Error Correction Coding: Mathematical Methods and Algorithms [2 ed.] 1119567475, 9781119567479

Providing in-depth treatment of error correction  Error Correction Coding: Mathematical Methods and Algorithms, 2nd Edit

2,736 299 21MB

English Pages 992 [995] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Error Correction Coding: Mathematical Methods and Algorithms [2 ed.]
 1119567475, 9781119567479

Table of contents :
Cover
Title Page
Copyright
Contents
Preface
List of Program Files
List of Laboratory Exercises
List of Algorithms
List of Figures
List of Tables
List of Boxes
About the Companion Website
Part I Introduction and Foundations
Chapter 1 A Context for Error Correction Coding
1.1 Purpose of This Book
1.2 Introduction: Where Are Codes?
1.3 The Communications System
1.4 Basic Digital Communications
1.4.1 Binary Phase‐Shift Keying
1.4.2 More General Digital Modulation
1.5 Signal Detection
1.5.1 The Gaussian Channel
1.5.2 MAP and ML Detection
1.5.3 Special Case: Binary Detection
1.5.4 Probability of Error for Binary Detection
1.5.5 Bounds on Performance: The Union Bound
1.5.6 The Binary Symmetric Channel
1.5.7 The BSC and the Gaussian Channel Model
1.6 Memoryless Channels
1.7 Simulation and Energy Considerations for Coded Signals
1.8 Some Important Definitions and a Trivial Code: Repetition Coding
1.8.1 Detection of Repetition Codes Over a BSC
1.8.2 Soft‐Decision Decoding of Repetition Codes Over the AWGN
1.8.3 Simulation of Results
1.8.4 Summary
1.9 Hamming Codes
1.9.1 Hard‐Input Decoding Hamming Codes
1.9.2 Other Representations of the Hamming Code
1.9.2.1 An Algebraic Representation
1.9.2.2 A Polynomial Representation
1.9.2.3 A Trellis Representation
1.9.2.4 The Tanner Graph Representation
1.10 The Basic Questions
1.11 Historical Milestones of Coding Theory
1.12 A Bit of Information Theory
1.12.1 Information‐ Theoretic Definitions for Discrete Random Variables
1.12.1.1 Entropy and Conditional Entropy
1.12.1.2 Relative Entropy, Mutual Information, and Channel Capacity
1.12.2 Data Processing Inequality
1.12.3 Channels
1.12.3.1 Binary Symmetric Channel
1.12.3.2 Binary Erasure Channel
1.12.3.3 Noisy Typewriter
1.12.3.4 Symmetric Channels
1.12.4 Channel Capacity
1.12.5 Information Theoretic Definitions for Continuous Random Variables
1.12.6 The Channel Coding Theorem
1.12.7 “Proof” of the Channel Coding Theorem
1.12.8 Capacity for the Continuous‐Time AWGN Channel
1.12.9 Transmission at Capacity with Errors
1.12.10 The Implication of the Channel Coding Theorem
1.12.11 Non‐Asymptotic Information Theory
1.12.11.1 Discrete Channels
1.12.11.2 The AWGN Channel
1.12.11.3 Comparison of Codes
Programming Laboratory 1: Simulating a Communications Channel
Objective
Background
Use of Coding in Conjunction with the BSC
Assignment
Preliminary Exercises
Programming Part
BPSK Simulation
Resources and Implementation Suggestions
1.13 Exercises
1.14 References
Part II Block Codes
Chapter 2 Groups and Vector Spaces
2.1 Introduction
2.2 Groups
2.2.1 Subgroups
2.2.2 Cyclic Groups and the Order of an Element
2.2.3 Cosets
2.2.4 Lagrange's Theorem
2.2.5 Induced Operations; Isomorphism
2.2.6 Homomorphism
2.3 Fields: A Prelude
2.4 Review of Linear Algebra
2.5 Exercises
2.6 References
Chapter 3 Linear Block Codes
3.1 Basic Definitions
3.2 The Generator Matrix Description of Linear Block Codes
3.2.1 Rudimentary Implementation
3.3 The Parity Check Matrix and Dual Codes
3.3.1 Some Simple Bounds on Block Codes
3.4 Error Detection and Correction Over Hard‐Output Channels
3.4.1 Error Detection
3.4.2 Error Correction: The Standard Array
3.5 Weight Distributions of Codes and Their Duals
3.6 Binary Hamming Codes and Their Duals
3.7 Performance of Linear Codes
3.7.1 Error Detection Performance
3.7.2 Error Correction Performance
3.7.3 Performance for Soft‐Decision Decoding
3.8 Erasure Decoding
3.8.1 Binary Erasure Decoding
3.9 Modifications to Linear Codes
3.10 Best‐Known Linear Block Codes
3.11 Exercises
3.12 References
Chapter 4 Cyclic Codes, Rings, and Polynomials
4.1 Introduction
4.2 Basic Definitions
4.3 Rings
4.3.1 Rings of Polynomials
4.4 Quotient Rings
4.5 Ideals in Rings
4.6 Algebraic Description of Cyclic Codes
4.7 Nonsystematic Encoding and Parity Check
4.8 Systematic Encoding
4.9 Some Hardware Background
4.9.1 Computational Building Blocks
4.9.2 Sequences and Power Series
4.9.3 Polynomial Multiplication
4.9.3.1 Last‐Element‐First Processing
4.9.3.2 First‐Element‐First Processing
4.9.4 Polynomial Division
4.9.4.1 Last‐Element‐First Processing
4.9.5 Simultaneous Polynomial Division and Multiplication
4.9.5.1 First‐Element‐First Processing
4.10 Cyclic Encoding
4.11 Syndrome Decoding
4.12 Shortened Cyclic Codes
4.12.1 Method 1: Simulating the Extra Clock Shifts
4.12.2 Method 2: Changing the Error Pattern Detection Circuit
4.13 Binary CRC Codes
4.13.1 Byte‐Oriented Encoding and Decoding Algorithms
4.13.2 CRC Protecting Data Files or Data Packets
4.A Linear Feedback Shift Registers
4.A.1 Basic Concepts
Appendix 4.A.2 Connection With Polynomial Division
Appendix 4.A.3 Some Algebraic Properties of Shift Sequences
Programming Laboratory 2: Polynomial Division and Linear Feedback Shift Registers
4.14.3 Objective
4.14.3 Preliminary Exercises
4.14.3 Programming Part: BinLFSR
4.14.3 Resources and Implementation Suggestions
4.14.3 Programming Part: BinPolyDiv
4.14.3 Follow‐On Ideas and Problems
Programming Laboratory 3: CRC Encoding and Decoding
4.14.3 Objective
4.14.3 Preliminary
4.14.3 Programming Part
4.14.3 Resources and Implementation Suggestions
4.14 Exercise
4.15 References
Chapter 5 Rudiments of Number Theory and Algebra
5.1 Motivation
5.2 Number Theoretic Preliminaries
5.2.1 Divisibility
5.2.2 The Euclidean Algorithm and Euclidean Domains
5.2.3 An Application of the Euclidean Algorithm: The Sugiyama Algorithm
5.2.4 Congruence
5.2.5 The ϕ Function
5.2.6 Some Cryptographic Payoff
5.2.6.1 Fermat's Little Theorem
5.2.6.2 RSA Encryption
5.3 The Chinese Remainder Theorem
5.3.1 The CRT and Interpolation
5.3.1.1 The Evaluation Homomorphism
5.3.1.2 The Interpolation Problem
5.4 Fields
5.4.1 An Examination of R and C
5.4.2 Galois Field Construction: An Example
5.4.3 Connection with Linear Feedback Shift Registers
5.5 Galois Fields: Mathematical Facts
5.6 Implementing Galois Field Arithmetic
5.6.1 Zech Logarithms
5.6.2 Hardware Implementations
5.7 Subfields of Galois Fields
5.8 Irreducible and Primitive Polynomials
5.9 Conjugate Elements and Minimal Polynomials
5.9.1 Minimal Polynomials
5.10 Factoring xn−1
5.11 Cyclotomic Cosets
5.A How Many Irreducible Polynomials Are There?
5.A.1 Solving for Im Explicitly: The Moebius Function
Programming Laboratory 4: Programming the Euclidean Algorithm
5.12.1 Objective
5.12.1 Preliminary Exercises
5.12.1 Background
5.12.1 Programming Part
Programming Laboratory 5: Programming Galois Field Arithmetic
5.12.1 Objective
5.12.1 Preliminary Exercises
5.12.1 Programming Part
5.12 Exercise
5.13 References
Chapter 6 BCH and Reed–Solomon Codes: Designer Cyclic Codes
6.1 BCH Codes
6.1.1 Designing BCH Codes
6.1.2 The BCH Bound
6.1.3 Weight Distributions for Some Binary BCH Codes
6.1.4 Asymptotic Results for BCH Codes
6.2 Reed–Solomon Codes
6.2.1 Reed–Solomon Construction 1
6.2.2 Reed–Solomon Construction 2
6.2.3 Encoding Reed–Solomon Codes
6.2.4 MDS Codes and Weight Distributions for RS Codes
6.3 Decoding BCH and RS Codes: The General Outline
6.3.1 Computation of the Syndrome
6.3.2 The Error Locator Polynomial
6.3.3 Chien Search
6.4 Finding the Error Locator Polynomial
6.4.1 Simplifications for Narrow‐Sense Binary Codes; Peterson's Algorithm
6.4.2 Berlekamp–Massey Algorithm
6.4.3 Characterization of LFSR Length in Massey's Algorithm
6.4.4 Simplifications for Binary Codes
6.5 Nonbinary BCH and RS Decoding
6.5.1 Forney's Algorithm
6.6 Euclidean Algorithm for the Error Locator Polynomial
6.7 Erasure Decoding for Nonbinary BCH or RS Codes
6.8 Galois Field Fourier Transform Methods
6.8.1 Equivalence of the Two Reed–Solomon Code Constructions
6.8.2 Frequency‐Domain Decoding
6.9 Variations and Extensions of Reed–Solomon Codes
6.9.1 Simple Modifications
6.9.2 Generalized Reed–Solomon Codes and Alternant Codes
6.9.3 Goppa Codes
6.9.4 Decoding Alternant Codes
6.9.5 Cryptographic Connections: The McEliece Public Key Cryptosystem
Programming Laboratory 6: Programming the Berlekamp–Massey Algorithm
6.9.5 Background
6.9.5 Assignment
6.9.5 Preliminary Exercises
6.9.5 Programming Part
6.9.5 Resources and Implementation Suggestions
Programming Laboratory 7: Programming the BCH Decoder
6.9.5 Objective
6.9.5 Preliminary Exercises
6.9.5 Programming Part
6.9.5 Resources and Implementation Suggestions
6.9.5 Follow‐On Ideas and Problems
Programming Laboratory 8: Reed–Solomon Encoding and Decoding
6.9.5 Objective
6.9.5 Background
6.9.5 Programming Part
6.A Proof of Newton's Identities
6.11 Exercise
6.12 References
Chapter 7 Alternate Decoding Algorithms for Reed–Solomon Codes
7.1 Introduction: Workload for Reed–Solomon Decoding
7.2 Derivations of Welch–Berlekamp Key Equation
7.2.1 The Welch–Berlekamp Derivation of the WB Key Equation
7.2.1.1 Single error in a message location
7.2.1.2 Multiple errors in message locations
7.2.1.3 Errors in check locations
7.2.2 Derivation from the Conventional Key Equation
7.3 Finding the Error Values
7.4 Methods of Solving the WB Key Equation
7.4.1 Background: Modules
7.4.2 The Welch–Berlekamp Algorithm
7.4.3 A Modular Approach to the Solution of the WB Key Equation
7.5 Erasure Decoding with the WB Key Equation
7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding
7.6.1 Bounded Distance, ML, and List Decoding
7.6.2 Error Correction by Interpolation
7.6.3 Polynomials in Two Variables
7.6.3.1 Degree and Monomial Order
7.6.3.2 Zeros and Multiple Zeros
7.6.4 The GS Decoder: The Main Theorems
7.6.4.1 The Interpolation Theorem
7.6.4.2 The Factorization Theorem
7.6.4.3 The Correction Distance
7.6.4.4 The Number of Polynomials in the Decoding List
7.6.5 Algorithms for Computing the Interpolation Step
7.6.5.1 Finding Linearly Dependent Columns: The Feng–Tzeng Algorithm
7.6.5.2 Finding the Intersection of Kernels: The Kötter Algorithm
7.6.6 A Special Case: m=1 and L=1
7.6.7 An Algorithm for the Factorization Step: The Roth–Ruckenstein Algorithm
7.6.7.1 What to Do with Lists of Factors?
7.6.8 Soft‐Decision Decoding of Reed–Solomon Codes
7.6.8.1 Notation
7.6.8.2 A Factorization Theorem
7.6.8.3 Mapping from Reliability to Multiplicity
7.6.8.4 The Geometry of the Decoding Regions
7.6.8.5 Computing the Reliability Matrix
7.7 Exercises
7.8 References
Chapter 8 Other Important Block Codes
8.1 Introduction
8.2 Hadamard Matrices, Codes, and Transforms
8.2.1 Introduction to Hadamard Matrices
8.2.2 The Paley Construction of Hadamard Matrices
8.2.3 Hadamard Codes
8.3 Reed–Muller Codes
8.3.1 Boolean Functions
8.3.2 Definition of the Reed–Muller Codes
8.3.3 Encoding and Decoding Algorithms for First‐Order RM Codes
8.3.3.1 Encoding RM(1,m) Codes
8.3.3.2 Decoding RM(1,m) Codes
8.3.3.3 Expediting Decoding Using the Fast Hadamard Transform
8.3.4 The Reed Decoding Algorithm for RM(r,m) Codes, r≥1
8.3.4.1 Details for an RM(2,4) Code
8.3.4.2 A Geometric Viewpoint
8.3.5 Other Constructions of Reed–Muller Codes
8.4 Building Long Codes from Short Codes: The Squaring Construction
8.5 Quadratic Residue Codes
8.6 Golay Codes
8.6.1 Decoding the Golay Code
8.6.1.1 Algebraic Decoding of the ?23 Golay Code
8.6.1.2 Arithmetic Decoding of the ?24 Code
8.7 Exercises
8.8 References
Chapter 9 Bounds on Codes
9.1 The Gilbert–Varshamov Bound
9.2 The Plotkin Bound
9.3 The Griesmer Bound
9.4 The Linear Programming and Related Bounds
9.4.1 Krawtchouk Polynomials
9.4.2 Character
9.4.3 Krawtchouk Polynomials and Characters
9.5 The McEliece–Rodemich–Rumsey–Welch Bound
9.6 Exercises
9.7 References
Chapter 10 Bursty Channels, Interleavers, and Concatenation
10.1 Introduction to Bursty Channels
10.2 Interleavers
10.3 An Application of Interleaved RS Codes: Compact Discs
10.4 Product Codes
10.5 Reed–Solomon Codes
10.6 Concatenated Codes
10.7 Fire Codes
10.7.1 Fire Code Definition
10.7.2 Decoding Fire Codes: Error Trapping Decoding
10.8 Exercises
10.9 References
Chapter 11 Soft‐Decision Decoding Algorithms
11.1 Introduction and General Notation
11.2 Generalized Minimum Distance Decoding
11.2.1 Distance Measures and Properties
11.3 The Chase Decoding Algorithms
11.4 Halting the Search: An Optimality Condition
11.5 Ordered Statistic Decoding
11.6 Soft Decoding Using the Dual Code: The Hartmann Rudolph Algorithm
11.7 Exercises
11.8 References
Part III Codes on Graphs
Chapter 12 Convolutional Codes
12.1 Introduction and Basic Notation
12.1.1 The State
12.2 Definition of Codes and Equivalent Codes
12.2.1 Catastrophic Encoders
12.2.2 Polynomial and Rational Encoders
12.2.3 Constraint Length and Minimal Encoders
12.2.4 Systematic Encoders
12.3 Decoding Convolutional Codes
12.3.1 Introduction and Notation
12.3.2 The Viterbi Algorithm
12.3.3 Some Implementation Issues
12.3.3.1 The Basic Operation: Add‐Compare‐Select
12.3.3.2 Decoding Streams of Data: Windows on the Trellis
12.3.3.3 Output Decisions
12.3.3.4 Hard and Soft Decoding; Quantization
12.3.3.5 Synchronization Issues
12.4 Some Performance Results
12.5 Error Analysis for Convolutional Codes
12.5.1 Enumerating Paths Through the Trellis
12.5.1.1 Enumerating on More Complicated Graphs: Mason's Rule
12.5.2 Characterizing Node Error Probability Pe and Bit Error Rate Pb
12.5.3 A Bound on Pd for Discrete Channels
12.5.3.1 Performance Bound on the BSC
12.5.4 A Bound on Pd for BPSK Signaling Over the AWGN Channel
12.5.5 Asymptotic Coding Gain
12.6 Tables of Good Codes
12.7 Puncturing
12.7.1 Puncturing to Achieve Variable Rate
12.8 Suboptimal Decoding Algorithms for Convolutional Codes
12.8.1 Tree Representations
12.8.2 The Fano Metric
12.8.3 The Stack Algorithm
12.8.4 The Fano Algorithm
12.8.5 Other Issues for Sequential Decoding
12.8.5.1 Computational complexity
12.8.5.2 Code design
12.8.5.3 Variations on sequential decoding algorithms
12.8.6 A Variation on the Viterbi Algorithm: The M Algorithm
12.9 Convolutional Codes as Block Codes and Tailbiting Codes
12.10 A Modified Expression for the Path Metric
12.11 Soft Output Viterbi Algorithm (SOVA)
12.12 Trellis Representations of Block and Cyclic Codes
12.12.1 Block Codes
12.12.2 Cyclic Codes
12.12.3 Trellis Decoding of Block Codes
Programming Laboratory 9: Programming Convolutional Encoders
12.12.3 Objective
12.12.3 Background
12.12.3 Programming Part
Programming Laboratory 10: Convolutional Decoders: The Viterbi Algorithm
12.12.3 Objective
12.12.3 Background
12.12.3 Programming Part
12.13 Exercises
12.14 References
Chapter 13 Trellis‐Coded Modulation
13.1 Adding Redundancy by Adding Signals
13.2 Background on Signal Constellations
13.3 TCM Example
13.3.1 The General Ungerboeck Coding Framework
13.3.2 The Set Partitioning Idea
13.4 Some Error Analysis for TCM Codes
13.4.1 General Considerations
13.4.2 A Description of the Error Events
13.4.3 Known Good TCM Codes
13.5 Decoding TCM Codes
13.6 Rotational Invariance
13.6.1 Differential Encoding
13.6.2 Constellation Labels and Partitions
13.7 Multidimensional TCM
13.7.1 Some Advantages of Multidimensional TCM
13.7.1.1 Energy expansion advantage
13.7.1.2 Sphere‐packing advantages
13.7.1.3 Spectral efficiency
13.7.1.4 Rotational invariance
13.7.1.5 Signal shape
13.7.1.6 Peak‐to‐average power ratio
13.7.1.7 Decoding speed
13.7.2 Lattices and Sublattices
13.7.2.1 Basic Definitions
13.7.2.2 Common Lattices
13.7.2.3 Sublattices and Cosets
13.7.2.4 The Lattice Code Idea
13.7.2.5 Sources of Coding Gain in Lattice Codes
13.7.2.6 Some Good Lattice Codes
13.8 Multidimensional TCM Example: The V.34 Modem Standard
13.9 Exercises
Programming Laboratory 11: Trellis‐Coded Modulation Encoding and Decoding
13.10 References
Part IV Iteratively Decoded Codes
Chapter 14 Turbo Codes
14.1 Introduction
14.2 Encoding Parallel Concatenated Codes
14.3 Turbo Decoding Algorithms
14.3.1 The MAP Decoding Algorithm
14.3.2 Notation
14.3.3 Posterior Probability
14.3.4 Computing αt and βt
14.3.5 Computing γt
14.3.6 Normalization
14.3.7 Summary of the BCJR Algorithm
14.3.8 A Matrix/Vector Formulation
14.3.9 Comparison of the Viterbi Algorithm and the BCJR Algorithm
14.3.10 The BCJR Algorithm for Systematic Codes
14.3.11 Turbo Decoding Using the BCJR Algorithm
14.3.11.1 The Terminal State of the Encoders
14.3.12 Likelihood Ratio Decoding
14.3.12.1 Log Prior Ratio λp,t
14.3.12.2 Log Posterior λs,t(0)
14.3.13 Statement of the Turbo Decoding Algorithm
14.3.14 Turbo Decoding Stopping Criteria
14.3.14.1 The Cross Entropy Stopping Criterion
14.3.14.2 The Sign Change Ratio (SCR) Criterion
14.3.14.3 The Hard Decision Aided (HDA) Criterion
14.3.15 Modifications of the MAP Algorithm
14.3.15.1 The Max‐Log‐MAP Algorithm
14.3.16 Corrections to the Max‐Log‐MAP Algorithm
14.3.17 The Soft‐Output Viterbi Algorithm
14.4 On the Error Floor and Weight Distributions
14.4.1 The Error Floor
14.4.2 Spectral Thinning and Random Permuters
14.4.3 On Permuters
14.5 EXIT Chart Analysis
14.5.1 The EXIT Chart
14.6 Block Turbo Coding
14.7 Turbo Equalization
14.7.1 Introduction to Turbo Equalization
14.7.2 The Framework for Turbo Equalization
Programming Laboratory 12: Turbo Code Decoding
14.7.2 Objective
14.7.2 Background
14.7.2 Programming Part
14.8 Exercise
14.9 References
Chapter 15 Low‐Density Parity‐Check Codes: Introduction, Decoding, and Analysis
15.1 Introduction
15.2 LDPC Codes: Construction and Notation
15.3 Tanner Graphs
15.4 Decoding LDPC Codes
15.4.1 Decoding Using Log‐Likelihood Ratios
15.4.1.1 Log‐Likelihood Ratios
15.4.1.2 Log‐Likelihood Ratio of the Sum of Bits
15.4.1.3 Decoding: Message from a Check Node to a Variable Node
15.4.1.4 Log Likelihood of Repeated Observations About a Bit
15.4.1.5 Decoding: Message from a Variable Node to a Check Node
15.4.1.6 Inputs to the LDPC Decoding Algorithm
15.4.1.7 Terminating the Decoding Algorithm
15.4.1.8 Summary of Decoding: The Belief Propagation Algorithm
15.4.1.9 Messages on the Tanner Graph
15.4.2 Decoding Using Probabilities
15.4.2.1 Probability of Even Parity: Decoding at the Check Nodes
15.4.2.2 Probability of Independent Observations Decoding at a Variable Node
15.4.2.3 Computing the Leave‐One‐Out Product
15.4.2.4 Sparse Memory Organization
15.4.3 Variations on Decoding Algorithms: The Min‐Sum Decoder
15.4.3.1 The ⊞ Operation and the ϕ(x) Function
15.4.3.2 Attenuated and Offset Min‐Sum Decoders
15.4.4 Variations on Decoding Algorithms: Min‐Sum with Correction
15.4.4.1 Approximate min* Decoder
15.4.4.2 The Reduced Complexity Box‐Plus Decoder
15.4.4.3 The Richardson–Novichkov Decoder
15.4.5 Hard‐Decision Decoding
15.4.5.1 Bit Flipping
15.4.5.2 Gallager's Algorithms A and B
15.4.5.3 Weighted Bit Flipping
15.4.5.4 Gradient Descent Bit Flipping
15.4.6 Divide and Concur Decoding
15.4.6.1 Summary of the Divide and Concur Algorithm
15.4.6.2 DC Applied to LDPC Decoding
15.4.6.3 The Divide Projections
15.4.6.4 The Concur Projection
15.4.6.5 A Message‐Passing Viewpoint of DC Decoding
15.4.7 Difference Map Belief Propagation Decoding
15.4.8 Linear Programming Decoding
15.4.8.1 Background on Linear Programming
15.4.8.2 Formulation of the Basic LP Decoding Algorithm
15.4.8.3 LP Relaxation
15.4.8.4 Examples and Discussion
15.4.9 Decoding on the Binary Erasure Channel
15.4.10 BEC Channels and Stopping Sets
15.5 Why Low‐Density Parity‐Check Codes?
15.6 The Iterative Decoder on General Block Codes
15.7 Density Evolution
15.8 EXIT Charts for LDPC Codes
15.9 Irregular LDPC Codes
15.9.1 Degree Distribution Pairs
15.9.2 Density Evolution for Irregular Codes
15.9.3 Computation and Optimization of Density Evolution
15.9.4 Using Irregular Codes
15.10 More on LDPC Code Construction
15.11 Encoding LDPC Codes
15.12 A Variation: Low‐Density Generator Matrix Codes
Programming Laboratory 13: Programming an LDPC Decoder
15.13 Exercise
15.14 References
Chapter 16 Low‐Density Parity‐Check Codes: Designs and Variations
16.1 Introduction
16.2 Repeat‐Accumulate Codes
16.2.1 Decoding RA Codes
16.2.2 Irregular RA Codes
16.2.3 RA Codes with Multiple Accumulators
16.3 LDPC Convolutional Codes
16.4 Quasi‐Cyclic Codes
16.4.1 QC Generator Matrices
16.4.2 Constructing QC Generator Matrices from QC Parity Check Matrices
16.4.2.1 Full Rank Case
16.4.2.2 Non‐Full Rank Case
16.5 Construction of LDPC Codes Based on Finite Fields
16.5.1 I. Construction of QC‐LDPC Codes Based on the Minimum‐Weight Codewords of a Reed–Solomon Code with Two Information Symbols
16.5.2 II. Construction of QC‐LDPC Codes Based on a Special Subclass of RS Codes
16.5.3 III. Construction of QC‐LDPC Codes Based on Subgroups of a Finite Field
16.5.4 IV. Construction of QC‐LDPC Codes Based on Subgroups of the Multiplicative Group of a Finite Field
16.5.5 Construction of QC‐LDPC Codes Based on Primitive Elements of a Field
16.6 Code Design Based on Finite Geometries
16.6.1 Rudiments of Euclidean Geometry
16.6.1.1 Points in EG(m,q)
16.6.1.2 Lines in EG(m,q)
16.6.1.3 Incidence vectors in EG*(m,q)
16.6.2 A Family of Cyclic EG‐LDPC Codes
16.6.3 Construction of LDPC Codes Based on Parallel Bundles of Lines
16.6.4 Constructions Based on Other Combinatoric Objects
16.7 Ensembles of LDPC Codes
16.7.1 Regular ensembles
16.7.2 Irregular Ensembles
16.7.3 Multi‐edge‐type Ensembles
16.8 Constructing LDPC Codes by Progressive Edge Growth (PEG)
16.9 Protograph and Multi‐Edge‐Type LDPC Codes
16.10 Error Floors and Trapping Sets
16.11 Importance Sampling
16.11.1 Importance Sampling: General Principles
16.11.2 Importance Sampling for Estimating Error Probability
16.11.2 Conventional Sampling (MC)
16.11.2 Importance Sampling (IS)
16.11.3 Importance Sampling for Tanner Trees
16.11.3.1 Single Parity‐Check Codes
16.11.3.2 Symmetric Tanner Trees
16.11.3.3 General Trees
16.11.3.4 Importance Sampling for LDPC Codes
16.12 Fountain Codes
16.12.1 Conventional Erasure Correction Codes
16.12.2 Tornado Codes
16.12.3 Luby Transform Codes
16.12.4 Raptor Codes
16.13 References
Part V Polar Codes
Chapter 17 Polar Codes
17.1 Introduction and Preview
17.2 Notation and Channels
17.3 Channel Polarization, N=2 Channel
17.3.1 Encoding
17.3.2 Synthetic Channels and Mutual Information
17.3.3 Synthetic Channel Transition Probabilities
17.3.4 An Example with N=2 Using the Binary Erasure Channel
17.3.5 Successive Cancellation Decoding
17.3.5.1 Log‐Likelihood Ratio Computations
17.3.5.2 Computing the Log‐Likelihood Function with Lower Complexity
17.3.6 Tree Representation
17.3.7 The Polar Coding Idea
17.4 Channel Polarization, N>2 Channels
17.4.1 Channel Combining: Extension from N/2 to N channels.
17.4.2 Pseudocode for Encoder for Arbitrary N
17.4.3 Transition Probabilities and Channel Splitting
17.4.4 Channel Polarization Demonstrated: An Example Using the Binary Erasure Channel for N>2
17.4.5 Polar Coding
17.4.6 Tree Representation
17.4.7 Successive Cancellation Decoding
17.4.8 SC Decoding from a Message Passing Point of View on the Tree
17.4.9 A Decoding Example with N=4
17.4.10 A Decoding Example with N=8
17.4.11 Pseudo‐Code Description of Successive Cancellation Algorithm
17.5 Some Theorems of Polar Coding Theory
17.5.1 I(W) and Z(W) for general B‐DMCs
17.5.2 Channel Polarization
17.5.3 The Polarization Theorem
17.5.4 A Few Facts About Martingales
17.5.5 Proof of the Polarization Theorem
17.5.6 Another Polarization Theorem
17.5.7 Rate of Polarization
17.5.8 Probability of Error Performance
17.6 Designing Polar Codes
17.6.1 Code Design by Battacharyya Bound
17.6.2 Monte Carlo Estimation of Z(WN(i))
17.7 Perspective: The Channel Coding Theorem
17.8 Systematic Encoding of Polar Codes
17.8.1 Polar Systematic Encoding via the Encoder Graph
17.8.2 Polar Systematic Encoding via Arıkan's Method
17.8.3 Systematic Encoding: The Bit Reverse Permutation
17.8.4 Decoding Systematically Encoded Codes
17.8.5 Flexible Systematic Encoding
17.8.6 Involutions and Domination Contiguity
17.8.7 Polar Codes and Domination Contiguity
17.8.8 Modifications for Polar Codes with Bit‐Reverse Permutation
17.9 List Decoding of Polar Codes
17.9.1 The Likelihood Data Structure P
17.9.2 Normalization
17.9.3 Code to Compute P
17.9.4 The Bits Data Structure C
17.9.5 Code to Compute C
17.9.6 Supporting Data Structures
17.9.7 Code for Polar List Decoding
17.9.8 An Example of List Decoding
17.9.9 Computational Complexity
17.9.10 Modified Polar Codes
17.10 LLR‐Based Successive Cancellation List Decoding
17.10.1 Implementation Considerations
17.11 Simplified Successive Cancellation Decoding
17.11.1 Modified SC Decoding
17.12 Relations with Reed–Muller Codes
17.13 Hardware and Throughput Performance Results
17.14 Generalizations, Extensions, and Variations
17.A BN is a Bit‐Reverse Permutation
17.B The Relationship of the Battacharyya Parameter to Channel Capacity
17.B.1 Error Probability for Two Codewords
17.B.2 Proof of Inequality (17.59)
17.B.3 Proof of Inequality (17.60) [16]
17.C Proof of Theorem 17.12
17.15 Exercises
17.16 References
Part VI Applications
Chapter 18 Some Applications of Error Correction in Modern Communication Systems
18.1 Introduction
18.2 Digital Video Broadcast T2 (DVB‐T2)
18.2.1 BCH Outer Encoding
18.2.2 LDPC Inner Encoding
18.3 Digital Cable Television
18.4 E‐UTRA and Long‐Term Evolution
18.4.1 LTE Rate 1/3 Convolutional Encoder
18.4.2 LTE Turbo Code
18.5 References
Part VII Space-Time Coding
Chapter 19 Fading Channels and Space‐Time Codes
19.1 Introduction
19.2 Fading Channels
19.2.1 Rayleigh Fading
19.3 Diversity Transmission and Reception: The MIMO Channel
19.3.1 The Narrowband MIMO Channel
19.3.2 Diversity Performance with Maximal‐Ratio Combining
19.4 Space‐Time Block Codes
19.4.1 The Alamouti Code
19.4.2 A More General Formulation
19.4.3 Performance Calculation
19.4.3.1 Real Orthogonal Designs
19.4.3.2 Encoding and Decoding Based on Orthogonal Designs
19.4.3.3 Generalized Real Orthogonal Designs
19.4.4 Complex Orthogonal Designs
19.4.4.1 Future Work
19.5 Space‐Time Trellis Codes
19.5.1 Concatenation
19.6 How Many Antennas?
19.7 Estimating Channel Information
19.8 Exercises
19.9 References
References
Index
EULA

Citation preview

Error Correction Coding

Error Correction Coding Mathematical Methods and Algorithms

Second Edition

Todd K. Moon Utah State University, Utah, USA

This second edition first published 2021 © 2021 by John Wiley and Sons, Inc. Edition History John Wiley and Sons, Inc. (1e, 2005) All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions. The right of Todd K. Moon to be identified as the author of this work has been asserted in accordance with law. Registered Office John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA Editorial Office 111 River Street, Hoboken, NJ 07030, USA For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com. Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats. Limit of Liability/Disclaimer of Warranty MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This work’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® software. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Library of Congress Cataloging-in-Publication Data Names: Moon, Todd K., author. Title: Error correction coding : mathematical methods and algorithms / Todd K. Moon. Description: Second edition. | Hoboken, NJ : Wiley, 2021. | Includes bibliographical references and index. Identifiers: LCCN 2020027697 (print) | LCCN 2020027698 (ebook) | ISBN 9781119567479 (cloth) | ISBN 9781119567486 (adobe pdf) | ISBN 9781119567493 (epub) Subjects: LCSH: Engineering mathematics. | Error-correcting codes (Information theory) Classification: LCC TA331 .M66 2021 (print) | LCC TA331 (ebook) | DDC 621.3820285/572–dc23 LC record available at https://lccn.loc.gov/2020027697 LC ebook record available at https://lccn.loc.gov/2020027698 Cover Design: Wiley Cover Image: © Putt Sakdhnagool/Getty Images Set in 10/12pt NimbusRomNo9L by SPi Global, Chennai, India

10 9 8 7 6 5 4 3 2 1

Contents Preface

xvii

List of Program Files

xxiii

List of Laboratory Exercises

xxix

List of Algorithms

xxxi

List of Figures

xxxiii

List of Tables

xli

List of Boxes

xliii

About the Companion Website

Part I Introduction and Foundations 1

A Context for Error Correction Coding 1.1 Purpose of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Introduction: Where Are Codes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 The Communications System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Basic Digital Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Binary Phase-Shift Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 More General Digital Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Signal Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 The Gaussian Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 MAP and ML Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Special Case: Binary Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.4 Probability of Error for Binary Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.5 Bounds on Performance: The Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.6 The Binary Symmetric Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.7 The BSC and the Gaussian Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Memoryless Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Simulation and Energy Considerations for Coded Signals . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Some Important Definitions and a Trivial Code: Repetition Coding . . . . . . . . . . . . . . . . 1.8.1 Detection of Repetition Codes Over a BSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.2 Soft-Decision Decoding of Repetition Codes Over the AWGN . . . . . . . . . . . . 1.8.3 Simulation of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.1 Hard-Input Decoding Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.2 Other Representations of the Hamming Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 The Basic Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11 Historical Milestones of Coding Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 A Bit of Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.1 Information- Theoretic Definitions for Discrete Random Variables . . . . . . . . . 1.12.2 Data Processing Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.3 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.4 Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.5 Information Theoretic Definitions for Continuous Random Variables . . . . . . . 1.12.6 The Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xlv

1 3 3 3 4 10 10 11 14 14 16 18 19 21 23 24 24 25 27 27 31 32 32 32 34 36 38 39 39 39 43 44 46 47 48

vi

Contents 1.12.7 “Proof” of the Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.8 Capacity for the Continuous-Time AWGN Channel . . . . . . . . . . . . . . . . . . . . . . 1.12.9 Transmission at Capacity with Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.10 The Implication of the Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.11 Non-Asymptotic Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 1 Simulating a Communications Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Use of Coding in Conjunction with the BSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resources and Implementation Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49 53 54 55 56 60 60 60 61 61 61 62 64 68

Part II Block Codes

69

2

Groups and Vector Spaces 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Cyclic Groups and the Order of an Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Cosets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Lagrange’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Induced Operations; Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6 Homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Fields: A Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Review of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71 71 71 74 75 76 77 78 81 82 84 89 91

3

Linear Block Codes 3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Generator Matrix Description of Linear Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Rudimentary Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The Parity Check Matrix and Dual Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Some Simple Bounds on Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Error Detection and Correction Over Hard-Output Channels . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Error Correction: The Standard Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Weight Distributions of Codes and Their Duals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Binary Hamming Codes and Their Duals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Performance of Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Error Detection Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Error Correction Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.3 Performance for Soft-Decision Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Erasure Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Binary Erasure Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Modifications to Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Best-Known Linear Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93 93 94 96 96 98 99 100 100 104 106 108 108 109 112 113 114 114 116 116 121

Contents

vii

4

Cyclic Codes, Rings, and Polynomials 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Rings of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Quotient Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Ideals in Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Algebraic Description of Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Nonsystematic Encoding and Parity Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Systematic Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Some Hardware Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.1 Computational Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.2 Sequences and Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.3 Polynomial Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.4 Polynomial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.5 Simultaneous Polynomial Division and Multiplication . . . . . . . . . . . . . . . . . . . . 4.10 Cyclic Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Syndrome Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12 Shortened Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.1 Method 1: Simulating the Extra Clock Shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.2 Method 2: Changing the Error Pattern Detection Circuit . . . . . . . . . . . . . . . . . . 4.13 Binary CRC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13.1 Byte-Oriented Encoding and Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . . 4.13.2 CRC Protecting Data Files or Data Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 4.A Linear Feedback Shift Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 4.A.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 4.A.2 Connection With Polynomial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 4.A.3 Some Algebraic Properties of Shift Sequences . . . . . . . . . . . . . . . . . . . . Lab 2 Polynomial Division and Linear Feedback Shift Registers . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part: BinLFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resources and Implementation Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part: BinPolyDiv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Follow-On Ideas and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 3 CRC Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resources and Implementation Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

123 123 123 124 125 126 128 130 131 134 136 136 137 137 138 141 143 147 153 154 155 155 157 161 161 162 165 167 168 168 168 168 169 169 170 170 170 170 170 171 172 178

5

Rudiments of Number Theory and Algebra 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Number Theoretic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 The Euclidean Algorithm and Euclidean Domains . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 An Application of the Euclidean Algorithm: The Sugiyama Algorithm . . . . . 5.2.4 Congruence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 The 𝜙 Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Some Cryptographic Payoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

179 179 182 182 184 189 192 193 193

viii

Contents

6

5.3

The Chinese Remainder Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 The CRT and Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 An Examination of ℝ and ℂ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Galois Field Construction: An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Connection with Linear Feedback Shift Registers . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Galois Fields: Mathematical Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Implementing Galois Field Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Zech Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Hardware Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Subfields of Galois Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Irreducible and Primitive Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Conjugate Elements and Minimal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9.1 Minimal Polynomials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Factoring xn − 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Cyclotomic Cosets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 5.A How Many Irreducible Polynomials Are There? . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 5.A.1 Solving for Im Explicitly: The Moebius Function . . . . . . . . . . . . . . . . . Lab 4 Programming the Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 5 Programming Galois Field Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

196 198 200 201 204 207 207 211 211 212 213 214 216 218 222 223 224 227 228 228 228 228 229 229 229 229 230 231 239

BCH and Reed–Solomon Codes: Designer Cyclic Codes 6.1 BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Designing BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 The BCH Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Weight Distributions for Some Binary BCH Codes . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Asymptotic Results for BCH Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Reed–Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Reed–Solomon Construction 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Reed–Solomon Construction 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Encoding Reed–Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 MDS Codes and Weight Distributions for RS Codes . . . . . . . . . . . . . . . . . . . . . . 6.3 Decoding BCH and RS Codes: The General Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Computation of the Syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 The Error Locator Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Chien Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Finding the Error Locator Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Simplifications for Narrow-Sense Binary Codes; Peterson’s Algorithm . . . . . 6.4.2 Berlekamp–Massey Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Characterization of LFSR Length in Massey’s Algorithm . . . . . . . . . . . . . . . . . 6.4.4 Simplifications for Binary Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Nonbinary BCH and RS Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Forney’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

241 241 241 244 246 248 249 249 250 251 252 253 254 255 255 256 258 261 262 266 268 268

Contents

7

ix

6.6 6.7 6.8

Euclidean Algorithm for the Error Locator Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . Erasure Decoding for Nonbinary BCH or RS Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Galois Field Fourier Transform Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Equivalence of the Two Reed–Solomon Code Constructions . . . . . . . . . . . . . . 6.8.2 Frequency-Domain Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Variations and Extensions of Reed–Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.1 Simple Modifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2 Generalized Reed–Solomon Codes and Alternant Codes . . . . . . . . . . . . . . . . . . 6.9.3 Goppa Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.4 Decoding Alternant Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.5 Cryptographic Connections: The McEliece Public Key Cryptosystem . . . . . . Lab 6 Programming the Berlekamp–Massey Algorithm . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resources and Implementation Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 7 Programming the BCH Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resources and Implementation Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Follow-On Ideas and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 8 Reed–Solomon Encoding and Decoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 6.A Proof of Newton’s Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

272 274 277 281 282 283 283 283 285 286 286 287 287 287 287 288 288 289 289 289 289 290 290 290 290 290 290 291 293 297

Alternate Decoding Algorithms for Reed–Solomon Codes 7.1 Introduction: Workload for Reed–Solomon Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Derivations of Welch–Berlekamp Key Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 The Welch–Berlekamp Derivation of the WB Key Equation . . . . . . . . . . . . . . . 7.2.2 Derivation from the Conventional Key Equation . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Finding the Error Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Methods of Solving the WB Key Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Background: Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 The Welch–Berlekamp Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 A Modular Approach to the Solution of the WB Key Equation . . . . . . . . . . . . . 7.5 Erasure Decoding with the WB Key Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding . . . . . . . . . . . . . . . 7.6.1 Bounded Distance, ML, and List Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 Error Correction by Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 Polynomials in Two Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.4 The GS Decoder: The Main Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.5 Algorithms for Computing the Interpolation Step . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.6 A Special Case: m = 1 and L = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.7 An Algorithm for the Factorization Step: The Roth–Ruckenstein Algorithm . 7.6.8 Soft-Decision Decoding of Reed–Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

299 299 299 300 304 306 307 308 309 315 326 326 326 327 328 333 340 351 353 358 367 370

x

Contents 8

Other Important Block Codes 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Hadamard Matrices, Codes, and Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Introduction to Hadamard Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 The Paley Construction of Hadamard Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Hadamard Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Reed–Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Boolean Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Definition of the Reed–Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Encoding and Decoding Algorithms for First-Order RM Codes . . . . . . . . . . . . 8.3.4 The Reed Decoding Algorithm for RM(r, m) Codes, r ≥ 1 . . . . . . . . . . . . . . . . . 8.3.5 Other Constructions of Reed–Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Building Long Codes from Short Codes: The Squaring Construction . . . . . . . . . . . . . . . 8.5 Quadratic Residue Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Golay Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 Decoding the Golay Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

371 371 371 371 373 376 376 377 378 380 385 392 393 397 399 400 403 405

9

Bounds on Codes 9.1 The Gilbert–Varshamov Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 The Plotkin Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 The Griesmer Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 The Linear Programming and Related Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Krawtchouk Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 Character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.3 Krawtchouk Polynomials and Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 The McEliece–Rodemich–Rumsey–Welch Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

407 410 411 412 413 415 415 417 419 420 424

10 Bursty Channels, Interleavers, and Concatenation 10.1 Introduction to Bursty Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Interleavers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 An Application of Interleaved RS Codes: Compact Discs . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Product Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Reed–Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Concatenated Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Fire Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.1 Fire Code Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.2 Decoding Fire Codes: Error Trapping Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

425 425 425 428 429 431 431 432 432 434 436 437

11 Soft-Decision Decoding Algorithms 11.1 Introduction and General Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Generalized Minimum Distance Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Distance Measures and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 The Chase Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Halting the Search: An Optimality Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Ordered Statistic Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Soft Decoding Using the Dual Code: The Hartmann Rudolph Algorithm . . . . . . . . . . .

439 439 441 441 444 445 447 449

Contents 11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi 451 452

Part III Codes on Graphs

453

12 Convolutional Codes 12.1 Introduction and Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 The State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Definition of Codes and Equivalent Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Catastrophic Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 Polynomial and Rational Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.3 Constraint Length and Minimal Encoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.4 Systematic Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Decoding Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Introduction and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.3 Some Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Some Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Error Analysis for Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Enumerating Paths Through the Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Characterizing Node Error Probability Pe and Bit Error Rate Pb . . . . . . . . . . . . 12.5.3 A Bound on Pd for Discrete Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.4 A Bound on Pd for BPSK Signaling Over the AWGN Channel . . . . . . . . . . . . 12.5.5 Asymptotic Coding Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Tables of Good Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.1 Puncturing to Achieve Variable Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8 Suboptimal Decoding Algorithms for Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . 12.8.1 Tree Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.2 The Fano Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.3 The Stack Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.4 The Fano Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.5 Other Issues for Sequential Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.6 A Variation on the Viterbi Algorithm: The M Algorithm . . . . . . . . . . . . . . . . . . 12.9 Convolutional Codes as Block Codes and Tailbiting Codes . . . . . . . . . . . . . . . . . . . . . . . 12.10 A Modified Expression for the Path Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.11 Soft Output Viterbi Algorithm (SOVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12 Trellis Representations of Block and Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12.1 Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12.2 Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.12.3 Trellis Decoding of Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 9 Programming Convolutional Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 10 Convolutional Decoders: The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

455 455 459 460 464 466 467 470 471 471 474 483 488 491 494 499 501 503 505 505 506 508 509 510 510 514 516 523 524 524 525 528 533 533 533 534 535 535 535 536 537 537 537 538

xii

Contents 12.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.14 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

538 543

13 Trellis-Coded Modulation 13.1 Adding Redundancy by Adding Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Background on Signal Constellations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 TCM Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 The General Ungerboeck Coding Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 The Set Partitioning Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Some Error Analysis for TCM Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 General Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 A Description of the Error Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 Known Good TCM Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Decoding TCM Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Rotational Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.1 Differential Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.2 Constellation Labels and Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 Multidimensional TCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.1 Some Advantages of Multidimensional TCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.2 Lattices and Sublattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 Multidimensional TCM Example: The V.34 Modem Standard. . . . . . . . . . . . . . . . . . . . . 13.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 11 Trellis-Coded Modulation Encoding and Decoding. . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

545 545 545 547 553 553 555 555 556 560 562 562 566 567 568 569 570 578 585 586 586 586 586 586

Part IV Iteratively Decoded Codes

589

14 Turbo Codes 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Encoding Parallel Concatenated Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Turbo Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 The MAP Decoding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Posterior Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.4 Computing 𝛼t and 𝛽t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.5 Computing 𝛾t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.6 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.7 Summary of the BCJR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.8 A Matrix/Vector Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.9 Comparison of the Viterbi Algorithm and the BCJR Algorithm . . . . . . . . . . . . 14.3.10 The BCJR Algorithm for Systematic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.11 Turbo Decoding Using the BCJR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.12 Likelihood Ratio Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.13 Statement of the Turbo Decoding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.14 Turbo Decoding Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.15 Modifications of the MAP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.16 Corrections to the Max-Log-MAP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.17 The Soft-Output Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

591 591 593 595 596 596 598 600 601 602 603 605 605 606 607 609 612 612 614 616 616

Contents

xiii

14.4 On the Error Floor and Weight Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 The Error Floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.2 Spectral Thinning and Random Permuters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.3 On Permuters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 EXIT Chart Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.1 The EXIT Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 Block Turbo Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 Turbo Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7.1 Introduction to Turbo Equalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7.2 The Framework for Turbo Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 12 Turbo Code Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.8 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

616 616 618 621 622 624 626 628 628 629 631 631 631 631 631 634

15 Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 LDPC Codes: Construction and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Tanner Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Decoding LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.1 Decoding Using Log-Likelihood Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.2 Decoding Using Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.3 Variations on Decoding Algorithms: The Min-Sum Decoder . . . . . . . . . . . . . . 15.4.4 Variations on Decoding Algorithms: Min-Sum with Correction . . . . . . . . . . . . 15.4.5 Hard-Decision Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.6 Divide and Concur Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.7 Difference Map Belief Propagation Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.8 Linear Programming Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.9 Decoding on the Binary Erasure Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.10 BEC Channels and Stopping Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5 Why Low-Density Parity-Check Codes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6 The Iterative Decoder on General Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.7 Density Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.8 EXIT Charts for LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9 Irregular LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9.1 Degree Distribution Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9.2 Density Evolution for Irregular Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9.3 Computation and Optimization of Density Evolution . . . . . . . . . . . . . . . . . . . . . 15.9.4 Using Irregular Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.10 More on LDPC Code Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.11 Encoding LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.12 A Variation: Low-Density Generator Matrix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab 13 Programming an LDPC Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.13 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.14 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

637 637 637 641 642 642 652 659 662 669 674 681 682 691 692 694 695 696 699 702 703 704 706 708 708 708 710 710 710 710 711 711 712 715

xiv

Contents 16 Low-Density Parity-Check Codes: Designs and Variations 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Repeat-Accumulate Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.1 Decoding RA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 Irregular RA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.3 RA Codes with Multiple Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 LDPC Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Quasi-Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4.1 QC Generator Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4.2 Constructing QC Generator Matrices from QC Parity Check Matrices . . . . . . 16.5 Construction of LDPC Codes Based on Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.1 I. Construction of QC-LDPC Codes Based on the Minimum-Weight Codewords of a Reed–Solomon Code with Two Information Symbols . . . . . . 16.5.2 II. Construction of QC-LDPC Codes Based on a Special Subclass of RS Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.3 III. Construction of QC-LDPC Codes Based on Subgroups of a Finite Field . 16.5.4 IV. Construction of QC-LDPC Codes Based on Subgroups of the Multiplicative Group of a Finite Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.5 Construction of QC-LDPC Codes Based on Primitive Elements of a Field . . 16.6 Code Design Based on Finite Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6.1 Rudiments of Euclidean Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6.2 A Family of Cyclic EG-LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6.3 Construction of LDPC Codes Based on Parallel Bundles of Lines . . . . . . . . . . 16.6.4 Constructions Based on Other Combinatoric Objects . . . . . . . . . . . . . . . . . . . . . 16.7 Ensembles of LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.7.1 Regular ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.7.2 Irregular Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.7.3 Multi-edge-type Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.8 Constructing LDPC Codes by Progressive Edge Growth (PEG) . . . . . . . . . . . . . . . . . . . 16.9 Protograph and Multi-Edge-Type LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.10 Error Floors and Trapping Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.11 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.11.1 Importance Sampling: General Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.11.2 Importance Sampling for Estimating Error Probability . . . . . . . . . . . . . . . . . . . . Conventional Sampling (MC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Importance Sampling (IS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.11.3 Importance Sampling for Tanner Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.12 Fountain Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.12.1 Conventional Erasure Correction Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.12.2 Tornado Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.12.3 Luby Transform Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.12.4 Raptor Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

717 717 717 720 721 722 723 725 725 727 730

735 736 736 736 740 741 742 742 743 743 745 746 749 750 751 752 754 755 756 765 771 772 772 774 774 774

Part V Polar Codes

777

17 Polar Codes 17.1 Introduction and Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Notation and Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 Channel Polarization, N = 2 Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.2 Synthetic Channels and Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

779 779 781 782 782 783

731 733 734

Contents

17.4

17.5

17.6

17.7 17.8

17.9

17.10

xv 17.3.3 Synthetic Channel Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.4 An Example with N = 2 Using the Binary Erasure Channel . . . . . . . . . . . . . . . 17.3.5 Successive Cancellation Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.6 Tree Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.7 The Polar Coding Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Channel Polarization, N > 2 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.1 Channel Combining: Extension from N ∕ 2 to N channels. . . . . . . . . . . . . . . . . . 17.4.2 Pseudocode for Encoder for Arbitrary N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.3 Transition Probabilities and Channel Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.4 Channel Polarization Demonstrated: An Example Using the Binary Erasure Channel for N > 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.5 Polar Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.6 Tree Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.7 Successive Cancellation Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.8 SC Decoding from a Message Passing Point of View on the Tree . . . . . . . . . . 17.4.9 A Decoding Example with N = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.10 A Decoding Example with N = 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.11 Pseudo-Code Description of Successive Cancellation Algorithm . . . . . . . . . . . Some Theorems of Polar Coding Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.1 I(W) and Z(W) for general B-DMCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.2 Channel Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.3 The Polarization Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.4 A Few Facts About Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.5 Proof of the Polarization Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.6 Another Polarization Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.7 Rate of Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.8 Probability of Error Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing Polar Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.1 Code Design by Battacharyya Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.2 Monte Carlo Estimation of Z(WN(i) ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perspective: The Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Systematic Encoding of Polar Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.1 Polar Systematic Encoding via the Encoder Graph . . . . . . . . . . . . . . . . . . . . . . . . 17.8.2 Polar Systematic Encoding via Arıkan’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.3 Systematic Encoding: The Bit Reverse Permutation . . . . . . . . . . . . . . . . . . . . . . 17.8.4 Decoding Systematically Encoded Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.5 Flexible Systematic Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.6 Involutions and Domination Contiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.7 Polar Codes and Domination Contiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.8 Modifications for Polar Codes with Bit-Reverse Permutation . . . . . . . . . . . . . . List Decoding of Polar Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.1 The Likelihood Data Structure P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.2 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.3 Code to Compute P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.4 The Bits Data Structure C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.5 Code to Compute C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.6 Supporting Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.7 Code for Polar List Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.8 An Example of List Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.9 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9.10 Modified Polar Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LLR-Based Successive Cancellation List Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.10.1 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

786 786 787 792 793 793 793 797 798 802 805 807 807 810 811 813 823 826 826 827 830 830 831 832 835 835 837 837 838 839 839 840 842 843 843 843 846 847 850 850 851 853 854 855 856 857 857 862 864 865 866 867

xvi

Contents 17.11 Simplified Successive Cancellation Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.11.1 Modified SC Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.12 Relations with Reed–Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.13 Hardware and Throughput Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.14 Generalizations, Extensions, and Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 17.A BN is a Bit-Reverse Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 17.B The Relationship of the Battacharyya Parameter to Channel Capacity . . . . . . Appendix 17.B.1 Error Probability for Two Codewords . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 17.B.2 Proof of Inequality (17.59) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 17.B.3 Proof of Inequality (17.60) [16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 17.C Proof of Theorem 17.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.16 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

867 870 871 871 871 872 873 873 874 879 880 882 883

Part VI Applications

885

18 Some Applications of Error Correction in Modern Communication Systems 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Digital Video Broadcast T2 (DVB-T2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.1 BCH Outer Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.2 LDPC Inner Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Digital Cable Television . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 E-UTRA and Long-Term Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.1 LTE Rate 1/3 Convolutional Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.2 LTE Turbo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

887 887 887 887 888 892 895 897 897 898

Part VII Space-Time Coding

899

19 Fading Channels and Space-Time Codes 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Fading Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.1 Rayleigh Fading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Diversity Transmission and Reception: The MIMO Channel . . . . . . . . . . . . . . . . . . . . . . 19.3.1 The Narrowband MIMO Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3.2 Diversity Performance with Maximal-Ratio Combining . . . . . . . . . . . . . . . . . . . 19.4 Space-Time Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.1 The Alamouti Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.2 A More General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.3 Performance Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.4 Complex Orthogonal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5 Space-Time Trellis Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5.1 Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.6 How Many Antennas? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.7 Estimating Channel Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

901 901 901 904 905 907 908 909 909 911 911 916 917 919 919 921 922 923

References

925

Index

939

Preface The goals of this second edition are much the same as the first edition: to provide a comprehensive introduction to error correction coding suitable for the engineering practitioner and to provide a solid foundation leading to more advanced treatments or research. Since the first edition, the content has been modernized to include substantially more on LDPC code design and decoder algorithms, as well as substantial coverage of polar codes. The thorough introduction to finite fields and algebraic codes of the first edition has been retained, with some additions in finite geometries used for LDPC code design. As observed in the first edition, the sophistication of the mathematical tools used increases over time. While keeping with the sense that this is the first time most readers will have seen these tools, a somewhat higher degree of sophistication is needed in some places. The presentation is intended to provide a background useful both to engineers, who need to understand algorithmic aspects for the deployment and implementation of error correction coding, and to researchers, who need sufficient background to prepare them to read, understand, and ultimately contribute to the research literature. The practical algorithmic aspects are built upon a firm foundation of mathematics, which are carefully motivated and developed. Pedagogical Features Since its inception, coding theory has drawn from richly interacting variety of mathematical areas, including detection theory, information theory, linear algebra, finite geometries, combinatorics, optimization, system theory, probability, algebraic geometry, graph theory, statistical designs, Boolean functions, number theory, and modern algebra. The level of sophistication has increased over time: algebra has progressed from vector spaces to modules; practice has moved from polynomial interpolation to rational interpolation; Viterbi makes way for SOVA and BCJR. This richness can be bewildering to students, particularly engineering students who may be unaccustomed to posing problems and thinking abstractly. It is important, therefore, to motivate the mathematics carefully. Some of the major pedagogical features of the book are as follows.



While most engineering-oriented error-correction-coding textbooks clump the major mathematical concepts into a single chapter, in this book the concepts are developed over several chapters so they can be put to more immediate use. I have attempted to present the mathematics “just in time,” when they are needed and well-motivated. Groups and linear algebra suffice to describe linear block codes. Cyclic codes motivate polynomial rings. The design of cyclic codes motivates finite fields and associated number-theoretical tools. By interspersing the mathematical concepts with applications, a deeper and broader understanding is possible.



For most engineering students, finite fields, the Berlekamp–Massey algorithm, the Viterbi algorithm, BCJR, and other aspects of coding theory are abstract and subtle. Software implementations of the algorithms bring these abstractions closer to a meaningful reality, bringing deeper understanding than is possible by simply working homework problems and taking tests. Even when students grasp the concepts well enough to do homework on paper, these programs provide a further emphasis, as well as tools to help with the homework. The understanding becomes experiential, more than merely conceptual. Understanding of any subject typically improves when the student him- or herself has the chance to teach the material to someone (or something) else. A student must develop an especially clear understanding of a concept in order to “teach” it to something as dim-witted and literal-minded as a computer. In this process the computer can provide feedback to the student through debugging and program testing that reinforces understanding. In the coding courses I teach, students implement a variety of encoders and decoders, including Reed–Solomon encoders and decoders, convolutional encoders, turbo code decoders, and LDPC decoders. As a result of these programming activities, students move beyond an on-paper understanding, gaining a perspective of what coding theory can do and how to put it to work. A colleague of mine observed that many students emerge from a first course in coding theory more

xviii

Preface confused than informed. My experience with these programming exercises is that my students are, if anything, overconfident, and feel ready to take on a variety of challenges.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

In this book, programming exercises are presented in a series of 13 Laboratory Exercises. These are supported with code providing most of the software “infrastructure,” allowing students to focus on the particular algorithm they are implementing.

α10α11α12α13α14α15α16

LabIntro.pdf

These labs also help with the coverage of the course material. In my course I am able to offload classroom instruction of some topics for students to read, with the assurance that the students will learn it solidly on their own as they implement it. (The Euclidean algorithm is one of these topics in my course.) Research in error control coding can benefit from having a flexible library of tools for the computations, particularly since analytical results are frequently not available and simulations are required. The laboratory assignments presented here can form the foundation for a research library, with the added benefit that having written major components, the researcher can easily modify and extend them. It is in light of these pedagogic features that this book bears the subtitle Mathematical Methods and Algorithms. There is sufficient material in this book for a one- or two-semester course based on the book, even for instructors who prefer to focus less on implementational aspects and the laboratories. Over 200 programs, functions and data files are associated with the text. The programs are written in MATLAB,1 C, or C++. Some of these include complete executables which provide “tables” of primitive polynomials (over any prime field), cyclotomic cosets and minimal polynomials, and BCH codes (not just narrow sense), avoiding the need to tabulate this material. Other functions include those used to make plots and compute results in the book. These provide example of how the theory is put into practice. Other functions include those used for the laboratory exercises. The files are highlighted in the book by the icon α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

as in the marginal note above. The files are available at https://github.com/tkmoon/eccbook. Other aspects of the book include the following:

1



Many recent advances in coding have resulted from returning to the perspective of coding as a detection problem. Accordingly, the book starts off with a digital communication framework with a discussion of detection theory.



Recent codes such as polar codes or LDPC codes are capable of achieving capacity, or nearly so. It is important, therefore, to understand what capacity is and what it means to transmit at capacity. Chapter 1 also summarizes information theory, to put coding into its historical and modern context. Added in this second edition is an introduction to nonasymptotic information theory, which will be increasingly important in developing communication systems. The information theory informs EXIT chart analysis of turbo and LDPC codes.

MATLAB is a registered trademark of The Mathworks, Inc.

Preface



Pedagogically, Hamming codes are used to set the stage for the book by using them to demonstrate block codes, cyclic codes, trellises and Tanner graphs.



Homework exercises are drawn from a variety of sources and are at a variety of levels. Some are numerical, testing basic understanding of concepts. Others provide the opportunity to prove or extend results from the text. Others extend concepts or provide new results. Because of the programming laboratories, exercises requiring decoding by hand of given bit sequences are few, since I am of the opinion that is better to know how to tell the computer than to do it by hand. I have drawn these exercises from a variety of sources, including problems that I faced as a student and those which I have given to students on homework and exams over the years.



Number theoretic concepts such as divisibility, congruence, and the Chinese remainder theorem are developed at a point in the development where students can appreciate it.



Occasionally connections between the coding theoretic concepts and related topics are pointed out, such as public key cryptography and shift register sequences. These add spice and motivate students with the understanding that the tools they are learning have broad applicability.



There has been considerable recent progress made in decoding Reed–Solomon codes by re-examining their original definition. Accordingly, Reed–Solomon codes are defined both in this primordial way (as the image of a polynomial function) and also using a generator polynomial having roots that are consecutive powers of a primitive element. This sets the stage for several decoding algorithms for Reed–Solomon codes, including frequency-domain algorithms, Welch–Berlekamp algorithm and the soft-input Guruswami–Sudan algorithm.



Turbo codes, including EXIT chart analysis, are presented, with both BCJR and SOVA decoding algorithms. Both probabilistic and likelihood decoding viewpoints are presented.



LDPC codes are presented with an emphasis on the decoding algorithm. Density evolution analysis is also presented.



Polar Codes are a family of codes developed since the first edition. They offer low complexity encode and decode with soft inputs and without iterative decoding. They also provably achieve channel capacity (as the code length goes to infinity). This book offers a deep introduction to polar codes, including careful description of encoding and decoding algorithms, with operational C++ code.



Several Applications involving state-of-the-art systems illustrate how the concepts can be applied.



Space-time codes, used for multi-antenna systems in fading channels, are presented.

Courses of Study A variety of courses of study are possible. In the one-semester course I teach, I move quickly through principal topics of block, trellis, and iteratively–decoded codes. Here is an outline of one possible one-semester course: Chapter 1: Major topics only. Chapter 2: All. Chapter 3: Major topics. Chapter 4: Most. Leave CRC codes and LFSR to labs. Chapter 5: Most. Touch on RSA. Chapter 6: Most. Light on GFFT and variations. Chapter 12: Most. Skip puncturing and stack-oriented algorithms.

xix

xx

Preface Chapter 13: Most. Skip the V.34 material. Chapter 14: Basic definition and the BCJR algorithm. Chapter 15: Basic definition of LDPC codes and the sum-product decoder. Chapter 17: Introduce the idea of A guide in selecting material for this course is: follow the labs. To get through all 13 labs, selectivity is necessary. An alternative two-semester course could be a semester devoted to block codes followed by a semester on trellis and iteratively decoded codes. A two semester sequence could move straight through the book, with possible supplements from the research literature on topics of particular interest to the instructor. A different two-semester approach would follow the outline above for one semester, followed by more advanced coverage of LDPC and polar codes. Theorems, lemmas, corollaries, examples, and definitions are all numbered sequentially using the same counter in a chapter, which should make identifying these environments straightforward. Figures, tables, and equations each have their own counters. Definitions, examples, and proofs are terminated by the symbol ◽. Use of Computers

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10α11α12α13α14α15α16

comptut.pdf

The computer-based labs provide a means of working out some of the computational details that otherwise might require drudgery. These are primarily (with the exception of the first lab) to be done in C++, where the ability to overload operators and run at compiled speed make the language very well suited. It may be helpful to have a “desk calculator” for homework and exploring ideas. Many tools exist now for this. The brief tutorial comptut.pdf provides an introduction to gap and magma, both of which can be helpful to students doing homework or research in this area. The sage package built on Python also provides considerable relevant capability. Why This Book? In my mind, the purposes of a textbook are these: 1. To provide a topographical map into the field of study, showing the peaks and the important roads to them. (However, in an area as rich as coding theory, it is impossible to be exhaustive.) 2. To provide specific details about important techniques. 3. To present challenging exercises that will strengthen students’ understanding of the material and present new perspectives. 4. To have some reference value, so that practitioners can continue to use it. 5. To provide references to literature that will lead readers to make their own discoveries. With a rapidly-changing field, the references can only provide highlights; web-based searches have changed the nature of the game. Nevertheless, having a starting point in the literature is still important. 6. To present a breadth of ideas which will motivate students to innovate on their own. A significant difficulty I have faced is selection. The terrain is so richly textured that it cannot be mapped out in a single book. Every communication or information theory conference and every issue of the IEEE Transactions on Information Theory yields new and significant results. Publishing restrictions and practicality limit this book from being encyclopedic. My role as author has been merely to select what parts of the map to include and to present them in a pedagogically useful way. In so

Preface

xxi

doing, I have aimed to choose tools for the general practitioner and student (and of interest to me). Other than that selective role, no claim of creation is intended; I hope I have given credit as appropriate where it is due. This book is a result of teaching a course in error correction coding at Utah State University for nearly three decades. Over that time, I have taught out of the books [45], [489], and [275], and my debt to these books is clear. Parts of some chapters grew out of lecture notes based on these books and the connections will be obvious. I have felt compelled to include many of the exercises from the first coding course I took out of [275]. These books have defined for me the sine qua non of error-correction coding texts. I am also indebted to [296] for its rich theoretical treatment, [401] for presentation of trellis coding material, [462] for discussion of bounds, [190] for exhaustive treatment of turbo coding methods, and to the many great researchers and outstanding expositors whose works have contributed to my understanding. More recently, the extensive coverage of LDPC codes in [391] has been extremely helpful, and motivated my own implementation of almost all of the decoding algorithms. Acknowledgments I am grateful for the supportive environment at Utah State University that has made it possible to undertake and to complete this task. Students in coding classes over the years have contributed to this material. I have benefitted tremendously from feedback from careful readers and translators who have provided their own error correction from the first edition. Their negative acknowledgment protocols have improved the delivery of this packet of information. I thank you all! To my six wonderful children — Leslie, Kyra, Kaylie, Jennie, Kiana, and Spencer — and presently twelve grandchildren, and my wife Barbara, who have seen me slip away too often and too long to write, I express my deep gratitude for their trust and patience. In the end, all I do is for them. T.K.M Logan, UT, October 2020

List of Program Files comptut.pdf bpskprobplot.m bpskprob.m repcodeprob.m testrepcode.cc repcodes.m mindist.m hamcode74pe.m

Introduction to gap and magma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set up to plot prob. of error for BPSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot probability of error for BPSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error prob. for (n, 1) repetition codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the repetition code performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot results for repetition code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find minimum distance using exhaustive search . . . . . . . . . . . . . . . . . . . . Probability ( of) error for (7, 4) Hamming code . . . . . . . . . . . . . . . . . . . . . . .

xx 21 21 31 32 32 33 35

nchoosektest.m

Compute

and test for k < 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

plotcapcmp.m cawgnc2.m cbawgnc2.m h2.m plotcbawn2.m cbawgnc.m cawgnc.m philog.m phifun.m gaussj2 Hammsphere genstdarray.c progdetH15.m progdet.m polyadd.m polysub.m polymult.m polydiv.m polyaddm.m polysubm.m polymultm.m primitive.txt primfind.c BinLFSR.h BinLFSR.cc testBinLFSR.cc MakeLFSR BinPolyDiv.h BinPolyDiv.cc testBinPolyDiv.cc gcd.c crtgamma.m fromcrt.m tocrt.m testcrt.m testcrp.m tocrtpoly.m fromcrtpolym. crtgammapoly.m primfind cyclomin ModArnew.h testmodarnew1.cc

Capacity of the AWGN and BAWGNC channels . . . . . . . . . . . . . . . . . . . Compute capacity of AWGN channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute the capacity of the BAWGN channel . . . . . . . . . . . . . . . . . . . . . Compute the binary entropy function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot capacity for AWGN and BAWGN channels . . . . . . . . . . . . . . . . . . . Compute capacity for BAWGN channel . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute capacity for AWGN channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Log 𝜙 function for the BAWGNC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 𝜙 function for the BAWGNC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaussian elimination over GF(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of points in a Hamming sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate a standard array for a code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error detection prob. for (15, 11) Hamming code . . . . . . . . . . . . . . . . . . . Error detection prob. for (31, 21) Hamming code . . . . . . . . . . . . . . . . . . . Add polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subtract polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiply polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polynomial quotient and remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add polynomials modulo a number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subtract polynomials modulo a number . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiply polynomials modulo a number . . . . . . . . . . . . . . . . . . . . . . . . . . . Table of primitive polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Program to find primitive polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Binary LFSR class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Binary LFSR class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Binary LFSR class tester . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Makefile for tester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Binary polynomial division . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) Binary polynomial division . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Binary polynomial division test . . . . . . . . . . . . . . . . . . . . . A simple example of the Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . . Compute the gammas for the CRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convert from CRT representation to integer . . . . . . . . . . . . . . . . . . . . . . . . Compute the CRT representation of an integer . . . . . . . . . . . . . . . . . . . . . An example of CRT calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test a polynomial CRT calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CRT representation of a polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polynomial CRT representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gammas for the CRT representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find primitive polynomials in GF(p)[x] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cyclotomic cosets and minimal polynomials . . . . . . . . . . . . . . . . . . . . . . . Templatized Modulo arithmetic class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Templatized Modulo arithmetic class tester . . . . . . . . . . . . . . . . . . . . . . . .

49 49 49 49 54 54 54 54 54 95 98 101 105 105 126 126 126 126 126 126 126 163 163 169 169 169 169 170 170 170 177 197 197 197 197 197 197 197 197 216 223 228 228

n k

xxiv

List of Program Files polynomialT.h polynomialT.cc testpoly1.cc testgcdpoly.cc gcdpoly.cc GF2.h GFNUM2m.h GFNUM2m.cc testgfnum.cc bchweight.m bchdesigner reedsolwt.m masseymodM.m testRSerasure.cc erase.mag testBM.cc Chiensearch.h Chiensearch.cc testChien.cc BCHdec.h BCHdec.cc testBCH.cc RSdec.h RSdec.cc testRS.cc rsencode.cc rsdecode.cc bsc testpxy.cc computekm.m computeLm.cc computeLm.m testft.m fengtzeng.m invmodp.m testGS1.cc kotter.cc testGS3.cc testGS5.cc kotter1.cc testGS2.cc rothruck.cc rothruck.h Lbarex.m computetm.m computeLbar.m computeLm.m pi2m1 genrm.cc

(lab, complete) Templatized polynomial class . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Templatized polynomial class . . . . . . . . . . . . . . . . . . . . . . . . Demonstrate templatized polynomial class . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test polynomial GCD function . . . . . . . . . . . . . . . . . . . . . . . Polynomial GCD function (incomplete) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) GF(2) class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Galois field GF(2m ) class . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) Galois field GF(2m ) class . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test Galois field class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weight dist. of BCH code from weight of dual . . . . . . . . . . . . . . . . . . . . . . . Design a t-error correcting binary BCH code . . . . . . . . . . . . . . . . . . . . . . . . . Weight distribution for an (n, k) RS code . . . . . . . . . . . . . . . . . . . . . . . . . . . . Return the shortest LFSR for data sequence . . . . . . . . . . . . . . . . . . . . . . . . . . RS with erasures using Euclidean alg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erasure decoding example in magma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the Berlekamp–Massey algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Chien Search class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) Chien Search class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test the Chien Search class . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) BCHdec decoder class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) BCHdec decoder class . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) BCHdec decoder class tester . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) RS decoder class header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) RS decoder class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test RS decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Encode a file using RS encoder . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Decode a file using RS decoder . . . . . . . . . . . . . . . . . . . . . . Simulate a binary symmetric channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concepts relating to 2-variable polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . Compute Km for the Guruswami–Sudan decoder . . . . . . . . . . . . . . . . . . . . . Maximum list length for GS(m) decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum list length for GS(m) decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the Feng–Tzeng algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Poly. s.t. the first l + 1 columns are lin. dep. . . . . . . . . . . . . . . . . . . . . . . . . . Compute the inverse of a number modulo p . . . . . . . . . . . . . . . . . . . . . . . . . . Test the GS decoder (Kotter part) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kotter interpolation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the GS decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the GS decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kotter algorithm with m = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the Roth–Ruckenstein algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roth–Ruckenstein algorithm (find y-roots) . . . . . . . . . . . . . . . . . . . . . . . . . . . Roth–Ruckenstein algorithm (find y-roots) . . . . . . . . . . . . . . . . . . . . . . . . . . . Average performance of a GS(m) decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . Tm error correction capability for GS decoder . . . . . . . . . . . . . . . . . . . . . . . . Avg. no. of CWs around random pt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute max. length of list of decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Koetter–Vardy map reliability to multiplicity . . . . . . . . . . . . . . . . . . . . . . . . Create a Reed–Muller generator matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

229 229 229 229 229 230 230 230 230 247 248 253 265 275 276 288 289 289 289 289 289 289 290 291 291 291 291 291 328 336 336 336 344 344 344 349 349 350 353 353 353 353 353 358 358 358 358 363 378

List of Program Files rmdecex.m hadex.m testfht.cc fht.cc fht.m rmdecex2.m testQR.cc golaysimp.m testGolay.cc golayrith.m plotbds.m simplex1.m pivottableau.m reducefree.m restorefree.m krawtchouk.m lpboundex.m utiltkm.cc utiltkm.h concodequant.m chernoff1.m plotconprob.m finddfree teststack.m stackalg.m fanomet.m fanoalg.m BinConv.h BinConvFIR.h BinConvFIR.cc BinConvIIR.h BinConvIIR.cc testconvenc Convdec.h Convdec.cc BinConvdec01.h BinConvdecBPSK.h BinConvdecBPSK.cc testconvdec.cc makeB.m tcmt1.cc tcmrot2.cc lattstuff.m voln.m latta2.m lattz2m BCJR.h BCJR.cc testbcjr.cc testturbodec2.cc makegenfromA.m

xxv RM(1, 3) decoding example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computation of H8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the fast Hadamard transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fast Hadamard transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fast Hadamard transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RM(2, 4) decoding example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of arithmetic for QR code decoding . . . . . . . . . . . . . . . . . . . . . . . Derive equations for algebraic Golay decoder . . . . . . . . . . . . . . . . . . . . . . . Test the algebraic Golay decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arithmetic Golay decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot bounds for binary codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear program in standard form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Main function in simplex1.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Auxiliary linear programming function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Auxiliary linear programming function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute Krawtchouk polynomials recursively . . . . . . . . . . . . . . . . . . . . . . Solve the linear programming for the LP bound . . . . . . . . . . . . . . . . . . . . . Sort and random functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sort and random functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantization of the Euclidean metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chernoff bounds for conv. performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance bounds for a conv. code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Executable: Find dfree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the stack algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The stack algorithm for convolutional decoding . . . . . . . . . . . . . . . . . . . . . Fano metric for convolutionally coded data . . . . . . . . . . . . . . . . . . . . . . . . . The Fano algorithm for convolutional decoding . . . . . . . . . . . . . . . . . . . . . (lab, complete) Base class for bin. conv. enc. . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Bin. feedforward conv. enc. . . . . . . . . . . . . . . . . . . . . . . . . . (incomplete) Bin. feedforward conv. encoder . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Binary recursive conv. enc. . . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) Binary recursive conv. enc. . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test convolutional encoders . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Convolutional decoder class . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) Convolutional decoder class . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Binary conv. decoder, BSC . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Bin. conv. dec., BPSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Bin. conv. dec., BPSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test the convolutional decoder . . . . . . . . . . . . . . . . . . . . . . Make the B matrix for an example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test constellation for rotationally invariant code . . . . . . . . . . . . . . . . . . . . . Test constellation for rotationally invariant code . . . . . . . . . . . . . . . . . . . . . Gen. matrices for A2 , D4 , E6 , E8 , Λ16 , Λ24 . . . . . . . . . . . . . . . . . . . . . . . . . . Volume of n-dimensional unit-radius sphere . . . . . . . . . . . . . . . . . . . . . . . . Plot A2 lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot Z2 lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) BCJR algorithm class header . . . . . . . . . . . . . . . . . . . . . . . . (lab, incomplete) BCJR algorithm class . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test BCJR algorithm class . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) Test the turbo decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Systematic gen. matrix from parity check matrix . . . . . . . . . . . . . . . . . . . .

383 384 384 384 384 388 398 401 402 403 408 408 414 414 414 415 418 440 440 487 502 504 506 514 514 514 516 531 536 536 536 536 536 533 533 533 538 538 538 557 557 565 570 570 576 576 631 631 631 631 638

xxvi

List of Program Files gaussj2.m Agall.m Agall.txt writesparse.m ldpcdecoder.cc ldpctest2.cc decode() lldecode() probdecode() ldpc.m ldpcdecode.m ldpclogdecode.m phidecode() qphidecode() minsumdecode() minsumcorrectdecode() aminstartdecode() rcbpdecode() showchktable.m bitflip1decode() galAbitflipdecode() weightedbitflip decode() modwtedbitflipdecode() grad1bitflipdecode() grad2bitflipdecode() DC1decode() DMBPdecode() LPdecode() ldpctest1 psifunc.m densev1.m densevtest.m Psi.m Psiinv.m threshtab.m ldpcsim.mat exit1.m loghist.m exit3.m dotrajectory.m exit2.m doexitchart.m getinf.m getinfs.m sparseHno4.m Asmall.txt ldpcdec.h ldpcdec.cc ldpctest.cc ldpc.m ldpcdecode.m

Gauss–Jordan elimination over GF(2) . . . . . . . . . . . . . . . . . . . . . A parity check matrix for an LDPC code . . . . . . . . . . . . . . . . . . . Sparse representation of the matrix . . . . . . . . . . . . . . . . . . . . . . . . Write a matrix into a file in sparse format . . . . . . . . . . . . . . . . . . Many different LDPC decoding algorithms . . . . . . . . . . . . . . . . ldpcdecoder runner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decoding in class LDPCDECODER . . . . . . . . . . . . . . . . . . . . . . LL BP decoding in class LDPCDECODER . . . . . . . . . . . . . . . . Probability BP decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Demonstrate LDPC decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LDPC decoder (nonsparse representation) . . . . . . . . . . . . . . . . . LDPC decoder (log likelihood) . . . . . . . . . . . . . . . . . . . . . . . . . . . Belief propagation decoding using 𝜙 function . . . . . . . . . . . . . . Quantized 𝜙 decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Min sum decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Corrected min sum decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Approximation min∗ decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduce complexity box-plus decoder . . . . . . . . . . . . . . . . . . . . . . Compare RCBP quantization and ⊞ . . . . . . . . . . . . . . . . . . . . . . . Bit Flipping LDPC decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gallager Algorithm A Bit Flipping Decoder . . . . . . . . . . . . . . . . Weighted bit flipping decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modified weighted bit flipping decoder . . . . . . . . . . . . . . . . . . . . Gradient descent bit flipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multigradient bit flip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LDPC decoding using divide and concur . . . . . . . . . . . . . . . . . . . DMBP decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Programming LDPC Decoding . . . . . . . . . . . . . . . . . . . . . LDPC err. prob. & iteration plots . . . . . . . . . . . . . . . . . . . . . . . . . Plot the Ψ function used in density evolution . . . . . . . . . . . . . . . An example of density evolution . . . . . . . . . . . . . . . . . . . . . . . . . . Plot density evolution results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot the Ψ function used in density evolution . . . . . . . . . . . . . . . Compute Ψ−1 used in density evolution . . . . . . . . . . . . . . . . . . . . Convert threshold table to Eb ∕N0 . . . . . . . . . . . . . . . . . . . . . . . . . LDPC decoder simulation results . . . . . . . . . . . . . . . . . . . . . . . . . Plot histograms of LDPC decoder outputs . . . . . . . . . . . . . . . . . . Find histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot mutual information as a function of iteration . . . . . . . . . . . Mutual information as a function of iteration . . . . . . . . . . . . . . . Plot EXIT chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Take mutual information to EXIT chart . . . . . . . . . . . . . . . . . . . . Convert data to mutual information . . . . . . . . . . . . . . . . . . . . . . . Convert multiple data to mutual information . . . . . . . . . . . . . . . Make sparse check matrix without girth 4 . . . . . . . . . . . . . . . . . . H matrix example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (lab, complete) LDPC decoder class header . . . . . . . . . . . . . . . . (lab, incomplete) LDPC decoder class . . . . . . . . . . . . . . . . . . . . . (lab, complete) LDPC decoder class tester . . . . . . . . . . . . . . . . . (lab, complete) Demonstrate LDPC decoding . . . . . . . . . . . . . . LDPC decoder (not sparse) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

638 640 640 640 642 642 649 649 655 656 656 656 660 660 661 659 662 665 667 670 671 673 673 673 673 674 682 682 695 697 698 698 698 698 698 700 700 700 700 700 700 700 700 700 708 711 711 711 711 711 711

List of Program Files ldpctest1.cc A1-2.txt A1-3.txt gf257ex.m ldpcencode.m ldpccodesetup1.m fadeplot.m jakes.m fadepbplot.m

xxvii Prob. of err. plots for LDPC code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rate 1/2 parity check matrix, sparse format . . . . . . . . . . . . . . . . . . . . . . . . . . Rate 1/3 parity check matrix, sparse format . . . . . . . . . . . . . . . . . . . . . . . . . . LDPC check matrix min. wt. RS codewords . . . . . . . . . . . . . . . . . . . . . . . . . Encode LDPC codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Make parity check matrix for DVB-T2 IRA code . . . . . . . . . . . . . . . . . . . . . Realizations of the amplitude of a fading channel . . . . . . . . . . . . . . . . . . . . . Jakes method for amplitude of a fading channel . . . . . . . . . . . . . . . . . . . . . . BPSK performance over Rayleigh fading channel . . . . . . . . . . . . . . . . . . . .

711 711 732 732 732 903 903 904

List of Laboratory Exercises Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7 Lab 8 Lab 9 Lab 10 Lab 11 Lab 12 Lab 13

Simulating a Communications Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polynomial Division and Linear Feedback Shift Registers . . . . . . . . . . . . . . . . . . . . . . . . . . CRC Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming the Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Galois Field Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming the Berlekamp–Massey Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming the BCH Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reed–Solomon Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Convolutional Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convolutional Decoders: The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trellis-Coded Modulation Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Turbo Code Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming an LDPC Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60 168 170 228 229 287 289 290 535 537 586 631 710

List of Algorithms 1.1 1.2 1.3 1.4

Hamming Code Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outline for Simulating Digital Communications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outline for Simulating (n, k)-coded Digital Communications . . . . . . . . . . . . . . . . . . . . . . . . . Outline for Simulating (n, k) Hamming-coded Digital Communications . . . . . . . . . . . . . . .

34 60 61 61

4.1 4.2 4.2

Fast CRC encoding for a stream of bytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary Linear Feedback Shift Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary Polynomial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

161 169 170

5.1 5.2 5.3 5.4 5.5

Extended Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modulo Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Templatized Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polynomial GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GF(2m ) Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

189 228 229 229 230

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10

Massey’s Algorithm (Pseudocode). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massey’s Algorithm for Binary BCH Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Berlekamp–Massey Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chien Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BCH Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reed–Solomon Encoder Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reed–Solomon Decoder Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reed–Solomon Decoder Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reed–Solomon File Encoder and Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary Symmetric Channel Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

265 266 288 289 289 290 291 291 291 291

7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9

Welch–Berlekamp Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Welch–Berlekamp Interpolation, Modular Method v.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Welch–Berlekamp Interpolation, Modular Method v.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Welch–Berlekamp Interpolation, Modular Method v.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Feng–Tzeng Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kötters Interpolation for Guruswami–Sudan Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guruswami–Sudan Interpolation Decoder with m = 1 and L = 1 . . . . . . . . . . . . . . . . . . . . . . Roth–Ruckenstein Algorithm for Finding y-roots of Q(x, y) . . . . . . . . . . . . . . . . . . . . . . . . . . Koetter–Vardy Algorithm for Mapping from Π to M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

314 321 324 325 343 348 352 356 362

8.1 8.2

Decoding for RM(1, m) Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arithmetic Decoding of the Golay G24 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

382 403

11.1 11.2 11.3 11.4

Generalized Minimum Distance (GMD) Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chase-2 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chase-3 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ordered Statistic Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

441 445 445 448

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10

The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Stack Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Base Class for Binary Convolutional Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Derived Classes for FIR and IIR Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Program for Convolutional Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Base Decoder Class Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convolutional Decoder for Binary (0,1) Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convolutional Decoder for BPSK Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the Convolutional Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

477 514 532 535 536 536 537 537 538 538

xxxii

List of Algorithms 14.1 14.2 14.3 14.4

The BCJR (MAP) Decoding Algorithm, Probability Form . . . . . . . . . . . . . . . . . . . . . . . . . . . The Turbo Decoding Algorithm, Probability Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BCJR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test the turbo decoder Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

603 608 631 631

15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 15.10 15.11 15.12

Log-Likelihood Decoding Algorithm for Binary LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . Probability Algorithm for Binary LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a-Min∗ Algorithm for Check Node Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantize the chk Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bit Flipping Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gallager Algorithm A Bit Flipping Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weighted Bit Flipping Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Divide and Concur LDPC Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Difference Map Belief Propagation Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LDPC Decoder Class Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MATLAB Code to Test LDPC Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Make Performance Plots for LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

649 655 665 667 670 671 673 678 682 711 711 711

16.1 16.2

Repeat Accumulate Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Progressive Edge Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

720 747

17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 17.9 17.10 17.11 17.12 17.13 17.14 17.15 17.16

Polar Code Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Successive Cancellation Decoder (Pseudocode) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar Code Design Using Bhattacharrya Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar Code Design Using Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar Code Systematic Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Recursively Calculate P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Recursively Calculate C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Main Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Build Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Initialize Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Assign Initial Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Clone a Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Kill a Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Get Pointer P𝓁𝜆 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Get Pointer C𝜆𝓁 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polar List Decoder: Continue Paths for Unfrozen Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

797 824 838 839 842 854 856 857 858 859 859 859 860 860 860 861

List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

1.29

The binary entropy function H2 (p). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A general framework for digital communications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Signal constellation for BPSK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juxtaposition of signal waveforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two-dimensional signal space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-PSK signal constellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlation processing (equivalent to matched filtering). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional densities in BPSK modulation. (a) Conditional densities; (b) weighted conditional densities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributions when two signals are sent in Gaussian noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability of error for BPSK signaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability of error bound for 8-PSK modulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A binary symmetric channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Communication system diagram and BSC equivalent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Energy for a coded signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability of error for coded bits, before correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A (3, 1) binary repetition code. (a) The code as points in space; (b) the Hamming spheres around the points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A representation of decoding spheres. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance of the (3, 1) and (11, 1) repetition code over BSC. . . . . . . . . . . . . . . . . . . . . . . . Performance of the (7, 4) Hamming code in the AWGN channel. . . . . . . . . . . . . . . . . . . . . . The trellis of a (7, 4) Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Tanner graph for a (7, 4) Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of data-processing inequality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSC and BEC models. (a) Binary symmetric channel with crossover; (b) binary erasure channel with erasure probability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Capacities of AWGNC, BAWGNC, and BSC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship between input and output entropies for a channel. . . . . . . . . . . . . . . . . . . . . . . . Capacity lower bounds on Pb as a function of SNR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rate-blocklength tradeoff for the BSC with 𝛿 = 0.11 and maximal block error rate 𝜖 = 10−3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rate-blocklength tradeoff for the Gaussian channel with P = 1 and maximal block error rate 𝜖 = 10−3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regions for bounding the Q function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1 2.2

An illustration of cosets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A lattice partitioned into cosets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76 81

3.1 3.2

Error detection performance for a (15, 11) Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . Demonstrating modifications on a Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

110 115

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

Addition and multiplication tables for GF(2)[x]∕⟨x3 + 1⟩. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A circuit for multiplying two polynomials, last-element first. . . . . . . . . . . . . . . . . . . . . . . . . . High-speed circuit for multiplying two polynomials, last-element first. . . . . . . . . . . . . . . . . A circuit for multiplying two polynomials, first-element first. . . . . . . . . . . . . . . . . . . . . . . . . High-speed circuit for multiplying two polynomials, first-element first. . . . . . . . . . . . . . . . A circuit to perform polynomial division. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A circuit to divide by g(x) = x5 + x + 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Realizing h(x)∕g(x) (first-element first), controller canonical form. . . . . . . . . . . . . . . . . . . . . Realizing h(x)∕g(x) (first-element first), observability form. . . . . . . . . . . . . . . . . . . . . . . . . . . Realization of H(x) = (1 + x)∕(1 + x3 + x4 ), controller form. . . . . . . . . . . . . . . . . . . . . . . . . . Realization of H(x) = (1 + x)∕(1 + x3 + x4 ), H observability form. . . . . . . . . . . . . . . . . . . . .

127 138 138 139 139 139 141 142 142 142 143

1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28

5 6 11 11 12 13 15 18 20 21 22 23 25 26 26 28 29 31 35 37 38 44 45 50 51 55 59 60 65

xxxiv

List of Figures 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22 4.23 4.24 4.25 4.26 4.27 4.28 4.29 4.30

Nonsystematic encoding of cyclic codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Circuit for systematic encoding using g(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Systematic encoder for (7, 4) code with generator g(x) = 1 + x + x3 . . . . . . . . . . . . . . . . . . . A systematic encoder using the parity-check polynomial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A systematic encoder for the Hamming code using h(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A syndrome computation circuit for a cyclic code example. . . . . . . . . . . . . . . . . . . . . . . . . . . Cyclic decoder with r(x) shifted in the left end of syndrome register. . . . . . . . . . . . . . . . . . . Decoder for a (7, 4) Hamming code, input on the left. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cyclic decoder when r(x) is shifted in right end of syndrome register. . . . . . . . . . . . . . . . . . Hamming decoder with input fed into right end of the syndrome register. . . . . . . . . . . . . . . Meggitt decoders for the (31, 26) Hamming code. (a) Input from the left end; (b) Input from the right end. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiply r(x) by 𝜌(x) and compute the remainder modulo g(x). . . . . . . . . . . . . . . . . . . . . . . . Decoder for a shortened Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear feedback shift register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear feedback shift register with g(x) = 1 + x + x2 + x4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear feedback shift register with g(x) = 1 + x + x4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear feedback shift register, reciprocal polynomial convention. . . . . . . . . . . . . . . . . . . . . . Another LFSR circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An LFSR with state labels.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

144 145 146 146 147 148 149 150 151 152 153 154 155 162 162 164 165 176 177

5.1 5.2 5.3 5.4 5.5

LFSR labeled with powers of 𝛼 to illustrate Galois field elements. . . . . . . . . . . . . . . . . . . . . Multiplication of 𝛽 by 𝛼.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiplication of an arbitrary 𝛽 by 𝛼 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiplication of 𝛽 by an arbitrary field element. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subfield structure of GF(224 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

207 212 212 213 214

6.1

Chien search algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

256

7.1 7.2

301

7.3 7.4 7.5 7.6 7.7 7.8

Remainder computation when errors are in message locations.. . . . . . . . . . . . . . . . . . . . . . . . Comparing BD, ML, and list decoding. (a) BD decoding: codeword in same Hamming sphere as r; (b) ML decoding: nearest codeword to r decoding (possibly beyond Hamming sphere); (c) List decoding: all codewords within a given Hamming distance of r. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Km as a function of m for a (32, 8) Reed–Solomon code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fraction of errors corrected as a function of rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example of the Roth–Ruckenstein Algorithm over GF(5). . . . . . . . . . . . . . . . . . . . . . . . . An example of the Roth–Ruckenstein Algorithm over GF(5) (cont’d). . . . . . . . . . . . . . . . . ̄ to Π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence of M Computing the reliability function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

327 337 338 364 365 366 366

8.1 8.2 8.3 8.4 8.5 8.6

An encoder circuit for a RM(1, 3) code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Signal flow diagram for the fast Hadamard transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary adjacency relationships in (a) three and (b) four dimensions. . . . . . . . . . . . . . . . . . . . Planes shaded to represent the equations orthogonal on bit m34 . . . . . . . . . . . . . . . . . . . . . . . . Parity check geometry for second-order vectors for the RM(2, 4) code. . . . . . . . . . . . . . . . . Parity check geometry for first-order vectors for the RM(2, 4) code. . . . . . . . . . . . . . . . . . . .

381 385 389 389 391 392

9.1 9.2 9.3

Comparison of lower bound and various upper bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A linear programming problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding Stirling’s formula. (a) Under bound; (b) over bound. . . . . . . . . . . . . . . . . . . . . . . . . .

408 414 422

List of Figures

xxxv

10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8

A 3 × 4 interleaver and deinterleaver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A cross interleaver and deinterleaver system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The CD recording and data formatting process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The error-correction encoding in the compact disc standard. . . . . . . . . . . . . . . . . . . . . . . . . . . The product code 1 × 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A concatenated code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deep-space concatenated coding system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error trapping decoder for burst-error-correcting codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

426 427 428 429 430 431 432 435

11.1

Signal labels for soft-decision decoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

440

12.1 12.2 12.3 12.4 12.5

A rate R = 1∕2 convolutional encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A systematic R = 1∕2 encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A systematic R = 2∕3 encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A systematic R = 2∕3 encoder with more efficient hardware. . . . . . . . . . . . . . . . . . . . . . . . . . Encoder, state diagram, and trellis for G(x) = [1 + x2 , 1 + x + x2 ]. (a) Encoder; (b) state diagram; (c) one stage of trellis; (d) three stages of trellis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . State diagram and trellis for a rate R = 2∕3 systematic encoder. . . . . . . . . . . . . . . . . . . . . . . A feedforward R = 2∕3 encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A less efficient feedforward R = 2∕3 encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Processing stages for a convolutional code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notation associated with a state transition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Viterbi step: select the path with the best metric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Path through trellis corresponding to true sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add-compare-select operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A 2-bit quantization of the soft-decision metric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantization thresholds for four- and eight-level quantization. . . . . . . . . . . . . . . . . . . . . . . . . Bit error rate for different constraint lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bit error rate for various quantization and window lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . Viterbi algorithm performance as a function of quantizer threshold spacing. . . . . . . . . . . . BER performance as a function of truncation block length. . . . . . . . . . . . . . . . . . . . . . . . . . . . Error events due to merging paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two decoding examples. (a) Diverging path of distance 5; (b) diverging path of distance 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The state diagram and graph for diverging/remerging paths. (a) State diagram; (b) split state 0; (c) output weight irepresented by Di. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rules for simplification of flow graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps simplifying the flow graph for a convolutional code. . . . . . . . . . . . . . . . . . . . . . . . . . . . State diagram labeled with input/output weight and branch length. . . . . . . . . . . . . . . . . . . . . A state diagram to be enumerated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance of a (2, 1) convolutional code with dfree = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trellises for a punctured code. (a) Trellis for initial punctured code; (b) trellis by collapsing two stages of the initial trellis into a single stage. . . . . . . . . . . . . . . . . . . . . . . . . . . A tree representation for a rate R = 1∕2 code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stack contents for stack algorithm decoding example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flowchart for the Fano algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computations of SOVA at t = 3 (see Example 12.23). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computations of SOVA at t = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The trellis of a (7, 4) Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A systematic encoder for a (7, 4, 3) Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

456 457 458 458

12.6 12.7 12.8 12.9 12.10 12.11 12.12 12.13 12.14 12.15 12.16 12.17 12.18 12.19 12.20 12.21 12.22 12.23 12.24 12.25 12.26 12.27 12.28 12.29 12.30 12.31 12.32 12.33 12.34 12.35

461 462 463 463 472 473 475 478 483 487 488 489 490 491 491 492 493 494 495 495 496 497 504 509 511 515 517 529 531 534 534

xxxvi

List of Figures 12.36 A trellis for a cyclically encoded (7, 4, 3) Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.37 (a) State diagram and (b) trellis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

535 536

13.1 13.2 13.3

546 546

13.21 13.22 13.23 13.24 13.25 13.26 13.27 13.28

PSK signal constellations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QAM signal constellations (overlaid). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three communication scenarios. (a) QPSK, no coding; (b) QPSK, R = 2∕3 coding; (c) 8PSK, R = 2∕3 coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set partitioning of an 8-PSK signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R = 2∕3 trellis-coded modulation example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A TCM encoder employing subset selection and a four-state trellis. . . . . . . . . . . . . . . . . . . . An eight-state trellis for 8-PSK TCM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Block diagram of a TCM encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set partitioning on a 16-QAM constellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partition for 8-ASK signaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A correct path and an error path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example trellis for four-state code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trellis coder circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TCM encoder for QAM constellations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mapping of edge (i, j) to edge (f𝜙 (i), f𝜙 (j)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Encoder circuit for rotationally invariant TCM code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trellis for the rotationally invariant code of Figure 13.16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-Cross constellation for rotationally invariant TCM code. . . . . . . . . . . . . . . . . . . . . . . . . . . A portion of the lattice ℤ2 and its cosets. (a) ℤ2 lattice; (b) cosets of the lattice. . . . . . . . . Hexagonal lattice. (a) Basic lattice; (b) the fundamental parallelotopes around the lattice points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ℤ2 and its partition chain and cosets. (a) ℤ2 ; (b) cosets of Rℤ2 ; (c) Cosets of R2 ℤ2 .. . . . . Block diagram for a trellis lattice coder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lattice and circular boundaries for various constellations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-State trellis encoder for use with V.34 standard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trellis diagram of V.34 encoder [? ]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The 192-point constellation employed in the V.34 standard. . . . . . . . . . . . . . . . . . . . . . . . . . . Partition steps for the V.34 signal constellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orbits of some of the points under rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

571 575 575 576 579 580 581 582 583

14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 14.10 14.11 14.12 14.13 14.14 14.15 14.16 14.17 14.18

Decoding results for a (37, 21, 65536) code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Block diagram of a turbo encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Block diagram of a turbo encoder with puncturing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example turbo encoder with G(x) = 1∕(1 + x2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Block diagram of a turbo decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Processing stages for BCJR algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One transition of the trellis for the encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A log-likelihood turbo decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State sequences for an encoding example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arrangements of n1 = 3 detours in a sequence of length N = 7. . . . . . . . . . . . . . . . . . . . . . . . A 6 × 6 “square” sequence written into the 120 × 120 permuter. . . . . . . . . . . . . . . . . . . . . . . Variables used in iterative decoder for EXIT chart analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . Qualitative form of the transfer characteristic IE = T(IA ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trajectories of mutual information in iterated decoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Turbo BCH encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of an implementation of a parallel concatenated code. . . . . . . . . . . . . . . . . . . . . . . A trellis for a cyclically encoded (7, 4, 3) Hamming code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Framework for a turbo equalizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

592 593 594 594 595 597 598 611 619 620 622 623 625 625 626 627 628 629

13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.12 13.11 13.13 13.14 13.15 13.16 13.17 13.18 13.19 13.20

548 549 549 551 552 553 554 554 557 557 560 564 564 565 565 566 571

List of Figures

xxxvii

14.19 Trellis associated with a channel with L = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.20 Example for a (3, 2, 2) parity-check code. (a) A simple (3, 2, 2) parity check code; (b) received values Lc rt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

630

15.1 15.2 15.3 15.4 15.5 15.6 15.7

Bipartite graph associated with the parity check matrix H. . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of LDPC decoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A check node with four connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The message from check node Cm to variable node Vn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The message from variable node Vn to check node Cm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A parity-check tree associated with the Tanner graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A portion of the Tanner graph for the example code, with node for V1 as a root, showing a cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Processing information through the graph determined by H. . . . . . . . . . . . . . . . . . . . . . . . . . . Probabilistic message from check node m to variable nodes n. . . . . . . . . . . . . . . . . . . . . . . . . The function 𝜙(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison between LL-based computing and min-sum computing for check to variable node computation. (a) Original comparison; (b) scaled comparison. . . . . . . . . . . . . . . . . . . . s(x, y), the approximation s̃(x, y), L1 ⊞ L2 , and the approximation using s̃(x, y). (a) s(x, y); (b) s̃(x, y); (c) exact L1 ⊞ L2 ; (d) approximate L1 ⊞ L2 . . . . . . . . . . . . . . . . . . . . . . . . Comparison of ⊞ and its RCBP quantization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Approximations to the 𝜙(𝛽) function for the Richardson–Novichkov decoder. . . . . . . . . . . Steps in the DC algorithm. (a) Projection of 𝐫[0] and 𝐫[1] onto 0 and 1 ; (b) 𝐫(m) + (𝐫 ) − 𝐫(m) ) (Overshoot); (c) concur projection, and results of difference map. . . . . 2(Pm D (m) Linear programming problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polytope of the codewords (0, 0, 0), (0, 1, 1), (1, 1, 0), (1, 0, 1).. . . . . . . . . . . . . . . . . . . . . . . . . Tanner graph for the Hamming (7, 4) code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of constraints 𝜅m as a function of | (m)|. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hamming code Tanner graph. Solid = 1, dotted = 0, heavy dashed = erasure. (a) Tanner graph; (b) graph with deleted edges; (c) after one erasure corrected; (d) after two erasures corrected. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stopping sets. (a) Idea of stopping sets; (b) stopping set {𝑣0 , 𝑣4 , 𝑣6 } on a (9, 3) example code [442]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of the decoding performance of LPDC codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of hard-decision Hamming and iterative decoding. . . . . . . . . . . . . . . . . . . . . . . . The function Ψ(x) compared with tanh(x∕2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behavior of density evolution for a R = 1∕3 code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Messages from bits to checks and from checks to bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Histograms of the bit-to-check information for various decoder iterations. . . . . . . . . . . . . . Decoder information at various signal-to-noise ratios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EXIT charts at various signal-to-noise ratios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Result of permutation of rows and columns on parity check matrix. . . . . . . . . . . . . . . . . . . .

641 642 645 646 647 652

A repeat-accumulate encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Tanner graph for a (3, 1) RA code with two input bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . Decoding steps for an RA code. (a) Initialization/parity bit to check; (b) check to message; (c) message to check; (d) check to parity bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanner graph for an irregular repeat-accumulate code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example Tanner graph for an irregular repeat-accumulate code. . . . . . . . . . . . . . . . . . . . . . . An IRAA encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Encoding QC codes. (a) cyclic shift register adder accumulator (CSRAA); (b) parallelizing the CRSAA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (3, 6) Ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

718 718

15.8 15.9 15.10 15.11 15.12 15.13 15.14 15.15 15.16 15.17 15.18 15.19 15.20

15.21 15.22 15.23 15.24 15.25 15.26 15.27 15.28 15.29 15.30 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8

632

652 653 654 660 661 664 668 669 675 683 685 686 690

692 693 694 695 697 699 700 701 701 702 708

720 721 722 723 727 743

xxxviii

List of Figures 16.9 16.10 16.11 16.12 16.13

16.14

16.15

16.16 16.17

16.18 16.19 16.20 16.21

16.22 16.23 16.24

16.25 16.26 17.1

17.2

17.3 17.4 17.5 17.6

Illustrating an irregular ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example MET ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notation for the PEG algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance of a (4000, 2000) LDPC code founding using PEG. . . . . . . . . . . . . . . . . . . . . . Graphs related to protograph description of LDPC code. (a) Initial protograph from adjacency matrix in 16.10; (b) repeating nodes Q = 3 times, and permutations of repeated edges; (c) detailed Tanner graph derived from protograph. . . . . . . . . . . . . . . . . . . . Subgraphs associated with trapping sets. ∘ = variable nodes. ◾ = unsatisfied check nodes. ◽ = incorrectly satisfied check nodes. (a) (12, 4) trapping set; (b) (14, 4) trapping set; (c) (5, 5) trapping set; (d) (5, 7) trapping set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability of error for a BPSK detection problem using MC and IS sampling. (a) Distribution for Monte Carlo integration; (b) optimal sampling distribution for this problem; (c) distribution for IS integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Best gain and shift 𝜁 for an importance sampling example. (a) 𝛾 and PE as a function of 𝜎 2 . PE decreases when 𝛾 increases; (b) z as a function of 𝜎 2 . . . . . . . . . . . . . . . . . . . . . . . . . . Probability of error for three-symbols detection problem using MC and IS sampling. (a) Distribution for Monte Carlo integration; (b) optimal sampling distribution q∗ for this problem; (c) distribution for IS integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Points chosen at random on a circle of radius 𝜌 = 2 with Gaussian noise added of covariance K̃ = K∕n, where K = 𝜎 2 I2 with 𝜎 2 = 1. (a) n = 0.5; (b) n = 1; (c) n = 2. . . . . An example constraint set A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example constellation. (a) Constellation; (b) error region; (c) spherical offset. . . . . . . . . . Geometry of coded symbol points. (a) The sphere of codewords around 0, and the sphere of codewords around 𝐬0 ; (b) side view of two spheres; (c) plane in ℝN−1 on which intersecting hypersphere lies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decision boundary for (3, 2) single parity-check coding (𝜎 = 0.1). Squares denote codewords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A (3, 2, 2) tree and labeling for a particular boundary. (a) Tree; (b) indication of variable nodes for the 𝜕E(2,4,6,8,14,16) boundary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trees which are not symmetric. (a) Variable nodes 2 and 3 and nodes 4 and 5 isormophically interchangeable; (b) variable nodes 2 and 3 isormophically interchangeable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanner graph for (9, 3) regular code, and its unfolding into a tree. (a) Tanner graph for the code; (b) unfolded tree; (c) pruned unfolded tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph defining Tornado codes. (a) One stage of code graph; (b) decoding an erased value; (c) two stages of check bits; (d) overall Tornado code with m + 1 stages. . . . . . . . . Error correction performance of polar codes compared with an LDPC code of H the same rate. (a) Results for a (2048,1024) polar code, with list decoding (see [431]); (b) comparison of R = 0.84 polar codes with LDPC code of same rate (see [391]). (Color versions of the page can be found at wileywebpage or https://github.com/ tkmoon/eccbook/colorpages.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic ideas of polarization. (a) Two uses of the channel W; (b) simple coding and two channel uses; (c) recursively applying the coding construction, N = 4; (d) recursively applying the coding construction, N = 8 with some bits frozen. . . . . . . . . . . . . . . . . . . . . . . . First stage of the construction of the polar transform. (a) The basic building block; (b) representation in terms of the channels W2(0) and W2(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . g(p) and h(p) for p ∈ [0, 1], showing that g(p) > h(p). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Successive cancellation on N = 2 channels. (a) Upper log-likelihood ratio computation; (b) lower log-likelihood ratio computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of tanh rule (solid lines) with (17.26) (dashed lines). . . . . . . . . . . . . . . . . . . . . .

743 745 747 748

749

751

755 758

759 759 760 761

764 767 768

770 771 773

780

780 783 788 790 791

List of Figures 17.7 17.8

17.9 17.10 17.11 17.12 17.13 17.14 17.15 17.16 17.17

17.18 17.19 17.20

17.21 17.22 17.23

17.24

17.25

17.26 17.27 17.28 17.29 17.30

Tree representation of the encoding. (a) Encoder N = 2; (b) converting to a tree; (c) converted to a tree; (d) rotated with the root node at the top. . . . . . . . . . . . . . . . . . . . . . . . Repeating the basic construction by duplicating W2 to form W4 . (a) Duplication of W2 ; (b) creation of W4(0) and W4(1) ; (c) creation of W4(2) and W4(3) ; (d) reordering the inputs of W4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Another representation of W4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General recursion to go from two WN∕2 channels to WN . (a) Basic form of general recursion to go from two WN∕2 channels to WN ; (b) alternative form for the recursion. . . The channel W8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The tree of recursively constructed channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tree decomposition of the information in the W4 channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing the synthetic BEC erasure probabilities on a tree. . . . . . . . . . . . . . . . . . . . . . . . . . Polarization effect for the BEC channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BEC probability for polarized channels, p = 0.3. (a) Unsorted probabilities, 8 iterations; (b) sorted probabilities, 8 iterations; (c) sorted probabilities, 16 iterations. . . . . . . . . . . . . . A tree representation of the decoder structure. (a) Basic encoding structure; (b) basic encoding structure, sequential bit order; (c) identifying frozen and unfrozen nodes, and the groupings of adders; (d) first stage of converting groupings into nodes, with nodes colored to match colorings of children; (e) next stage of converting groupings into nodes; (f) final stage of converting groupings into nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotated tree and messages in the tree. (a) Rotating the tree to put the root node at the top; (b) a portion of a tree, showing messages passed in a local decoder. . . . . . . . . . . . . . . . . . . . Encoder for N = 4 and its associated tree. (a) Encoder illustrating node groupings; (b) decoding tree for N = 4 with messages labeled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decoding steps for N = 4. (a) L4(0) ; (b) L4(1) ; (c) L4(2) ; (d) L4(3) . (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/ colorpages.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decoding tree for N = 8 with messages labeled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The structure used for the decoding example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Successive cancellation steps, j = 0, 1, 2, i = 0, 4, 2. (a) LLR path for bit j = 0. i = 0, level1 = 3; (b) bit propagation, j = 0, i = 0; (c) LLR path for bit j = 1. i = 4, level1 = 1; (d) bit propagation j = 1. i = 4, level0 = 2; (e) LLR path for bit j = 2, i = 2. level1 = 2; (f) bit propagation j = 2, i = 2. (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/ colorpages.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Successive cancellation steps, j = 3, 4, 5, i = 6, 1, 5. (a) LLR path for bit j = 3, i = 6. level1 = 1; (b) bit propagation for bit j = 3, i = 6. level0 = 3; (c) LLR path for bit j = 4, i = 1. level1 = 3; (d) bit propagation j = 4, i = 1; (e) LLR path for bit j = 5, i = 5. level1 = 1; (f) bit propagation j = 5, i = 5. level0 = 2. (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/ colorpages.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Successive cancellation steps, j = 6, 7, i = 3, 7. (a) LLR path for bit j = 6, i = 3. level1 = 2; (b) bit propagation j = 6, i = 3; (c) LLR path for bit j = 7, i = 7. level1 = 1. (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.). . . . . . . . . . . . . . . . . . . . . . . . . Z(W), I(W) and the bounds from 17.59 and 17.60. (a) BEC; (b) BSC. . . . . . . . . . . . . . . . . . Calculations for systematic polar encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A systematic encoder for an (8, 5) polar code [396]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evolution of paths in list decoding. Following [435]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Encoder for N = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxxix

792

794 795 797 798 801 802 804 804 806

808 809 811

814 814 815

816

820

823 827 841 844 851 852

xl

List of Figures 17.31 Data structure P used in list decoding with N = 8, n = 3, L = 4. Numbers in the braces indicate the dimension of the array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.32 A two-step list decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.33 A tree labeled to illustrate the notation for simplified successive cancellation decoding. . 17.34 Identifying rate one nodes (black) and a rate zero nodes (white) in the tree, and the reduced tree. (a) Rate one nodes and rate zero nodes; (b) the reduced tree. . . . . . . . . . . . . . 17.35 Plot of E0 (𝜌, Q). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 18.2 18.3 18.4

FEC encoding structure (see [107, Figure 12]).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concatenated Reed–Solomon and Trellis coding for television transmission (see [204]). Trellis encoder ([204, Figure 2]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set partition of 64-QAM. At the last stage, different shapes are used to indicate the different subsets. (This decomposition is for example here, and is not indicated in [204].) 18.5 Trellis decoder ([204, Figure 3]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ̂ to subsets in the 64-QAM constellation. . . . . . . . . . . . . . . . . . . . . . . . . 18.6 Distances from (Î, Q) 18.7 Rate 4/5 convolutional encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.8 OFDM transmission over a frequency-selective channel (without and with coding) (see [87, Section 4.7]). (a) OFDM transmission over a frequency-selective channel; (b) channel coding and interleaving to provide frequency diversity. . . . . . . . . . . . . . . . . . . . 18.9 Rate 1/3 convolutional encoder for LTE [1, Figure 2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.10 LTE rate 1/3 turbo code encoder [1, Figure 3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1 19.2 19.3

Multiple reflections from transmitter to receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation of a fading channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diversity performance of quasi-static, flat-fading channel with BPSK modulation. Solid is exact; dashed is approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4 Multiple transmit and receive antennas across a fading channel. . . . . . . . . . . . . . . . . . . . . . . 19.5 Two receive antennas and a maximal ratio combiner receiver. . . . . . . . . . . . . . . . . . . . . . . . . 19.6 A two-transmit antenna diversity scheme: the Alamouti code. . . . . . . . . . . . . . . . . . . . . . . . . 19.7 A delay diversity scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8 8-PSK constellation and the trellis for a delay-diversity encoder. . . . . . . . . . . . . . . . . . . . . . . 19.9 Space-time codes with diversity 2 for 4-PSK having 8, 16, and 32 states [435]. . . . . . . . . . 19.10 Performance of codes with 4-PSK that achieve diversity 2 [435]. (a) Two receive and two transmit antennas; (b) one receive and two transmit antennas.. . . . . . . . . . . . . . . . . . . . . 19.11 Space-time codes with diversity 2 for 8-PSK having 16 and 32 states [435]. . . . . . . . . . . . 19.12 Performance of codes with 8-PSK that achieve diversity 2 [435]. (a) Two receive and two transmit antennas; (b) one receive and two transmit antennas.. . . . . . . . . . . . . . . . . . . . .

853 865 868 868 876 888 892 893 894 894 894 895

896 897 898 902 903 905 906 907 909 917 918 919 920 921 921

List of Tables 1.1

Historical Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.1

The Standard Array for a Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

101

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13

Codewords in the Code Generated by g(x) = 1 + x2 + x3 + x4 . . . . . . . . . . . . . . . . . . . . . . . . Systematic Codewords in the Code Generated by g(x) = 1 + x2 + x3 + x4 . . . . . . . . . . . . . . Computation Steps for Long Division Using a Shift Register Circuit . . . . . . . . . . . . . . . . . . Computing the Syndrome and Its Cyclic Shifts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operation of the Meggitt Decoder, Input from the Left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operation of the Meggitt Decoder, Input from the Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CRC Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lookup Table for CRC-ANSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LFSR Example with g(x) = 1 + x + x2 + x4 and Initial State 1 . . . . . . . . . . . . . . . . . . . . . . . . LFSR Example with g(x) = 1 + x + x2 + x4 ] and Initial State 1 + x . . . . . . . . . . . . . . . . . . . . LFSR Example with g(x) = 1 + x + x2 + x4 ] and Initial State 1 + x2 + x3 . . . . . . . . . . . . . . LFSR example with g(x) = 1 + x + x4 ] and Initial State 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barker Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

132 136 141 148 151 152 158 160 163 163 163 164 178

5.1

Power, Vector, and Polynomial Representations of GF(24 ) as an Extension Using g(𝛼) = 1 + 𝛼 + 𝛼 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conjugacy Classes over GF(23 ) with Respect to GF(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conjugacy Classes over GF(24 ) with Respect to GF(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conjugacy Classes over GF(25 ) with Respect to GF(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conjugacy Classes over GF(42 ) with Respect to GF(4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cyclotomic Cosets Modulo 15 with Respect to GF(7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

206 220 221 221 221 223

5.2 5.3 5.4 5.5 5.6 6.1

6.8

Weight Distribution of the Dual of a Double-Error-Correcting Primitive Binary BCH Code of Length n = 2m − 1, m ≥ 3, m Odd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weight Distribution of the Dual of a Double-Error-Correcting Primitive Binary Narrow-Sense BCH Code, n = 2m − 1, m ≥ 4, m Even . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weight Distribution of the Dual of a Triple-Error- Correcting Primitive Binary Narrow-Sense BCH Code, n = 2m − 1, m ≥ 5, m Odd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weight Distribution of the Dual of a Triple-Error- Correcting Primitive Binary Narrow-Sense BCH Code, n = 2m − 1, m ≥ 6, m Even . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evolution of the Berlekamp–Massey Algorithm for the Input Sequence {1, 1, 1, 0, 1, 0, 0} Berlekamp–Massey Algorithm for a Double-Error-Correcting Code . . . . . . . . . . . . . . . . . . . Berlekamp-Massey Algorithm for a Double-Error-Correcting Code: Simplifications for the Binary Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Berlekamp–Massey Algorithm for a Triple-Error-Correcting Code . . . . . . . . . . . . . . . . . . . .

7.1 7.2

Monomials Ordered Under (1, 3)-revlex Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monomials Ordered Under (1, 3)-lex Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

330 330

8.1 8.2

Extended Quadratic Residue Codes Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weight Distributions for the G23 and G24 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

399 400

10.1

Performance Specification of the Compact Disc Coding System . . . . . . . . . . . . . . . . . . . . . .

430

12.1 12.2 12.3 12.4

Quantized Branch Metrics Using Linear Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Best Convolutional Codes of Rates R = 1∕2 and R = 1∕3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Best Convolutional Codes of Rates R = 1∕4, R = 2∕3, and R = 3∕4 . . . . . . . . . . . . . . . . . . Comparison of Free Distance as a Function of Constraint Length for Systematic and Nonsystematic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

489 506 507

6.2 6.3 6.4 6.5 6.6 6.7

246 247 247 248 266 266 267 272

508

xlii

List of Tables 12.5 12.6

Best Known R = 2∕3 and R = 3∕4 Convolutional Codes Obtained by Puncturing a R = 1∕2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance of Fano Algorithm on a Particular Sequence as a Function of Δ . . . . . . . . . .

13.1 13.2 13.3 13.4

510 523

Average Energy Requirements for Some QAM Constellations . . . . . . . . . . . . . . . . . . . . . . . . Maximum Free-Distance Trellis Codes for 8-PSK Constellation . . . . . . . . . . . . . . . . . . . . . . Maximum Free-Distance Trellis Codes for 16-PSK Constellation . . . . . . . . . . . . . . . . . . . . . Maximum Free-Distance Trellis Codes for Amplitude Modulated (One-Dimensional) Constellations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Encoder Connections and Coding Gains for Maximum Free-Distance QAM Trellis Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Attributes of Some Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 Comparison of Average Signal Energy for Circular Boundary ℤ2 and A2 Constellations with Regular QAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 Some Good Multidimensional TCM Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.9 Bit Converter: Sublattice Partition of 4D Rectangular Lattice [? ] . . . . . . . . . . . . . . . . . . . . . 13.10 4D Block Encoder [? ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

547 561 562

14.1 14.2

𝛼t′ and 𝛽t′ Example Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Posterior Input Bit Example Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

604 604

15.1 15.2

Threshold Values for Various LDPC Codes for the Binary AWGN Channel . . . . . . . . . . . Degree Distribution Pairs for R = 1∕2 Codes for Binary Transmission over an AWGN [? , p. 623]. 𝜎𝜏 is the decoding threshold, also expressed as (Eb ∕N0 ) (dB) . . . . . . . . . . . . . .

699

757

16.3 16.4

IS Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MC and IS Comparison for Digital Communication for Various IS Mean Translation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IS Simulation Results for Single Parity-Check Codes of Length N. (See [493]) . . . . . . . . . IS Simulation Results for a (3, 6, 3) Decoding Tree. 𝜖 = 10% [493] . . . . . . . . . . . . . . . . . . .

763 768 770

18.1 18.2 18.3 18.4 18.5 18.6 18.7

Coding Parameters for a Normal Frame NLDPC = 64,800 (See [105, Table 6(a)] . . . . . . . . Coding Parameters for a Short Frame NLDPC = 16,200 (See [105, Table 6(b)] . . . . . . . . . . BCH Minimal Polynomials for Normal Codes (NLDPC = 64,800) (See [105, Table 7(a)]) BCH Minimal Polynomials for Normal Codes (NLDPC = 6200) (See [105, Table 7(b)]) . IRA Indices for DVB-T2 Rate 2/3 Code, NLDPC = 64,800 (See [105, Table A.3]) . . . . . . QLDPC Values for Frames of Normal and Short Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Channel Coding Schemes for Transmission and Control Channels [2, Section 5.1.3] . . . .

888 888 889 889 890 891 897

16.1 16.2

562 563 573 577 578 580 584

707

List of Boxes Box 1.1 Box 2.1 Box 3.1 Box 3.2 Box 4.1 Box 5.1 Box 9.1 Box 9.2 Box 12.1

The Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One-to-One and Onto Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Correction and Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The UDP Protocol [SD1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Division Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Èveriste Galois (1811–1832) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O and o Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Cauchy–Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphs: Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22 79 100 113 124 204 409 412 459

About the Companion Website This book is accompanied by a companion website:

www.wiley.com/go/Moon/ErrorCorrectionCoding The website includes:

• •

Solutions Manual Program Files

Part I

Introduction and Foundations

Chapter 1

A Context for Error Correction Coding I will make weak things become strong unto them . . . —Ether 12:27

. . . he denies that any error in the machine is responsible for the so-called errors in the answers. He claims that the Machines are self correcting and that it would violate the fundamental laws of nature for an error to exist in the circuits of relays. —Isaac Asimov I, Robot

1.1

Purpose of This Book

Error control coding in the context of digital communication has a history dating back to the middle of the twentieth century. In recent years, the field has been revolutionized by codes which are capable of approaching the theoretical limits of performance, the channel capacity. This has been impelled by a trend away from purely combinatoric and discrete approaches to coding theory toward codes which are more closely tied to a physical channel and soft decoding techniques. The purpose of this book is to present error correction and detection coding covering both traditional concepts thoroughly as well as modern developments in soft-decision and iteratively decoded codes and recent decoding algorithms for algebraic codes. An attempt has been made to maintain some degree of balance between the mathematics and their engineering implications by presenting both the mathematical methods used in understanding the codes as well as the algorithms which are used to efficiently encode and decode.

1.2

Introduction: Where Are Codes?

Error correction coding is the means whereby errors which may be introduced into digital data as a result of transmission through a communication channel can be corrected based upon received data. Error detection coding is the means whereby errors can be detected based upon received information. Collectively, error correction and error detection coding are error control coding. Error control coding can provide the difference between an operational communications system and a dysfunctional system. It has been a significant enabler in the telecommunications revolution, portable computers, the Internet, digital media, and space exploration. Error control coding is nearly ubiquitous in modern, information-based society. Every compact disc, CD-ROM, or DVD employs codes to protect the data embedded in the plastic disk. Every flash drive (thumb drive) and hard disk drive employs correction coding. Every phone call made over a digital cellular phone employs it. Every frame of digital television is protected by error correction coding. Every packet transmitted over the Internet has a protective coding “wrapper” used to determine if the packet has been received correctly. Even everyday commerce takes advantage of error detection coding, as the following examples illustrate. Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

4

A Context for Error Correction Coding Example 1.1 The ISBN (international standard book number) is used to uniquely identify books. An ISBN such as 0-201-36186-8 can be parsed as 0 − 20 − 1 − 36186 − 8 . ⏟⏟⏟ ⏟⏟⏟ ⏟⏞⏞⏟⏞⏞⏟ ⏟⏟⏟ country

publisher

book no.

check

Hyphens do not matter. The first digit indicates a country/language, with 0 for the United States. The next two digits are a publisher code. The next six digits are a publisher-assigned book number. The last digit is a check digit, used to validate if the code is correct using what is known as a weighted code. An ISBN is checked as follows: the cumulative sum of the digits is computed, then the cumulative sum of the cumulative sum is computed. For a valid ISBN, the sum-of-the-sum must be equal to 0, modulo 11. The character X is used for the check digit 10. For this ISBN, we have Digit

Cumulative Sum

Cumulative Sum

0 2 0 1 3 6 1 8 6 8

0 2 2 3 6 12 13 21 27 35

0 2 4 7 13 25 38 59 86 121

The final sum-of-the-sum is 121, which is equal to 0 modulo 11 (i.e., the remainder after dividing by 11 is 0). ◽ Example 1.2 The Universal Product Codes (UPC) employed on the bar codes of most merchandise employ a simple error detection system to help ensure reliability in scanning. In this case, the error detection system consists of a simple parity check. A UPC consists of a 12-digit sequence, which can be parsed as 0 16000 ⏟⏞⏟⏞⏟ manufacturer identification number

66610 8 . ⏟⏟⏟ ⏟⏟⏟ item parity number check

Denoting the digits as u1 , u2 , . . . , u12 , the parity digit u12 is determined such that 3(u1 + u3 + u5 + u7 + u9 + u11 ) + (u2 + u4 + u6 + u8 + u10 + u12 ) is a multiple of 10. In this case, 3(0 + 6 + 0 + 6 + 6 + 0) + (1 + 0 + 0 + 6 + 1 + 8) = 70. If, when a product is scanned, the parity condition does not work, the operator is flagged so that the object may be re-scanned. ◽

1.3

The Communications System

Appreciation of the contributions of coding and understanding its limitations require some awareness of information theory and how its major theorems delimit the performance of a digital communication system. Information theory is increasingly relevant to coding theory, because with recent advances in theory it is now possible to achieve the performance bounds of information theory. By contrast, in the past, the bounds were more of a backdrop to the action on the stage of coding research and practice. Part of this success has come by placing the coding problem more fully in its communications context,

1.3 The Communications System

5

marrying the coding problem more closely to the signal detection problem instead of treating the coding problem mostly as one of discrete combinatorics. Information theory treats information almost as a physical quantity which can be measured, transformed, stored, and moved from place to place. A fundamental concept of information theory is that information is conveyed by the resolution of uncertainty. Information can be measured by the amount of uncertainty resolved. For example, if a digital source always emits the same value, say 1, then no information is gained by observing that the source has produced, yet again, the output 1, since there was no uncertainty about the outcome to begin with. Probabilities are used to mathematically describe the uncertainty. For a discrete random variable X (i.e., one which produces discrete outputs, such as X = 0 or X = 1), the information conveyed by observing an outcome x is defined as −log2 P(X = x) bits. (If the logarithm is base 2, the units of information are in bits. If the natural logarithm is employed, the units of information are in nats.) For example, if P(X = 1) = 1 (the outcome 1 is certain), then observing X = 1 yields −log2 (1) = 0 bits of information. On the other hand, observing X = 0 in this case yields −log2 (0) = ∞: total surprise at observing an impossible outcome. The entropy is the average information. For a binary source X having two outcomes occurring with probabilities p and 1 − p, the binary entropy function, denoted as either H2 (X) (indicating that it is the entropy of the source) or H2 (p) (indicating that it is a function of the outcome probabilities) is H2 (X) = H2 (p) = E[−log2 P(X)] = −p log2 (p) − (1 − p) log2 (1 − p) bits. A plot of the binary entropy function as a function of p is shown in Figure 1.1. The peak information of 1 bit occurs when p = 12 . 1 0.8

H2

0.6 0.4 0.2 0

0

0.2

0.4

p

0.6

0.8

1

Figure 1.1: The binary entropy function H2 (p). Example 1.3 A fair coin is tossed once per second, with the outcomes being “head” and “tail” with equal probability. Each toss of the coin generates an event that can be described with H2 (0.5) = 1 bit of information. The sequence of tosses produces information at a rate of 1 bit/second. An unfair coin, with P(head) = 0.01 is tossed. The average information generated by each throw in this case is H2 (0.01) = 0.0808 bits. Another unfair ◽ coin, with P(head) = 1 is tossed. The information generated by each throw in this case is H2 (1) = 0 bits.

For a source X having M outcomes x1 , x2 , . . . , xM , with probabilities P(X = xi ) = pi , i = 1, 2, . . . , M, the entropy is M ∑ pi log2 pi bits. (1.1) H(X) = E[−log2 P(X)] = − i=1

6

A Context for Error Correction Coding Note: The “bit” as a measure of entropy (or information content) is different from the “bit” as a measure of storage. For the unfair coin having P(head) = 1, the actual information content determined by a toss of the coin is 0: there is no information gained by observing that the outcome is again 1. For this process with this unfair coin, the entropy rate — that is, the amount of actual information it generates — is 0. However, if the coin outcomes were for some reason to be stored directly, without the benefit of some kind of coding, each outcome would require 1 bit of storage (even though they don’t represent any new information). With the prevalence of computers in our society, we are accustomed to thinking in terms of “bits” — e.g., a file is so many bits long, the register of a computer is so many bits wide. But these are “bits” as a measure of storage size, not “bits” as a measure of actual information content. Because of the confusion between “bit” as a unit of information content and “bit” as an amount of storage, the unit of information content is sometimes — albeit rarely — called a Shannon, in homage to the founder of information theory, Claude Shannon.1 A digital communication system embodies functionality to perform physical actions on information. Figure 1.2 illustrates a fairly general framework for a single digital communication link. In this link, digital data from a source are encoded and modulated (and possibly encrypted) for communication over a channel. At the other end of the channel, the data are demodulated, decoded (and possibly decrypted), and sent to a sink. The elements in this link all have mathematical descriptions and theorems from information theory which govern their performance. The diagram indicates the realm of applicability of three major theorems of information theory.

(sometimes)

source

(data compression) source encoder

(privacy/ security) encryption encoder

(source coding theorem, rate distortion theorem)

sink

source decoder

Sometimes combined coding/modulation (e.g., TCM) (error protection) channel encoder

(transmission waveform generation) modulation

(capacity, channel coding theorem)

encryption decoder (sometimes)

channel decoder

channel

demodulation/ equalization

Sometimes combined/iterative decoding and modulation

Figure 1.2: A general framework for digital communications. There are actually many kinds of codes employed in a communication system. In the description below, we point out where some of these codes arise. Throughout the book we make some connections between these codes and our major focus of study, error correction codes. The source is the data to be communicated, such as a computer file, a video sequence, or a telephone conversation. For our purposes, it is represented in digital form, perhaps as a result of an 1 This mismatch of object and value is analogous to the physical horse, which may or may not be capable of producing one “horsepower” of power, 550 ft-lbs/second. Thermodynamicists can dodge the issue by using the SI unit of Watts for power, information theorists might sidestep confusion by using the Shannon. Both of these units honor founders of their respective disciplines.

1.3 The Communications System analog-to-digital conversion step. Information-theoretically, sources are viewed as streams of random numbers governed by some probability distribution. Every source of data has a measure of the information that it represents, which (in principle) can be exactly quantified in terms of entropy. The source encoder performs data compression by removing redundancy. As illustrated in Example 1.3, the number of bits used to store the information from a source may exceed the number of bits of actual information content. That is, the number of bits to represent the data may exceed the number of mathematical bits — Shannons — of actual information content. The amount a particular source of data can be compressed without any loss of information (lossless compression) is governed theoretically by the source coding theorem of information theory, which states that a source of information can be represented without any loss of information in such a way that the amount of storage required (in bits) is equal to the amount of information content — the entropy — in bits or Shannons. To achieve this lower bound, it may be necessary for long blocks of the data to be jointly encoded. Example 1.4 For the unfair coin with P(head) = 0.01, the entropy is H(0.01) = 0.0808. Therefore, 10,000 such (independent) tosses convey 808 bits (Shannons) of information, so theoretically the information of 10,000 tosses of the coin can be represented exactly using only 808 (physical) bits of information. ◽

Thus a bit (in a computer register) in principle can represents an actual (mathematical) bit of information content, if the source of information is represented correctly. In compressing a data stream, a source encoder removes redundancy present in the data. For compressed binary data, 0 and 1 occur with equal probability in the compressed data (otherwise, there would be some redundancy which could be exploited to further compress the data). Thus, it is frequently assumed at the channel coder that 0 and 1 occur with equal probability. The source encoder employs special types of codes to do the data compression, called collectively source codes or data compression codes. Such coding techniques include Huffman coding, run-length coding, arithmetic coding, Lempel–Ziv coding, and combinations of these, all of which fall beyond the scope of this book. If the data need to be compressed below the entropy rate of the source, then some kind of distortion must occur. This is called lossy data compression. In this case, another theorem governs the representation of the data. It is possible to do lossy compression in a way that minimizes the amount of distortion for a given rate of transmission. The theoretical limits of lossy data compression are established by the rate–distortion theorem of information theory. One interesting result of rate–distortion theory says that for a binary source having equiprobable outcomes, the minimal rate to which the data can be compressed with the average distortion per bit equal to p is 1 (1.2) r = 1 − H2 (p) p ≤ . 2 Lossy data compression uses its own kinds of codes as well. Example 1.5 In the previous example, suppose that a channel is available that only provides 800 bits of information to convey 10,000 tosses of the biased coin. Since this number of bits is less than allowed by the source coding theorem, there must be some distortion introduced. From above, the source rate is 0.0808 bits

7

8

A Context for Error Correction Coding of information per toss. The rate we are sending over this restricted channel is 800∕10 000 = 0.08 bits/toss. The rate at which tosses are coded is 800 r= = 0.9901. 808 The distortion is p = H2−1 (1 − r) = H2−1 (1 − 0.9901) = H2−1 (0.0099) = 0.00085. That is, out of 10,000 coin tosses, transmission at this rate would introduce distortion in (0.00085) (10,000) = 8.5 bits. ◽

The encrypter hides or scrambles information so that unintended listeners are unable to discern the information content. The codes used for encryption are generally different from the codes used for error correction. Encryption is often what the layperson frequently thinks of when they think of “codes,” as in secret codes, but as we are seeing, there are many other different kinds of codes. As we will see, however, the mathematical tools used in error correction coding can be applied to some encryption codes. In particular, we will meet RSA public key encryption as an outgrowth of number theoretic tools to be developed, and McEliece public key encryption as a direct application of a particular family of error correction codes. The channel coder is the first step in the error correction or error detection process. The channel coder adds redundant information to the stream of input symbols in a way that allows errors which are introduced into the channel to be corrected. This book is primarily dedicated to the study of the channel coder and its corresponding channel decoder. It may seem peculiar to remove redundancy with the source encoder, then turn right around and add redundancy back in with the channel encoder. However, the redundancy in the source typically depends on the source in an unstructured way and may not provide uniform protection to all the information in the stream, nor provide any indication of how errors occurred or how to correct them. The redundancy provided by the channel coder, on the other hand, is introduced in a structured way, precisely to provide error control capability. Treating the problems of data compression and error correction separately, rather than seeking a jointly optimal source/channel coding solution, is asymptotically optimal (as the block sizes get large). This fact is called the source–channel separation theorem of information theory. (There has been work on combined source/channel coding for finite — practical — block lengths, in which the asymptotic theorems are not invoked. This work falls outside the scope of this book.) Because of the redundancy introduced by the channel coder, there must be more symbols at the output of the coder than at the input. Frequently, a channel coder operates by accepting a block of k input symbols and producing at its output a block of n symbols, with n > k. The rate of such a channel coder is R = k∕n, so that R < 1. The input to the channel coder is referred to as the message symbols (or, in the case of binary codes, the message bits). The input may also be referred to as the information symbols (or bits). The modulator converts symbol sequences from the channel encoders into signals appropriate for transmission over the channel. Many channels require that the signals be sent as a continuous-time voltage, or an electromagnetic waveform in a specified frequency band. The modulator provides the appropriate channel-conforming representation.

1.3 The Communications System Included within the modulator block one may find codes as well. Some channels (such as magnetic recording channels) have constraints on the maximum permissible length of runs of 1s. Or they might have a restriction that the sequence must be DC-free. Enforcing such constraints employs special codes. Treatment of such runlength-limited codes appears in [275]; see also [209]. Some modulators employ mechanisms to ensure that the signal occupies a broad bandwidth. This spread-spectrum modulation can serve to provide multiple-user access, greater resilience to jamming, low probability of detection, and other advantages. (See, e.g., [509].) Spread-spectrum systems frequently make use of pseudorandom sequences, some of which are produced using linear feedback shift registers as discussed in Appendix 4.A. The channel is the medium over which the information is conveyed. Examples of channels are telephone lines, internet cables, fiber-optic lines, microwave radio channels, high-frequency channels, cell phone channels, etc. These are channels in which information is conveyed between two distinct places. Information may also be conveyed between two distinct times, for example, by writing information onto a computer disk, then retrieving it at a later time. Hard drives, CD-ROMs, DVDs, and solid-state memory are other examples of channels. As signals travel through a channel, they are corrupted. For example, a signal may have noise added to it; it may experience time delay or timing jitter, or suffer from attenuation due to propagation distance and/or carrier offset; it may be reflected by multiple objects in its path, resulting in constructive and/or destructive interference patterns; it may experience inadvertent interference from other channels, or be deliberately jammed. It may be filtered by the channel response, resulting in interference among symbols. These sources of corruption in many cases can all occur simultaneously. For purposes of analysis, channels are frequently characterized by mathematical models, which (it is hoped) are sufficiently accurate to be representative of the attributes of the actual channel, yet are also sufficiently abstracted to yield tractable mathematics. Most of our work in this book will assume one of two idealized channel models, the binary symmetric channel (BSC) and the additive white Gaussian noise channel (AWGNC), which are described in Section 1.5. While these idealized models do not represent all of the possible problems a signal may experience, they form a starting point for many, if not most, of the more comprehensive channel models. The experience gained by studying these simpler channel models forms a foundation for more accurate and complicated channel models. (As exceptions to the AWGN or BSC rule, in Section 14.7, we comment briefly on convolutive channels and turbo equalization, while in Chapter 19, coding for quasi-static Rayleigh flat fading channels are discussed.) As suggested by Figure 1.2, the channel encoding and modulation may be combined in what is known as coded modulation. Channels have different information-carrying capabilities. For example, a dedicated fiber-optic line is capable of carrying more information than a plain-old-telephone-service (POTS) pair of copper wires. Associated with each channel is a quantity known as the capacity, C, which indicates how much information it can carry reliably. The information a channel can reliably carry is intimately related to the use of error correction coding. The governing theorem from information theory is Shannon’s channel coding theorem, which states essentially this: Provided that the rate R of transmission is less than the capacity C, there exists a code such that the probability of error can be made arbitrarily small. The demodulator/equalizer receives the signal from the channel and converts it into a sequence of symbols. This typically involves many functions, such as filtering, demodulation, carrier synchronization, symbol timing estimation, frame synchronization, and matched filtering, followed

9

10

A Context for Error Correction Coding by a detection step in which decisions about the transmitted symbols are made. We will not concern ourselves in this book with these very important details (many of which are treated in textbooks on digital communication, such as [367]), but will focus on issues related to channel encoding and decoding. The channel decoder exploits the redundancy introduced by the channel encoder to correct any errors that may have been introduced. As suggested by the figure, demodulation, equalization, and decoding may be combined. Particularly in recent work, turbo equalizers (introduced in Chapter 14) are used in a powerful combination. The decrypter removes any encryption. The source decoder provides an uncompressed representation of the data. The sink is the ultimate destination of the data. As this summary description has indicated, there are many different kinds of codes employed in communications. This book treats only error correction (or detection) codes. There is a certain duality between some channel coding methods and some source coding methods. So, studying error correction does provide a foundation for other aspects of the communication system.

1.4

Basic Digital Communications

The study of modulation/channel/demodulation/detection falls in the realm of “digital communications,” and many of its issues (e.g., filtering, synchronization, carrier tracking) lie beyond the scope of this book. Nevertheless, some understanding of digital communications is necessary here because modern coding theory has achieved some of its successes by careful application of detection theory, in particular in maximum a posteriori (MAP) and maximum likelihood (ML) receivers. Furthermore, performance of codes is often plotted in terms of signal-to-noise ratio (SNR), which is understood in the context of the modulation of a physical waveform. Coded modulation relies on signal constellations beyond simple binary modulation, so an understanding of them is important. The material in this section is standard for a digital communications course. However, it is germane to our treatment here because these concepts are employed in the development of soft-decision decoding algorithms. 1.4.1

Binary Phase-Shift Keying

In digital communication, a stream of bits (i.e., a sequence of 1s and 0s) is mapped to a waveform for transmission over the channel. Binary phase-shift keying (BPSK) is a form of amplitude modulation in which a bit is represented by the sign of the transmitted waveform. (It is called “phase-shift” keying [PSK] because the sign change represents a 180∘ phase shift.) Let { . . . , b−2 , b−1 , b0 , b1 , b2 , . . . } represent a sequence of bits, bi ∈ {0, 1} which arrive at a rate of 1 bit every T seconds. The bits are assumed to be randomly generated with probabilities P1 = P(bi = 1) and P0 = P(bi = 0). While typically 0 and 1 are equally likely, we will initially retain a measure of generality and assume that P1 ≠ P0 necessarily. It will frequently be of interest to map the set {0, 1} to the set {−1, 1}. We will denote b̃ i as the ±1-valued bit corresponding to the {0, 1}-valued bit bi . Either of the mappings b̃ i = (2bi − 1)

or

b̃ i = −(2bi − 1)

may be used in practice, so some care make sure the proper mapping is understood. √ √is needed to√ Here, let ai = Eb (2bi − 1) = − Eb (−1)bi = Eb b̃ i be a mapping of bit bi (or b̃ i ) into a transmitted signal amplitude. This signal amplitude multiplies a waveform 𝜑1 (t), where 𝜑1 (t) is a unit-energy

1.4 Basic Digital Communications

11

signal,



∫−∞

𝜑1 (t)2 dt = 1,

which has support2 over [0, T). Thus, a bit bi arriving at time iT can be represented by the signal ai 𝜑1 (t − iT). The energy required to transmit a single bit bi is ∞

∫−∞

(ai 𝜑1 (t))2 dt = Eb .

Thus, Eb is the energy expended per bit. √ Eb 𝜑1 (t) and It √ is helpful to think of√ the transmitted signals −√Eb −√Eb √ 𝜑1(t) − Eb 𝜑1 (t) as points Eb and − Eb in a one-dimensional signal space, where the coordinate axis is the “𝜑1 (t)” axis. “0” “1” The two points in the signal space are plotted with their corresponding bit assignment in Figure 1.3. The points in Figure 1.3 Signal constellation for the signal space employed by the modulator are called the BPSK. signal constellation, so Figure 1.3 is a signal constellation with two points (or signals). A sequence of bits to be transmitted can be represented by a juxtaposition of 𝜑1 (t) waveforms, where the waveform representing bit bi starts at time iT. Then the sequence of bits is represented by the signal ∑ ai 𝜑1 (t − iT). (1.3) s(t) = i

Example 1.6 With 𝜑1 (t) as shown in Figure 1.4(a) and the bit sequence {1, 1, 0, 1, 0}, the signal s(t) is as shown in Figure 1.4(b).

1/√T

s (t)

𝜑1(t) √Eb /T

“1”

“1”

“1”

T −√Eb /T (a)

“0”

“0” (b)

Figure 1.4: Juxtaposition of signal waveforms. ◽

1.4.2

More General Digital Modulation

The concept of signal spaces generalizes immediately to higher dimensions and to larger signal constellations; we restrict our attention here to no more than two dimensions. Let 𝜑2 (t) be a unit-energy

2 Strictly speaking, functions not having this limited support can be used, but assuming support over [0, T) makes the discussion significantly simpler. Also, the signal 𝜑i (t) can in general be complex, but we restrict attention here to real signals.

12

A Context for Error Correction Coding function which is orthogonal to 𝜑1 (t). That is, ∞

∫−∞



𝜑2 (t)2 dt = 1 and

∫−∞

𝜑1 (t)𝜑2 (t) dt = 0.

In this case, we are defining “orthogonality” with respect to the inner product ∞

⟨𝜑1 (t), 𝜑2 (t)⟩ =

∫−∞

𝜑1 (t)𝜑2 (t) dt.

We say that {𝜑1 (t), 𝜑2 (t)} form an orthonormal set if they both have unit energy and are orthogonal: ⟨𝜑1 (t), 𝜑1 (t)⟩ = 1 ⟨𝜑2 (t), 𝜑2 (t)⟩ = 1 ⟨𝜑1 (t), 𝜑2 (t)⟩ = 0. The orthonormal functions 𝜑1 (t) and 𝜑2 (t) define the coordinate axes of a two-dimensional signal space, as suggested by Figure 1.5. Corresponding to every point (x1 , y1 ) of this two-dimensional signal space is a signal (i.e., a function of time) s(t) obtained as a linear combination of the coordinate functions: s(t) = x1 𝜑1 (t) + y1 𝜑2 (t). That is, there is a one-to-one correspondence between “points” in space and their represented signals. We can represent this as s(t) ↔ (x1 , y1 ). The geometric concepts of distance and angle can be expressed in terms of the signal space points. For example, let s1 (t) = x1 𝜑1 (t) + y1 𝜑2 (t) (i.e., s1 (t) ↔ (x1 , y1 )) (1.4) s2 (t) = x2 𝜑1 (t) + y2 𝜑2 (t) (i.e., s2 (t) ↔ (x2 , y2 )) We define the squared distance between s1 (t) and s2 (t) as ∞

d2 (s1 (t), s2 (t)) =

∫−∞

(s1 (t) − s2 (t))2 dt,

(1.5)

and the inner product between s1 (t) and s2 (t) as ∞

⟨s1 (t), s2 (t)⟩ =

∫−∞

s1 (t)s2 (t) dt.

(1.6)

Rather than computing distance using the integral (1.5), we can equivalently and more easily compute using the coordinates in signal space (see Figure 1.5): d2 (s1 (t), s2 (t)) = (x1 − x2 )2 + (y1 − y2 )2 . 𝜑2(t) d

(x1, y1)

(x2, y2) 𝜑1(t)

Figure 1.5: Two-dimensional signal space.

(1.7)

1.4 Basic Digital Communications

13

This is the familiar squared Euclidean distance between the points (x1 , y1 ) and (x2 , y2 ). Also, rather than computing the inner product using the integral (1.6), we equivalently compute using the coordinates in signal space: ⟨s1 (t), s2 (t)⟩ = x1 x2 + y1 y2 . (1.8) This is the familiar inner product (or dot product) between the points (x1 , y1 ) and (x2 , y2 ): ⟨(x1 , y1 ), (x2 , y2 )⟩ = x1 x2 + y1 y2 . The significance is that we can use the signal space geometry to gain insight into the nature of the signals, using familiar concepts of distance and angle. We can use this two-dimensional signal space for digital information transmission as follows. Let M = 2m , for some integer m, be the number of points in the signal constellation. M-ary transmission is obtained by placing M points (a1k , a2k ), k = 0, 1, . . . , M − 1, in this signal space and assigning a unique pattern of m bits to each of these points. These points are the signal constellation. Let  = {(a1k , a2k ), k = 0, 1, . . . , M − 1} denote the set of points in the signal constellation. Example 1.7 Figure 1.6 shows eight points arranged in two-dimensional space in a constellation known as 8-PSK. Each point has a 3-bit designation. The signal corresponding to the point (a1k , a2k ) is selected by three input bits and transmitted. Thus, the signal sk (t) = a1k 𝜑1 (t) + a2k 𝜑2 (t),

(a1k , a2k ) ∈ 

carries 3 bits of information.

𝜑2(t) 011 010

001 000

110

𝜑1(t)

√Es 100

111 101

dmin

Figure 1.6: 8-PSK signal constellation.

Note that the assignments of bits to constellation points in Figure 1.6 is such that adjacent points differ by only 1 bit. Such an assignment is called Gray code order. Since it is most probable that errors will move from a point to an adjacent point, this reduces the probability of bit error. ◽

Associated with each signal sk (t) = a1k 𝜑1 (t) + a2k 𝜑2 (t) and signal constellation point (a1k , a2k ) ∈  is a signal energy, ∞

Ek =

∫−∞

(sk (t))2 dt = a21k + a22k .

14

A Context for Error Correction Coding The average signal energy Es is obtained by averaging all the signal energies, usually by assuming that each signal point is used with equal probability: Es =

M−1 M−1 ∞ 1 ∑ 1 ∑ 2 sk (t)2 dt = (a + a22k ). M k=0 ∫−∞ M k=0 1k

The average energy per signal Es can be related to the average energy per bit Eb by Eb =

E energy per signal = s. number of bits/signal m

To send a sequence of bits using M-ary modulation, the bits are partitioned into blocks of m successive bits, where the data rate is such that m bits arrive every Ts seconds. The ith m-bit set is then used to index a point (a1i , a2i ) ∈ . This point corresponds to the signal which is transmitted over the signal interval for the m bits. These signals are juxtaposed to form the transmitted signal: ∑ a1i 𝜑1 (t − iTs ) + a2i 𝜑2 (t − iTs ), (a1i , a2i ) ∈ . (1.9) s(t) = i

The point (a1i , a2i ) is thus the point set at time i. Equation (1.9) can be expressed in its signal space vector equivalent, by simply letting 𝐬i = [a1i , a2i ]T denote the vector transmitted at time i. In what follows, we will express the operations in terms of the two-dimensional signal space. Restricting to a one-dimensional signal space (as for BPSK transmission), or extending to higherdimensional signal spaces is straightforward. In most channels, the signal s(t) is mixed with some carrier frequency before transmission. However, for simplicity, we will restrict attention to the baseband transmission case.

1.5 1.5.1

Signal Detection The Gaussian Channel

The signal s(t) is transmitted over the channel. Of all the possible disturbances that might be introduced by the channel, we will deal primarily with additive white Gaussian noise (AWGN), resulting in the received signal r(t) = s(t) + n(t). (1.10) In an AWGN channel, the signal n(t) is a white Gaussian noise process, having the properties that E[n(t)] = 0

∀t,

Rn (𝜏) = E[n(t)n(t − 𝜏)] =

N0 𝛿(𝜏), 2

and all sets of samples are jointly Gaussian distributed. The quantity N0 ∕2 is the (two-sided) noise power spectral density. Due to the added noise, the signal r(t) is typically not a point in the signal constellation, nor, in fact, is r(t) probably even in the signal space — it cannot be expressed as a linear combination of the basis functions 𝜑1 (t) and 𝜑2 (t). The detection process to be described below corresponds to the geometric operations of (1) projecting r(t) onto the signal space; and (2) finding the closest point in the signal space to this projected function. At the receiver, optimal detection requires first passing the received signal through a filter “matched” to the transmitted waveform. This is the projection operation, projecting r(t) onto the signal space. To detect the ith signal starting at iTs , the received signal is correlated with the waveforms

1.5 Signal Detection

15

𝜑1 (t − iTs ) and 𝜑2 (t − iTs ) to produce the point (R1i , R2i ) in signal space3 : ∞

R1i = ⟨r(t), 𝜑1 (t − iTs )⟩ =

∫−∞

r(t)𝜑1 (t − iTs ) dt, (1.11)



R2i = ⟨r(t), 𝜑2 (t − iTs )⟩ =

∫−∞

r(t)𝜑2 (t − iTs ) dt.

The integral is over the support of 𝜑1 (t) and 𝜑2 (t). For example, if 𝜑1 (t) has support over [0, Ts ], then (i+1)Ts the limits of integration in (1.11) are ∫iT . The processing in (1.11) is illustrated in Figure 1.7. s Using (1.9) and (1.10), it is straightforward to show that R1i = a1i + N1i

and

R2i = a2i + N2i ,

(1.12)

where (a1i , a2i ) is the transmitted point in the signal constellation for the ith symbol. The point (a1i , a2i ) is not known at the receiver — it is, in fact, what the receiver needs to decide — so at the receiver a1i and a2i are random variables. ʃ∞∞

R1

∞ ʃ−∞

R2

𝜙1(t – iTs) r(t) 𝜙2(t – iTs)

Figure 1.7: Correlation processing (equivalent to matched filtering). The noise random variables N1i and N2i defined by (i+1)Ts

N1i =

∫iTs

𝜑1 (t − iTs )n(t) dt

(i+1)Ts

and

N2i =

∫iTs

𝜑2 (t − iTs )n(t) dt

have the following properties: N1i and N2i are Gaussian random variables, with E[N1i ] = 0

and

E[N2i ] = 0

N0 2

and

var[N2i ] ≜ 𝜎 2 =

and4 var[N1i ] ≜ 𝜎 2 =

(1.13) N0 2

(1.14)

and E[N1i N2i ] = 0;

(1.15)

3 The operation in (1.11) can equivalently be performed by passing r(t) through filters with impulse response 𝜑 (−t) and 1 𝜑2 (−t) and sampling the output at time t = iTs . This is referred to as a matched filter. The matched filter implementation and the correlator implementation provide identical outputs. 4 The symbol ≜ means “is defined to be equal to.”

16

A Context for Error Correction Coding that is, N1i and N2i are uncorrelated and hence, being Gaussian, are independent. The probability density function (pdf) for N1i or N2i is − 1 n2 1 e 2𝜎 2 i . pN (ni ) = √ 2𝜋𝜎

It will be convenient to express (1.12) in vector form. Let 𝐑i = [R1i , R2i ]T (received vector), 𝐒i = [a1i , a2i ]T (sent vector), and 𝐍i = [N1i , N2i ]T (noise vector). Then 𝐑i = 𝐒i + 𝐍i . Then 𝐍i is jointly Gaussian distributed, with 0 mean and covariance matrix [ ] T 2 1 0 E[𝐍i 𝐍i ] = 𝜎 = 𝜎 2 I = RN . 0 1 Explicitly, the pdf of the vector 𝐍i is p𝐍 (𝐧) =

1.5.2

[ ] [ ] 1 1 1 2 2 exp − 𝐧T R−1 𝐧 = exp − (n + n ) . N 1 2 2 2𝜋𝜎 2 2𝜎 2 2𝜋 det(RN ) √

1

MAP and ML Detection

Let 𝐒 denote the transmitted value, where 𝐒 ∈  is chosen with prior probability P(𝐒 = 𝐬), or, more briefly, P(𝐬). The receiver uses the received point 𝐑 = 𝐫 to make a decision about what the transmitted signal 𝐒 is. Let P(𝐬|𝐫) be used to denote P(𝐒 = 𝐬|𝐑 = 𝐫) = P𝐒|𝐑 (𝐒 = 𝐬|𝐑 = 𝐫) for an observed value of the random variable 𝐑 = 𝐫. Conditioned on knowing that the transmitted signal is 𝐒 = 𝐬, 𝐑 is a Gaussian random variable with conditional density [ ] 1 1 exp − (𝐫 − 𝐬)T R−1 pR|S (𝐫|𝐬) = pN (𝐫 − 𝐬) = (𝐫 − 𝐬) √ N 2 2𝜋 det(RN ) (1.16) ] [ 1 2 = C exp − 2 ∥𝐫 − 𝐬∥ , 2𝜎 where ∥𝐫 − 𝐬∥2 is the squared Euclidean distance between 𝐫 and 𝐬 and C is a quantity that does not depend on either 𝐑 or 𝐒. The quantity pR|S (𝐫|𝐬) is called the likelihood function. The likelihood function pR|S (𝐫|𝐬) is typically viewed as a function of the conditioning argument, with the observed values 𝐫 as fixed parameters. The signal point 𝐬 ∈  depends uniquely upon the transmitted bits mapped to the signal constellation point. Conditioning upon knowing the transmitted signal is thus equivalent to conditioning on knowing the transmitted bits. Thus, the notation p(𝐫|𝐬) is used interchangeably with p(𝐫|𝐛), when 𝐬 is the signal √ used to represent the bits 𝐛. For example, for BPSK modulation, we could write either p(r|s =√ Eb ) or p(r|b = 1) or even p(r|b̃ = 1), since by the modulation described above the amplitude Eb is transmitted when the input bit is b = 1 (or b̃ = 1). Let us denote the estimated decision as 𝐬̂ = [â1 , â2 ]T ∈ . We will use the notation P(𝐬|𝐫) as a shorthand for P(𝐒 = 𝐬|𝐫). Theorem 1.8 The decision rule which minimizes the probability of error is to choose 𝐬̂ to be that value of 𝐬 which maximizes P(𝐒 = 𝐬|𝐫), where the possible values of 𝐬 are those in the signal constellation . That is, 𝐬̂ = arg max P(𝐬|𝐫). (1.17) 𝐬∈

1.5 Signal Detection

17

Proof Let us denote the constellation as  = {𝐬i , i = 1, 2, . . . , M}. Let p(𝐫|𝐬i ) denote the pdf of the received signal when 𝐒 = 𝐬i is the transmitted signal. Let Ω denote the space of possible received values; in the current case Ω = ℝ2 . Let us partition the space Ω into regions Ωi , where the decision rule is expressed as: set 𝐬̂ = 𝐬i if 𝐫 ∈ Ωi . That is, Ωi = {𝐫 ∈ Ω ∶ decide 𝐬̂ = 𝐬i }. The problem, then, is to determine the partitions Ωi . By the definition of the partition, the conditional probability of a correct answer when 𝐒 = 𝐬i is sent is P(̂𝐬 = 𝐬i |𝐒 = 𝐬i ) =

∫Ωi

p(𝐫|𝐬i ) d𝐫.

Denote the conditional probability of error when signal 𝐒 = 𝐬i is sent as Pi (): Pi () = P(̂𝐬 ≠ 𝐬i |𝐒 = 𝐬i ). Then we have Pi () = 1 − P(̂𝐬 = 𝐬i |𝐒 = 𝐬i ) = 1 −

∫Ωi

p(𝐫|𝐬i ) d𝐫.

The average probability of error is P() =

M ∑

Pi ()P(𝐒 = 𝐬i ) =

i=1

M ∑

[ P(𝐒 = 𝐬i ) 1 −

i=1

=1−

M ∑ i=1

∫Ωi

] ∫Ωi

p(𝐫|𝐬i ) d𝐫

d𝐫

p(𝐫|𝐬i )P(𝐒 = 𝐬i ) d𝐫.

The probability of a correct answer is P() = 1 − P() =

M ∑ i=1

∫Ωi

p(𝐫|𝐬i )P(𝐒 = 𝐬i ) d𝐫 =

M ∑ i=1

∫Ωi

P(𝐒 = 𝐬i |𝐫)p(𝐫) d𝐫.

Since p(𝐫) ≥ 0, to maximize P(), the region of integration Ωi should be chosen precisely so that it covers the region where P(𝐒 = 𝐬i |𝐫) is the largest possible. That is, Ωi = {𝐫 ∶ P(𝐒 = 𝐬i |𝐫) > P(𝐒 = 𝐬j |𝐫), i ≠ j}. ◽

This is equivalent to (1.17). Using Bayes’ rule we can write (1.17) as 𝐬̂ = arg max P(𝐬|𝐫) = arg max 𝐬∈

pR|S (𝐫|𝐬)P(𝐬)

𝐬∈

pR (𝐫)

.

Since the denominator of the last expression does not depend on 𝐬, we can further write 𝐬̂ = arg max pR|S (𝐫|𝐬)P(𝐬). 𝐬∈

(1.18)

This is called the MAP decision rule. In the case that all the prior probabilities are equal (or are assumed to be equal), this rule can be simplified to 𝐬̂ = arg max pR|S (𝐫|𝐬). 𝐬∈

This is called the ML decision rule.

18

A Context for Error Correction Coding Note: We will frequently suppress the subscript on a pdf or distribution function, letting the arguments themselves indicate the intended random variables. We could thus write p(𝐫|𝐬) in place of pR|S (𝐫|𝐬). Once the decision 𝐬̂ is made, the corresponding bits are determined by the bit-to-constellation mapping. The output of the receiver is thus an estimate of the bits. By the form of (1.16), we see that the ML decision rule for the Gaussian noise channel selects that point 𝐬̂ ∈  which is closest to 𝐫 in squared Euclidean distance, ∥𝐫 − 𝐬̂ ∥2 . 1.5.3

Special Case: Binary Detection

For the case of binary in a one-dimensional signal space, the signal constellation consists √ transmission √ of the points  = { Eb , − Eb }, corresponding, respectively, to the bit b = 1 or b = 0 (respectively, using the current mapping). The corresponding likelihood functions are √ √ √ √ 1 − 1 (r− Eb )2 1 − 1 (r+ Eb )2 p(r|s = Eb ) = √ e 2𝜎 2 p(r|s = − Eb ) = √ e 2𝜎 2 . 2𝜋 2𝜋 √ √ These densities are plotted in Figure 1.8(a). We see r|s = √Eb is a Gaussian . The MAP √ with mean Eb√ decision rule compares the weighted densities p(r|s = E )P(s = E ) and p(r|s = − Eb )P(s = b √ √b √ − Eb ). Figure 1.8(b) shows these densities in the case that P(s = − Eb ) > P(s = Eb ). Clearly, there is a threshold point 𝜏 at which √ √ √ √ p(r|s = Eb )P(s = Eb ) = p(r|s = − Eb )P(s = − Eb ). In this case, the decision rule (1.18) simplifies to {√ Eb (i.e., bi = 1) ŝ = √ − Eb (i.e., bi = 0)

if r > 𝜏

(1.19)

if r < 𝜏.

The threshold value can be computed explicitly as

√ P(s = − Eb ) 𝜎2 𝜏 = √ ln . (1.20) √ 2 Eb P(s = Eb ) √ √ In the case that P(s = Eb ) = P(s = − Eb ), the decision threshold is at 𝜏 = 0, as would be expected.

p(r |s = – √Eb)

p(r |s = – √Eb) P(s = – √Eb)

p(r |s = √Eb)

p(r|s = √Eb) P(s = √Eb)

r

r – √Eb

– √Eb

√Eb

τ

√Eb

(b)

(a)

Figure 1.8: Conditional densities in BPSK modulation. (a) Conditional densities; (b) weighted conditional densities. Binary detection problems are also frequently expressed in terms of likelihood ratios. For binary detection, the problem is one of determining, say, if b = 1 or if b = 0. The detection rule (1.18) becomes a test between p(r|b = 1)P(b = 1)

and

P(r|b = 0)P(b = 0).

1.5 Signal Detection

19

This can be expressed as a ratio, p(r|b = 1)P(b = 1) . p(r|b = 0)P(b = 0) In the case of equal priors, we obtain the likelihood ratio L(r) =

p(r|b = 1) . p(r|b = 0)

For many channels, it is more convenient to use the log likelihood ratio Λ(r) = log

p(r|b = 1) , p(r|b = 0)

where the natural logarithm is usually used. The decision is made that b̂ = 1 if Λ(r) > 0 and b̂ = 0 if Λ(r) < 0. For the Gaussian channel with BPSK modulation, we have √ √ √ exp(− 2𝜎1 2 (r − Eb )2 ) 2 Eb p(r|a = Eb ) = Λ(r) = log r = Lc r, (1.21) = log √ √ 𝜎2 p(r|a = − Eb ) exp(− 2𝜎1 2 (r + Eb )2 ) √ 2 E

where Lc = 𝜎 2 b is called the channel reliability.5 The quantity Λ(r) = Lc r can be used as soft information in a decoding system. The quantity sign(Λ(r)) is referred to as hard information in a decoding system. Most early error correction decoding algorithms employed hard information — actual estimated bit values — while there has been a trend toward increasing use of soft information decoders, which generally provide better performance. Note: Beginning √ in Section 1.7, error correction coding is used, and the transmitted BPSK amplitudes are ± Ec , where Ec = REb . Using these amplitudes, √ √ p(r|a = Ec ) 2 Ec Λ(r) = log r = √ 𝜎2 p(r|a = − Ec ) √ so that the channel reliability is Lc = 2 Ec ∕𝜎 2 . 1.5.4

Probability of Error for Binary Detection

Even with optimum decisions at the demodulator, errors can still be made with some probability (otherwise, error correction coding would not ever be needed). For binary detection problems in Gaussian noise, the probabilities can be expressed using the Q(x) function, which is the probability that a unit Gaussian N ∼  (0, 1) exceeds x:

The Q function has the properties that Q(x) = 1 − Q(−x)

Q(0) =

1 2

Q(−∞) = 1 Q(∞) = 0.

5 In some sources (e.g., [178]) the channel reliability L is expressed alternatively as equivalent to 2E ∕𝜎 2 . This is in some c b ways preferable, since it is unitless.

20

A Context for Error Correction Coding For a Gaussian random variable Z with mean 𝜇 and variance 𝜎 2 , Z ∼  (𝜇, 𝜎 2 ), it is straightforward to show that ∞ (x − 𝜇) 2 2 1 . e−(z−𝜇) ∕2𝜎 dz = Q P(Z > x) = √ 𝜎 2𝜋𝜎 ∫x Suppose there are two points a and b along an axis, and that R = s + N, where s is one of the two points, and N ∼  (0, 𝜎 2 ). The distributions P(R|s = a)P(s = a) and P(R|s = b)P(s = b) are plotted in Figure 1.9. A decision threshold 𝜏 is also shown. When a is sent, an error is made when R > 𝜏. Denoting  as the error event, this occurs with probability ∞ ) ( − 1 (r−a)2 1 𝜏 −a . e 2𝜎 2 dr = Q P(|s = a) = P(R > 𝜏) = √ 𝜎 2𝜋𝜎 ∫𝜏 When b is sent, an error is made when R < 𝜏, which occurs with probability ( ) ( ) 𝜏 −b b−𝜏 P(|s = b) = P(R < 𝜏) = 1 − P(R > 𝜏) = 1 − Q =Q . 𝜎 𝜎 The overall probability of error is P() = P(|s = a)P(s = a) + P(|s = b)P(s = b) ( ) ( ) 𝜏 −a b−𝜏 =Q P(s = a) + Q P(s = b). 𝜎 𝜎

(1.22)

An important special case is when P(s = a) = P(s = b) = 12 . Then the decision threshold is at the midpoint 𝜏 = (a + b)∕2. Let d = |b − a| be the distance between the two signals. Then (1.22) can be written ( ) d . (1.23) P() = Q 2𝜎 Even in the case that the signals are transmitted in multidimensional space, provided that the covariance of the noise is of the form 𝜎 2 I, the probability of error is still of the form (1.23). That is, if 𝐑=𝐬+𝐍 are n-dimensional vectors, with 𝐍 ∼  (0, 𝜎 2 I), and 𝐒 ∈( {𝐚,)𝐛} are two equiprobable transmitted d , where d =∥ 𝐚 − 𝐛 ∥ is the Euclidean vectors, then the probability of decision error is P() = Q 2𝜎 distance between vectors. This formula is frequently used in characterizing the performance√ of codes. √ √ For the particular case of BPSK signaling, we have a = − Eb , b = Eb , and d = 2 Eb . The probability P() is denoted as Pb , the “probability of a bit error.” Thus, ( (√ ) √ ) √ √ Eb − 𝜏 𝜏 + Eb (1.24) Pb = Q P(− Eb ) + Q P( Eb ). 𝜎 𝜎 p(R | s = a) P (s = a)

a

𝜏

p(R | s = b) P (s = b)

b P(𝜀|s = a) P (s = a)

Figure 1.9: Distributions when two signals are sent in Gaussian noise.

1.5 Signal Detection

21

√ √ When P( Eb ) = P(− Eb ), then 𝜏 = 0. Recalling that the variance for the channel is expressed as N 𝜎 2 = 20 , we have for BPSK transmission (√



Pb = Q( Eb ∕𝜎) = Q

2Eb N0

)

α α2 α3 α4 α5 α6 α7 α8 α9

.

(1.25)

bpskprobplot.m bpskprob.m

Figure 1.10 shows the probability of bit error for a BPSK as a function of the SNR in dB (decibel), where Eb ∕N0 dB = 10 log10 Eb ∕N0 , √ √ for the case P( Eb ) = P(− Eb ). 10−1

Probability of bit error

10−2 10−3 10−4 10−5

0

2

4

6

8

10

Eb/N0 (dB)

Figure 1.10: Probability of error for BPSK signaling. 1.5.5

00000

α10 α11 α12 α13 α14 α15 α16

The quantity Eb ∕N0 is frequently called the (bit) SNR.

10−6

+

01010

Bounds on Performance: The Union Bound

For some signal constellations, exact expressions for the probability of error are difficult to obtain. In many cases it is more convenient to obtain a bound on the probability of error using the union bound. (See Box 1.1.) Consider, for example, the 8-PSK constellation in Figure 1.11. If the point labeled 𝐬0 is transmitted, then an error occurs if the received signal falls in either shaded area. Let A be the event that the received signal falls on the incorrect side of threshold line L1 and let B be the event that the received signal falls on the incorrect side of the line L2 . Then Pr(symbol decoding error|𝐬0 sent) = P(A ∪ B). The events A and B are not disjoint, as is apparent from Figure 1.11. The exact probability computation is made more difficult by the overlapping region. Using the union bound, however, the probability of error can be bounded as Pr(symbol decoding error|𝐬0 sent) ≤ P(A) + P(B).

22

A Context for Error Correction Coding

L2

A s0

L1

B

Figure 1.11: Probability of error bound for 8-PSK modulation.

Box 1.1: The Union Bound For sets A and B, we have P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

A

A⋂B

B

Then, since P(A ∩ B) ≥ 0, clearly P(A ∪ B) ≤ P(A) + P(B). The event A occurs with the probability that the transmitted signal falls on the wrong side of the line L1 ; similarly for B. Assuming that the noise is independent Gaussian with variance 𝜎 2 in each coordinate direction, this probability is ( ) dmin P(A) = Q , 2𝜎 where dmin is the minimum distance between signal points. Denote the probability of a symbol error by Ps . Assuming that all symbols are sent with equal probability, we have Ps = Pr(symbol decoding error|𝐬0 sent), where the probability is bounded by ( ) dmin . (1.26) Ps ≤ 2Q 2𝜎 The factor 2 multiplying the Q function is the number of nearest neighbors around each constellation point. The probability of error is dominated by the minimum distance between points: better performance is obtained with larger distance. As Es ∕N0 (the symbol SNR) increases, the probability of falling in the intersection region decreases and the bound (1.26) becomes increasingly tight. More generally, the probability of detection error for a symbol s which has K neighbors in signal space at a distance dmin from it can be bounded by ( ) dmin Ps ≤ KQ , (1.27) 2𝜎 and the bound becomes increasingly tight as the SNR increases. For signal constellations larger than BPSK, it is common to plot the probability of a symbol error vs. the SNR in Es ∕N0 , where Es is the average signal energy. However, when the bits are assigned in

1.5 Signal Detection

23

Gray code order, then a symbol error is likely to be an adjacent symbol. One symbol thus results in 1 bit error, out of the log2 M bits the symbol represents, so Pb ≈

1.5.6

1 P log2 M s

for sufficiently large SNR.

(1.28)

The Binary Symmetric Channel

The BSC is a simplified channel model which contemplates only the transmission of bits over the channel; it does not treat details such as signal spaces, modulation, or matched filtering. The BSC accepts 1 bit per unit of time and transmits that bit with a probability of error p. A representation of the BSC is shown in Figure 1.12. An incoming bit of 0 or 1 is transmitted through the channel unchanged with probability 1 − p, or flipped with probability p. The sequence of output bits in a BSC can be modeled as (1.29) Ri = Si ⊕ Ni , where Ri ∈ {0, 1} are the output bits, Si ∈ {0, 1} are the input bits, Ni ∈ {0, 1} represents the possible bit errors, where Ni is 1 if an error occurs on bit i. The addition in (1.29) is modulo 2 addition (without carry), according to the addition table 0⊕0=0

0⊕1=1

1 ⊕ 1 = 0,

so that if Ni = 1, then Ri is the bit complement of Si . The BSC is an instance of a memoryless channel. This means each of the errors Ni is statistically independent of all the other Ni . The probability that bit i has an error is P(Ni = 1) = p, where p is called the BSC crossover probability. The sequence {Ni , i ∈ ℤ} can be viewed as an independent and identically distributed (i.i.d.) Bernoulli(p) random process. 1–p

0 bits

0 bits

p p 1

1–p

1

Figure 1.12: A binary symmetric channel. Suppose that S is sent over the channel and R is received. The likelihood function P(R|S) is { 1−p if R = S P(R|S) = (1.30) p if R ≠ S. Now suppose that the sequence 𝐬 = [s1 , s2 , . . . , sn ] is transmitted over a BSC and that the received sequence is 𝐑 = [r1 , r2 , . . . , rn ]. Because of independent noise samples, the likelihood function factors, P(𝐑|𝐒) =

n ∏

P(Ri |Si ).

(1.31)

i=1

Each factor in the product is of the form (1.30). Thus, there is a factor (1 − p) every time Ri agrees with Si , and a factor p every time Ri differs from Si . To represent this, we introduce the Hamming distance.

24

A Context for Error Correction Coding Definition 1.9 The Hamming distance between a sequence 𝐱 = [x1 , x2 , . . . , xn ] and a sequence 𝐲 = [y1 , y2 , . . . , yn ] is the number of positions that the corresponding elements differ: dH (𝐱, 𝐲) =

n ∑ [xi ≠ yi ].

(1.32)

i=1

Here we have used the notation (Iverson’s convention [166]) { 1 if xi ≠ yi [xi ≠ yi ] = 0 if xi = yi .



Using the notation of Hamming distance, we can write the likelihood function (1.31) as P(𝐑|𝐒) = (1 − p)n−dH (𝐑,𝐒) ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ number of places they are the same

pdH (𝐑,𝐒) . ⏟⏟⏟ number of places they differ

The likelihood function can also be written as ( )dH (𝐑,𝐒) p P(𝐑|𝐒) = (1 − p)n . 1−p Consider now the detection problem of deciding if the sequence 𝐒1 or the sequence 𝐒2 was sent, where each occur with equal probability. The ML decision rule says to choose that value of 𝐒 for which p dH (𝐑,𝐒) (1 − p)n is the largest. Assuming that p < 12 , this corresponds to choosing that value of 𝐒 for 1−p which dH (𝐑, 𝐒) is the smallest, that is, the vector 𝐒 nearest to 𝐑 in Hamming distance. We see that for detection in a Gaussian channel, the Euclidean distance is the appropriate distance for detection. For the BSC, the Hamming distance is the appropriate distance for detection. 1.5.7

The BSC and the Gaussian Channel Model

At a sufficiently coarse level of detail, the modulator/demodulator system with the AWGN channel can be viewed as a BSC. The modulation, channel, and detector collectively constitute a “channel” which accepts bits at the input and emits bits at the output. The end-to-end system viewed at this level, as suggested by the dashed box in Figure 1.13(b), forms a BSC. The crossover probability p can be computed based on the system parameters, √ p = P(bit out = 0|bit in = 1) = P(bit out = 1|bit in = 0) = Pb = Q( 2Eb ∕N0 ). In many cases the probability of error is computed using a BSC with an “internal” AWGN channel, so that the probability of error is produced as a function of Eb ∕N0 .

1.6

Memoryless Channels

A memoryless channel is one in which the output rn at the nth symbol time depends only on the input at time n. Thus, given the input at time n, the output at time n is statistically independent of the outputs at other times. That is, for a sequence of received signals 𝐑 = (R1 , R2 , . . . , Rm )

1.7 Simulation and Energy Considerations for Coded Signals

bits Modulator

s(t)

r(t) Channel (e.g., AWGN)

25

Matched filter or correlator

R

Matched filter or correlator

R

Decision rule (e.g., MAP or ML)

bits

(a)

bits Modulator

s(t)

r(t) Channel (e.g., AWGN)

0

1–p

bits 1–p

bits

0 bits

p p 1

Decision rule (e.g., MAP or ML)

1

(b)

Figure 1.13: (a) System diagram showing modulation, channel, and demodulation; (b) BSC equivalent. depending on transmitted signals S1 , S2 , . . . , Sm , the likelihood function p(R1 , R2 , . . . , Rm |S1 , S2 , . . . , Sm ) can be factored as p(R1 , R2 , . . . , Rm |S1 , S2 , . . . , Sm ) =

m ∏

p(Ri |Si ).

i=1

Both the additive Gaussian channel and the BSC that have been introduced are memoryless channels. We will almost universally assume that the channels are memoryless channels. The bursty channels discussed in Chapter 10 and the convolutive channel introduced in Chapter 14 are exceptions to this.

1.7

Simulation and Energy Considerations for Coded Signals

In channel coding with rate R = k∕n, k input bits yield n coded bits at the output of the encoder, where n > k. A transmission budget which allocates Eb Joules/bit for the uncoded data must spread that energy over more coded bits. Let Ec = REb denote the “energy per coded bit.” We thus have Ec < Eb . Consider the framework shown in Figure 1.14. From point “a” to point “b,” there is conventional (uncoded) BPSK modulation passing through an AWGN channel, except √ √ that the energy per bit is Ec , so that the amplitudes in the BPSK modulator are ± Ec , instead of ± Eb , as they would be in the uncoded case. Thus, at point “b” the probability of error of the coded bits appearing at point “a” can be computed as ) (√ 2Ec . Pb,coded = Q N0

26

A Context for Error Correction Coding Energy/bit= Eb R = k/n channel uncoded encoder bits

Energy/bit= Ec Modulator √Ec −√Ec “0"

“a”

n ~ 𝓝 (0, 𝜎2) s

r

+

Detector

“1"

“b” Channel decoder “c”

Figure 1.14: Energy for a coded signal.

100

Rate R = 1 Rate R = 1/2 Rate R = 1/3

Probability of bit error

10−1 10−2 10−3 10−4 10−5 10−6

0

2

4 6 Eb/N0 (dB)

8

10

Figure 1.15: Probability of error for coded bits, before correction. Since Ec < Eb , this is worse performance than uncoded BPSK would have had. Figure 1.15 shows the probability of error of coded bits for R = 1∕2 and R = 1∕3 error correction codes at point “b” in Figure 1.14. At the receiver, the detected coded bits are passed to the channel decoder, the error correction stage, which attempts to correct errors. Clearly, in order to be of any value, the code must be strong enough so that the bits emerging at point “c” of Figure 1.14 can compensate for the lower energy per bit in the channel, plus correct other errors. Fortunately, we will see that this is in fact the case. Now consider how this system might be simulated in software. It is common to simulate the modulator at point “a” of Figure 1.14 as having fixed amplitudes and to adjust the variance 𝜎 2 of the noise n in the channel. One of the primary considerations, therefore, is how to set 𝜎 2 . Frequently it is desired to simulate performance at a particular SNR, expressed in dB, denoted as (Eb ∕N0 )dB ). Let 𝛾 = Eb ∕N0 denote the desired SNR at which to simulate. The SNR in dB is converted to 𝛾 by 𝛾 = 10(SNR)dB ∕10 Recalling that 𝜎 2 = N0 ∕2, and knowing 𝛾, we have 𝛾=

Eb , 2𝜎 2

1.8 Some Important Definitions and a Trivial Code: Repetition Coding so 𝜎2 =

Eb . 2𝛾

𝜎2 =

Ec . 2R𝛾

Since Eb = Ec ∕R, we have

27

It is also common in simulation to normalize so that the simulated signal amplitude is Ec = 1.

1.8

Some Important Definitions and a Trivial Code: Repetition Coding

In this section, we introduce the important coding concepts of code rate, Hamming distance, minimum distance, Hamming spheres, and the generator matrix. These concepts are introduced by means of a simple, even trivial, example of an error correction code, the repetition code. Let 𝔽2 denote the set (field) with two elements in it, 0 and 1. In this field, arithmetic operations are defined as: 0+0=0 0+1=1 1+0=1 1+1=0 0⋅0=0 0⋅1=0 1⋅0=0

1 ⋅ 1 = 1.

Let 𝔽2n denote the (vector) space of n-tuples of elements of 𝔽2 . Suppressing for the moment a few details, here is a useful definition: an (n, k) binary code is a set of 2k distinct points in 𝔽2n . Another way of putting this: an (n, k) binary code is a code that accepts k bits as input and produces n bits as output. (We will be interested in linear codes, as the details unfold.) Definition 1.10 The rate of an (n, k) code is R=

k . n



The (n, 1) repetition code, where n is odd, is the code obtained by repeating the 1-bit input n times in the output codeword. That is, the codeword representing the input 0 is a block of n 0s and the codeword representing the input 1 is a block of n 1s. The code  consists of the set of two codewords  = {[0, 0, . . . , 0], [1, 1, . . . , 1]} ⊂ 𝔽2n . Letting m denote the message, the corresponding codeword is 𝐜 = [m, m, m, . . . , m]. ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ n copies

This is a rate R = 1∕n code. Encoding can be represented as a matrix operation. Let G be the 1 × n generator matrix given by [ ] G= 1 1 ··· 1 . Then the encoding operation is 𝐜 = mG. 1.8.1

Detection of Repetition Codes Over a BSC

Let us first consider decoding of this code when transmitted through a BSC with crossover probability p < 1∕2. Denote the output of the BSC by 𝐫 = 𝐜 ⊕ 𝐧,

28

A Context for Error Correction Coding where the addition is modulo 2 and 𝐧 is a binary vector of length n, with 1 in the positions where the channel errors occur. Assuming that the codewords are selected with equal probability, ML decoding is appropriate. As observed in Section 1.5.6, the ML decoding rule selects the codeword in C which is closest to the received vector 𝐫 in Hamming distance. For the repetition code, this decoding rule can be expressed as a majority decoding rule: if the majority of received bits are 0, decode a 0; otherwise, decode a 1. For example, take the (7, 1) repetition code and let m = 1. Then the codeword is 𝐜 = [1, 1, 1, 1, 1, 1, 1]. Suppose that the received vector is 𝐫 = [1, 0, 1, 1, 0, 1, 1]. Since 5 out of the 7 bits are 1, the decoded value is m ̂ = 1. An error detector can also be established. If the received vector 𝐫 is not one of the codewords, we detect that the channel has introduced one or more errors into the transmitted codeword. The codewords in a code C can be viewed as points in n-dimensional space. For example, Figure 1.16(a) illustrates the codewords as points (0, 0, 0) and (1, 1, 1) in three-dimensional space. (Beyond three dimensions, of course, the geometric viewpoint cannot be plotted, but it is still valuable conceptually.) In this geometric setting, we use the Hamming distance to measure distances between points.

Hamming “sphere” for (1,1,1) (1,1,1) (0,0,0)

Hamming “sphere” for (0,0,0) (b)

(a)

Figure 1.16: A (3, 1) binary repetition code. (a) The code as points in space; (b) the Hamming spheres around the points. Definition 1.11 The minimum distance dmin of a code  is the smallest Hamming distance between any two codewords in the code: min dH (𝐜i , 𝐜j ). dmin = 𝐜i ,𝐜j ∈C,𝐜i ≠𝐜j



The two codewords in the (n, 1) repetition code are clearly (Hamming) distance n apart. In this geometric setting, the ML decoding algorithm may be expressed as: Choose the codeword 𝐜̂ which is closest to the received vector 𝐫. That is, 𝐜̂ = arg min dH (𝐫, 𝐜). 𝐜∈C

Definition 1.12 The Hamming sphere of radius t around a codeword 𝐜 consists of all vectors which are at a Hamming distance ≤ t from 𝐜. ◽

1.8 Some Important Definitions and a Trivial Code: Repetition Coding For example, for the (3, 1) repetition code, the codewords and the points in their Hamming spheres are Codeword (0,0,0) (1,1,1)

Points in its sphere (0,0,0),(0,0,1),(0,1,0),(1,0,0) (1,1,1),(1,1,0),(1,0,1),(0,1,1),

as illustrated in Figure 1.16(b). When the Hamming spheres around each codeword are all taken to have the same radius, the largest such radius producing nonoverlapping spheres is determined by the separation between the nearest two codewords in the code, dmin . The radius of the spheres in this case is t = ⌊(dmin − 1)∕2⌋, where the notation ⌊x⌋ means to take the greatest integer ≤ x. Figure 1.17 shows the idea of these Hamming spheres. The black squares represent codewords in n-dimensional space and black dots represent other vectors in n-dimensional space. The dashed lines indicate the boundaries of the Hamming spheres around the codewords. If a vector 𝐫 falls inside the sphere around a codeword, then it is closer to that codeword than to any other codeword. By the ML criterion, 𝐫 should decode to that codeword inside the sphere. When all the spheres have radius t = ⌊(dmin − 1)∕2⌋, this decoding rule is referred to as bounded distance decoding.

dmin c1

v1

Figure 1.17: A representation of decoding spheres. The decoder will make a decoding error if the channel noise moves the received vector 𝐫 into a sphere other than the sphere the true codeword is in. Since the centers of the spheres lie a distance at least dmin apart, the decoder is guaranteed to decode correctly provided that no more than t errors occur in the received vector 𝐫. The number t is called the random error correction capability of the code. If dmin is even and two codewords lie exactly dmin apart and the channel introduces dmin ∕2 errors, then the received vector lies right on the boundary of two spheres. In this case, given no other information, the decoder must choose one of the two codewords arbitrarily; half the time it will make an error. Note from Figure 1.17 that in a bounded distance decoder there may be vectors that fall outside the Hamming spheres around the codewords, such as the vector labeled 𝐯1 . If the received vector 𝐫 = 𝐯1 , then the nearest codeword is 𝐜1 . A bounded distance decoder, however, would not be able to decode if 𝐫 = 𝐯1 , since it can only decode those vectors that fall in spheres of radius t. The decoder might have to declare a decoding failure in this case. A true ML decoder, which chooses the nearest codeword to the received vector, would be able to decode. Unfortunately, ML decoding is computationally very difficult for large codes. Most of the algebraic decoding algorithms in this book are only bounded distance decoders. An interesting

29

30

A Context for Error Correction Coding exception are the decoders presented in Chapters 7 and 11, which actually produce lists of codeword candidates. These decoders are called list decoders. If the channel introduces fewer than dmin errors, then these can be detected, since 𝐫 cannot be another codeword in this case. In summary, for a code with minimum distance dmin : Guaranteed error correction capability: Guaranteed error detection capability:

t = ⌊(dmin − 1)∕2⌋ dmin − 1

Having defined the repetition code, let us now characterize its probability of error performance as a function of the BSC crossover probability p. For the (n, 1) repetition code, dmin = n, and t = (n − 1)∕2 (remember n is odd). Suppose in particular that n = 3, so that t = 1. Then the decoder will make an error if the channel causes either 2 or 3 bits to be in error. Using Pne to denote the probability of decoding error for a code of length n, we have P3e = Prob(2 channel errors) + Prob(3 channel errors) = 3p2 (1 − p) + p3 = 3p2 − 2p3 . If p < 12 , then P3e < p, that is, the decoder will have fewer errors than using the channel without coding. Let us now examine the probability of decoding error for a code of length n. Note that it doesn’t matter what the transmitted codeword was; the probability of error depends only on the error introduced by the channel. Clearly, the decoder will make an error if more than half of the received bits are in error. More precisely, if more than t bits are in error, the decoder will make an error. The probability of error can be expressed as Pne =

n ∑

Prob(i channel errors occur out of n transmitted bits).

i=t+1

The probability of exactly i bits in error out of n bits, where each bit is drawn at random with probability p is6 ( ) n i p (1 − p)n−i , i so that Pne

n ( ) ∑ n i p (1 − p)n−i = i i=t+1 ( )t+1 ( ) p n = (1 − p)n + terms of higher degree in p. 1−p t+1

It would appear that as the code length increases, and thus t increases, the probability of decoder error decreases. (This is substantiated in Exercise 1.16b.) Thus, it is possible to obtain arbitrarily small probability of error, but at the cost of a very low rate: R = 1∕n → 0 as PNe → 0. Let us now consider using this repetition code for communication over the AWGN channel. Let us suppose that the transmitter has P = 1 Watt (W) of power available and that we want to send information at 1 bit/second. There is thus Eb = 1 Joule (J) of energy available for each bit of information. Now the information is coded using an (n, 1) repetition code. To maintain the information rate of 1 bit/second, we must send n coded bits/second. With n times as many bits to send, there is still only 1 W of power available, which must be shared among all the coded bits. The energy available for each coded bit, which we denote as Ec , is Ec = Eb ∕n. Thus, because of coding, there is less energy available for each bit to convey information! The probability of error for the AWGN channel (i.e., the

6

( ) The binomial coefficient is

n i

=

n! . i!(n−i)!

1.8 Some Important Definitions and a Trivial Code: Repetition Coding

31

binary crossover probability for the effective BSC) is √ √ p = Q( 2Ec ∕N0 ) = Q( 2Eb ∕nN0 ).

α α2 α3 α4 α5 α6 α7 α8 α9 +

The crossover probability p is higher as a result of using a code! However, the hope is that the error decoding capability of the overall system is better. Nevertheless, for the repetition code, this hope is in vain. Figure 1.18 shows the probability of error for repetition codes (here, consider only the hard-decision decoding). The coded performance is worse than the uncoded performance, and the performance gets worse with increasing n. 100

Probability of bit error

10−1 10−2 10−3

Uncoded Rep n = 3, soft Rep n = 11, soft Rep n = 3, hard Rep n = 11, hard

10−4 10−5 10−6

0

2

4

6

8

10

SNR (dB)

Figure 1.18: Performance of the (3, 1) and (11, 1) repetition code over BSC using both hard- and soft-decision decoding. 1.8.2

Soft-Decision Decoding of Repetition Codes Over the AWGN

Let us now consider decoding over the AWGN using a soft-decision decoder. Since the repetition code has a particularly simple codeword structure, it is straightforward to describe the soft-decision decoder and characterize its probability of error. The likelihood function is n ∏ p(ri |ci ), p(𝐫|𝐜) = i=1

so that the log likelihood ratio Λ(𝐫) = log

p(𝐫|m = 1) p(𝐫|m = 0)

can be computed using (1.21) as Λ(𝐫) = Lc

n ∑

ri .

i=1

Then the decoder decides m ̂ = 1 if Λ(𝐫) > 0, or m ̂ = 0 if Λ(𝐫) < 0. Since the threshold is 0 and Lc is a positive constant, the decoder decides { ∑ 1 if ni=1 ri > 0 m ̂ = ∑n 0 if i=1 ri < 0.

01010

00000

α10 α11 α12 α13 α14 α15 α16

repcodeprob.m

32

A Context for Error Correction Coding The decoder performs superior to the hard-decision decoder. Suppose the vector √ soft-decision √ √ (− Ec , − Ec , . . . , − Ec ) is sent (corresponding to the all-zero codeword). If one of the ri happens to be greater than 0, but other of the ri are correspondingly less than 0, the erroneous positive quantities might be canceled out by the other symbols. In fact, it is straightforward to show (see Exercise 1.18) that the probability of error for the (n, 1) repetition code with soft-decision decoding is √ (1.33) Pb = Q( 2Eb ∕N0 ). That is, it is the same as for uncoded transmission — still not effective as a code, but better than hard-decision decoding. While these simple-minded repetition codes do not, by themselves, make good error correction codes, repetition codes will be shown in Chapter 15 to be a component of very powerful codes, the repeat accumulate codes. 1.8.3

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

tstrepcode.cc repcoderes.m

Simulation of Results

While it is possible for these simple codes to compute explicit performance curves, it is worthwhile to consider how the performance might also be simulated, since other codes that we will examine may be more difficult to analyze. The program here illustrates a framework for simulating the performance of codes. The probability of error is estimated by running codewords through a simulated Gaussian channel until a specified number of errors has occurred. Then the estimated probability of error is the number of errors counted divided by the number of bits generated. Figure 1.18 shows the probability of error for uncoded transmission and both hard- and soft-decision decoding of (3, 1) and (11, 1) codes. 1.8.4

Summary

This lengthy example on a nearly useless code has introduced several concepts that will be useful for other codes:

• • • • •

The concept of minimum distance of a code. The probability of decoder error. The idea of a generator matrix. The fact that not every code is good! Recognition that soft-decision decoding is superior to hard-input decoding in terms of probability of error.

Prior to the proof of Shannon’s channel coding theorem and the research it engendered, communication engineers were in a quandary. It was believed that to obtain totally reliable communication, it would be necessary to transmit using very slow rates, essentially employing repetition codes to catch any errors and using slow symbol rate to increase the energy per bit. However, Shannon’s theorem dramatically changed this perspective, indicating that it is not necessary to slow the rate of communication to zero. It is only necessary to use better codes. It is hardly an overstatement that information-theoretic understanding has driven the information age!

1.9

Hamming Codes

As a second example we now introduce are Hamming codes. These are codes which are much better than repetition codes and were the first important codes discovered. Hamming codes lie at the intersection of many different kinds of codes, so we will also use them to briefly introduce

1.9 Hamming Codes

33

several important themes which will be developed throughout the course of this book. These themes include: generator matrices; parity check matrices; syndromes; generator polynomials; parity check polynomials; trellises; and Tanner graphs. These themes, touched on here, will be fully developed in later chapters. A (7, 4) Hamming code produces n = 7 bits of output for every k = 4 bits of input. Hamming codes are linear block codes, which means that the encoding operation can be described in terms of a 4 × 7 generator matrix, such as ⎡1 1 0 1 0 0 0⎤ ⎢0 1 1 0 1 0 0⎥ G=⎢ . (1.34) 0 0 1 1 0 1 0⎥ ⎢ ⎥ ⎣0 0 0 1 1 0 1⎦ The codewords are obtained as linear combination of the rows of G, where all the operations are computed modulo 2 in each vector element, that is, in the field 𝔽2 . The code is the row space of G. For a message vector 𝐦 = [m1 , m2 , m3 , m4 ] the codeword is 𝐜 = 𝐦G. For example, if 𝐦 = [1, 1, 0, 0], then

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

𝐜 = 𝐦G = [1, 1, 0, 1, 0, 0, 0] ⊕ [0, 1, 1, 0, 1, 0, 0] = [1, 0, 1, 1, 1, 0, 0]. It can be verified that the minimum distance of the Hamming code is dmin = 3, so the code is capable of correcting 1 error in every block of n bits. The codewords for this code are [0, 0, 0, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0, 0], [0, 1, 1, 0, 1, 0, 0], [1, 0, 1, 1, 1, 0, 0] [0, 0, 1, 1, 0, 1, 0], [1, 1, 1, 0, 0, 1, 0], [0, 1, 0, 1, 1, 1, 0], [1, 0, 0, 0, 1, 1, 0] [0, 0, 0, 1, 1, 0, 1], [1, 1, 0, 0, 1, 0, 1], [0, 1, 1, 1, 0, 0, 1], [1, 0, 1, 0, 0, 0, 1] [0, 0, 1, 0, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [0, 1, 0, 0, 0, 1, 1], [1, 0, 0, 1, 0, 1, 1].

(1.35)

The Hamming decoding algorithm presented here is slightly more complicated than for the repetition code. (There are other decoding algorithms.) Every (n, k) linear block code has associated with it a (n − k) × n matrix H called the parity check matrix, which has the property that GH T = 0 (all the operations are performed modulo 2). This means that a vector 𝐯 ∈ 𝔽2n (a vector of binary elements) 𝐯H T = 0 if and only if the vector 𝐯 is a codeword. The parity check matrix is not unique. For the written as ⎡1 0 H = ⎢0 1 ⎢ ⎣0 0

(1.36)

generator G of (1.34), the parity check matrix can be 1 1 1 0 0⎤ 0 1 1 1 0⎥ . ⎥ 1 0 1 1 1⎦

(1.37)

The matrix H can be expressed in terms of its columns as ] [ H = 𝐡1 𝐡2 𝐡3 𝐡4 𝐡5 𝐡6 𝐡7 . It may be observed that (for the Hamming code) the columns of H consist of the binary representations of the numbers 1 through 7 = n, though not in numerical order. On the basis of this observation, we can generalize to other Hamming codes. Hamming codes of length n = 2m − 1 and dimension k = 2m − m − 1 exist for every m ≥ 2, having parity check matrices whose columns are binary representations of the numbers from 1 through n.

00000

α10 α11 α12 α13 α14 α15 α16

mindist.m

34

A Context for Error Correction Coding 1.9.1

Hard-Input Decoding Hamming Codes

Suppose that a codeword 𝐜 is sent and the vector through a BSC is 𝐫 = 𝐜 ⊕ 𝐧 (addition modulo 2). The first decoding step is to compute the syndrome 𝐬 = 𝐫H T = (𝐜 ⊕ 𝐧)H T = 𝐧H T (all operations in 𝔽2 ). Because of Property (1.36), the syndrome depends only on the error 𝐧 and not on the transmitted codeword. The codeword information is “projected away.” Since a Hamming code is capable of correcting only a single error, suppose that 𝐧 is all zeros except at a single position, 𝐧 = [n1 , n2 , n3 , . . . , n7 ] = [0, . . . , 0, 1, 0, . . . , 0], where the 1 is equal to ni . (That is, the error is in the ith position.) Let us write H T in terms of its rows: ⎡𝐡T1 ⎤ ⎢ T⎥ ⎢𝐡 ⎥ T H = ⎢ 2⎥. ⎢⋮⎥ ⎢ T⎥ ⎣𝐡n ⎦ Then the syndrome is [ 𝐬 = 𝐫H T = 𝐧H T = n1 n2

⎡𝐡T1 ⎤ ⎢ ⎥ ] ⎢𝐡T2 ⎥ . . . nn ⎢ ⎥ = 𝐡Ti . ⎢⋮⎥ ⎢ T⎥ ⎣𝐡n ⎦

The error position i is the column i of H that is equal to the (transpose of) the syndrome 𝐬T . Algorithm 1.1 Hamming Code Decoding 1. For the received binary vector 𝐫, compute the syndrome 𝐬 = 𝐫H T . If 𝐬 = 0, then the decoded codeword is 𝐜̂ = 𝐫. 2. If 𝐬 ≠ 0, then let i denote the column of H which is equal to 𝐬T . There is an error in position i of 𝐫. The decoded codeword is 𝐜̂ = 𝐫 + 𝐧i , where 𝐧i is a vector which is all zeros except for a 1 in the ith position.

This decoding procedure fails if more than one error occurs. Example 1.13

Suppose that the message 𝐦 = [m1 , m2 , m3 , m4 ] = [0, 1, 1, 0]

is encoded, resulting in the codeword 𝐜 = [0, 1, 1, 0, 1, 0, 0] + [0, 0, 1, 1, 0, 1, 0] = [0, 1, 0, 1, 1, 1, 0]. When 𝐜 is transmitted over a BSC, the vector 𝐫 = [0, 1, 1, 1, 1, 1, 0] is received. The decoding algorithm proceeds as follows:

1.9 Hamming Codes

35

1. The syndrome 𝐬 = [0, 1, 1, 1, 1, 1, 0]H T = [1, 0, 1] is computed. 2. This syndrome corresponds to column 3 of H. The decoded value is therefore

α α2 α3 α4 α5 α6 α7 α8 α9 +

𝐜̂ = 𝐫 + [0, 0, 1, 0, 0, 0, 0] = [0, 1, 0, 1, 1, 1, 0],

01010

which is the transmitted codeword. ◽

The expression for the probability of bit error is significantly more complicated for Hamming codes than for repetition codes. We defer on the details of these computations to the appropriate location (Section 3.7) and simply plot the results here. The available energy per encoded bit is Ec = Eb (k∕n) = 4Eb ∕7, so, as for the repetition code, there is less energy available per bit. This represents a loss of 10 log10 (4∕7) = −2.4 dB of energy per transmitted bit compared to the uncoded system. The decrease in energy per bit is not as great as for the repetition code, since the rate is higher. Figure 1.19 shows the probability of bit error for uncoded channels (the solid line), and for the coded bits — that is, the bits coded with energy Ec per bit — (the dashed line). The figure also shows the probability of bit error for the bits after they have been through the decoder (the dash-dot line). In this case, the decoded bits do have a lower probability of error than the uncoded bits. For the uncoded system, to achieve a probability of error of Pb = 10−6 requires an SNR of 10.5 dB, while for the coded system, the same probability of error is achieved with 10.05 dB. The code was able to overcome the 2.4 dB of loss due to rate, and add another 0.45 dB of improvement. We say that the coding gain of the system (operated near 10 dB) is 0.45 dB: we can achieve the same performance as a system expending 10.5 dB SNR per bit, but with only 10.05 dB of expended transmitter energy per bit.

100

Probability of bit error Pb

10−2 10−4 10−6

Coding gain

10−8 BER with no coding

10−10

BER with coding Decoded BER

10−12

Asymptotic soft-input decoding 10−14

4

5

6

7

8 9 Eb/N0 (dB)

10

11

12

Figure 1.19: Performance of the (7, 4) Hamming code in the AWGN channel. Also shown in Figure 1.19 is the asymptotic (most accurate for large SNR) performance of soft-decision decoding. This is somewhat optimistic, having better performance than might be achieved in practice. But it does show the potential that soft-decision decoding has: it is significantly better than the hard-input decoding.

00000

α10 α11 α12 α13 α14 α15 α16

hamcode74pe.m nchoosektest.m

36

A Context for Error Correction Coding 1.9.2

Other Representations of the Hamming Code

In the brief introduction to the Hamming code, we showed that the encoding and decoding operations have matrix representations. This is because Hamming codes are linear block codes, which will be explored in Chapter 3. There are other representations for Hamming and other codes. We briefly introduce these here as bait and lead-in to further chapters. As these representations show, descriptions of codes involve algebra, polynomials, graph theory, and algorithms on graphs, in addition to the linear algebra we have already seen. 1.9.2.1

An Algebraic Representation

The columns of the parity check matrix H can example, we could write ⎡1 0 H = ⎢0 1 ⎢ ⎣0 0 as

be represented using a symbol for each column. For 1 1 1 0 0⎤ 0 1 1 1 0⎥ ⎥ 1 0 1 1 1⎦

] [ H = 𝜷1 𝜷2 𝜷3 𝜷4 𝜷5 𝜷6 𝜷7 ,

where each 𝛽i represents a 3-tuple. Then the syndrome 𝐬 = 𝐫H T can be represented as 𝐬=

7 ∑

ri 𝜷 i .

i=1

Then 𝐬 = 𝜷 j for some j, which indicates the column where the error occurred. This turns the decoding problem into a straightforward algebra problem. A question we shall take up later is how to generalize this operation. That is, can codes be defined which are capable of correcting more than a single error, for which finding the errors can be computed using algebra? In order to explore this question, we will need to carefully define how to perform algebra on discrete objects (such as the columns of H) so that addition, subtraction, multiplication, and division are defined in a meaningful way. Such algebraic operations are defined in Chapters 2 and 5. 1.9.2.2

A Polynomial Representation

Examination of the codewords in (1.35) reveals an interesting fact: if 𝐜 is a codeword, then so is every cyclic shift of 𝐜. For example, the codeword [1, 1, 0, 1, 0, 0, 0] has the cyclic shifts [0, 1, 1, 0, 1, 0, 0], [0, 0, 1, 1, 0, 1, 0], [0, 0, 0, 1, 1, 0, 1], [1, 0, 0, 0, 1, 1, 0], [0, 1, 0, 0, 0, 1, 1], [1, 0, 1, 0, 0, 0, 1], which are also codewords. Codes for which all cyclic shifts of every codeword are also codewords are called cyclic codes. As we will find in Chapter 4, Hamming codes, like most block codes of modern interest, are cyclic codes. In addition to the representation using a generator matrix, cyclic codes can also be represented using polynomials. For the (7, 4) Hamming code, there is a generator polynomial g(x) = x3 + x + 1 and a corresponding parity-check polynomial h(x) = x4 + x2 + x + 1, which is a polynomial such that h(x)g(x) = x7 + 1. In general, for a cyclic code of length n, a generator polynomial g(x) must divide xn + 1 (without remainder, operations in 𝔽2 ), and the quotient is the parity check polynomial, xn + 1 h(x) = , g(x) where the operations are done in 𝔽2 . The encoding operation can be represented using polynomial multiplication (with coefficient operations modulo 2). For this reason, the study of polynomial operations and the study of algebraic objects built out of polynomials is of great interest. The parity

1.9 Hamming Codes

37

check polynomial can be used to check if a polynomial is a code polynomial: A polynomial r(x) is a code polynomial if and only if r(x)h(x) is a multiple of xn + 1, so that r(x)h(x) xn + 1 divides with remainder 0. This provides a parity check condition: for this Hamming code (n = 7) compute r(x)h(x) modulo x7 + 1. If this is not equal to 0, then r(x) is not a code polynomial. Example 1.14

The message 𝐦 = [m0 , m1 , m2 , m3 ] = [0, 1, 1, 0] can be represented as a polynomial as m(x) = m0 + m1 x + m2 x2 + m3 x3 = 0 ⋅ 1 + 1 ⋅ x + 1 ⋅ x2 + 0 ⋅ x3 = x + x2 .

The code polynomial is obtained by c(x) = m(x)g(x), or c(x) = (x + x2 )(1 + x + x3 ) = (x + x2 + x4 ) + (x2 + x3 + x5 ) = x + 2x2 + x3 + x4 + x5 = x + x3 + x4 + x5 , (where 2x2 = 0 modulo 2), which corresponds to the code vector 𝐜 = [0, 1, 0, 1, 1, 1, 0].

1.9.2.3



A Trellis Representation

As we will see in Chapter 12, there is a graph associated with a block code. This graph is called the Wolf trellis for the code. We shall see that paths through the graph correspond to vectors 𝐯 that satisfy the parity check condition 𝐯H T = 0. For example, Figure 1.20 shows the trellis corresponding to the parity check matrix (1.37). The trellis states at the kth stage are obtained by taking all possible binary linear combinations of the first k columns of H. In Chapter 12, we will develop a decoding algorithm which essentially finds the best path through the graph. One such decoding algorithm is called the Viterbi algorithm. Such decoding algorithms will allow us to create soft-decision decoding algorithms for block codes.

h1 = [100] h2 = [010] h3 = [101] h4 = [110] h5 = [111] h6 = [011] h7 = [001]

000 001 010 011 100 101 110 111 s1

s2

s3

s4

s5

s6

s7

s8

Figure 1.20: The trellis of a (7, 4) Hamming code.

The Viterbi algorithm is also used for decoding codes which are defined using graphs similar to that of Figure 1.20. Such codes are called convolutional codes.

38

A Context for Error Correction Coding 1.9.2.4

The Tanner Graph Representation

Every linear block code also has another graph which represents it called the Tanner graph. For a parity check matrix, the Tanner graph has one node to represent each column of H (the “bit nodes,” denoted by ci in the figure) and one node to represent each row of H (the “check nodes,” denoted by zi in the figure). Edges occur only between bit nodes and check nodes. There is an edge between a bit node and a check node if there is a 1 in the parity check matrix at the corresponding location. For example, for the parity check matrix of (1.37), the Tanner graph representation is shown in Figure 1.21. Algorithms to be presented in Chapter 15 describe how to propagate information through the graph in order to perform decoding. These algorithms are usually associated with codes which are iteratively decoded, such as turbo codes and low-density parity-check codes. These modern families of codes have very good behavior, sometimes nearly approaching capacity.

c1 c2 c3 c4 c5 c6 c7 Bit nodes

z1 z2 z3 Check nodes

Figure 1.21: The Tanner graph for a (7, 4) Hamming code.

1.10 The Basic Questions The two simple codes we have examined so far bring out issues relevant for the codes we will investigate: 1. How is the code described and represented? 2. How is encoding accomplished? 3. How is decoding accomplished? (This frequently takes some cleverness!) 4. How are codewords to be represented, encoded, and decoded, in a computationally tractable way? 5. What is the performance of the code? What are the properties of the code? (e.g., How many codewords? What are the weights of the codewords?) 6. Are there other families of codes which can provide better coding gains? 7. How can these codes be found and described? 8. Are there constraints on allowable values of n, k, and dmin ? 9. Is there some limit to the amount of coding gain possible? 10. For a given available SNR, is there a lower limit on the probability of error that can be achieved? Questions of this nature shall be addressed throughout the remainder of this book, presenting the best answers available at this time.

1.12 A Bit of Information Theory

39

1.11 Historical Milestones of Coding Theory We present in Table 1.1 a brief summary of major accomplishments in coding theory and some of the significant contributors to that theory, or expositors who contributed by bringing together the significant contributions to date. Some dates and contributions may not be exactly as portrayed here; it is difficult to sift through the sands of recent history. Also, significant contributions to coding are made every month, so this cannot be a complete list.

1.12 A Bit of Information Theory This section serves as an appendix to this introductory chapter, providing background on the basic definitions and concepts of information theory. As information-theoretic concepts are invoked throughout this book, readers may want to refer to this section for this background material. 1.12.1 1.12.1.1

Information- Theoretic Definitions for Discrete Random Variables Entropy and Conditional Entropy

Definition 1.15 Let X be a discrete random variable taking values in a set  = {x1 , x2 , . . . , xm } with probabilities P(X = xi ), xi ∈ . Different notations for these probabilities are pX (xi ), p(xi ), or pi . The entropy of X is H(X) = E[−log2 P(X)] = − =−





P(X = x) log2 P(X = x) = −

x∈

m ∑

pi log2 pi

i=1

pX (x) log2 pX (x).

x∈

When logarithm base is 2, the units of entropy are in bits. When the logarithm base is e (natural log), the units of entropy are in nats . ◽ The entropy represents the uncertainty there is about X prior to its measurement; equivalently, it is the amount of information gained when X is measured. The entropy of X depends on the probabilities with which X takes on its values, not the particular values that X takes on, so it is a function of pX , not . When the base of logarithms is 2 (as above), then the entropy is in units of bits. When the base of logarithms is e, the entropy is in units of nats. Writing the base of the entropy as Hb (X), it is straightforward to show that Hb (X) = (logb a)Ha (X). For example, He (X) = (loge 2)H2 (X). Thus, to convert from bits to nats, multiply by loge 2. It is straightforward to show that for any distribution, H(X) ≥ 0. Example 1.16

Let

{ X=

1 0

with probability p with probability 1 − p.

Then H(X) = −p log p − (1 − p) log(1 − p). This is also denoted as H(p) (since it depends only on p). For example, when p = 0.1, then H(X) = 0.469 bits (using log2 ) and H(X) = 0.3251 nats (using loge ).



40

A Context for Error Correction Coding Table 1.1: Historical Milestones Year Milestone

Year Milestone

1948 Shannon publishes “A Mathematical Theory of Communication” [404] 1950 Hamming describes Hamming codes [181] 1954 Reed [362] and Muller [321] both present Reed–Muller codes and their decoders 1955 Elias introduces convolutional codes [109] 1957 Prange introduces cyclic codes [348] 1959 A. Hocquenghem [201] and . . . 1960 Bose and Ray-Chaudhuri [48] describe BCH codes Reed and Solomon produce eponymous codes [364] Peterson provides a solution to BCH decoding [336] 1961 Peterson produces his book [337], later extended and revised by Peterson and Weldon [338] 1962 Gallager introduces LDPC codes [148] 2400 BPS modem commercially available (4-PSK) (see [138]) 1963 The Fano algorithm for decoding convolutional codes introduced [113] Massey unifies the study of majority logic decoding [297] 1966 Forney produces an in-depth study of concatenated codes [126] introduces generalized minimum distance decoding [127] 1967 Berlekamp introduces a fast algorithm for BCH/Reed–Solomon decoding [36] Rudolph initiates the study of finite geometries for coding [385] 4800 BPS modem commercially available (8-PSK) (see [138]) 1968 Berlekamp produces Algebraic Coding Theory [33] Gallager produces Information theory and reliable communication [147] 1969 Jelinek describes the stack algorithm for decoding convolutional codes [219] Massey introduces his algorithm for BCH decoding [295] Reed–Muller code flies on Mariner deep space probes using Green machine decoder 1971 Viterbi introduces the algorithm for ML decoding of convolutional codes [468] 9600 BPS modem commercially available (16-QAM) (see [138]) 1972 The BCJR algorithm is described in the open literature [21]

1973 Forney elucidates the Viterbi algorithm [128] 1975 Sugiyama et al. propose the use of the Euclidean algorithm for decoding [424] 1977 MacWilliams and Sloane produce the encyclopedic The Theory of Error Correcting Codes [292] Voyager deep space mission uses a concatenated RS/convolutional code (see [305]) 1978 Wolf introduces a trellis description of block codes [488] 1980 14,400 BPS modem commercially available (64-QAM) (see [138]) Sony and Phillips standardize the compact disc, including a shortened Reed–Solomon code 1981 Goppa introduces algebraic-geometry codes [163, 164] 1982 Ungerboeck describes trellis-coded modulation [452] 1983 Lin and Costello produce their engineering textbook [271] Blahut publishes his textbook [45] 1984 14,400 BPS TCM modem commercially available (128-TCM) (see [138]) 1985 19,200 BPS TCM modem commercially available (160-TCM) (see [138]) 1993 Berrou, Glavieux, and Thitimajshima announce turbo codes [39] 1994 Theℤ4 linearity of families of nonlinear codes is announced [182] 1995 MacKay resuscitates LDPC codes [290] Wicker publishes his textbook [483] 1996 33,600 BPS modem (V.34) modem is commercially available (see [136]) 1998 Alamouti describes a space-time code [8] 1999 Guruswami and Sudan present a list decoder for RS and AG codes [172] 2000 Aji and McEliece [6] (and others [255]) synthesize several decoding algorithms using message passing ideas 2002 Hanzo, Liew, and Yeap characterize turbo algorithms in [187] 2003 Koetter and Vardy extend the GS algorithm for soft-decision decoding of RS codes [251] 2004 Lin and Costello second edition [272] 2009 Ar𝚤kan invents polar codes [16] 2020 Moon produces what is hoped to be a valuable 2nd Edition!

1.12 A Bit of Information Theory

41

Now suppose that Y = f (X) for some probabilistic function f (X). (For example, Y might be the output of a noisy channel that has X as the input, such that Y = X + N for a discrete random variable N.) Let  denote the set of possible Y outcomes. Let PY (y) = P(Y = y). X and Y are jointly distributed with joint probability PXY (x, y). The conditional probability of X, given that Y = y is P(X|Y = y) =

PXY (X, y) . PY (y)

H(X|y) is the uncertainty remaining about X when Y is measured as Y = y: ∑ PX|Y (x|y) log2 PX|Y (x|y) (bits). H(X|y) = E[−log2 PX|Y (X|y)] = − x∈

Then the average uncertainty in X averaged over the outcomes Y is called the conditional entropy, H(X|Y), computed as ∑∑ ∑ H(X|y)PY (y) = − PX|Y (x|y)PY (y) log2 PX|Y (x|y) H(X|Y) = y∈

∑∑

=−

y∈ x∈

PXY (x, y) log2 PX|Y (x|y) (bits).

y∈ x∈

The joint entropy is defined by H(X, Y) = −

∑∑

PXY (x, y) log2 PXY (x, y).

x∈ y∈

It is straightforward to show that H(X, Y) = H(X) + H(Y|X). This is referred to as the chain rule for entropy. This has the interpretation: The uncertainty in the joint variables (X, Y) is the uncertainty in X plus the uncertainty remaining in Y when X is given. By symmetry, H(X, Y) = H(Y) + H(X|Y). This rule also holds if all arguments are conditioned on the same event: H(X, Y|Z) = H(X|Z) + H(Y|X, Z). The chain rule for conditional entropy can be extended to multiple variables: H(X1 , X2 , . . . , Xn ) =

n ∑

H(Xi |Xi−1 , . . . , X1 )

i=1

= H(X1 ) + H(X2 |X1 ) + H(X3 |X2 , X1 ) + · · · + H(Xn |Xn−1 , . . . , X1 ).

1.12.1.2

Relative Entropy, Mutual Information, and Channel Capacity

Definition 1.17 Let P(x) and Q(x) be two probability mass functions on the outcomes in . The Kullback–Leibler distance D(P||Q) between two probability mass functions, also known as the relative entropy or the cross entropy is [ ] ∑ P(x) P(x) D(P||Q) = EP log P(x) log = . Q(x) Q(x) x∈ ◽

42

A Context for Error Correction Coding Lemma 1.18 D(P||Q) ≥ 0, with equality if and only if p = q; that is, if the two distributions are the same. Proof We use the inequality log x ≤ x − 1, with equality only at x = 1 (where the log is loge ). This inequality appears so frequently in information theory it has been termed the information theory inequality. Then ∑ Q(x) P(x) P(x) log =− Q(x) P(x) x∈ x∈ [ ] ∑ Q(x) P(x) 1 − ≥ (information theory inequality) P(x) x∈ ∑ = P(x) − Q(x) = 0.

D(P||Q) =



P(x) log



x∈

Definition 1.19 Let X be a random variable taking values in the set  and let Y be a random variable taking values in the set , and let X and Y be jointly and marginally distributed as PXY (x, y), PX (x), and PY (y), respectively. The mutual information between X and Y is the Kullback–Leibler distance between the joint distribution PXY (x, y) and the product of the marginals PX (x)PY (y): I(X; Y) = D(PXY ||PX PY ) =

∑∑

PXY (x, y) log

x∈ y∈

PXY (x, y) . PX (x)PY (y)

(1.38) ◽

If X and Y are independent, so that PXY (x, y) = PX (x)PY (y), then I(X; Y) = 0. That is, Y tells no information at all about X. From the definition, I(X; Y) = I(Y; X)

and

I(X; X) = H(X).

Using the definitions, mutual information can also be written as I(X; Y) = H(X) − H(X|Y). This is shown as follows: I(X; Y) =



PXY (x, y) log

x,y

=



PXY (x, y) log

x,y

=−



PXY (x, y) PX (x)PY (y) PX|Y (x|y) PX (x)

PXY (x, y) log PX (x) +

x,y

=−





( PX (x) log PX (x) −

x



PXY (x, y) log PX|Y (x|y)

x,y



) PXY (x, y) log PX|Y (x|y)

x,y

= H(X) − H(X|Y). The mutual information is the difference between the average uncertainty in X and the uncertainty in X there still is after measuring Y. Thus, it quantifies how much information Y tells about X. Since the Definition (1.38) is symmetric, also I(X; Y) = H(Y) − H(Y|X).

1.12 A Bit of Information Theory

43

We can also write I(X; Y) = H(X) + H(Y) − H(X, Y). In light of Lemma 1.18, we see that mutual information I(X; Y) can never be negative. Definition 1.20 The conditional mutual information of random variables X and Y given Z is I(X; Y|Z) = H(X|Z) − H(X|Y, Z).

(1.39) ◽

Similar to entropy, mutual information has a chain rule: I(X1 , X2 , . . . , Xn ; Y) =

n ∑

I(Xi ; Y|Xi−1 , Xi−2 , . . . , X1 )

i=1

= I(X1 ; Y) + I(X2 ; Y|X1 ) + I(X3 ; Y|X2 , X1 )

(1.40)

+ · · · + I(Xn ; Y|Xn−1 , Xn−2 , . . . , X1 ). 1.12.2

Data Processing Inequality

Let X, Y, and Z be random variables described by mass functions (or pdfs, in the case of continuous random variables). By the properties of conditional probability, for any three such random variables, the joint density factors as p(x, y, z) = p(x)p(y|x)p(z|x, y). (Here, the argument is used to also denote the random variables involved). In the particular case the X, Y, and Z form a Markov chain (in that order), the joint probability mass function (or pdf) can be factored as p(x, y, z) = p(x)p(y|x)p(z|y), that is, p(z|x, y) = p(z|y), so that Z is conditionally independent of X given Y. Variables X, Y, and Z forming a Markov chain are conditionally independent, given Y, since p(x, z|y) =

p(x, y, z) p(x)p(y|x)p(z|y) p(x, y)p(z|y) = = = p(x|y)p(z|y). p(y) p(y) p(y)

The following theorem is known as the data processing inequality. Theorem 1.21 If X, Y and Z form a Markov chain (in that order), then I(X; Y) ≥ I(X; Z). Proof Expand the mutual information using the chain rule in two different ways: I(X; Y, Z) = I(X; Z) + I(X; Y|Z) = I(X; Y) + I(X; Z|Y). Since, as observed above, X and Z are conditionally independent given Y, I(X; Z|Y) = 0, so I(X; Y) = I(X; Z) + I(X; Y|Z). Since I(X; Y|Z) ≥ 0, it follows that

I(X; Y) ≥ I(X; Z).



44

A Context for Error Correction Coding An interpretation of this can be given in light of Figure 1.22. X is an input to a channel (or system) which produces an output Y. There is some information that Y provides about X, as determined by I(X; Y). The Y data is processed by some processing block (which is not limited in any way) to produce an output Z. What the data processing inequality teaches is that Z cannot provide more information about X than was already available from Y. (It might provide a more useful representation, but it cannot create information that is not there originally.)

X

Channel (or system)

Y

Processing

Z

Figure 1.22: Illustration of data processing inequality.

1.12.3

Channels

From an information theoretic view, a channel is a model of the medium by which an input is conveyed to an output, often in a setting of communication. Since communication in the presence of noise involves random perturbations to the input, a channel is described by a transition probability, as follows. Definition 1.22 A discrete channel is a system with an input alphabet  and an output alphabet , with a transition probability PY|X (y|x) (or, more briefly, p(y|x)), which is the probability of observing the output symbol y ∈  when the input symbol is x ∈ . ◽ A channel is said to be memoryless if the probability distribution of the output depends only on the input at the corresponding time, and is conditionally independent of channel inputs or outputs at other times. Let 𝐱 = x1 , x2 , . . . , xn be a sequence of inputs and 𝐲 = y1 , y2 , . . . , yn be the corresponding sequence of outputs. Then for a memoryless channel, the joint conditional probability factors: p(𝐲|𝐱) =

n ∏

p(yi |xi ).

i=1

1.12.3.1

Binary Symmetric Channel

An important example of a channel is the BSC . This models a situation in which bits, having alphabet  = {0, 1}, may be flipped as they traverse the channel. A transmitted 0 is received (correctly) as a 0 with probability 1 − p, and as a 1 (incorrectly) with probability p, and correspondingly for a transmitted 1. Thus, p is the probability that a bit is corrupted as it passed through the channel. p is referred to as the crossover probability. A model for the BSC is shown in Figure 1.23(a). For the BSC, the mutual information between the input and the output is I(X; Y) = H(Y) − H(Y|X) ∑ = H(Y) − p(x)H(Y|X = x) x∈{0,1}

= H(Y) −

∑ x

≤ 1 − H(p).

p(x)H(P) = H(Y) − H(p)

1.12 A Bit of Information Theory

45

The last inequality is because Y is a binary random variable (maximum entropy = 1). When the probability distribution on the input is PX (0) = PX (1) = 12 , then by symmetry PY (0) = PY (1) = 12 , and the mutual information is maximized.

0 0

1–p p

Y 1

0

𝛼

0

X p 1–p

1–𝛼

X

E

1

Y

𝛼 1

(a)

1–𝛼 (b)

1

Figure 1.23: BSC and BEC models. (a) Binary symmetric channel with crossover; (b) binary erasure channel with erasure probability. A useful representation of the channel transition probabilities p(y|x) is as a transition matrix whose xth row and yth column represents p(y|x). For the BSC, the transmission matrix is [ ] 1−p p p(y|x) = . p 1−p 1.12.3.2

Binary Erasure Channel

Another important channel is the binary erasure channel (BEC). In the BEC, bits are not corrupted but some bits are lost. This might model, for example, an internet connection in which when packets are received, the bits in the channel are (essentially) reliably received, but some packets may be lost, resulting in lost bits. A model for the BEC is shown in Figure 1.23(b). The input alphabet is binary,  = {0, 1}, and the output alphabet has three symbols, 0, 1, and a symbol denoting an erasure,  = {0, 1, E}. The probability that a symbol is erased is 𝛼: P(Y = E|X = 0) = P(Y = E|X = 1) = 𝛼. The transition matrix is p(y|x) =

] [ 1−𝛼 𝛼 0 . 0 𝛼 1−𝛼

The mutual information between the input and the output is I(X; Y) = H(Y) − H(Y|X) = H(Y) − H(𝛼) where H(Y|X) = H(𝛼) (the entropy for a binary channel with crossover probability p). It can be shown that H(Y) is maximized when P(X = 1) = P(X = 0) = 12 , and in that case, I(X; Y) = 1 − 𝛼. 1.12.3.3

Noisy Typewriter

Less practical, but helpful to think about the nature of channels, is the noisy typewriter channel. In this case, the input set and output sets are  =  = {A, B, C, . . . , Z}. The channel input is either

46

A Context for Error Correction Coding received unchanged, or the next letter in the alphabet is received, with probability 12 . The mutual information is I(X; Y) = H(Y) − H(Y|X). For every given X, there are two possible outcomes Y, so H(Y|X) = 1 bit. When all input symbols are equally likely, then all output symbols are equally likely and H(Y) = log2 26 bits. For the uniform input probabilities, then, I(X; Y) = log2 26 − 1 = log2 13. 1.12.3.4

Symmetric Channels

Definition 1.23 [82, p. 190] A channel is said to be symmetric if the rows of the channel transition matrix are permutations of each other and the columns are permutations of each other. A channel is said to be weakly symmetric if every row of the transition matrix is a permutation ∑ ◽ of every other row, and all the column sums x p(y|x) are equal. The BSC introduced above is an example of a symmetric channel, but the BEC is not a symmetric channel. However, the BEC is weakly symmetric. 1.12.4

Channel Capacity

Definition 1.24 The channel capacity C of a channel with input X and output Y is defined as the maximum mutual information between X and Y, where the maximum is taken over all possible input distributions. C = max I(X; Y). PX (x) ◽ For a (weakly) symmetric channel, let 𝐫 denote the probabilities of one of the rows and let H(𝐫) be the entropy computed using these probabilities. The mutual information is I(X; Y) = H(Y) − H(Y|X) = H(Y) − H(𝐫). The entropy of Y is bounded by log ||, so I(X; Y) = log || − H(𝐫). For a symmetric channel, when the input distribution is uniform, PX (x) = 1∕||, then the output distribution is also uniform, so H(Y) = log ||. Thus C = max I(X; Y) = log || − H(𝐫). p(x)

(1.41)

For the BSC with crossover probability p, the capacity is C = 1 − H2 (p), which is achieved when the input symbols are chosen with P(X = 0) = P(X = 1) = 12 . For the BEC with erasure probability 𝛼, the capacity is C = 1 − 𝛼, which is achieved when the input symbols are chosen with P(X = 0) = P(X = 1) = 12 .

1.12 A Bit of Information Theory 1.12.5

47

Information Theoretic Definitions for Continuous Random Variables

Let Y be a continuous random variable taking on values in an (uncountable) set , with pdf pY (y). The differential entropy is defined as H(Y) = −E[log2 pY (y)] = − pY (y) log2 pY (y) dy. ∫ Whereas entropy for discrete random variables is always nonnegative, differential entropy (for a continuous random variable) can be positive or negative. Example 1.25

Let Y ∼  (0, 𝜎 2 ). Then ] [ [ ] ) ( 1 1 1 −Y 2 ∕2𝜎 2 2 = −E log2 √ H(Y) = −E log2 √ e + log2 (e) − 2 (Y ) 2𝜎 2𝜋𝜎 2𝜋𝜎 1 1 E[Y 2 ] + log2 2𝜋𝜎 2 2 2𝜎 2 1 1 1 = log2 (e) + log2 2𝜋𝜎 2 = log2 2𝜋e𝜎 2 (bits). 2 2 2

= log2 (e)



It can be shown that, of all continuous random variables having mean 0 and variance 𝜎 2 , the Gaussian  (0, 𝜎 2 ) has the largest differential entropy. Let X be a discrete-valued random variable taking on values in the alphabet  with probability Pr(X = x) = PX (x), x ∈  and let X be passed through a channel which produces a continuous-valued output Y for Y ∈ . A typical example of this is the AWGNC , where Y = X + N, and N ∼  (0, 𝜎 2 ). Let pXY (x, y) = pY|X (y|x)PX (x) denote the joint distribution of X and Y and let ∑ ∑ pXY (x, y) = pY|X (y|x)PX (x) pY (y) = x∈

x∈

denote the pdf of Y. Then the mutual information I(X; Y) is computed as I(X; Y) = D(pXY (X, Y)||PX (X)pY (Y)) = =

∑ x∈

∫y∈

∑ ∫

pY|X (y|x)PX (x) log2 ∑

pXY (x, y) log2



pXY (x, y) dy pY (y)PX (x)

pY|X (y|x) x′ ∈ pY|X (y|x

′ )P (x′ ) X

dy.

Example 1.26 Suppose  = {a, −a} (e.g., BPSK modulation with amplitude a) with probabilities P(X = a) = P(X = −a) = 12 . Let N ∼  (0, 𝜎 2 ) and let Y = X + N.

48

A Context for Error Correction Coding Because the channel has only binary inputs, this is referred to as the binary additive white Gaussian noise channel (BAWGNC). Then [ ∞ p(y|a) 1 p(y|a) log2 1 I(X; Y) = 2 ∫−∞ (p(y|a) + p(y| − a)) 2

+ p(y| − a) log2

]

p(y| − a) 1 (p(y|a) 2

dy

+ p(y| − a))



1 p(y|a) log2 p(y|a) + p(y| − a) log2 p(y| − a) 2 ∫−∞ [ ] 1 − (p(y|a) + p(y| − a)) log2 (p(y|a) + p(y| − a)) dy 2 1 = [−H(Y) − H(Y) 2 ∞ [ ] ] 1 − (p(y|a) + p(y| − a)) log2 (p(y|a) + p(y| − a)) dy ∫−∞ 2 =



= −

∫−∞

𝜙(y, a, 𝜎 2 ) log2 𝜙(y, a, 𝜎 2 ) dy −

1 log2 2𝜋e𝜎 2 (bits), 2

(1.42)

(1.43)

where we define the function 2 2 2 2 1 [e−(y−a) ∕2𝜎 + e−(y+a) ∕2𝜎 ]. 𝜙(y, a, 𝜎 2 ) = √ 8𝜋𝜎 2



When both the channel input X and the output Y are continuous random variables, then the mutual information is p (x, y) I(X; Y) = D(pXY (X, Y)||PX (x)pY (Y)) = p (x, y) log2 XY dy dx. ∫ ∫ XY pY (y)PX (x) Example 1.27

Let X ∼  (0, 𝜎x2 ) and N ∼  (0, 𝜎n2 ), independent of X. Let Y = X + N. Then Y ∼  (0, 𝜎x2 + 𝜎n2 ). I(X; Y) = H(Y) − H(Y|X) = H(Y) − H(X + N|X) = H(Y) − H(N|X) 1 1 log2 2𝜋e𝜎y2 − log2 2𝜋e𝜎n2 2 2 ( ) 𝜎2 1 = log2 1 + x2 (bits). 2 𝜎n = H(Y) − H(N) =

(1.44)

The quantity 𝜎x2 represents the average power in the transmitted signal X and 𝜎n2 represents the average power in the noise signal N. This channel is called the AWGNC. ◽

As for the discrete channel, the channel capacity C of a channel with input X and output Y is the maximum mutual information between X and Y, where the maximum is over all input distributions. In Example 1.26, the maximizing distribution is, in fact, the uniform distribution, P(X = a) = P(X = −a) = 12 , so (1.43) is the capacity for the BAWGNC. In Example 1.27, the maximizing distribution is, in fact, the Gaussian distribution (since this maximizes the entropy of the output), so (1.44) is the capacity for the AWGNC. 1.12.6

The Channel Coding Theorem

The channel capacity has been defined as the maximum mutual information between the input and the output. But Shannon’s channel coding theorem, tells us what the capacity means. Recall that an error

1.12 A Bit of Information Theory

49

correction code has a rate R = k∕n, where k is the number of input symbols and n is the number of output symbols, the length of the code. The channel coding theorem says this: Provided that the coded rate of transmission R is less than the channel capacity, for any given probability of error 𝜖 specified, there is an error correction code of length n0 such that there exist codes of length n exceeding n0 for which the decoded probability of error is less than 𝜖.

That is, provided that we transmit at a rate less than capacity, arbitrarily low probabilities of error can be obtained, if a sufficiently long error correction code is employed. The capacity is the amount of information that can be transmitted reliably through the channel per channel use. A converse to the channel coding theorem states that for a channel with capacity C, if R > C, then the probability of error is bounded away from zero: reliable transmission is not possible. The channel coding theorem is an existence theorem; it tells us that codes exist that can be used for reliable transmission, but not how to find practical codes. Shannon’s remarkable proof used random codes. But as the code gets long, the decoding complexity of a truly random (unstructured) code goes up exponentially with the length of the code. Since Shannon’s proof, engineers and mathematicians have been looking for ways of constructing codes that are both good (meaning they can correct a lot of errors) and practical, meaning that they have some kind of structure that makes decoding of sufficiently low complexity that decoders can be practically constructed. Figure 1.24 shows a comparison of the capacity of the AWGNC and the BAWGNC channels as a function of Ec ∕𝜎 2 (an SNR measure). In this figure, we observe that the capacity of the AWGNC increases with SNR beyond 1 bit per channel use, while the BAWGNC asymptotes to a maximum of 1 bit per channel use — if only binary data is put into the channel, only 1 bit of useful information can be obtained. It is always the case that CAWGNC > CBAWGNC . Over all possible input distributions, the Gaussian distribution is information maximizing, so CAWGNC is an upper bound on capacity for any modulation or coding that might be employed. However, at very low SNRs, CAWGNC and CBAWGNC are very nearly equal. √Figure 1.24 also shows the capacity of the equivalent BSC, with crossover probability p = Q( Ec ∕𝜎 2 ) and capacity CBSC = 1 − H2 (p). This corresponds to hard-input decoding. Clearly, there is some loss of potential rate due to hard-input decoding, although the loss diminishes as the SNR increases. 1.12.7

“Proof” of the Channel Coding Theorem

In this section, we present a “proof” of the channel coding theorem. While mathematically accurate, it is not complete. The arguments can be considerably tightened, but are sufficient to show the main ideas of coding. Also, the proof is only presented for the discrete input/discrete channel case. The intuition, however, generally carries over to the Gaussian channel. An important preliminary concept is the “asymptotic equipartition property” (AEP). Let X be a random variable taking values in a set . Let 𝐗 = (X1 , X2 , . . . , Xn ) be an i.i.d. (independent, identically distributed) random vector and let 𝐱 denote an outcome of 𝐗. Theorem 1.28 (AEP) Let X = (X1 , X2 , . . . , Xn ), where Xi are i.i.d., and let x = (x1 , x2 , . . . , xn ) be a particular draw. As n → ∞ (1.45) P(X = x) ≈ 2−nH(X) , x ∈  , The interpretation of this theorem is as follows. Note that the probability 2−nH(X) is a function of the distribution of X, not the particular outcome 𝐱. By the AEP, most of the probability of a draw of 𝐗 is “concentrated” in a set that occurs with probability 2−nH(X) . the typical set. That is, a “typical” outcome

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

plocapcmp.m cawgnc2.m cbawgnc2.m h2.m

50

A Context for Error Correction Coding 1.5

Capacity (bits/channel use)

AWGNC capacity BAWGNC capacity BSC capacity

1

0.5

0

0

1

2

3 Ec /σ2

4

5

6

Figure 1.24: Capacities of AWGNC, BAWGNC, and BSC. is likely to occur, while an outcome which is not “typical” is not likely to occur. Since the “typical” outcomes all have approximately the same probability, given by (1.45), there must be approximately 2nH(X) outcomes in the typical set  . Proof We sketch the main idea of the proof. Let the outcome space for a random variable Y be  = {b1 , b2 , . . . , bK }, occurring with probabilities Pi = P(Y = bi ). Out of n samples of the i.i.d. variable Y, let ni be the number of outcomes that are equal to bi . By the law of large numbers,7 when n is large, ni (1.46) ≈ Pi . n The product of n observations can be written as n

n

n

(n ∕n) (n ∕n)

(n ∕n)

y1 y2 · · · yn = b11 b22 · · · bKK = [b1 1 b2 2 · · · bK K ]n ]n [ ∑K ni ]n [ n1 n2 nK = 2 n log2 b1 2 n log2 b2 · · · 2 n log2 bK = 2 i=1 n log2 bi ]n [ ∑K ≈ 2 i=1 Pi log2 bi (by (1.46)) = [2E[log2 Y] ]n .

(1.47)

Now suppose that Y is, in fact, a function of a random variable X, Y = f (X), where, specifically, this function is the probability mass function of X: f (x) = PX (x) = P(X = x). Then n ∏ y1 y2 · · · yn = f (x1 )f (x2 ) · · · f (xn ) = PX (xi ) = P(𝐗 = 𝐱). i=1

Also, by (1.47), y1 y2 · · · yn = f (x1 )f (x2 ) · · · f (xn ) =

n ∏ i=1

This establishes (1.45).

pX (xi ) ≈ [2E[log2 pX (X)] ]n = 2−nH(X) . ◽

7 Thorough proof of the AEP merely requires putting all of the discussion in the formal language of the weak law of large numbers.

1.12 A Bit of Information Theory

51

Let X be a binary source with entropy H(X) and let each X be transmitted through a memoryless channel to produce the output Y. Consider transmitting the sequence of i.i.d. outcomes x1 , x2 , . . . , xn . While the number of possible sequences is M = 2n , the typical set has only about 2nH(X) sequences in it. Let the total possible number of output sequences 𝐲 = y1 , y2 , . . . , yn be N. There are about 2nH(Y) ≤ N typical output sequences. For each typical output sequence 𝐲 there are approximately 2nH(X|Y) input sequences that could have caused it. Furthermore, each input sequence 𝐱 typically could produce 2nH(Y|X) output sequences. This is summarized in Figure 1.25(a). 2n H (X|Y) typical inputs causing each output sequence

2n H (X|Y) typical inputs for each output sequence

2n H (X ) typical input sequences y

y

x x 2n H (Y) typical output sequences

M possible input sequences

N possible output sequences

2n H (Y /X) typical outputs for each intput sequence

(a)

(b)

Figure 1.25: Relationship between input and output entropies for a channel. Each • or ◾ represents a sequence. (a) Basic input output relationship; (b) ◾ represents codewords. Now let X be coded by a rate-R code to produce a coded sequence which selects, out of the 2n possible input sequences, only 2nR of these. In Figure 1.25(b), these coded sequences are denoted with filled squares, ◾. The mapping which selects the 2nR points is the code. Rather than select any particular code, we contemplate using all possible codes at random (using, however, only the typical sequences). Under the random code, a sequence selected at random is a codeword with probability 2nR

= 2n(R−H(X)) . 2nH(X) Now consider the problem of correct decoding. A sequence 𝐲 is observed. It can be decoded correctly if there is only one code vector 𝐱 that could have caused it. From Figure 1.25(b), the probability that none of the points in the “fan” leading to 𝐲 other than the original code point is a codeword is P = (probability a point 𝐱 is not a codeword)(typical number of inputs for this nH(X|Y)

= (1 − 2n(R−H(X)) )2

𝐲)

.

If we now choose R < maxPX (x) H(X) − H(X|Y), that is, choose R < the capacity C, then R − H(X) + H(X|Y) < 0

52

A Context for Error Correction Coding for any input distribution PX (x). In this case, R − H(X) = −H(X|Y) − 𝜂 for some 𝜂 > 0. Then

nH(X|Y)

P = (1 − 2n(−H(X|Y)−𝜂) )2

.

Expanding out using the binomial expansion, ( nH(X|Y) ) 2 2n(−H(X|Y)−𝜂) + higher order terms, P=1− 1 so as n → ∞,

P → 1 − 2−n𝜂 → 1.

Thus, the probability that none of the points except the original code point leading to 𝐲 is a codeword approaches 1, so that the probability of decoding error — due to multiple codewords mapping to a single received vector — approaches 0. We observe that if the average of an ensemble approaches zero, then there are elements in the ensemble that must approach 0. Thus, among all the possible codes in the ensemble, there are codes for which the probability of error approaches zero as n → ∞. There are two other ways of viewing the coding rate requirement. The 2nH(Y|X) typical sequences resulting from transmitting a vector 𝐱 must partition the 2nH(Y) typical output sequences, so that each observed output sequence can be attributed to a unique input sequence. The number of subsets in this partition is 2nH(Y) = 2n(H(Y)−H(Y|X)) , 2nH(Y|X) so the condition R < H(Y) − H(Y|X) must be enforced. Alternatively, the 2nH(X) typical input sequences must be partitioned so that the 2nH(X|Y) typical input sequences associated with an observation 𝐲 are disjoint. There must be 2n(H(X)−H(X|Y)) distinct subsets, so again the condition R < H(X) − H(X|Y) must be enforced. Let us summarize what we learn from the proof of the channel coding theorem:

• •

As long as R < C, arbitrarily reliable transmission is possible.



Since the theorem was based on ensembles of random codes, it does not specify what the best code should be. We don’t know how to “design” the best codes, we only know that they exist.



However, random codes have a high probability of being good. So we are likely to get a good code simply by picking one at random!

The code lengths, however, may have to be long to achieve the desired reliability. The closer R is to C, the larger we would expect n to need to be in order to obtain some specified level of performance.

So what, then, is the issue? Why the need for decades of research in coding theory, if a code can simply be selected at random? The answer has to do with the complexity of representing and decoding the code. To represent a random code of length n, there must be memory to store all the codewords, which requires n2Rn bits. Furthermore, to decode a received word 𝐲, ML decoding for a random code requires that a received vector 𝐲 must be compared with all 2Rn possible codewords. For a R = 1∕2 code with n = 1000 (a relatively modest code length and a low-rate code), 2500 comparisons must be made for each received vector. This is prohibitively expensive, beyond practical feasibility for even massively parallel computing systems, let alone a portable communication device. Ideally, we would like to explore the space of codes parameterized by rate, probability of decoding error, block length (which governs latency), and encoding and decoding complexity, identifying thereby

1.12 A Bit of Information Theory

53

all achievable tuples of (R, P, n, 𝜒 E , 𝜒 D ), where P is the probability of error and 𝜒 E and 𝜒 D are the encoding and decoding complexities. This is an overwhelmingly complex task. The essence of coding research has taken the pragmatic stance of identifying families of codes which have some kind of algebraic or graphical structure that will enable representation and decoding with manageable complexity. In some cases what is sought are codes in which the encoding and decoding can be accomplished readily using algebraic methods — essentially so that decoding can be accomplished by solving sets of equations. In other cases, codes employ constraints on certain graphs to reduce the encoding and decoding complexity. Most recently, families of codes have been found for which very long block lengths can be effectively obtained with low complexity using very sparse representations, which keep the decoding complexity in check. Describing these codes and their decoding algorithms is the purpose of this book. The end result of the decades of research in coding is that the designer has a rich palette of code options, with varying degrees of rate and encode and decode complexity. This book presents many of the major themes that have emerged from this research. 1.12.8

Capacity for the Continuous-Time AWGN Channel

Let Xi be a zero-mean random variable with E[Xi2 ] = 𝜎x2 which is input to a discrete AWGN channel, so that Ri = Xi + Ni , where the Ni are i.i.d. Ni ∼  (0, 𝜎n2 ). The capacity of this channel is ( ) 𝜎x2 1 C = log2 1 + 2 bits/channel use. 2 𝜎n Now consider sending a continuous-time signal x(t) according to x(t) =

n ∑

Xi 𝜑i (t),

i=1

where the 𝜑i (t) functions are orthonormal over [0, T]. Let us suppose that the transmitter power available is P watts, so that the energy dissipated in T seconds is E = PT. This energy is also expressed as n T ∑ x2 (t) dt = Xi2 . E= ∫0 i=1

We must therefore have

n ∑

Xi2 = PT

i=1

or

nE[Xi2 ]

𝜎x2

= PT, so that = PT∕n. Now consider transmitting a signal x(t) through a continuous-time channel with bandwidth W. By the sampling theorem (frequently attributed to Nyquist, but in this context it is frequently called Shannon’s sampling theorem), a signal of bandwidth W can be exactly characterized by 2W samples/second — any more samples than this cannot convey any more information about this bandlimited signal. So we can get 2W independent channel uses per second over this bandlimited channel. There are n = 2WT symbols transmitted over T seconds. If the received signal is R(t) = x(t) + N(t), where N(t) is a white Gaussian noise random process with two-sided power spectral density N0 ∕2, then in the discrete-time sample Ri = xi + Ni ,

54

A Context for Error Correction Coding T

where Ri = ∫0 R(t)𝜑i (t) dt, the variance of Ni is 𝜎 2 = N0 ∕2. The capacity for this bandlimited channel is ) ) ( ( PT∕n 1 log 1 + bits/channel use (2W channel uses/second) C= 2 2 N0 ∕2 ( ) 2PT = W log2 1 + bits/second. nN0 Now, using n = 2WT, we obtain C = W log2 (1 + P∕N0 W) bits/second.

(1.48)

Since P is the average transmitted power, in terms of its units, we have P=

energy = (energy/bit)(bits/second). second

Since Eb is the energy/bit and the capacity is the rate of transmission in bits per second, we have P = Eb Rb = Eb C (when transmitting at capacity), giving ( ) C Eb . (1.49) C = W log2 1 + W N0 Let 𝜂 = C∕W be the spectral efficiency in bits/second/Hz; this is the data rate available for each Hertz of channel bandwidth. From (1.49), ( ) Eb 𝜂 = log2 1 + 𝜂 N0 or Eb ∕N0 =

2𝜂 − 1 . 𝜂

(1.50)

For BPSK the spectral efficiency is 𝜂 = 1 bit/second/Hz, so (1.50) indicates that it is theoretically possible to transmit arbitrarily reliably at Eb ∕N0 = 1, which is 0 dB. In principle, then, it should be possible to devise a coding scheme which could transmit BPSK-modulated signals arbitrarily reliably at an SNR of 0 dB. By contrast, for uncoded transmission when Eb ∕N0 = 9.6 dB, the BPSK performance shown in Figure 1.10 has Pb = 10−5 . There is at least 9.6 dB of coding gain possible. The approximately 0.44 dB of gain provided by the (7,4) Hamming code of Section 1.9 falls over 9 dB short of what is theoretically possible! α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

plotcbawgn2.m cbawgnc.m cawgnc.m philog.m phifun.m

1.12.9

Transmission at Capacity with Errors

By the channel coding theorem, zero probability of error is (asymptotically) attainable provided that the transmission rate is less than the capacity. What if we allow a nonvanishing probability of error. What is the maximum rate of transmission? Or, equivalently, for a given rate, which is the minimum SNR that will allow transmission at that rate, with a specified probability of error? The theoretical tools we need to address these questions are the separation theorem and rate–distortion theory. The separation theorem says that we can separately consider and optimally (at least, asymptotically) data compression and error correction. Suppose that the source has a rate of r bits/second. First, compress the information so that the bits of the compressed signal match the bits of the source signal with probability p. From rate-distortion theory, this produces a source at rate 1 − H2 (p) per source bit (see (1.2)). These compressed bits, at a rate r(1 − H2 (p)) are then transmitted over the channel with vanishingly small probability of error. We must therefore have r(1 − H2 (p)) < C. The maximum rate achievable with average distortion (i.e., probability of bit error) p, which we denote

1.12 A Bit of Information Theory

55

10−2

Pb

10−3 BAWGNC R = 0.25 BAWGNC R = 0.5 BAWGNC R = 0.75 BAWGNC R = 0.9 AWGNC R = 0.25 AWGNC R = 0.5 AWGNC R = 0.75 AWGNC R = 0.9

10−4

10−5

10−6 −2

−1

0

1 Eb/N0 (dB)

2

3

4

Figure 1.26: Capacity lower bounds on Pb as a function of SNR. as C(p) , is therefore C(p) =

C . 1 − H2 (p)

Figure 1.26 shows the required SNR Eb ∕N0 for transmission at various rates for both the BAWGNC and the AWGNC. For any given line in the plot, the region to the right of the plot is achievable — it should theoretically be possible to transmit at that probability of error at that SNR. Curves such as these therefore represent a goal to be achieved by a particular code: we say that we are transmitting at capacity if the performance falls on the curve. We note the following from the plot:



At very low SNR, the binary channel and the AWGN channel have very similar performance. This was also observed in conjunction with Figure 1.24.

• •

The higher the rate, the higher the required SNR. The vertical asymptote (as Pb → 0) is the capacity C for that channel.

1.12.10

The Implication of the Channel Coding Theorem

The implication of the channel coding theorem, fundamentally, is that for a block code of length n and rate R = k∕n, the probability of a block decoding error can be bounded as P(E) ≤ 2−nEb (R) ,

(1.51)

where Eb (R) is a positive function of R for R < C. Work on a class of codes known as convolutional codes — to be introduced in Chapter 12 has shown (see, e.g., [471]) that P(E) ≤ 2−(m+1)nEc (R) ,

(1.52)

where m is the memory of the code and Ec (R) is positive for R < C. The problem, as we shall see (and what makes coding such a fascinating topic) is that, in the absence of some kind of structure, as either

56

A Context for Error Correction Coding n or m grow, the complexity can grow exponentially. How should this structure be introduced, to make the encoders and decoders tractable, while producing good error correction capability? 1.12.11

Non-Asymptotic Information Theory

The channel coding theorem described above is an asymptotic theory, asserting the existence of codes producing low probability of error, as the code length n goes to infinity. This asymptotic theory serves best when the code lengths are long. Recent technological developments, however, are pushing toward short codes. For example, in a cybercontrol system, near real-time response cannot wait around for a long codeword to be formed, so short codewords protecting brief data packets are needed to move data around in a responsive way. Thus, it becomes increasingly of interest to consider what the information-theoretic performance limits are for short codes. These issues are addressed by finite-block length, non-asymptotic information theory, briefly developed in this section. A code (not necessarily linear) may be described by the parameters (n, M, 𝜖), where:

• • •

n is the length of the code; M is the number of codewords in the code; 𝜖 is the probability of error that the code achieves.

Since M messages can be indexed by log2 M bits, the rate of the code (the number of bits to select a codeword divided by the codeword length) is R=

log2 M . n

Expressed in this language, Shannon’s channel coding theorem can be expressed as: there exists sequences of codes described by (n, Mn , 𝜖n ) such that R = lim

n→∞

log2 Mn > 0. n

that is, there is a positive asymptotic rate, and 𝜖n → 0. Here, Mn is the maximum number of codewords associated with a code of length n, and 𝜖n is the probability of error associated with the code of length n. Let M ∗ (n, 𝜖) = max{M ∶ ∃(n, M, 𝜖) − code} be the maximum number of codewords of length n such that it is possible to recover the original message with probability at least 1 − 𝜖. M ∗ (n, 𝜖) is basic to non-asymptotic information theory. Shannon’s asymptotic result can be expressed as lim lim

𝜖→0 n→∞

1 ∗ M (n, 𝜖) = C, n

where C is the capacity. For any 0 < 𝜖 < 1, it is also true that lim

n→∞

1 ∗ M (n, 𝜖) ≤ C. n

Thus, log2 M ∗ (n, 𝜖) = nC + o(n).

1.12 A Bit of Information Theory

57

We define the information density as iX;Y (x; y) = log

PXY (x, y) PX (x)PY (y)

The expectation of the information density is the mutual information, E[iX;Y (x; y)] = I(X; Y). Under asymptotic information theory, lim

n→∞

1 log M ∗ (n, 𝜖) = E[iX;Y (x, y)] = C. n 2

Asymptotic information theory is thus seen to be a first-order theory, accounting for only the mean value of the information density. 1.12.11.1

Discrete Channels

Under non-asymptotic information theory, it can be shown [343, Section 3.2.2] that for a discrete channel (such as a BSC with crossover probability 𝛿) the Gaussian approximation holds: √ 1 1 V −1 1 ∗ log M (n, 𝜖) = C − Q (𝜖) + log2 n + O(1). (1.53) n 2 n 2n n Here, Q−1 is the inverse of the Q function defined by ∞

Q(x) =

∫x

1 −y2 ∕2 dy. √ e 2𝜋

Also, V is called the channel dispersion, and is computed as the variance of the information density: V = var[iX;Y (x, y)]. For the BSC, it can be shown by computation that ) ( 1−𝛿 2 , V = 𝛿(1 − 𝛿) log 𝛿 so that 1 log M ∗ (n, 𝜖) ≈ C − n 2



𝛿(1 − 𝛿) −1 1 1−𝛿 Q (𝜖) log2 + log2 n. n 𝛿 2n

(1.54)

There are two other bounds that are interesting to compare with the Gaussian approximation. An upper bound is expressed as follows. Theorem 1.29 [343, Theorem 40, p. 51] For a BSC with crossover probability 𝛿, the size M of an (n, M, 𝜖) code is bounded by 1 (1.55) M≤ n , 𝛽1−𝜖 n where 𝛽1−𝜖 is computed as follows.

• •

Let 𝛼 = 1 − 𝜖. Determine the integer L and the number 𝜆 ∈ [0, 1) such that 𝛼 = (1 − 𝜆)𝛼L + 𝜆𝛼L+1 ,

58

A Context for Error Correction Coding where 𝛼𝓁 =

𝓁−1 (

∑ n) (1 − 𝛿)n−k 𝛿 k . k k=0

(Since 𝛼𝓁 is an increasing function of 𝓁, select L as the smallest value such that 𝛼𝓁 > 𝛼, then 𝜆 will be within its bounds.)



Compute n 𝛽1−𝜖 = (1 − 𝜆)𝛽L + 𝜆𝛽L+1 ,

where 𝛽t = 2−n

t ( ) ∑ n k=0

k

.

The proof is provided in [343]. This theorem describes a non-asymptotic converse to the channel coding theorem: in order to achieve a given probability of error 𝜖, the number of codewords in the code n cannot exceed 1∕𝛽1−𝜖 . This provides an upper bound on the rate of the code. Another bound is provided by the following. Theorem 1.30 [343, Corollary 39, p. 49] For the BSC with crossover probability 𝛿, there exists an (n, M, 𝜖) code (where 𝜖 represents the average probability of error) such that ) ( n ( ) t ( ) ∑ ∑ n t n n−t −n 𝜖≤ 𝛿 (1 − 𝛿) min 1, (M − 1)2 . (1.56) t s t=0 s=0

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

nonasymptinfol.m qfinv.m lognchoosek.m

This is an existence theorem. For a given probability of error, codes with a small number of codewords M (low rate) could be used. But this theorem promises that there exist codes bounding the probability of error 𝜖. This provides a lower bound on what values of M are possible. To employ this as a bound, a value of M which achieves the bound (1.56) is found. In the plot below, this was done by a binary search on the value log2 M∕n, starting with a bracketing solution of M = 2 and a value of M large enough that the right-hand side of (1.56) exceeds 𝜖. Figure 1.27 shows these bounds for a BSC with crossover probability 𝛿 = 0.11 (giving a capacity C = 1 − H(𝛿) = 0.5) and maximal block error 𝜖 = 10−3 , compared with the capacity C = 1 − H(𝛿) = 0.5. While (1.55) and (1.56) are actual bounds and not approximations, the plot shows that the Gaussian approximation (1.54) closely tracks these bounds, and has the benefit of being much easier to compute than the bounds. As the figure shows, there is a substantial gap between the asymptotic rate (Capacity C = 0.5) and the non-asymptotic rate — at lengths of n = 2000 bits, the rate achieving 𝜖 = 10−3 is only about 0.43, so only 88% of the asymptotic capacity is achievable even at n = 2000. For the BEC channel, the Gaussian approximation is [343, Section 3.3.2] √ 𝛿(1 − 𝛿) −1 1 1 ∗ log M (n, 𝜖) = (1 − 𝛿) log 2 − Q (𝜖) log 2 + O(1). (1.57) n n n 1.12.11.2

The AWGN Channel

For the Gaussian channel (a continuous channel), with codewords (c1 , c2 , . . . , cn ) satisfying the power constraint n ∑ c2i = nP i=1

1.12 A Bit of Information Theory

59

0.6

Rate (bits/channel use)

0.5

0.4

0.3

0.2 Capacity Upper bound Lower bound Gaussian approximation

0.1

0

0

500

1000 Block length (n)

1500

2000

Figure 1.27: Rate-block length tradeoff for the BSC with 𝛿 = 0.11 and maximal block error rate 𝜖 = 10−3 . having unit-covariance noise, let M ∗ (n, 𝜖, P) denote the number of codewords in a code of length satisfying the power constraint. Shannon’s (asymptotic) capacity is lim lim

𝜖→0 n→∞

1 1 log M ∗ (n, 𝜖, P) = log(1 + P) ≜ C(P). n 2

For the non-asymptotic theory, 1 ∗ M (n, 𝜖, P) ≤ C(P) − n



V(P) −1 1 Q (𝜖) + O(1). n n

Here, V(P) is the channel dispersion V(P) =

P P+2 (log e)2 . 2 (P + 1)2

It can be shown [343] that this approximation closely tracks other bounds. Figure 1.28 shows the asymptotic rate (capacity) compared with the Gaussian approximation. Again, the finite length effect is clear. Even with a codelength of n = 2000, only 88% of the channel capacity is achievable.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

nonasymptinfo2.m

1.12.11.3

Comparison of Codes

Polyanskiy [343] suggests the use of M ∗ as a means of comparing different codes. For a code with M codewords, he defines (for the AWGN channel) the ratio Rnorm (𝜖) =

log M , log M ∗ (n, 𝜖, 𝛾min (𝜖))

where 𝛾min (𝜖) is the smallest SNR at which the code will admit decoding with probability of error less than 𝜖. Practically, M ∗ (n, 𝜖, 𝛾min (𝜖)) is approximated using (1.57) with negligible loss in accuracy.

60

A Context for Error Correction Coding

Rate (bits/channel use)

0.5

0.4

0.3

0.2

0.1 Capacity Gaussian approximation 0

0

500

1000 Block length (n)

1500

2000

Figure 1.28: Rate-block length tradeoff for the Gaussian channel with P = 1 and maximal block error rate 𝜖 = 10−3 . “Better” codes should have Rnorm (𝜖) which is near 1. A comparison of the sorts of codes discussed in this book is provided in [343].

Programming Laboratory 1: Simulating a Communications Channel

Objective In this lab, you will simulate a BPSK communication system and a coded system with a Hamming code employing hard-input decoding rules. Background Reading: Sections 1.5, 1.7, 1.9. In the case of BPSK, an exact expression for the probability of error is available, (1.25). However, in many more interesting communication systems, a closed-form expression for the probability of error is not available or is difficult to compute. Results must be therefore obtained by simulation of the system. One of the great strengths of the signal-space viewpoint is that probability of error simulations can be made based only on points in the signal space. In other words, it suffices

to simulate random variables as in the matched filter output (1.12), rather than creating the continuous-time functions as in (1.10). (However, for other kinds of questions, a simulation of the continuous-time function might be necessary. For example, if you are simulating the effect of synchronization, timing jitter, delay, or fading, simulating the time signal is probably necessary.) A framework for simulating a communication system from the signal space point of view for the purpose of computing the probability of error is as follows: Algorithm 1.2 Outline for Simulating Digital Communications 1

2 3 4 5

6

7

Initialization: Store the points in the signal constellation. Fix Eb (typically Eb = 1). FOR each signal-to-noise ratio 𝛾 = Eb ∕N0 : Compute N0 = Eb ∕𝛾 and 𝜎 2 = N0 ∕2. DO: Generate some random bit(s) (the ‘‘transmitted’’ bits). according to the bit probabilities Map the bit(s) into the signal constellation (e.g., BPSK or 8-PSK) to create signal 𝐬. Generate a Gaussian random vector 𝐧 (noise) with variance 𝜎 2 = N0 ∕2 in each signal direction.

1.12 A Bit of Information Theory

8

9

10 11 12 13 14

15

Add the noise to the signal to create the matched filter output signal 𝐫 = 𝐬 + 𝐧. Perform a detection on the symbol. (e.g., find closest point in signal constellation to 𝐫). From the detected symbol, determine the detected bits. Compare detected bits with the transmitted bits. Accumulate the number of bits in error. UNTIL at least N bit errors have been counted. The estimated probability of error at this SNR is Pe ≈ number of errors counted number of bits generated End FOR

As a general rule, the more errors N you count, the smaller will be the variance of your estimate of the probability of error. However, the bigger N is, the longer the simulation will take to run. For example, if the probability of error is near 10−6 at some particular value of SNR, around 1 million bits must be generated before you can expect an error. If you choose N = 100, then 100 million bits must be generated to estimate the probability of error, for just that one point on the plot! Use of Coding in Conjunction with the BSC For an (n, k) code having rate R = k∕n transmitted with energy per bit equal to Eb , the energy per coded bit is Ec = Eb R. It is convenient to fix the coded energy per bit in the simulation. To simulate the BSC channel with coding, the following outline can be used. Algorithm 1.3 Outline for Simulating (n, k)-coded Digital Communications 1

2 3 4 5 6

7 8

9 10 11 12 13

14

Initialization: Store the points in the signal constellation. Fix Ec (typically Ec = 1). Compute R. FOR each signal-to-noise ratio 𝛾 = Eb ∕N0 : Compute N0 = Ec ∕(R𝛾) and 𝜎 2 = N0 ∕2. √ Compute the BSC crossover probability p = Q( 2Ec ∕N0 ). DO: Generate a block of k ‘‘transmitted’’ input bits and accumulate the number of bits generated Encode the input bits to n codeword bits Pass the n bits through the BSC (flip each bit with probability p) Run the n bits through the decoder to produce k output bits Compare the decoded output bits with the input bits Accumulate the number of bits in error UNTIL at least N bit errors have been counted. The estimated probability of error is Pe ≈ number of errors counted number of bits generated End FOR

61 The encoding and decoding operations depend on the kind of code used. In this lab, you will use codes which are among the simplest possible, the Hamming codes. Since for linear codes the codeword is irrelevant, the simulation can be somewhat simplified by assuming that the input bits are all zero so that the codeword is also all zero. For the Hamming code, the simulation can be arranged as follows: Algorithm 1.4 Outline for Simulating (n, k) Hamming-coded Digital Communications 1 2 3 4 5 6

7 8 9

10

11 12 13 14

Fix Ec (typically Ec = 1). Compute R. FOR each signal-to-noise ratio 𝛾 = Eb ∕N0 : Compute N0 = Ec ∕(R𝛾) and 𝜎 2 = N0 ∕2. √ Compute the BSC crossover probability p = Q( 2Ec ∕N0 ). DO: Generate 𝐫 as a vector of n random bits which are 1 with probability p Increment the number of bits generated by k. Compute the syndrome 𝐬 = 𝐫H T . If 𝐬 ≠ 0, determine the error location based on the column of H which is equal to 𝐬 and complement that bit of 𝐫 Count the number of decoded bits (out of k) in 𝐫 which match the all-zero message bits Accumulate the number of bits in error. UNTIL at least N bit errors have been counted. Compute the probability of error. End FOR

The coding gain for a coded system is the difference in the SNR required between uncoded and coded systems achieving the same probability of error. Usually the coding gain is expressed in dB. Assignment Preliminary Exercises Show that if X is a random variable with mean 0 and variance 1 then Y = aX + b is a random variable with mean b and variance a2 . Programming Part BPSK Simulation 1. Write a program that will simulate a BPSK communication system with unequal prior bit probabilities. Using your program, create data from which to plot the probability of bit error obtained from your simulation for SNRs in the range from 0 to 10 dB, for the three cases that P0 = 0.5 (in which case your

62

A Context for Error Correction Coding plot should look much like Figure 1.10), P0 = 0.25, and P0 = 0.1. Decide on an appropriate value of N. 2. Prepare data from which to plot the theoretical probability of error (1.24) for the same three values of P0 . (You may want to combine these first two programs into a single program.) 3. Plot the simulated probability of error on the same axes as the theoretical probability of error. The plots should have Eb ∕N0 in dB as the horizontal axis and the probability as the vertical axis, plotted on a logarithmic scale (e.g., semilogy in MATLAB).

2. Repeat this for a (15, 11) Hamming code. (See page 112 and Equations (3.6) and (3.4).)

Resources and Implementation Suggestions



The function gran provides a unit Gaussian random variable, generated using the Box–Muller transformation of two uniform random variables. The function gran2 returns two unit Gaussian random variables. This is useful for simulations in two-dimensional signal constellations.

4. Compare the theoretical and simulated results. Comment on the accuracy of the simulation and the amount of time it took to run the simulation. Comment on the importance of theoretical models (where it is possible to obtain them). 5. Plot the probability of error for P0 = 0.1, P0 = 0.25 and P0 = 0.5 on the same axes. Compare them and comment.



1. Write a program that will simulate an 8-PSK communication system with equal prior bit probabilities. Use a signal constellation in which the points are numbered in Gray code order. Make your program so that you can estimate both the symbol error probability and the bit error probability. Decide on an appropriate value of N. 2. Prepare data from which to plot the bound on the probability of symbol error Ps using (1.26) and probability of bit error Pb using (1.28).

4. Compare the theoretical and simulated results. Comment on the accuracy of the bound compared to the simulation and the amount of time it took to run the simulation.

Coded BPSK Simulation 1. Write a program that will simulate performance of the (7, 4) Hamming code over a BSC channel with √ channel crossover probability p = Q( 2Eb ∕N0 ) and plot the probability of error as a function of Eb ∕N0 in dB. On the same plot, plot the theoretical probability of error for uncoded BPSK transmission. Identify what the coding gain is for a probability of error Pb = 10−5 .

There is nothing in this lab that makes the use of C++ imperative, as opposed to C. However, you may find it useful to use C++ in the following ways: (a) Create an AWGN class to represent a 1-D or 2-D channel. (b) Create a BSC class. (c) Create a Hamming code class to take care of encoding and decoding (as you learn more about coding algorithms, you may want to change how this is done). (d) In the literature, points in two-dimensional signal constellations are frequently represented as points in the complex plane. You may find it convenient to do similarly, using the complex number capabilities that are present in C++.

8-PSK Simulation

3. Plot the simulated probability of symbol error and bit error on the same axes as the bounds on the probabilities of error.

A unit Gaussian random variable has mean 0 and variance 1. Given a unit Gaussian random variable, using the preliminary exercise, it is straightforward to generate a Gaussian random variable with any desired variance.



Since the horizontal axis of the probability of error plot is expressed as a ratio Eb ∕N0 , there is some flexibility in how to proceed. Given a value of Eb ∕N0 , you can either fix N0 and determine Eb , or you can fix Eb and determine N0 . An example of how this can be done is in testrepcode.cc.



The function uran generates a uniform random number between 0 and 1. This can be used to generate a bit which is 1 with probability p.



The Q function, used to compute the theoretical probability of error, is implemented in the function qf.



There are two basic approaches to generating the sequence of bits in the simulation. One way is to generate and store a large array of bits (or their resulting signals) then processing them all together. This is effective in a language such as MATLAB, where vectorized operations are faster than using

1.13 Exercises

63

for loops. The other way, and the way recom-

bitsum++; // accumulate if the bit != 0

mended here, is to generate each signal separately and to process it separately. This is recommended because it is not necessarily known in advance how many bits should be generated. The number of bits to be generated could be extremely large — in the millions or even billions when the probability of error is small enough.



For the Hamming encoding and decoding operation, vector/matrix multiply operations over GF(2) are required, such as 𝐜 = 𝐦G. (GF(2) is addition/subtraction/multiplication/division modulo 2.) These could be done in the conventional way using nested for loops. However, for short binary codes, a computational simplification is possible. Write G in terms of its columns as [ ] G = 𝐠1 𝐠2 · · · 𝐠n . Then the encoding process can be written as a series of vector/vector products (inner products) 𝐜 = [c1 , c2 , . . . , cn ] [ ] = 𝐦𝐠1 𝐦𝐠2 · · · 𝐦𝐠n . Let us consider the inner product operation: it consists of element-by-element multiplication, followed by a sum. Let m be an integer variable, whose bits represent the elements of the message vector 𝐦. Also, let g[i] be an integer variable in C whose bits represent the elements of the column 𝐠i . Then the element-by-element multiplication involved in the product 𝐦𝐠i can be written simply using the bitwise-and operator & in C. How, then, to sum up the elements of the resulting vector? One way, of course, is to use a for loop, such as: // Compute c=m*G, where m is a // bit-vector, and G is represented by // g[i] c = 0; // set vector of bits to 0 for(i = 0; i < n; i++) { mg = m & g[i]; // mod-2 multiplication // of all elements bitsum=0; for(j = 0, mask=1; j < n; j++) { // mask selects a single bit if(mg & mask) {

} mask «= 1; // shift mask over by 1 bit } bitsum = bitsum % 2; // mod-2 sum c = c | bitsum*(1«i); // assign to vector of bits ... }

However, for sufficiently small codes (such as in this assignment) the inner for loop can be eliminated by precomputing the sums. Consider the table below. For a given number m, the last column provides the sum of all the bits in m, modulo 2. m

m (binary)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111



m

0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4

s[m] =



m (mod 2) 0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0

To use this in a program, precompute the table of bit sums, then use this to look up the result. An outline follows: // Compute the table s, having all // the bit sums modulo 2 // ... // Compute c=m*G, where // m is a bit-vector, and // G is represented by g[i] c = 0; for(i = 0; i < n; i++) { c = c | s[m & g[i]]*(1«i); // assign to vector of bits }

64

A Context for Error Correction Coding

1.13 Exercises 1.1

Weighted codes. Let s1 , s2 , . . . , sn be a sequence of digits, each in the range 0 ≤ si < p, where p is a prime number. The weighted sum is W = ns1 + (n − 1)s2 + (n − 2)s3 + · · · + 2sn−1 + sn . The final digit sn is selected so that W modulo p is equal to 0. That is, W ≡ 0 (mod p). W is called the checksum. (a) Show that the weighted sum W can be computed by computing the cumulative sum sequence t1 , t2 , . . . , tn by t1 = s 1 , t2 = s 1 + s 2 , . . . , tn = s 1 + s 2 + · · · + s n then computing the cumulative sum sequence 𝑤1 = t1 , 𝑤2 = t1 + t2 , . . . , 𝑤n = t1 + t2 + · · · + tn , with W = 𝑤n . (b) Suppose that the digits sk and sk+1 are interchanged, with sk ≠ sk+1 , and then a new checksum W ′ is computed. Show that if the original sequence satisfies W ≡ 0 (mod p), then the modified sequence cannot satisfy W ′ ≡ 0 (mod p). Thus, interchanged digits can be detected. (c) For a sequence of digits of length < p, suppose that digit sk is altered to some s′k ≠ sk , and a new checksum W ′ is computed. Show that if the original sequence satisfies W ≡ 0 (mod p), then the modified sequence cannot satisfy W ′ ≡ 0 (mod p). Thus, a single modified digit can be detected. Why do we need the added restriction on the length of the sequence? (d) See if the ISBN 0-13-139072-4 is valid. (e) See if the ISBN 0-13-193072-4 is valid.

1.2 1.3

See if the UPCs 0 59280 00020 0 and 0 41700 00037 9 are valid. A coin having P(head) = 0.001 is tossed 10,000 times, each toss independent. What is the lower limit on the number of bits it would take to accurately describe the outcomes? Suppose it were possible to send only 100 bits of information to describe all 10,000 outcomes. What is the minimum average distortion per bit that must be accrued sending the information in this case? 1.4 Show that the entropy of a source X with M outcomes described by (1.1) is maximized when all the outcomes are equally probable: p1 = p2 = · · · = pM . 1.5 Show that (1.7) follows from (1.5) using (1.4). 1.6 Show that (1.12) is true and that the mean and variance of N1i and N2i are as in (1.13) and (1.14). 1.7 Show that the decision rule and threshold in (1.19) and (1.20) are correct. 1.8 Show that (1.24) is correct. 1.9 Show that if X is a random variable with mean 0 and variance 1 then Y = aX + b is a random variable with mean b and variance a2 . 1.10 Show that the detection rule for 8-PSK 𝐬̂ = arg max 𝐫 T 𝐬 𝐬∈

follows from (1.18) when all points are equally likely. 1.11 Consider a series of M BSCs, each with transition probability p, where the outputs of each BSC is connected to the inputs of the next in the series. Show that the resulting overall channel is a BSC and determine the crossover probability as a function of M. What happens as M → ∞? Hint: To simplify, consider the difference of (x + y)n and (x − y)n .

1.13 Exercises

65

1.12 [319] Bounds and approximations to the Q function. For many analyses it is useful to have analytical bounds and approximations to the Q function. This exercise introduces some of the most important of these. (a) Show that



2𝜋Q(x) =

1 −x2 ∕2 − e ∫x x

Hint: integrate by parts. (b) Show that



0
0.

2 1 1 −y2 ∕2 e dy < 3 e−x ∕2 . 2 y x

(c) Hence conclude that √

1

e−x

2𝜋x

2 ∕2

2 1 (1 − 1∕x2 ) < Q(x) < √ e−x ∕2 2𝜋x

x > 0.

(d) Plot these lower and upper bounds on a plot with Q(x) (use a log scale). 2 (e) Another useful bound is Q(x) ≤ 12 e−x ∕2 . Derive this bound. Hint: Identify [Q(𝛼)]2 as the probability that the zero-mean unit-Gaussian random variables lie in the shaded region shown on the left in Figure 1.29, (the region [𝛼, ∞) × [𝛼, ∞)). This probability is exceeded by the probability that (x, y) lies in the shaded region shown on the right (extended out to ∞). Evaluate this probability.

y

y

[Q(𝛼)]2 √2𝛼

𝛼

x

x 𝛼 (a)

√2𝛼 (b)

Figure 1.29: Regions for bounding the Q function.

1.13 Let V2 (n, t) be the number of points in a Hamming sphere of “radius” t around a binary codeword of length n. That is, it is the number of points within a Hamming distance t of a binary vector. Determine a formula for V2 (n, t). 1.14 Show that the Hamming distance satisfies the triangle inequality. That is, for three binary vectors 𝐱, 𝐲, and 𝐳 of length n, show that dH (𝐱, 𝐳) ≤ dH (𝐱, 𝐲) + dH (𝐲, 𝐳). √ 1.15 Show that for BPSK modulation with amplitudes ± Ec , the Hamming √ distance dH and the Euclidean distance dE between a pair of codewords are related by dE = 2 Ec dH . 1.16 In this problem, we will demonstrate that the probability of error for a repetition code decreases exponentially with the code length. Several other useful facts will also be introduced by this problem.

66

A Context for Error Correction Coding (a) Show that

( −nH2 (p)

2 (b) For the fact

If 0 ≤ p ≤

= (1 − p)

n

p 1−p

)np .

∑ (n) 1 ≤ 2nH2 (p) , then 2 i 0≤i≤pn

justify the following steps of its proof ∑ (n) pi (1 − p)n−i i 0≤i≤pn )pn ( ∑ (n) p (1 − p)n ≥ 1−p i 0≤i≤pn ( ) ∑ n . = 2−nH2 (p) i 0≤i≤pn

1 = (p + (1 − p))n ≥

(c) Show that the probability of error for a repetition code can be written as n−(t+1) (

Pne =

∑ j=0

where t = ⌊(n − 1)∕2⌋. (d) Show that

n j

) (1 − p)j p(n−j) ,

√ Pne ≤ [2 p(1 − p)]n

( ) 1.17 [292, p. 14] Identities on

n k

. We can define

⎧ x(x−1)(x−2)···(x−m+1) ( ) ⎪ m! x = ⎨1 m ⎪0 ⎩ Show (that ) (a) (b) (c) (d) (e) (f) (g)

n k

( )

=

n! k!(n−k)!

if m is a positive integer if m = 0 otherwise.

if k is a nonnegative integer.

n k

= 0 if n is an integer and k > n is a nonzero integer. ( ) ( ) n n + = n+1 . k k−1 ( ) ( k ) −n n+k−1 . (−1)k k = k ∑n ( n ) n k=0 k = 2 . ( ) ( ) ∑ ∑ n n n−1 = if n ≥ 1. k even k k odd k = 2 ( ) ∑n k n = 0 if n ≥ 1. k=0 (−1) k

( )

1.18 Show that for soft-decision decoding on the (n, 1) repetition code, (1.33) is correct. 1.19 For the (n, 1) code used over a BSC with crossover probability p, what is the probability that an error event occurs which is not detected? 1.20 Hamming code decoding.

1.13 Exercises

67

(a) For G in (1.34) and H in (1.37), verify that GH T = 0. (Recall that operations are computed modulo 2.) (b) Let 𝐦 = [1, 1, 0, 0]. Determine the transmitted Hamming codeword when the generator of (1.34) is used. (c) Let 𝐫 = [1, 1, 1, 1, 1, 0, 0]. Using Algorithm 1.1, determine the transmitted codeword 𝐜. Also determine the transmitted message 𝐦. (d) The message 𝐦 = [1, 0, 0, 1] is encoded to form the codeword 𝐜 = [1, 1, 0, 0, 1, 0, 1]. The ̂ Is the codeword 𝐜̂ found the vector 𝐫 = [1, 0, 1, 0, 1, 0, 0] is received. Decode 𝐫 to obtain 𝐜. same as the original 𝐜? Why or why not? 1.21 For the (7, 4) Hamming code generator polynomial g(x) = 1 + x + x3 , generate all possible code polynomials c(x). Verify that they correspond to the codewords in (1.35). Take a nonzero codeword c(x) and compute c(x)h(x) modulo x7 + 1. Do this also for two other nonzero codewords. What is the check condition for this code? 1.22 Is it possible that the polynomial g(x) = x4 + x3 + x2 + 1 is a generator polynomial for a cyclic code of length n = 7? 1.23 For the parity check matrix ⎡1 0 1 0 0⎤ H = ⎢0 1 0 1 0⎥ ⎢ ⎥ ⎣0 1 1 0 1⎦ draw the Wolf trellis and the Tanner graph. 1.24 Let X be a random variable taking on the values x = {a, b, c, d} with probabilities 1 1 1 1 P(X = b) = P(X = c) = P(X = d) = . 2 4 8 8 Determine H(X). Suppose that 100 measurements of independent draws of X are made per second. Determine what the entropy rate of this source is. Determine how to encode the X data to achieve this rate. Show that the information inequality log x ≤ x − 1 is true. Show that for a discrete random variable X, H(X) ≥ 0. Show that I(X; Y) ≥ 0 and that I(X; Y) = 0 only if X and Y are independent. Hint: Use the information inequality. Show that the formulas I(X; Y) = H(X) − H(X|Y) and I(X; Y) = H(Y) − H(Y|X) follow from the Definition (1.38). Show that H(X) ≥ H(X|Y). Hint: Use the previous two problems. Show that the mutual information I(X; Y) can be written as ∑ ∑ PY|X (y|x) I(X; Y) = PX (x) PY|X (y|x) log2 ∑ ′ ′ x′ ∈x PX (x )PY|X (y|x ) x∈ y∈ P(X = a) =

1.25 1.26 1.27 1.28 1.29 1.30

x

y

1.31 For a BSC with crossover probability p having input X and output Y, let the probability of the inputs be P(X = 0) = q and P(X = 1) = 1 − q. (a) Show that the mutual information is I(X; Y) = H(Y) + p log2 p + (1 − p) log2 (1 − p). (b) By maximizing over q show that the channel capacity per channel use is C = 1 − H2 (p) (bits).

68

A Context for Error Correction Coding 1.32 Consider the channel model shown here, which accepts three different symbols.

Transmitted symbols

p q

Received symbols

q p

The first symbol is not affected by noise, while the second and third symbols have a probability p of not being corrupted, and a probability q of being changed into the other of the pair. Let 𝛼 = −p log p − q log q, and let P be the probability that the first symbol is chosen and let Q be the probability that either of the other two is chosen, so that P + 2Q = 1. (a) Show that H(X) = −P log P − 2Q log Q. (b) Show that H(X|Y) = 2Q𝛼. (c) Choose the input distribution (i.e., choose P and Q) in such a way to maximize I(X; Y) = H(X) − H(X|Y)) subject to P + 2Q = 1. What is the capacity for this channel? 1.33 Let X ∼  (−a, a) (that is, X is uniformly distributed on [−a, a]). Compute H(X). Compare H(X) with the entropy of a Gaussian distribution having the same variance. 1.34 Let g(x) denote the pdf of a random variable X with variance 𝜎 2 . Show that H(X) ≤

1 log2 2𝜋e𝜎 2 . 2

with equality if and only if X is Gaussian. Hint: Let p(x) denote the pdf of a Gaussian r.v. with variance 𝜎 2 and consider D(g||p). Also, note that log p(x) is quadratic in x. 1.35 Show that H(X + N|X) = H(N).

1.14 References The information age was heralded with Shannon’s work [404]. Thorough coverage of information theory appears in [82], [147] and [498]. The books [300] and [471] place coding theory in its information theoretic context. Our discussion of the AEP follows [26], while our “proof” of the channel coding theorem closely follows Shannon’s original [404]. More analytical proofs appear in the textbooks cited above. See also [458]. Discussion about tradeoffs with complexity are in [376], as is the discussion in Section 1.12.9. Non-asymptotic information theory is developed in the masterful dissertation [343]. The detection theory and signal space background is available in most books on digital communication. See, for example, [26, 319, 344, 354]. Hamming codes were presented in [181]. The trellis representation was presented first in [488]; a thorough treatment of the concept appears in [273]. The Tanner graph representation appears in [432]; see also [148]. Exercise 1.16b comes from [458, p. 21]. The discussion relating to simulating communication systems points out that such simulations can be very slow. Faster results can in some cases be obtained using importance sampling. Some references on importance sampling are [123, 279, 403].

Part II

Block Codes

Chapter 2

Groups and Vector Spaces 2.1

Introduction

Linear block codes form a group and a vector space. Hence, the study of the properties of this class of codes benefits from a formal introduction to these concepts. The codes, in turn, reinforce the concepts of groups and subgroups that are valuable in the remainder of our study. Our study of groups leads us to cyclic groups, subgroups, cosets, and factor groups. These concepts, important in their own right, also build insight in understanding the construction of extension fields which are essential for some coding algorithms to be developed.

2.2

Groups

A group formalizes some of the basic rules of arithmetic necessary for cancellation and solution of some simple algebraic equations. Definition 2.1 A binary operation ∗ on a set is a rule that assigns to each ordered pair of elements of the set (a, b) some element of the set. (Since the operation returns an element in the set, this is actually defined as closed binary operation. We assume that all binary operations are closed.) ◽ ◽

Example 2.2

On the set of positive integers, we can define a binary operation ∗ by a ∗ b = min(a, b).

Example 2.3 argument).

On the set of real numbers, we can define a binary operation ∗ by a ∗ b = a (i.e., the first ◽

Example 2.4 On the set of real numbers, we can define a binary operation ∗ by a ∗ b = a + b. That is, the binary operation is regular addition. ◽

Definition 2.5 A group ⟨G, ∗⟩ is a set G together with a binary operation ∗ on G such that: G1 The operator is associative: for any a, b, c ∈ G, (a ∗ b) ∗ c = a ∗ (b ∗ c). G2 There is an element e ∈ G called the identity element such that a ∗ e = e ∗ a = a for all a ∈ G. G3 For every a ∈ G, there is an element b ∈ G known as the inverse of a such that a ∗ b = b ∗ a = e. The inverse of a is sometimes denoted as a−1 (when the operator ∗ is multiplication-like) or as −a (when the operator ∗ is addition-like). Where the operation is clear from context, the group ⟨G, ∗⟩ may be denoted simply as G.



It should be noted that the notation ∗ and a−1 are generic labels to indicate the concept. The particular notation used is modified to fit the context. Where the group operation is addition, the operator + is used and the inverse of an element a is more commonly represented as −a. When the group operation is multiplication, either ⋅ or juxtaposition is used to indicate the operation and the inverse is denoted as a−1 . Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

72

Groups and Vector Spaces Definition 2.6 If G has a finite number of elements, it is said to be a finite group. The order of a finite group G, denoted |G|, is the number of elements in G. ◽ This definition of order (of a group) is to be distinguished from the order of an element, given below. Definition 2.7 A group ⟨G, ∗⟩ is commutative if a ∗ b = b ∗ a for every a, b ∈ G.



Example 2.8 The set ⟨ℤ, +⟩, which is the set of integers under addition, forms a group. The identity element is 0, since 0 + a = a + 0 = a for any a ∈ ℤ. The inverse of any a ∈ ℤ is −a. This is a commutative group. ◽

As a matter of convention, a group that is commutative with an additive-like operator is said to be an Abelian group (after the mathematician N.H. Abel). Example 2.9 The set ⟨ℤ, ⋅⟩, the set of integers under multiplication, does not form a group. There is a multiplicative identity, 1, but there is not a multiplicative inverse for every element in ℤ. ◽ Example 2.10 The set ⟨ℚ∖{0}, ⋅⟩, the set of rational numbers excluding 0, is a group with identity element 1. ◽ The inverse of an element a is a−1 = 1∕a.

The requirements on a group are strong enough to introduce the idea of cancellation. In a group G, if a ∗ b = a ∗ c, then b = c (this is left cancellation). To see this, let a−1 be the inverse of a in G. Then a−1 ∗ (a ∗ b) = a−1 ∗ (a ∗ c) = (a−1 ∗ a) ∗ c = e ∗ c = c and a−1 ∗ (a ∗ b) = (a−1 ∗ a) ∗ b = e ∗ b = b, by the properties of associativity and identity. Under group requirements, we can also verify that solutions to linear equations of the form a ∗ x = b are unique. Using the group properties we get immediately that x = a−1 b. If x1 and x2 are two solutions, such that a ∗ x1 = b = a ∗ x2 , then by cancellation we get immediately that x1 = x2 . Example 2.11 Let ⟨ℤ5 , +⟩ denote addition on the numbers {0, 1, 2, 3, 4} modulo 5. The operation is demonstrated in tabular form in the table below: +

0

1

2

3

4

0 1 2 3 4

0 1 2 3 4

1 2 3 4 0

2 3 4 0 1

3 4 0 1 2

4 0 1 2 3

Clearly 0 is the identity element. Since 0 appears in each row and column, every element has an inverse. By the uniqueness of solution, we must have every element appearing in every row and column, as it does. By the symmetry of the table it is clear that the operation is Abelian (commutative). Thus, we verify that ⟨ℤ5 , +⟩ is an Abelian group. (Typically, when using a table to represent a group operation a ∗ b, the first operand a is the row and the second operand b is the column in the table.) ◽

In general, we denote the set of numbers 0, 1, . . . , n − 1 with addition modulo n by ⟨ℤn , +⟩ or, more briefly, ℤn .

2.2 Groups

73

Example 2.12 Consider the set of numbers {1, 2, 3, 4, 5} using the operation of multiplication modulo 6. The operation is shown in the following table: ⋅

1

2

3

4

5

1 2 3 4 5

1 2 3 4 5

2 4 0 2 4

3 0 3 0 3

4 2 0 4 2

5 4 3 2 1

The number 1 acts as an identity, but this does not form a group, since not every element has a multiplicative inverse. In fact, the only elements that have a multiplicative inverse are those that are relatively prime to 6, that is, those numbers that don’t share a divisor with 6 other than 1. We will see this example later in the context of rings. ◽

One way to construct groups is to take the Cartesian, or direct, product of groups. Given groups ⟨G1 , ∗⟩, ⟨G2 , ∗⟩, . . . , ⟨Gr , ∗⟩, the direct product group G1 × G2 × · · · × Gr has elements (a1 , a2 , . . . , ar ), where each ai ∈ Gi . The operation in G is defined element-by-element. That is, if (a1 , a2 , . . . , ar ) ∈ G and

(b1 , b2 , . . . , br ) ∈ G,

then (a1 , a2 , . . . , ar ) ∗ (b1 , b2 , . . . , br ) = (a1 ∗ b1 , a2 ∗ b2 , . . . , ar ∗ br ). Example 2.13 The group ⟨ℤ2 × ℤ2 , +⟩ consists of 2-tuples with addition defined element-by-element modulo 2. An addition for the group table is shown here: +

(0,0)

(0,1)

(1,0)

(1,1)

(0,0) (0,1) (1,0) (1,1)

(0,0) (0,1) (1,0) (1,1)

(0,1) (0,0) (1,1) (1,0)

(1,0) (1,1) (0,0) (0,1)

(1,1) (1,0) (0,1) (0,0) ◽

This group is called the Klein 4-group.

Example 2.14 This example introduces the idea of permutations as elements in a group. It is interesting because it introduces a group operation that is function composition, as opposed to the mostly arithmetic group operations presented to this point. A permutation of a set A is a one-to-one, onto function (a bijection) of a set A onto itself. It is convenient for purposes of illustration to let A be a set of n integers. For example, A = {1, 2, 3, 4}. A permutation p can be written in the notation p=

( ) 1 2 3 4 , 3 4 1 2

which means that 1→3 2→4

3 → 1 4 → 2.

There are n! different permutations on n distinct elements. We can think of p as an operator expressed in prefix notation. For example, p∘1=3

or

p ∘ 4 = 2.

74

Groups and Vector Spaces Let p1 = p and p2 =

( ) 1 2 3 4 4 3 1 2

The composition permutation p2 ∘ p1 first applies p1 , then p2 , so that ( ) ( ) ( ) 1 2 3 4 1 2 3 4 1 2 3 4 p2 ∘ p1 = ∘ = . 4 3 1 2 3 4 1 2 1 2 4 3 This is again another permutation, so the operation of composition of permutations is closed under the set of permutations. The identity permutation is ( ) 1 2 3 4 e= . 1 2 3 4 There is an inverse permutation under composition. For example, ( ) 1 2 3 4 = . p−1 1 3 4 1 2 It can be shown that composition of permutations is associative: for three permutations p1 , p2 , and p3 , then (p1 ∘ p2 ) ∘ p3 = p1 ∘ (p2 ∘ p3 ). Thus, the set of all n! permutations on n elements forms a group, where the group operation is function composition. This group is referred to as the symmetric group on n letters. The group is commonly denoted by Sn . It is also interesting to note that the composition is not commutative. This is clear from this example since p2 ∘ p1 ≠ p1 ∘ p2 . So S4 is an example of a noncommutative group.

2.2.1



Subgroups

Definition 2.15 A subgroup ⟨H, ∗⟩ of a group ⟨G, ∗⟩ is a group formed from a subset of elements in a group G with the same operation ∗. Notationally, we may write H < G to indicate that H is a subgroup of G. (There should be no confusion using < with comparisons between numbers because the operands are different in each case.) ◽ If the elements of H are a strict subset of the elements of G (i.e., H ⊂ G but not H = G), then the subgroup is said to be a proper subgroup. If H = G, then H is an improper subgroup of G. The subgroups H = {e} ⊂ G (e is the identity) and H = G are said to be trivial subgroups. Example 2.16 Let G = ⟨ℤ6 , +⟩, the set of numbers {0, 1, 2, 3, 4, 5} using addition modulo 6. Let H = ⟨{0, 2, 4}, +⟩, with addition taken modulo 6. As a set, H ⊂ G. It can be shown that H forms a group. Let K = ⟨{0, 3}, +⟩, with addition taken modulo 6. Then K is a subgroup of G. ◽ Example 2.17

A variety of familiar groups can be arranged as subgroups. For example, ⟨ℤ, +⟩ < ⟨ℚ, +⟩ < ⟨ℝ, +⟩ < ⟨ℂ, +⟩.

Example 2.18

The group of permutations on four letters, S4 , has a subgroup formed by the permutations ( ( ) ) 1 2 3 4 1 2 3 4 p0 = p1 = 1 2 3 4 2 3 4 1 ( ( ) ) 1 2 3 4 1 2 3 4 p3 = p2 = 3 4 1 2 4 1 2 3



2.2 Groups

75 ( 1 2 ( 1 p6 = 3 p4 =

( 1 4 ( 1 p7 = 1

) 2 3 4 1 4 3 ) 2 3 4 2 1 4

p5 =

) 2 3 4 3 2 1 ) 2 3 4 4 3 2

(2.1)

Compositions of these permutations are closed. These permutations correspond to the ways that the corners of a square can be moved to other corners by rotation about the center and reflection across edges or across diagonals (without bending the square). The geometric depiction of these permutations and the group operation table are shown here:

p4 p3

4

p5 1 p7

2 p1

p6

This group is known as D4 . D4 has a variety of subgroups of its own. (Can you find them?)

2.2.2



Cyclic Groups and the Order of an Element

In a group G with operation ∗ or multiplication operation, we use the notation an to indicate a ∗ a ∗ a ∗ · · · ∗ a, with the operand a appearing n times. Thus, a1 = a, a2 = a ∗ a, etc. We take a0 to be the identity element in the group G. We use a−2 to indicate (a−1 )(a−1 ), and a−n to indicate (a−1 )n . For a group with an additive operator +, the notation na is often used, which means a + a + a + · · · + a, with the operand appearing n times. Throughout this section we use the an notation; making the switch to the additive operator notation is straightforward. Let G be a group and let a ∈ G. Any subgroup containing a must also contain a2 , a3 , and so forth. The subgroup must contain e = aa−1 , and hence a−2 , a−3 , and so forth, are also in the subgroup. Definition 2.19 For any a ∈ G, the set {an |n ∈ ℤ} generates a subgroup of G called the cyclic subgroup. The element a is said to be the generator of the subgroup. The cyclic subgroup generated by a is denoted as ⟨a⟩. ◽ Definition 2.20 If every element of a group can be generated by a single element, the group is said to be cyclic. ◽

76

Groups and Vector Spaces Example 2.21 The group ⟨ℤ5 , +⟩ is cyclic, since every element in the set can be generated by a = 2 (under the appropriate addition law): 2,

2 + 2 = 4,

2 + 2 + 2 = 1,

2 + 2 + 2 + 2 = 3,

2 + 2 + 2 + 2 + 2 = 0.

In this case we could write ℤ5 = ⟨2⟩. Observe that there are several generators for ℤ5 .



The permutation group S3 is not cyclic: there is no element which generates the whole group. Definition 2.22 In a group G, with a ∈ G, the smallest positive n such that an is equal to the identity in G is said to be the order of a. If no such n exists, a is of infinite order. ◽ The order of an element should not be confused with the order of a group, which is the number of elements in the group. In ℤ5 , the computations above show that the element 2 is of order 5. In fact, the order of every nonzero element in ℤ5 is 5. Example 2.23

Let G = ⟨ℤ6 , +⟩. Then ⟨2⟩ = {0, 2, 4} ⟨3⟩ = {0, 3} ⟨5⟩ = {0, 1, 2, 3, 4, 5} = ℤ6 .

It is easy to verify that an element a ∈ ℤ6 is a generator for the whole group if and only if a and 6 are relatively prime. ◽

2.2.3

Cosets

Definition 2.24 Let H be a subgroup of ⟨G, ∗⟩ (where G is not necessarily commutative) and let a ∈ G. The left coset of H, a ∗ H, is the set {a ∗ h|h ∈ H}. The right coset of H is similarly defined, H ∗ a = {h ∗ a|h ∈ H}. ◽ Of course, in a commutative group, the left and right cosets are the same. Figure 2.1 illustrates the idea of cosets. If G is the group ⟨ℝ3 , +⟩ and H is the white plane shown, then the cosets of H in G are the translations of H.

H

x1 + H x2 + H x3 + H

Figure 2.1: An illustration of cosets.

2.2 Groups

77

Let G be a group and let H be a subgroup of G. Let a ∗ H be a (left) coset of H in G. Then clearly b ∈ a ∗ H if and only if b = a ∗ h for some h ∈ H. This means (by cancellation) that we must have a−1 ∗ b ∈ H. Thus, to determine if a and b are in the same (left) coset of H, we determine if a−1 ∗ b ∈ H. Example 2.25

Let G = ⟨ℤ, +⟩ and let S0 = 3ℤ = { . . . , −6, −3, 0, 3, 6, . . . }.

Then S0 is a subgroup of G. Now let us form the cosets S1 = S0 + 1 = { . . . , −5, −2, 1, 4, 7, . . . } and S2 = S0 + 2 = { . . . , −4, −1, 2, 5, 8, . . . }. Note that neither S1 nor S2 are groups (they do not contain the identity). The sets S0 , S1 , and S2 collectively cover the original group, G = S0 ∪ S1 ∪ S2 . Let us check whether a = 4 and b = 6 are in the same coset of S0 by checking whether (−a) + b ∈ S0 . Since ◽ −a + b = 2 ∉ S0 , a and b are not in the same coset.

2.2.4

Lagrange’s Theorem

Lagrange’s theorem prescribes the size of a subgroup compared to the size of its group. This little result is used in a variety of ways in the developments to follow. Lemma 2.26 Every coset of H in a group G has the same number of elements. Proof We will show that every coset has the same number of elements as H. Let a ∗ h1 ∈ a ∗ H and let a ∗ h2 ∈ a ∗ H be two elements in the coset a ∗ H. If a ∗ h1 = a ∗ h2 , then by cancellation we must ◽ have h1 = h2 . Thus, the elements of a coset are uniquely identified by the elements in H. We summarize some important properties about cosets: Reflexive An element a is in the same coset as itself. Symmetric If a and b are in the same coset, then b and a are in the same coset. Transitive If a and b are in the same coset, and b and c are in the same coset, then a and c are in the same coset. Reflexivity, symmetricity, and transitivity are properties of the relation “in the same coset.” Definition 2.27 A relation which has the properties of being reflexive, symmetric, and transitive is said to be an equivalence relation. ◽ An important fact about equivalence relations is that every equivalence relation partitions its elements into disjoint sets. Let us consider here the particular case of cosets. Lemma 2.28 The distinct cosets of H in a group G are disjoint.

78

Groups and Vector Spaces Proof Suppose A and B are distinct cosets of H; that is, A ≠ B. Assume that A and B are not disjoint, then there is some element c which is common to both. We will show that this implies that A ⊂ B. Let b ∈ B. For any a ∈ A, a and c are in the same coset (since c is in A). And c and b are in the same coset (since c is in B). By transitivity, a and b must be in the same coset. Thus, every element of A is in B, so A ⊂ B. Turning the argument around, we find that B ⊂ A. Thus, A = B. This contradiction shows that distinct A and B must also be disjoint. ◽ Theorem 2.29 (Lagrange’s theorem) Let G be a group of finite order and let H be a subgroup of G. Then the order of H divides1 the order of G. That is, |H| divides |G|. Proof The set of cosets partition G into disjoint sets, each of which has the same number of elements, |H|. These disjoint sets completely cover G, since every element g ∈ G is in some coset, g ∗ H. So the number of elements of G must be equal to a multiple of |H|. ◽ Lagrange’s theorem can be stated more succinctly using a notation which we now introduce: Definition 2.30 The vertical bar || means divides. We write a || b if a divides b (without remainder). ◽ Then Lagrange’s theorem can be written: If |G| < ∞ and H < G, then |H||||G|. One implication of Lagrange’s theorem is the following. Lemma 2.31 Every group of prime order is cyclic. Proof Let G be of prime order, let a ∈ G, and denote the identity in G by e. Let H = ⟨a⟩, the cyclic subgroup generated by a. Then a ∈ H and e ∈ H. But by Theorem 2.29, the order of H must divide the order of G. Since G is of prime order, then we must have |H| = |G|; hence a generates G, so G is cyclic. ◽ 2.2.5

Induced Operations; Isomorphism

Example 2.32 Let us return to the three cosets S0 , S1 , and S2 defined in Example 2.25. We thus have a set of three objects, S = {S0 , S1 , S2 }. Let us define an addition operation on S as follows: for A, B and C ∈ S, A+B=C

if and only if a + b = c for any a ∈ A, b ∈ B and some c ∈ C.

That is, addition of the sets is defined by representatives in the sets. The operation is said to be the induced operation on the cosets. For example, S1 + S2 = S0 , taking as representatives, for example, 1 ∈ S1 , 2 ∈ S2 and noting that 1 + 2 = 3 ∈ S0 . Similarly, S1 + S1 = S2 , taking as representatives 1 ∈ S1 and noting that 1 + 1 = 2 ∈ S2 . Based on this induced operation, an addition table can be built for the set S: + S0 S1 S2

1

That is, divides without remainder.

S0 S0 S1 S2

S1 S1 S2 S0

S2 S2 S0 S1

2.2 Groups

79

It is clear that this addition table defines a group, which we can call ⟨S, +⟩. Now compare this addition table with the addition table for ℤ3 : + 0 1 2

0 0 1 2

1 1 2 0

2 2 0 1

Structurally, the two addition tables are identical: entries in the second table are obtained merely by replacing ◽ Sk with k, for k = 0, 1, 2. We say that the group ⟨S, +⟩ and the group ⟨ℤ3 , +⟩ are isomorphic.

Definition 2.33 Two groups ⟨G, ∗⟩ and ⟨, ⋄⟩ are said to be (group) isomorphic if there exists a one-to-one, onto function 𝜙 ∶ G →  called the isomorphism such that for every a, b ∈ G (Box 2.1), 𝜙( a ∗ b ) = 𝜙(a) ⋄ 𝜙(b). ⏟⏟⏟ ⏟⏞⏞⏞⏟⏞⏞⏞⏟ operation operation in G in  The fact that groups G and  are isomorphic are denoted by G ≅ .

(2.2)



We can thus write S ≅ ℤ3 (where the operations are unstated but understood from context). Whenever two groups are isomorphic they are, for all practical purposes, the same thing. Different objects in the groups may have different names, but they represent the same sorts of relationships among themselves. Box 2.1: One-to-One and Onto Functions. A function 𝜙 ∶ G →  is said to be one-to-one if 𝜙(a) = 𝜙(b) implies a = b for every a and b in G. That is, two distinct values a, b ∈ G with a ≠ b do not map to the same value of 𝜙. A one-to-one function is also called an injective function. A contrasting example is 𝜙(x) = x2 , where 𝜙 ∶ ℝ → ℝ, which is not one-to-one since 4 = 𝜙(2) and 4 = 𝜙(−2). Definition 2.34 A function 𝜙 ∶ G →  is said to be onto if for every g ∈ , there is an element a ∈ G such that 𝜙(a) = g. An onto function is also called an surjective function. ◽ That is, the function goes onto everything in . A contrasting example is 𝜙(x) = x2 , where 𝜙 ∶ ℝ → ℝ, since the point g = −3 is not mapped onto by 𝜙 from any point in ℝ. Definition 2.35 A function which is one-to-one and onto (i.e., surjective and injective) is called bijective. ◽ Bijective functions are always invertible. If 𝜙 ∶ G →  is bijective, then |G| = || (the two sets have the same cardinality). Definition 2.36 Let ⟨G, ∗⟩ be a group, H a subgroup and let S = {H0 = H, H1 , H2 , . . . , HM } be the set of cosets of H in G. Then the induced operation between cosets A and B in S is defined by A ∗ B = C if and only if a ∗ b = c for any a ∈ A, b ∈ B and some c ∈ C, provided that this operation is well defined. The operation is well defined if for every a ∈ A and b ∈ B, a ∗ b ∈ C; there is thus no ambiguity in the induced operation. ◽

80

Groups and Vector Spaces For commutative groups, the induced operation is always well defined. However, the reader should be cautioned that for noncommutative groups, the operation is well defined only for normal subgroups.2 Example 2.37

Consider the group G = ⟨ℤ6 , +⟩ and let H = {0, 3}. The cosets of H are H0 = {0, 3}

H1 = 1 + H = {1, 4}

H2 = 2 + H = {2, 5}.

Then, under the induced operation, for example, H2 + H2 = H1 since 2 + 2 = 4 and 4 ∈ H1 . We could also choose different representatives from the cosets. We get 5+5=4 in G. Since 5 ∈ H2 and 4 ∈ H1 , we again have H2 + H2 = H1 . If by choosing different elements from the addend cosets, we were to end up with a different sum coset, the operation would not be well defined. Let us write the addition table for ℤ6 reordered and separated out by the cosets. The induced operation is clear. We observe that H0 , H1 and H2 themselves constitute a group, with addition table also shown.

H0 H1 H2

+ 0 3 1 4 2 5

H0 0 3 0 3 3 0 1 4 4 1 2 5 5 2

H1 1 4 1 4 4 1 2 5 5 2 3 0 0 3

H2 2 5 2 5 5 2 3 0 0 3 4 1 1 4

+ H0 H1 H2

H0 H0 H1 H2

H1 H1 H2 H0

H2 H2 H0 H1

The group of cosets is clearly isomorphic to ⟨ℤ3 , +⟩: {H0 , H1 , H2 } ≅ ℤ3 .



From this example, we see that the cosets themselves form a group. Theorem 2.38 If H is a subgroup of a commutative group ⟨G, ∗⟩, the induced operation ∗ on the set of cosets of H satisfies (a ∗ b) ∗ H = (a ∗ H) ∗ (b ∗ H). Furthermore, the operation is well defined. The proof is explored in Exercise 2.13. This defines an operation. Clearly, H itself acts as an identity for the operation defined on the set of cosets. Also, by Theorem 2.38, (a ∗ H) ∗ (a−1 ∗ H) = (a ∗ a−1 ) ∗ H = H, so every coset has an inverse coset. Thus, the set of cosets of H form a group. Definition 2.39 The group formed by the cosets of H in a commutative3 group G with the induced operation is said to be the factor group of G modulo H, denoted by G∕H. The cosets are said to be the residue classes of G modulo H. ◽ In the last example, we could write ℤ3 ≅ ℤ6 ∕H. From Example 2.32, the group of cosets was also isomorphic to ℤ3 , so we can write ℤ∕3ℤ ≅ ℤ3 . In general, it can be shown that ℤ∕nℤ ≅ ℤn . 2 3

A subgroup H of a group G is normal if g−1 Hg = H for all g ∈ G. Clearly all Abelian groups are normal. Or of a normal subgroup in a noncommutative group.

2.2 Groups

81

Example 2.40 A lattice is formed by taking all possible integer linear[ combinations] of a set of basis vectors. That is, let 𝐯1 , 𝐯2 , . . . , 𝐯n be a set of linearly independent vectors, let V = 𝐯1 , 𝐯2 , . . . , 𝐯n . Then a lattice is formed from these basis vectors by Λ = {V𝐳 ∶ 𝐳 ∈ ℤn }. [ ] 1 0 For example, the lattice formed by V = is the set of points with integer coordinates in the plane, denoted 0 1 2 as ℤ . For the lattice Λ = ℤ2 , let Λ′ = 2ℤ2 be a subgroup. Then the cosets S0 = Λ′ (denoted by •) S2 = (0, 1) + Λ′ (denoted by ◽)

S1 = (1, 0) + Λ′ (denoted by ∘ ) S3 = (1, 1) + Λ′ (denoted by ◊)

are indicated in Figure 2.2. It is straightforward to verify that Λ∕Λ′ ≅ ℤ2 × ℤ2 .

Figure 2.2: A lattice partitioned into cosets.

Such decompositions of lattices into subsets find application in trellis coded modulation, as we shall see in Chapter 13. ◽

2.2.6

Homomorphism

For isomorphism, two sets G and  are structurally the same, as defined by (2.2), and they have the same number of elements (since there is a bijective function 𝜙 ∶ G → ). From an algebraic point of view, G and  are identical, even though they may have different names for their elements. Homomorphism is a somewhat weaker condition: the sets must have the same algebraic structure, but they might have different numbers of elements. Definition 2.41 The groups ⟨G, ∗⟩ and ⟨, ⋄⟩ are said to be (group) homomorphic if there exists a function (that is not necessarily one-to-one) 𝜙 ∶ G →  called the homomorphism such that 𝜙( a ∗ b ) = 𝜙(a) ⋄ 𝜙(b). ⏟⏟⏟ ⏟⏞⏞⏞⏟⏞⏞⏞⏟ operation operation in G in 

(2.3)



82

Groups and Vector Spaces Example 2.42 Let G = ⟨ℤ, +⟩ and let  = ⟨ℤn , +⟩. Let 𝜙 ∶ G →  be defined by 𝜙(a) = a mod n, the remainder when a is divided by n. Let a, b ∈ ℤ. We have (see Exercise 2.32) 𝜙(a + b) = 𝜙(a) + 𝜙(b). Thus, ⟨ℤ, +⟩ and ⟨ℤn , +⟩ are homomorphic, although they clearly do not have the same number of elements.



Theorem 2.43 Let ⟨G, ∗⟩ be a commutative group and let H be a subgroup, so that G∕H is the factor group. Let 𝜙 ∶ G → G∕H be defined by 𝜙(a) = a ∗ H. Then 𝜙 is a homomorphism. The homomorphism 𝜙 is said to be the natural or canonical homomorphism. Proof Let a, b ∈ G. Then by Theorem 2.38, 𝜙( a ∗ b ) = 𝜙(a) ∗ 𝜙(b). ⏟⏟⏟ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ operation operation in G in G∕H



Definition 2.44 The kernel of a homomorphism 𝜙 of a group G into a group  is the set of all elements of G which are mapped onto the identity element of  by 𝜙. ◽

Example 2.45

2.3

For the canonical map ℤ → ℤn of Example 2.42, the kernel is nℤ, the set of multiples of n.



Fields: A Prelude

We shall have considerably more to say about fields in Chapter 5, but we introduce the concept here since fields are used in defining vector spaces and simple linear block codes. Definition 2.46 A field ⟨𝔽 , +, ⋅⟩ is a set of objects 𝔽 on which the operations of addition and multiplication, subtraction (or additive inverse), and division (or multiplicative inverse) apply in a manner analogous to the way these operations work for real numbers. In particular, the addition operation + and the multiplication operation ⋅ (or juxtaposition) satisfy the following: F1 Closure under addition: For every a and b in 𝔽 , a + b is also in 𝔽 . F2 Additive identity: There is an element in 𝔽 , which we denote as 0, such that a + 0 = 0 + a = a for every a ∈ 𝔽 . F3 Additive inverse (subtraction): For every a ∈ 𝔽 , there exists an element b in 𝔽 such that a + b = b + a = 0. The element b is frequently called the additive inverse of a and is denoted as −a. F4 Associativity: (a + b) + c = a + (b + c) for every a, b, c ∈ 𝔽 . F5 Commutativity: a + b = b + a for every a, b ∈ 𝔽 . The first four requirements mean that the elements of 𝔽 form a group under addition; with the fifth requirement, a commutative group is obtained. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... F6 Closure under multiplication: For every a and b in 𝔽 , a ⋅ b is also in 𝔽 .

2.3 Fields: A Prelude

83

F7 Multiplicative identity: There is an element in 𝔽 , which we denote as 1, such that a ⋅ 1 = 1 ⋅ a = a for every a ∈ 𝔽 with a ≠ 0. F8 Multiplicative inverse: For every a ∈ 𝔽 with a ≠ 0, there is an element b ∈ 𝔽 such that a ⋅ b = b ⋅ a = 1. The element b is called the multiplicative inverse, or reciprocal, of a and is denoted as a−1 . F9 Associativity: (a ⋅ b) ⋅ c = a ⋅ (b ⋅ c) for every a, b, c ∈ 𝔽 . F10 Commutativity: a ⋅ b = b ⋅ a for every a, b ∈ 𝔽 . Thus, the nonzero elements of 𝔽 form a commutative group under multiplication. .......................................................................................................................................................... F11 Multiplication distributes over addition: a ⋅ (b + c) = a ⋅ b + a ⋅ c . The field ⟨𝔽 , +, ⋅⟩ is frequently referred to simply as 𝔽 . A field with q elements in it may be denoted as 𝔽q . ◽

Example 2.47 tables

The field with two elements in it, 𝔽2 = ℤ2 = GF(2) has the following addition and multiplication + 0 1 0 0 1 1 1 0 ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟

⋅ 0 1 0 0 0 1 0 1 ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟

“exclusive or”

“and”

The field GF(2) is very important to our work, since it is the field where the operations involved in binary codes work. However, we shall have occasion to use many other fields as well. ◽

Example 2.48

The field 𝔽5 = ℤ5 = GF(5) has the following addition and multiplication tables: +

0

1

2

3

4



0

1

2

3

4

0 1 2 3 4

0 1 2 3 4

1 2 3 4 0

2 3 4 0 1

3 4 0 1 2

4 0 1 2 3

0 1 2 3 4

0 0 0 0 0

0 1 2 3 4

0 2 4 1 3

0 3 1 4 2

0 4 3 2 1



There are similarly constructed fields for every prime p, denoted by either GF(p) or 𝔽p . Example 2.49

A field with four elements can be constructed with the following operation tables: + 0 1 2 3

0 0 1 2 3

1 1 0 3 2

2 2 3 0 1

3 3 2 1 0

⋅ 0 1 2 3

0 0 0 0 0

1 0 1 2 3

2 0 2 3 1

3 0 3 1 2

(2.4)

This field is called GF(4). Note that it is definitely not the same as ⟨ℤ4 , +, ⋅⟩! (Why not?) We learn in Chapter 5 how to construct such a field. ◽

84

Groups and Vector Spaces Just as for groups, we can define the concepts of isomorphism and homomorphism. Two fields ⟨F, +, ⋅⟩ and ⟨ , +, ⋅⟩ are (field) isomorphic if there exists a bijective function 𝜙 ∶ F →  such that for every a, b ∈ F, 𝜙( a + b ) = 𝜙(a) + 𝜙(b) ⏟⏟⏟ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ operation operation in F in 

𝜙(

ab ) = 𝜙(a)𝜙(b) . ⏟⏟⏟ ⏟⏞⏟⏞⏟ operation operation in F in 

For example, the field  defined on the elements {−1, 1} with operation tables +

−1

1



−1

1

−1 1

−1 1

1 −1

−1 1

−1 −1

−1 1

is isomorphic to the field GF(2) defined above, with 𝜙 mapping 0 → −1 and 1 → 1. Fields F and  are homomorphic if such a structure-preserving map 𝜙 exists which is not necessarily bijective.

2.4

Review of Linear Algebra

Linear block codes are based on concepts from linear algebra. In this section, we review concepts from linear algebra which are immediately pertinent to our study of linear block codes. Up to this point, our examples have dealt primarily with binary alphabets having the symbols {0, 1}. As your algebraic and coding-theoretic skills are deepened, you will learn that larger alphabets are feasible and often desirable for good codes. However, rather than presenting the algebra first and the codes second, it seems pedagogically worthwhile to present the basic block coding concepts first using binary alphabets and introduce the algebra for larger alphabets later. For the sake of generality, we present definitions in terms of larger alphabets, but for the sake of concrete exposition we present examples in this chapter using binary alphabets. For now, understand that we will eventually need to deal with alphabets with more than two symbols. We denote the number of symbols in the alphabet by q, where q = 2 usually in this chapter. Furthermore, the alphabets we use usually form a finite field, denoted here as 𝔽q , which is briefly introduced in Section 2.3 and thoroughly developed in Chapter 5. Definition 2.50 Let V be a set of elements called vectors and let 𝔽 be a field of elements called scalars. An addition operation + is defined between vectors. A scalar multiplication operation ⋅ (or juxtaposition) is defined such that for a scalar a ∈ 𝔽 and a vector 𝐯 ∈ V, a ⋅ 𝐯 ∈ V. Then V is a vector space over 𝔽 if + and ⋅ satisfy the following: V1 V forms a commutative group under +. V2 For any element a ∈ 𝔽 and 𝐯 ∈ V, a ⋅ 𝐯 ∈ V. Combining V1 and V2, we must have a ⋅ 𝐯 + b ⋅ 𝐰 ∈ V for every 𝐯, 𝐰 ∈ V and a, b ∈ 𝔽 . V3 The operations + and ⋅ distribute: (a + b) ⋅ 𝐯 = a ⋅ 𝐯 + b ⋅ 𝐯 and

a ⋅ (𝐮 + 𝐯) = a ⋅ 𝐮 + a ⋅ 𝐯

for all scalars a, b ∈ 𝔽 and vectors 𝐯, 𝐮 ∈ V. V4 The operation ⋅ is associative: (a ⋅ b) ⋅ 𝐯 = a ⋅ (b ⋅ 𝐯) for all a, b ∈ 𝔽 and 𝐯 ∈ V. 𝔽 is called the scalar field of the vector space V.



2.4 Review of Linear Algebra

85

Example 2.51 1. The set of n-tuples (𝑣0 , 𝑣1 , . . . , 𝑣n−1 ), with elements 𝑣i ∈ ℝ forms a vector space which we denote as ℝn , with addition defined element-by-element, (𝑣0 , 𝑣1 , . . . , 𝑣n−1 ) + (u0 , u1 , . . . , un−1 ) = (𝑣0 + u0 , 𝑣1 + u1 , . . . , 𝑣n−1 + un−1 ), and scalar multiplication defined by a ⋅ (𝑣0 , 𝑣1 , . . . , 𝑣n−1 ) = (a𝑣0 , a𝑣1 , . . . , a𝑣n−1 ).

(2.5)

2. The set of n-tuples of (𝑣0 , 𝑣1 , . . . , 𝑣n−1 ) with elements 𝑣i ∈ 𝔽2 forms a vector space which we denote as 𝔽2n . There are 2n elements in the vector space 𝔽2n . For n = 3, the elements of the vector space are (0, 0, 0) (0, 0, 1) (0, 1, 0) (0, 1, 1) (1, 0, 0) (1, 0, 1) (1, 1, 0) (1, 1, 1) 3. In general, the set V = 𝔽qn of n-tuples of elements of the field 𝔽q with element-by-element addition and scalar multiplication as in (2.5) constitutes a vector space. We call an n-tuple of elements of 𝔽q simply an n-vector. ◽

Definition 2.52 Let 𝐯1 , 𝐯2 , . . . , 𝐯k be vectors in a vector space V and let a1 , a2 , . . . , ak be scalars in 𝔽 . The operation a1 𝐯1 + a2 𝐯2 + · · · ak 𝐯k ◽

is said to be a linear combination of the vectors. Notationally, observe that the linear combination a1 𝐯1 + a2 𝐯2 + · · · + ak 𝐯k can be obtained by forming a matrix G by stacking the vectors as columns, ] [ G = 𝐯1 𝐯2 · · · 𝐯k then forming the product with the column vector of coefficients: ⎡a1 ⎤ ] ⎢a2 ⎥ a1 𝐯1 + a2 𝐯2 + · · · ak 𝐯k = 𝐯1 𝐯2 · · · 𝐯k ⎢ ⎥ . ⋮ ⎢ ⎥ ⎣ak ⎦ [

(2.6)

Alternatively, the vectors 𝐯i can be envisioned as row vectors and stacked as rows. The linear combination can be obtained by the product with a row vector of coefficients: ⎡𝐯1 ⎤ ] ⎢𝐯2 ⎥ a1 𝐯1 + a2 𝐯2 + · · · ak 𝐯k = a1 a2 · · · ak ⎢ ⎥ . ⋮ ⎢ ⎥ ⎣𝐯k ⎦ [

Definition 2.53 Let V be a vector space. A set of vectors G = {𝐯1 , 𝐯2 , . . . , 𝐯k }, each in V, is said to be a spanning set for V if every vector 𝐯 ∈ V can be written as a linear combination of the vectors in G. That is, for every 𝐯 ∈ V, there exists a set of scalars a1 , a2 , . . . , ak such that 𝐯 = a1 𝐯1 + a2 𝐯2 + · · · + ak 𝐯k . For a set of vectors G, the set of vectors obtained from every possible linear combination of vectors in G is called the span of G, span(G). ◽

86

Groups and Vector Spaces It may be verified that the span of a set of vectors is itself a vector space. In light of the notation in (2.6), it is helpful to think of G as a matrix whose columns are the vectors 𝐯i , and not simply as a set of vectors. If G is interpreted as a matrix, we take span(G) as the set of linear combinations of the columns of G. The space obtained by the linear combination of the columns of a matrix G is called the column space of G. The space obtained by the linear combination of the rows of a matrix G is called the row space of G. It may be that there is redundancy in the vectors of a spanning set, in the sense that not all of them are needed to span the space because some of them can be expressed in terms of other vectors in the spanning set. In such a case, the vectors in the spanning set are not linearly independent: Definition 2.54 A set of vectors 𝐯1 , 𝐯2 , . . . , 𝐯k is said to be linearly dependent if a set of scalars {a1 , a2 , . . . , ak } exists, with not all ai = 0 such that a1 𝐯1 + a2 𝐯2 + · · · + ak 𝐯k = 0. A set of vectors which is not linearly dependent is linearly independent.



From the definition, if a set of vectors {𝐯1 , . . . , 𝐯k } is linearly independent and there exists a set of coefficients {a1 , . . . , ak } such that a1 𝐯1 + a2 𝐯2 + · · · + ak 𝐯k = 0, then it must be the case that a1 = a2 = · · · = ak = 0. Definition 2.55 A spanning set for a vector space V that has the smallest possible number of vectors in it is called a basis for V. The number of vectors in a basis for V is the dimension of V. ◽ Clearly the vectors in a basis must be linearly independent (or it would be possible to form a smaller set of vectors). Example 2.56

Let V = 𝔽24 , the set of binary 4-tuples and let ⎡⎡1⎤ ⎡0⎤ ⎡1⎤⎤ ⎢⎢0⎥ ⎢1⎥ ⎢1⎥⎥ G = ⎢⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥⎥ 1 1 0 ⎢⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎥ ⎣⎣0⎦ ⎣0⎦ ⎣0⎦⎦

be a set of vectors. Let W = span(G); ⎧⎡0⎤ ⎡1⎤ ⎡0⎤ ⎡1⎤⎫ ⎪⎢0⎥ ⎢0⎥ ⎢1⎥ ⎢1⎥⎪ W = ⎨⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥⎬ . 0 1 1 0 ⎪⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ ⎩⎣0⎦ ⎣0⎦ ⎣0⎦ ⎣0⎦⎭ It can be verified that W is a vector space. The set G is a spanning set for W, but it is not a spanning set for V. However, G is not a basis for W; the set G has some redundancy in it, since the third vector is a linear combination of the first two: ⎡1⎤ ⎡1⎤ ⎡0⎤ ⎢1⎥ ⎢0⎥ ⎢1⎥ ⎢0⎥ = ⎢1⎥ + ⎢1⎥ . ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣0⎦ ⎣0⎦ ⎣0⎦

2.4 Review of Linear Algebra

87

The vectors in G are not linearly independent. The third vector in G can be removed, resulting in the set ⎡⎡1⎤ ⎡0⎤⎤ ⎢⎢0⎥ ⎢1⎥⎥ G = ⎢⎢ ⎥ , ⎢ ⎥⎥ , 1 1 ⎢⎢ ⎥ ⎢ ⎥⎥ ⎣⎣0⎦ ⎣0⎦⎦ ′

which has span(G′ ) = W. No spanning set for W has fewer vectors in it than does G′ , so dim(W) = 2.



Theorem 2.57 Let V be a k-dimensional vector space defined over a scalar field with a finite number of elements q in it. Then the number of elements in V is |V| = qk . Proof Every vector 𝐯 in V can be written as 𝐯 = a1 𝐯1 + a2 𝐯2 + · · · + ak 𝐯k . Thus, the number of elements in V is the number of distinct k-tuples (a1 , a2 , . . . , ak ) that can be formed, ◽ which is qk . Definition 2.58 Let V be a vector space over a scalar field 𝔽 and let W ⊂ V be a vector space. That is, for any 𝐰1 and 𝐰2 ∈ W, a𝐰1 + b𝐰2 ∈ W for any a, b ∈ 𝔽 . Then W is called a vector subspace (or simply a subspace) of F. ◽

Example 2.59 The set W in Example 2.56 is a vector space, and is a subset of V. So W is a vector subspace of V. ◽ Note, as specified by Theorem 2.57, that W has 4 = 22 elements in it.

We now augment the vector space with a new operator called the inner product, creating an inner product space. Definition 2.60 Let 𝐮 = (u0 , u1 , . . . , un−1 ) and 𝐯 = (𝑣0 , 𝑣1 , . . . , 𝑣n−1 ) be vectors in a vector space V, where ui , 𝑣i ∈ 𝔽 . The inner product is a function that accepts two vectors and returns a scalar. It may be written as ⟨𝐮, 𝐯⟩ or as 𝐮 ⋅ 𝐯. It is defined as ⟨𝐮, 𝐯⟩ = 𝐮 ⋅ 𝐯 =

n−1 ∑

ui ⋅ 𝑣i .

i=0



It is straightforward to verify the following properties: 1. Commutativity: 𝐮 ⋅ 𝐯 = 𝐯 ⋅ 𝐮 2. Associativity: a ⋅ (𝐮 ⋅ 𝐯) = (a ⋅ 𝐮) ⋅ 𝐯 3. Distributivity: 𝐮 ⋅ (𝐯 + 𝐰) = 𝐮 ⋅ 𝐯 + 𝐮 ⋅ 𝐰. In physics and elementary calculus, the inner product is often called the dot product and is used to describe the physical concept of orthogonality. We similarly define orthogonality for the vector spaces of interest to us, even though there may not be a physical interpretation. Definition 2.61 Two vectors 𝐮 and 𝐯 are said to be orthogonal if 𝐮 ⋅ 𝐯 = 0. When 𝐮 and 𝐯 are orthogonal, this is sometimes denoted as 𝐮 ⟂ 𝐯. ◽

88

Groups and Vector Spaces Combining the idea of vector subspaces with orthogonality, we get the concept of a dual space: Definition 2.62 Let W be a k-dimensional subspace of a vector space V. The set of all vectors 𝐮 ∈ V which are orthogonal to all the vectors of W is called the dual space of W (sometimes called the orthogonal complement of W or dual space), denoted W ⟂ . (The symbol W ⟂ is sometimes pronounced “W perp,” for “perpendicular.”) That is, W ⟂ = {𝐮 ∈ V ∶ 𝐮 ⋅ 𝐰 = 0 for all 𝐰 ∈ W}.



Geometric intuition regarding dual spaces frequently may be gained by thinking in three-dimensional space ℝ3 and letting W be a plane through the origin and W ⟂ a line through the origin orthogonal to the plane. Example 2.63

Let V = 𝔽24 and let W be as in Example 2.56. Then it can be verified that ⎧⎡0⎤ ⎡0⎤ ⎡1⎤ ⎡1⎤⎫ ⎪⎢0⎥ ⎢0⎥ ⎢1⎥ ⎢1⎥⎪ W ⟂ = ⎨⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥⎬ . 0 0 1 1 ⎪⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪ ⎩⎣0⎦ ⎣1⎦ ⎣0⎦ ⎣1⎦⎭

Note that

⎛⎧⎡0⎤ ⎡1⎤⎫⎞ ⎜⎪⎢0⎥ ⎢1⎥⎪⎟ W = span ⎜⎨⎢ ⎥ , ⎢ ⎥⎬⎟ 0 1 ⎜⎪⎢ ⎥ ⎢ ⎥⎪⎟ ⎝⎩⎣1⎦ ⎣0⎦⎭⎠ ⟂

and that dim(W ⟂ ) = 2.



This example demonstrates the important principle stated in the following theorem. Theorem 2.64 Let V be a finite-dimensional vector space of n-tuples, 𝔽 n , with a subspace W of dimension k. Let U = W ⟂ be the dual space of W. Then dim(W ⟂ ) = dim(V) − dim(W) = n − k. Proof Let 𝐠1 , 𝐠2 , . . . , 𝐠k be a basis for W and let [ ] G = 𝐠1 𝐠2 · · · 𝐠k . This is a rank k matrix, meaning that the dimension of its column space is k and the dimension of its row space is k. Any vector 𝐰 ∈ W is of the form 𝐰 = G𝐱 for some vector 𝐱 ∈ 𝔽 k . Any vector 𝐮 ∈ U must satisfy 𝐮T G𝐱 = 0 for all 𝐱 ∈ 𝔽 k . This implies that 𝐮T G = 0. (That is, 𝐮 is orthogonal to every basis vector for W.) Let {𝐡1 , 𝐡2 , . . . , 𝐡r } be a basis for W ⟂ , then extend this to a basis for the whole n-dimensional space, {𝐡1 , 𝐡2 , . . . , 𝐡r , 𝐟1 , 𝐟2 , . . . , 𝐟n−r }. Every vector 𝐯 in the row space of G is expressible (not necessarily uniquely) as 𝐯 = 𝐛T G for some vector 𝐛 ∈ V. But since {𝐡1 , 𝐡2 , . . . , 𝐡r , 𝐟1 , 𝐟2 , . . . , 𝐟n−r } spans V, 𝐛 must be a linear combination of these vectors: 𝐛 = a1 𝐡1 + a2 𝐡2 + · · · + ar 𝐡r + ar+1 𝐟1 + · · · + an 𝐟n−r . So a vector 𝐯 in the row space of G can be written as T 𝐯 = a1 𝐡T1 G + a2 𝐡T2 G + · · · + an 𝐟n−r G,

2.5 Exercises

89

from which we observe that the row space of G is spanned by the vectors T {𝐡T1 G, 𝐡T2 G, . . . , 𝐡Tr G, 𝐟1T G, . . . , 𝐟n−r G}.

The vectors {𝐡1 , 𝐡2 , . . . , 𝐡r } are in W ⟂ , so that 𝐡Ti G = 0 for i = 1, 2, . . . , r. The remaining vectors T G} remain to span the k-dimensional row space of G. Hence, we must have n − r ≥ k. {𝐟1T G, . . . , 𝐟n−r Furthermore, these vectors are linearly independent, because if there is a set of coefficients {ai } such that T G) = 0, a1 (𝐟1T G) + · · · + an−r (𝐟n−r then T )G = 0. (a1 𝐟1T + · · · + an−r 𝐟n−r

But the vectors 𝐟i are not in W ⟂ , so we must have T = 0. a1 𝐟1T + · · · + an−r 𝐟n−r

Since the vectors {𝐟i } are linearly independent, we must have a1 = a2 = · · · = an−r = 0. Therefore, we T G}) = k, so n − r = k. ◽ must have dim span({𝐟1T G, . . . , 𝐟n−r

2.5 2.1

Exercises A group can be constructed by using the rotations and reflections of a regular pentagon into itself. The group operator is “followed by” (e.g., a reflection 𝜌 “followed by” a rotation r). This is a permutation group, as in Example 2.14. (a) (b) (c) (d) (e)

2.2 2.3 2.4 2.5 2.6

Show that only one group exists with three elements “up to isomorphism.” That is, there is only one way of filling out a binary operation table that satisfies all the requirements of a group. Show that there are two groups with four elements, up to isomorphism. One of these groups is isomorphic to ℤ4 . The other is called the Klein 4-group. Prove that in a group G, the identity element is unique. Prove that in a group G, the inverse a−1 of an element a is unique. Let G = ⟨ℤ16 , +⟩, the group of integers modulo 16. Let H = ⟨4⟩, the cyclic group generated by the element 4 ∈ G. (a) (b) (c) (d)

2.7 2.8 2.9

How many elements are in this group? Construct the group (i.e., show the “multiplication table” for the group). Is it an Abelian group? Find a subgroup with five elements and a subgroup with two elements. Are there any subgroups with four elements? Why?

List the elements of H. Determine the cosets of G∕H. Draw the “addition” table for G∕H. To what group is G∕H isomorphic?

Show that if G is an Abelian group and  is isomorphic to G, then  is also Abelian. Let G be a cyclic group and let  be isomorphic to G. Show that  is also a cyclic group. Let G be a cyclic group with generator a and let  be a group isomorphic to G. If 𝜙 ∶ G →  is an isomorphism, show that for every x ∈ G, x may be written as aj (for some j, using multiplicative notation) and that 𝜙(x) is determined by 𝜙(a).

90

Groups and Vector Spaces 2.10 An automorphism of a group G is an isomorphism of the group with itself, 𝜙 ∶ G → G. Using Exercise 2.9, how many automorphisms are there of ℤ2 ? of ℤ6 ? of ℤ8 ? of ℤ17 ? 2.11 [144] Let G be a finite Abelian group of order n, and let r be a positive integer relatively prime to n (i.e., they have no factors in common except 1). Show that the map 𝜙r ∶ G → G defined by 𝜙r (a) = ar is an isomorphism of G onto itself (an automorphism). Deduce that the equation xr = a always has a unique solution in a finite Abelian group G if r is relatively prime to the order of G. 2.12 Show that the induced operation defined in Definition 2.36 is well defined if G is commutative. 2.13 Prove Theorem 2.38. 2.14 Show for the lattice with coset decomposition in Figure 2.2 that Λ∕Λ′ ≅ ℤ2 × ℤ2 . 2.15 Let G be a cyclic group with generator a and let 𝜙 ∶ G → G′ be a homomorphism onto a group G′ . Show that the value of 𝜙 on every element of G is determined by the value of the homomorphism 𝜙(a). 2.16 Let G be a group and let a ∈ G. Let 𝜙 ∶ ℤ → G be defined by 𝜙(n) = an . Show that 𝜙 is a homomorphism. Describe the image of 𝜙 in G. 2.17 Show that if G, G′ , and G′′ are groups and 𝜙 ∶ G → G′ and 𝜓 ∶ G′ → G′′ are homomorphisms, then the composite function 𝜓 ∘ 𝜙 ∶ G → G′′ is a homomorphism. 2.18 Consider the set S = {0, 1, 2, 3} with the operations + 0 1 2 3

0 0 1 2 3

1 1 2 3 0

2 2 3 0 1

3 3 0 1 2

• 0 1 2 3

0 0 0 0 0

1 0 1 2 3

2 0 2 3 1

3 0 3 1 2

Is this a field? If not, why not? 2.19 Construct the addition and multiplication tables for ⟨ℤ4 , +, ⋅⟩ and compare to the tables in equation (2.4). Does ⟨ℤ4 , +, ⋅⟩ form a field? 2.20 Use the representation of GF(4) in (2.4) to solve the following pair of equations: 2x + y = 3 x + 2y = 3. 2.21 Show that the vectors in a basis must be linearly independent. 2.22 Let G = {𝐯1 , 𝐯2 , . . . , 𝐯k } be a basis for a vector space V. Show that for every vector 𝐯 ∈ V, there is a unique representation for 𝐯 as a linear combination of the vectors in G. 2.23 Show that if 𝐮 is orthogonal to every basis vector for W, then 𝐮 ⟂ W. 2.24 Let V be a vector space, and let W ⊂ V be a subspace. The dual space W ⟂ of W is the set of vectors in V which are orthogonal to every vector in W. Show that the dual space W ⟂ is a subspace of V. 2.25 Show that the set of binary polynomials (i.e., polynomials with binary coefficients, with operations in GF(2)) with degree less than r forms a vector space over GF(2) with dimension r. 2.26 What is the dimension of the vector space spanned by the vectors {(1, 1, 0, 1, 0, 1), (0, 1, 0, 1, 1, 1), (1, 1, 0, 0, 1, 1), (0, 1, 1, 1, 0, 1), (1, 0, 0, 0, 0, 0)} over GF(2)? 2.27 Find a basis for the dual space to the vector space spanned by {(1, 1, 1, 0, 0), (0, 1, 1, 1, 0), (0, 0, 1, 1, 1)}.

2.6 Exercises

91

2.28 Let S = {𝐯1 , 𝐯2 , . . . , 𝐯n } be an arbitrary basis for the vector space V. Let 𝐯 be an arbitrary vector in V; it may be expressed as the linear combination 𝐯 = a1 𝐯1 + a2 𝐯2 + · · · + an 𝐯n . Develop an expression for computing the coefficients {ai } in this representation. 2.29 Is it true that if 𝐱, 𝐲 and 𝐳 are linearly independent vectors over GF(q) then so also are 𝐱 + 𝐲, 𝐲 + 𝐳 and 𝐳 + 𝐱? 2.30 Let V be a vector space and let 𝐯1 , 𝐯2 , . . . , 𝐯k ∈ V. Show that span({𝐯1 , 𝐯2 , . . . , 𝐯k }) is a vector space. 2.31 Let U and V be linear subspaces of a vector space S. Show that the intersection U ∩ V is also a subspace of S. 2.32 Let G = ⟨ℤ, +⟩ and let  = ⟨ℤn , +⟩. Let 𝜙 ∶ G →  be defined by 𝜙(a) = a mod n. Show that 𝜙(a + b) = 𝜙(a) + 𝜙(b). 2.33 In this exercise, let 𝐱 ⋅ 𝐲 denote the inner product over the real numbers. Let 𝐱 and 𝐲 be vectors of length n with elements from the set {−1, 1}. Show that dH (𝐱, 𝐲) = n−𝐱⋅𝐲 . 2

2.6

References

Group theory is presented in a variety of books; see, for example, [42] or [144]. Our summary of linear algebra was drawn from [45, 483] and [319]. Some of the exercises were drawn from [483] and [144].

Chapter 3

Linear Block Codes 3.1

Basic Definitions

Consider a source that produces symbols from an alphabet  having q symbols, where  forms a field. We refer to a tuple (c0 , c1 , . . . , cn−1 ) ∈ n with n elements as an n-vector or an n-tuple. Definition 3.1 An (n, k) block code  over an alphabet of q symbols is a set of qk n-vectors called codewords or code vectors. Associated with the code is an encoder which maps a message, a k-tuple ◽ 𝐦 ∈ k , to its associated codeword. For a block code to be useful for error correction purposes, there should be a one-to-one correspondence between a message 𝐦 and its codeword 𝐜. However, for a given code , there may be more than one possible way of mapping messages to codewords. A block code can be represented as an exhaustive list, but for large k this would be prohibitively complex to store and decode. The complexity can be reduced by imposing some sort of mathematical structure on the code. The most common requirement is linearity. Definition 3.2 A block code  over a field 𝔽q of q symbols of length n and qk codewords is a q-ary linear (n, k) code if and only if its qk codewords form a k-dimensional vector subspace of the vector space of all the n-tuples 𝔽qn . The number n is said to be the length of the code and the number k is the dimension of the code. The rate of the code is R = k∕n. ◽ In some literature, an (n, k) linear code is denoted using square brackets, [n, k]. A linear code may also be designated as (n, k, dmin ), where dmin is the minimum distance of the code (as discussed below). For a linear code, the sum of any two codewords is also a codeword. More generally, any linear combination of codewords is a codeword. Definition 3.3 The Hamming weight wt(𝐜) of a codeword 𝐜 is the number of nonzero components of the codeword. The minimum weight 𝑤min of a code  is the smallest Hamming weight of any nonzero ◽ codeword: 𝑤min = min𝐜∈,𝐜≠0 wt(𝐜). Recall from Definition 1.11 that the minimum distance is the smallest Hamming distance between any two codewords of the code. Theorem 3.4 For a linear code , the minimum distance dmin satisfies dmin = 𝑤min . That is, the minimum distance of a linear block code is equal to the minimum weight of its nonzero codewords. Proof The result relies on the fact that linear combinations of codewords are codewords. If 𝐜i and 𝐜j are codewords, then so is 𝐜i − 𝐜j . The distance calculation can then be “translated to the origin”: dmin =

min

𝐜i ,𝐜j ∈,𝐜i ≠𝐜j

dH (𝐜i , 𝐜j ) =

min

𝐜i ,𝐜j ∈,𝐜i ≠𝐜j

dH (𝐜i − 𝐜j , 𝐜j − 𝐜j ) = min 𝑤(𝐜). 𝐜∈,𝐜≠0



An (n, k) code with minimum distance dmin is sometimes denoted as an (n, k, dmin ) code. As described in Section 1.8.1, the random error correcting capability of a code with minimum distance dmin is t = ⌊(dmin − 1)∕2⌋. Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

94

Linear Block Codes

3.2

The Generator Matrix Description of Linear Block Codes

Since a linear block code  is a k-dimensional vector space, there exist k linearly independent vectors which we designate as 𝐠0 , 𝐠1 , . . . , 𝐠k−1 such that every codeword 𝐜 in  can be represented as a linear combination of these vectors, 𝐜 = m0 𝐠0 + m1 𝐠1 + · · · + mk−1 𝐠k−1 ,

(3.1)

where mi ∈ 𝔽q . (For binary codes, all arithmetic in (3.1) is done modulo 2; for codes of 𝔽q , the arithmetic is done in 𝔽q .) Thinking of the 𝐠i as row vectors1 and stacking up, we form the k × n matrix G, ⎡ 𝐠0 ⎤ ⎢ 𝐠 ⎥ G = ⎢ 1 ⎥. ⋮ ⎥ ⎢ ⎣𝐠k−1 ⎦ Let

] [ 𝐦 = m0 m1 · · · mk−1 .

Then (3.1) can be written as 𝐜 = 𝐦G,

(3.2)

and every codeword 𝐜 ∈  has such a representation for some vector 𝐦. Since the rows of G generate (or span) the (n, k) linear code , G is called a generator matrix for . Equation (3.2) can be thought of as an encoding operation for the code . Representing the code thus requires storing only k vectors of length n (rather than the qk vectors that would be required to store all codewords of a nonlinear code). Note that the representation of the code provided by G is not unique. From a given generator G, another generator G′ can be obtained by performing row operations (nonzero linear combinations of the rows). Then an encoding operation defined by 𝐜 = 𝐦G′ maps the message 𝐦 to a codeword in , but it is not necessarily the same codeword that would be obtained using the generator G. Example 3.5

The (7, 4) Hamming code of Section 1.9 has the generator matrix ⎡1 ⎢0 G=⎢ 0 ⎢ ⎣0

1 1 0 0

0 1 1 0

1 0 1 1

0 1 0 1

0 0 1 0

0⎤ 0⎥ . 0⎥ ⎥ 1⎦

(3.3)

[ ] To encode the message 𝐦 = 1 0 0 1 , add the first and fourth rows of G (modulo 2) to obtain [ ] 𝐜= 1 1 0 0 1 0 1 . Another generator is obtained by replacing the first row of G with the sum of the first two rows of G: ⎡1 ⎢0 G′ = ⎢ 0 ⎢ ⎣0 For 𝐦 the corresponding codeword is

0 1 0 0

1 1 1 0

1 0 1 1

1 1 0 1

0 0 1 0

0⎤ 0⎥ . 0⎥ ⎥ 1⎦

[ ] 𝐜′ = 𝐦G′ = 1 0 1 0 0 0 1 .

This is a different codeword than 𝐜, but is still a codeword in .



1 Most signal processing and communication work employs column vectors by convention. However, a venerable tradition in coding theory has employed row vectors and we adhere to that through most of the book.

3.2 The Generator Matrix Description of Linear Block Codes

95

Definition 3.6 Let  be an (n, k) block code (not necessarily linear). An encoder is systematic if the message symbols m0 , m1 , . . . , mk−1 may be found explicitly and unchanged in the codeword. That is, there are coordinates i0 , i1 , . . . , ik−1 (which are most frequently sequential, i0 , i0 + 1, . . . , i0 + k − 1) such that ci0 = m0 , ci1 = m1 , . . . , cik−1 = mk−1 . For a linear code, the generator for a systematic encoder is called a systematic generator. ◽ It should be emphasized that being systematic is a property of the encoder and not a property of the code. For a linear block code, the encoding operation represented by G is systematic if an identity matrix can be identified among the rows of G. Neither the generator G nor G′ of Example 3.5 are systematic. Frequently, a systematic generator is written in the form ⎡ p0,0 ⎢ p ] ⎢ 1,0 [ ⎢ G = P Ik = p2,0 ⎢ ⎢ ⋮ ⎢ ⎣pk−1,0

p0,1

···

p0,n−k−1

1

0

0

···

p1,1 p2,1 ⋮

··· ···

p1,n−k−1 p2,n−k−1 ⋮

0 0

1 0

0 1

··· ··· ⋮

pk−1,1

···

pk−1,n−k−1

0

0

0

···

0⎤ ⎥ 0⎥ 0⎥ , ⎥ ⎥ ⎥ 1⎦

(3.4)

where Ik is the k × k identity matrix and P is a k × (n − k) matrix which generates parity symbols. The encoding operation is ] [ ] [ 𝐜 = 𝐦 P Ik = 𝐦P 𝐦 .n. The codeword is divided into two parts: the part 𝐦 consists of the message symbols, and the part 𝐦P consists of the parity check symbols. Performing elementary row operations (replacing a row with linear combinations of some rows) does not change the row span, so that the same code is produced. If two columns of a generator are interchanged, then the corresponding positions of the code are changed, but the distance structure of the code is preserved. Definition 3.7 Two linear codes which are the same except for a permutation of the components of the code are said to be equivalent codes. ◽ Let G and G′ be generator matrices of equivalent codes. Then G and G′ are related by the following operations: 1. Column permutations, 2. Elementary row operations. Given an arbitrary generator G, it is possible to put it into the form (3.4) by performing Gaussian elimination with pivoting. Example 3.8

For G of (3.3), an equivalent generator in systematic form is ⎡1 ⎢0 G′′ = ⎢ ⎢1 ⎢ ⎣1

1

0

1

0

0

1

1

0

1

0

1

1

0

0

1

0

1

0

0

0

0⎤ 0⎥ ⎥. 0⎥ ⎥ 1⎦

(3.5)

gaussj2.m

96

Linear Block Codes [ ] For the Hamming code with] this generator, let the message be 𝐦 = m0 , m1 , m2 , m3 and letthe corresponding [ codeword be 𝐜 = c0 , c1 , . . . , c6 . Then the parity bits are obtained by c0 = m0 + m2 + m3 c1 = m0 + m1 + m2 c2 = m1 + m2 + m3 and the systematically encoded bits are c3 = m0 , c4 = m1 , c5 = m2 and c6 = m3 .

3.2.1



Rudimentary Implementation

Implementing encoding operations for binary codes is straightforward, since the multiplication operation corresponds to the and operation and the addition operation corresponds to the exclusive-or operation. For software implementations, encoding is accomplished by straightforward matrix/vector multiplication. This can be greatly accelerated for binary codes by packing several bits into a single word (e.g., 32 bits in an unsigned int of 4 bytes). The multiplication is then accomplished using the bit exclusive-or operation of the language (e.g., the ̂ operator of C). Addition must be accomplished by looping through the bits, or by precomputing bit sums and storing them in a table, where they can be immediately looked up.

3.3

The Parity Check Matrix and Dual Codes

Since a linear code  is a k-dimensional vector subspace of 𝔽qn , by Theorem 2.64 there must be a dual space to  of dimension n − k. Definition 3.9 The dual space to an (n, k) code  of dimension k is the (n, n − k) dual code of , denoted by  ⟂ . A code  such that  =  ⟂ is called a self-dual code. ◽ As a vector space,  ⟂ has a basis which we denote by {𝐡0 , 𝐡1 , . . . , 𝐡n−k−1 }. We form a matrix H using these basis vectors as rows: ⎡ 𝐡0 ⎤ ⎢ 𝐡 ⎥ H = ⎢ 1 ⎥. ⋮ ⎥ ⎢ ⎣𝐡n−k−1 ⎦ This matrix is known as the parity check matrix for the code . The generator matrix and the parity check matrix for a code satisfy GH T = 0. The parity check matrix has the following important property: Theorem 3.10 Let  be an (n, k) linear code over 𝔽q and let H be a parity check matrix for . A vector 𝐯 ∈ 𝔽qn is a codeword if and only if 𝐯H T = 0. That is, the codewords in  lie in the (left) nullspace of H. (Sometimes additional linearly dependent rows are included in H, but the same result still holds.) Proof Let 𝐜 ∈ . By the definition of the dual code, 𝐡 ⋅ 𝐜 = 0 for all 𝐡 ∈  ⟂ . Any row vector 𝐡 ∈  ⟂ can be written as 𝐡 = 𝐱H for some vector 𝐱. Since 𝐱 is arbitrary, and in fact can select individual rows of H, we must have 𝐜𝐡Ti = 0 for i = 0, 1, . . . , n − k − 1. Hence 𝐜H T = 0.

3.3 The Parity Check Matrix and Dual Codes

97

Conversely, suppose that 𝐯H T = 0. Then 𝐯𝐡Ti = 0 for i = 0, 1, . . . , n − k − 1, so that 𝐯 is orthogonal to the basis of the dual code, and hence orthogonal to the dual code itself. Hence, 𝐯 must be in the code . ◽ When G is in systematic form (3.4), a parity check matrix is readily determined: ] [ H = In−k −PT .

(3.6)

(For the field 𝔽2 , −1 = 1, since 1 is its own additive inverse.) Frequently, a parity check matrix for a code is obtained by finding a generator matrix in systematic form and employing (3.6). Example 3.11

For the systematic generator G′′ of (3.5), a parity check matrix is ⎡1 0 0 H = ⎢0 1 0 ⎢ ⎣0 0 1

1 0 1 1⎤ 1 1 1 0⎥ . ⎥ 0 1 1 1⎦

(3.7)

It can be verified that G′′ H T = 0. Furthermore, even though G is not in systematic form, it still generates the same ◽ code so that GH T = 0. H is a generator for a (7, 3) code, the dual code to the (7, 4) Hamming code.

The condition 𝐜H T = 0 imposes linear constraints among the bits of 𝐜 called the parity check equations. Example 3.12

The parity check matrix of (3.7) gives rise to the equations c0 + c3 + c5 + c6 = 0 c1 + c3 + c4 + c5 = 0 c2 + c4 + c5 + c6 = 0

or, equivalently, some equations for the parity symbols are c0 = c3 + c5 + c6 c1 = c3 + c4 + c5 c2 = c4 + c5 + c6 .



A parity check matrix for a code (whether systematic or not) provides information about the minimum distance of the code. Theorem 3.13 Let a linear block code  have a parity check matrix H. The minimum distance dmin of  is equal to the smallest positive number of columns of H which are linearly dependent. That is, all combinations of dmin − 1 or fewer columns are linearly independent, so there is some set of dmin columns which are linearly dependent. Proof Let the columns of H be designated as 𝐡0 , 𝐡1 , . . . , 𝐡n−1 . Then since 𝐜H T = 0 for any codeword 𝐜, we have 0 = c0 𝐡0 + c1 𝐡1 + · · · + cn−1 𝐡n−1 , that is, a linear combination of columns of H. Let 𝐜 be the codeword of smallest weight, 𝑤 = wt(𝐜) = dmin , with nonzero positions only at indices i1 , i2 , . . . , i𝑤 . Then ci1 𝐡i1 + ci2 𝐡i2 + · · · + ci𝑤 𝐡i𝑤 = 0. Clearly, the columns of H corresponding to the elements of 𝐜 are linearly dependent.

98

Linear Block Codes On the other hand, if there were a linearly dependent set of u < 𝑤 columns of H, then there would be a codeword of weight u. ◽ Example 3.14

For the parity check matrix H of (3.7), the parity check condition is

⎡1 ⎢0 ⎢ [ ] ⎢0 T 𝐜H = c0 , c1 , c2 , c3 , c4 , c5 , c6 ⎢1 ⎢0 ⎢1 ⎢ ⎣1

0 1 0 1 1 1 0

0⎤ 0⎥ 1⎥ ⎥ 0⎥ 1⎥ 1⎥⎥ 1⎦

= c0 [1, 0, 0] + c1 [0, 1, 0] + c2 [0, 0, 1] + c3 [1, 1, 0] + c4 [0, 1, 1] + c5 [1, 1, 1] + c6 [1, 0, 1]. The first, second, and fourth rows of H T are linearly dependent, and no fewer rows of H T are linearly dependent. ◽

3.3.1

Some Simple Bounds on Block Codes

Theorem 3.13 leads to a relationship between dmin , n, and k: Theorem 3.15 The Singleton bound. The minimum distance for an (n, k) linear code is bounded by dmin ≤ n − k + 1.

(3.8)

Note: Although this bound is proved here for linear codes, it is also true for nonlinear codes. (See [292].) Proof An (n, k) linear code has a parity check matrix with n − k linearly independent rows. Since the row rank of a matrix is equal to its column rank, rank(H) = n − k. Any collection of n − k + 1 columns must therefore be linearly dependent. Thus, by Theorem 3.13, the minimum distance cannot be larger than n − k + 1. ◽ A code for which dmin = n − k + 1 is called a maximum distance separable (MDS) code. Thinking geometrically, around each code point is a cloud of points corresponding to non-codewords. (See Figure 1.17.) (For)a q-ary code, there are (q − 1)n vectors at a Hamming distance 1 away from a codeword, (q − 1)2 n2 vectors at a Hamming distance 2 away from a codeword and, ( ) in general, (q − 1)l nl vectors at a Hamming distance l from a codeword. Example 3.16 Let  be a code of length n = 4 over GF(3), so q = 3. Then the vectors at a Hamming distance of 1 from the [0, 0, 0, 0] codeword are [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1] [2, 0, 0, 0], [0, 2, 0, 0], [0, 0, 2, 0], [0, 0, 0, 2].



The vectors at Hamming distances ≤ t away from a codeword form a “sphere” called the Hamming sphere of radius t. The number of vectors in a Hamming sphere up to radius t for a code of length n over an alphabet of q symbols is denoted Vq (n, t), where t ( ) ∑ n Vq (n, t) = (3.9) (q − 1)j . j j=0 Hammsphere.m

3.4 Error Detection and Correction Over Hard-Output Channels

99

The bounded distance decoding sphere of a codeword is the Hamming sphere of radius t = ⌊(dmin − 1)∕2⌋ around the codeword. Equivalently, a code whose random error correction capability is t must have a minimum distance between codewords satisfying dmin ≥ 2t + 1. The redundancy of a code is essentially the number of parity symbols in a codeword. More precisely we have r = n − logq M, where M is the number of codewords. For a linear code we have M = qk , so r = n − k. A t-random error correcting q-ary code  must have

Theorem 3.17 (The Hamming Bound) redundancy r satisfying

r ≥ logq Vq (n, t). Proof Each of M spheres in  has radius t. The spheres do not overlap, or else it would not be possible to decode t errors. The total number of points enclosed by the spheres must be ≤ qn . We must have MVq (n, t) ≤ qn so qn ∕M ≥ Vq (n, t), from which the result follows by taking logq of both sides.



A code satisfying the Hamming bound with equality is said to be a perfect code. Actually, being perfect codes does not mean the codes are the best possible codes; it is simply a designation regarding how points fall in the Hamming spheres. The set of perfect codes is actually quite limited. It has been proved (see [292]) that the entire set of perfect codes is: 1. The set of all n-tuples, with minimum distance = 1 and t = 0. 2. Odd-length binary repetition codes. 3. Binary and nonbinary Hamming codes (linear) or other nonlinear codes with equivalent parameters. 4. The binary (23, 12, 7) Golay code G23 . 5. The ternary (i.e., over GF(3)) (11, 6, 5) code G11 and the (23,11,5) code G23 . These codes are discussed in Chapter 8.

3.4

Error Detection and Correction Over Hard-Output Channels

Definition 3.18 Let 𝐫 be an n-vector over 𝔽q and let H be a parity check matrix for a code . The vector 𝐬 = 𝐫H T (3.10) is called the syndrome of 𝐫.



By Theorem 3.10, 𝐬 = 0 if and only if 𝐫 is a codeword of . In medical terminology, a syndrome is a pattern of symptoms that aids in diagnosis; here 𝐬 aids in diagnosing if 𝐫 is a codeword or has been corrupted by noise. As we will see, it also aids in determining what the error is.

100

Linear Block Codes 3.4.1

Error Detection

The syndrome can be used as an error detection scheme. Suppose that a codeword 𝐜 in a binary linear block code  over 𝔽q is transmitted through a hard channel (e.g., a binary code over a BSC) and that the n-vector 𝐫 is received. We can write 𝐫 = 𝐜 + 𝐞, where the arithmetic is done in 𝔽q , and where 𝐞 is the error vector, being 0 in precisely the locations where the channel does not introduce errors. The received vector 𝐫 could be any of the vectors in 𝔽qn , since any error pattern is possible. Let H be a parity check matrix for . Then the syndrome 𝐬 = 𝐫H T = (𝐜 + 𝐞)H T = 𝐞H T . From Theorem 3.10, 𝐬 = 0 if 𝐫 is a codeword. However, if 𝐬 ≠ 0, then an error condition has been detected: we do not know what the error is, but we do know that an error has occurred. (Some additional perspective to this problem is provided in Box 3.1.) Box 3.1: Error Correction and Least Squares The hard-input decoding problem is: Given 𝐫 = 𝐦G + 𝐞, compute 𝐦. Readers familiar with least-squares problems (see, e.g., [319]) will immediately recognize the structural similarity of the decoding problem to least squares. If a least-squares solution were possible, the decoded value could be written as ̂ = 𝐫GT (GGT )−1 , 𝐦 reducing the decoding problem to numerical linear algebra. Why cannot least-squares techniques be employed here? In the first place, it must be recalled that in least squares, the distance function d is induced from an inner product, d(x, y) = ⟨x − y, x − y⟩1∕2 , while in our case the distance function is the Hamming distance — which measures the likelihood — which is not induced from an inner product. The Hamming distance is a function 𝔽qn × 𝔽qn → ℕ, while the inner product is a function 𝔽qn × 𝔽qn → 𝔽q : the codomains of the Hamming distance and the inner product are different. 3.4.2

Error Correction: The Standard Array

Let us now consider one method of decoding linear block codes transmitted through a hard channel using maximum likelihood (ML) decoding. As discussed in Section 1.8.1, ML decoding of a vector 𝐫 consists of choosing the codeword 𝐜 ∈  that is closest to 𝐫 in Hamming distance. That is, 𝐜̂ = arg min dH (𝐜, 𝐫). 𝐜∈

Let the set of codewords in the code be {𝐜0 , 𝐜1 , . . . , 𝐜M−1 }, where M = qk . Let us take 𝐜0 = 0, the all-zero codeword. Let Vi denote the set of n-vectors which are closer to the codeword 𝐜i than to any other codeword. (Vectors which are equidistant to more than one codeword are assigned into a set Vi at random.) The sets {Vi , i = 0, 1, . . . , M − 1} partition the space of n-vectors into M disjoint subsets. If 𝐫 falls in the set Vi , then, being closer to 𝐜i than to any other codeword, 𝐫 is decoded as 𝐜i . So, decoding can be accomplished if the Vi sets can be found. The standard array is a representation of the partition {Vi }. It is a two-dimensional array in which the columns of the array represent the Vi . The standard array is built as follows. First, every codeword 𝐜i belongs in its own set Vi . Writing down the set of codewords thus gives the first row of the array. From the remaining vectors in 𝔽qn , find the vector 𝐞1 of smallest weight. This must lie in the set V0 ,

3.4 Error Detection and Correction Over Hard-Output Channels

101

since it is closest to the codeword 𝐜0 = 0. But dH (𝐞1 + 𝐜i , 𝐜i ) = dH (𝐞1 , 0), for each i, so the vector 𝐞1 + 𝐜i must also lie in Vi for each i. So 𝐞1 + 𝐜i is placed into each Vi . The vectors 𝐞1 + 𝐜i are included in their respective columns of the standard array to form the second row of the standard array. The procedure continues, selecting an unused vector of minimal weight and adding it to each codeword to form the next row of the standard array, until all qn possible vectors have been used in the standard array. In summary: 1. Write down all the codewords of the code . 2. Select from the remaining unused vectors of 𝔽qn one of minimal weight, 𝐞. Write 𝐞 in the column under the all-zero codeword, then add 𝐞 to each codeword in turn, writing the sum in the column under the corresponding codeword. 3. Repeat step 2 until all qn vectors in 𝔽qn have been placed in the standard array. Example 3.19

For a (7, 3) code, a generator matrix is ⎡0 1 1 1 1 0 0⎤ G = ⎢1 0 1 1 0 1 0⎥ . ⎢ ⎥ ⎣1 1 0 1 0 0 1⎦ genstdarray.c

The codewords for this code are row 1 0000000 0111100 1011010 1100110 1101001 1010101 0110011 0001111 From the remaining 7-tuples, one of minimal weight is selected; take (1000000). The second row is obtained by adding this to each codeword: row 1 0000000 0111100 1011010 1100110 1101001 1010101 0110011 0001111 row 2 1000000 1111100 0011010 0100110 0101001 0010101 1110011 1001111 Now, proceed until all 2n vectors are used, selecting an unused vector of minimum weight and adding it to all the codewords. The result is shown in Table 3.1. ◽

Table 3.1: The Standard Array for a Code Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 9 Row 10 Row 11 Row 12 Row 13 Row 14 Row 15 Row 16

0000000 1000000 0100000 0010000 0001000 0000100 0000010 0000001 1100000 1010000 0110000 1001000 0101000 0011000 1000100 1110000

0111100 1111100 0011100 0101100 0110100 0111000 0111110 0111101 1011100 1101100 0001100 1110100 0010100 0100100 1111000 1001100

1011010 0011010 1111010 1001010 1010010 1011110 1011000 1011011 0111010 0001010 1101010 0010010 1110010 1000010 0011110 0101010

1100110 0100110 1000110 1110110 1101110 1100010 1100100 1100111 0000110 0110110 1010110 0101110 1001110 1111110 0100010 0010110

1101001 0101001 1001001 1111001 1100001 1101101 1101011 1101000 0001001 0111001 1011001 0100001 1000001 1110001 0101101 0011001

1010101 0010101 1110101 1000101 1011101 1010001 1010111 1010100 0110101 0000101 1100101 0011101 1111101 1001101 0010001 0100101

The horizontal lines in the standard array separate the error patterns of different weights.

0110011 1110011 0010011 0100011 0111011 0110111 0110001 0110010 1010011 1100011 0000011 1111011 0011011 0101011 1110111 1000011

0001111 1001111 0101111 0011111 0000111 0001011 0001101 0001110 1101111 1011111 0111111 1000111 0100111 0010111 1001011 1111111

102

Linear Block Codes We make the following observations: 1. There are qk codewords (columns) and qn possible vectors, so there are qn−k rows in the standard array. We observe, therefore, that: an (n, k) code is capable of correcting qn−k different error patterns. 2. The difference (or sum, over GF(2)) of any two vectors in the same row of the standard array is a code vector. In a row, the vectors are 𝐜i + 𝐞 and 𝐜j + 𝐞. Then (𝐜i + 𝐞) − (𝐜j + 𝐞) = 𝐜i − 𝐜j , which is a codeword, since linear codes form a vector subspace. 3. No two vectors in the same row of a standard array are identical. Because otherwise we have 𝐞 + 𝐜i = 𝐞 + 𝐜j , with i ≠ j, which means 𝐜i = 𝐜j , which is impossible. 4. Every vector appears exactly once in the standard array. We know every vector must appear at least once, by the construction. If a vector appears in both the lth row and the mth row, we must have 𝐞l + 𝐜i = 𝐞m + 𝐜j for some i and j. Let us take l < m. We have 𝐞m = 𝐞l + 𝐜i − 𝐜j = 𝐞l + 𝐜k for some k. This means that 𝐞m is on the lth row of the array, which is a contradiction. The rows of the standard array are called cosets. Each row is of the form 𝐞 +  = {𝐞 + 𝐜 ∶ 𝐜 ∈ }. That is, the rows of the standard array are translations of . These are the same cosets we met in Section 2.2.3 in conjunction with groups. The vectors in the first column of the standard array are called the coset leaders. They represent the error patterns that can be corrected by the code under this decoding strategy. The decoder of Example 3.19 is capable of correcting all errors of weight 1, seven different error patterns of weight 2, and one error pattern of weight 3. To decode with the standard array, we first locate the received vector 𝐫 in the standard array. Then identify 𝐫 =𝐞+𝐜 for a vector 𝐞 which is a coset leader (in the left column) and a codeword 𝐜 (on the top row). Since we designed the standard array with the smallest error patterns as coset leaders, the error codeword so identified in the standard array is the ML decision. The coset leaders are called the correctable error patterns. Example 3.20

For the code of Example 3.19, let 𝐫 = [0, 0, 1, 1, 0, 1, 1]

(shown in bold in the standard array) then its coset leader is 𝐞 = [0, 1, 0, 1, 0, 0, 0] and the codeword is 𝐜 = [0, 1, 1, 0, 0, 1, 1], which corresponds to the message 𝐦 = [0, 1, 1], since the generator is systematic. ◽ ( ) Example 3.21 It is interesting to note that for the standard array of Example 3.19, not all 72 = 21 patterns of two errors are correctable. Only seven patterns of two errors are correctable. However, there is one pattern of three errors which is correctable.

3.4 Error Detection and Correction Over Hard-Output Channels The minimum distance for this code is clearly 4, since the minimum weight of the nonzero codewords is 4. Thus, the code is guaranteed to correct only ⌊(4 − 1)∕2⌋ = 1 error and, in fact it does correct all patterns of single errors. ◽

As this decoding example shows, the standard array decoder may have coset leaders with weight higher than the random-error-correcting capability of the code t = ⌊(dmin − 1)∕2⌋. This observation motivates the following definition. Definition 3.22 A complete error correcting decoder is a decoder that given the received word 𝐫, selects the codeword 𝐜 which minimizes dH (𝐫, 𝐜). That is, it is the ML decoder for the BSC channel. ◽ If a standard array is used as the decoding mechanism, then complete decoding is achieved. On the other hand, if the rows of the standard array are filled out so that all instances of up to t errors appear in the table, and all other rows are left out, then a bounded distance decoder is obtained. Definition 3.23 A t-error correcting bounded distance decoder selects the codeword 𝐜 given ◽ the received vector 𝐫 if dH (𝐫, 𝐜) ≤ t. If no such 𝐜 exists, then a decoder failure is declared. Example 3.24 In Table 3.1, only up to row 8 of the table would be used in a bounded distance decoder, which is capable of correcting up to t = ⌊(dmin − 1)∕2⌋ = ⌊(4 − 1)∕2⌋ = 1 error. A received vector 𝐫 appearing in rows 9 through 16 of the standard array would result in a decoding failure. ◽

A perfect code can be understood ( in) terms of the standard array: it is one for which there are no “leftover” rows: for binary codes all nt error patterns of weight t and all lighter error patterns appear as coset(leaders in the table, with no “leftovers.” (For q-ary codes, the number of error patterns is ) (q − 1)t nt .) What makes it “perfect” then, is that the bounded distance decoder is also the ML decoder. The standard array can, in principle, be used to decode any linear block code, but suffers from a major problem: the memory required to represent the standard array quickly become excessive, and decoding requires searching the entire table to find a match for a received vector 𝐫. For example, a (256,200) binary code — not a particularly long code by modern standards — would require 2256 ≈ 1.2 × 1077 vectors of length 256 bits to be stored in it and every decoding operation would require on average searching through half of the table. A first step in reducing the storage and search complexity (which doesn’t go far enough) is to use syndrome decoding. Let 𝐞 + 𝐜 be a vector in the standard array. The syndrome for this vector is 𝐬 = (𝐞 + 𝐜)H T = 𝐞H T . Furthermore, every vector in the coset has the same syndrome: (𝐞 + 𝐜)H T = 𝐞H T . We therefore only need to store syndromes and their associated error patterns. This table is called the syndrome decoding table. It has qn−k rows but only two columns, so it is smaller than the entire standard array. But is still impractically large in many cases. With the syndrome decoding table, decoding is done as follows: 1. Compute the syndrome, 𝐬 = 𝐫H T . 2. In the syndrome decoding table look up the error pattern 𝐞 corresponding to 𝐬. 3. Then 𝐜 = 𝐫 + 𝐞. Example 3.25

For the code of Example 3.19, a parity check matrix is ⎡1 ⎢0 H=⎢ 0 ⎢ ⎣0

0 1 0 0

0 0 1 0

0 0 0 1

0 1 1 1

1 0 1 1

1⎤ 1⎥ . 0⎥ ⎥ 1⎦

103

104

Linear Block Codes The syndrome decoding table is Error

Syndrome

0000000 1000000 0100000 0010000 0001000 0000100 0000010 0000001 1100000 1010000 0110000 1001000 0101000 0011000 1000100 1110000

0000 1000 0100 0010 0001 0111 1011 1101 1100 1010 0110 1001 0101 0011 1111 1110

Suppose that 𝐫 = [0, 0, 1, 1, 0, 1, 1], as before. The syndrome is [ ] 𝐬 = 𝐫H T = 0 1 0 1 , [ ] (in bold in the table) which corresponds to the coset leader 𝐞 = 0 1 0 1 0 0 0 . The decoded codeword is then 𝐜̂ = [0, 0, 1, 1, 0, 1, 1] + [0, 1, 0, 1, 0, 0, 0] = [0, 1, 1, 0, 0, 1, 1], ◽

as before.

Despite the significant reduction compared to the standard array, the memory requirements for the syndrome decoding table are still very high. It is still infeasible to use this technique for very long codes. Additional algebraic structure must be imposed on the code to enable decoding long codes.

3.5

Weight Distributions of Codes and Their Duals

The weight distribution of a code plays a significant role in calculating probabilities of error. Definition 3.26 Let  be an (n, k) code. Let Ai denote the number of codewords of weight i in . Then the set of coefficients {A0 , A1 , . . . , An } is called the weight distribution for the code. It is convenient to represent the weight distribution as a polynomial, A(z) = A0 + A1 z + A2 z2 + · · · + An zn . This polynomial is called the weight enumerator.

(3.11) ◽

The weight enumerator is (essentially) the z-transform of the weight distribution sequence. Example 3.27 For the code of Example 3.19, there is one codeword of weight 0. The rest of the codewords all have weight 4. So A0 = 1, A4 = 7. Thus, A(z) = 1 + 7z4 . ◽

3.5 Weight Distributions of Codes and Their Duals

105

There is a relationship, known as the MacWilliams identities, between the weight enumerator of a linear code and its dual. This relationship is of interest because for many codes it is possible to directly characterize the weight distribution of the dual code, from which the weight distribution of the code of interest is obtained by the MacWilliams identity. Theorem 3.28 (The MacWilliams Identities). Let  be an (n, k) linear block code over 𝔽q with weight enumerator A(z) and let B(z) be the weight enumerator of  ⟂ . Then ( ) 1−z −k n B(z) = q (1 + (q − 1)z) A , (3.12) 1 + (q − 1)z or, turning this around algebraically,

(

−(n−k)

A(z) = q

n

(1 + (q − 1)z) B

1−z 1 + (q − 1)z

) .

(3.13)

For binary codes with q = 2, the MacWilliams identities are ( ( ) ) 1−z 1−z 1 1 B(z) = k (1 + z)n A A(z) = n−k (1 + z)n B . 1+z 1+z 2 2 The proof of this theorem reveals some techniques that are very useful in coding. We give the proof for codes over 𝔽2 , but it is straightforward to extend to larger fields (once you are familiar with them). The proof relies on the Hadamard transform. For a function f defined on 𝔽2n , the Hadamard transform f̂ of f is ∑ ∑ ∑n−1 (−1)⟨𝐮,𝐯⟩ f (𝐯) = (−1) i=0 ui 𝑣i f (𝐯), 𝐮 ∈ 𝔽2n , f̂ (𝐮) = 𝐯∈𝔽2n

𝐯∈𝔽2n

where the sum is taken over all 2n n-tuples 𝐯 = (𝑣0 , 𝑣1 , . . . , 𝑣n−1 ), where each 𝑣i ∈ 𝔽2 . The inverse Hadamard transform is 1 ∑ f (𝐮) = n (−1)⟨𝐮,𝐯⟩ f̂ (𝐯). 2 𝐯∈𝔽 n 2

Lemma 3.29 If  is an (n, k) binary linear code and f is a function defined on 𝔽2n , ∑ ∑ 1 ∑̂ 1 ∑ ̂ f (𝐮) or f (𝐮). f (𝐮) = f (𝐮) = ⟂ || 𝐮∈ | | 𝐮∈ ⟂ 𝐮∈ 𝐮∈ ⟂ Here || denotes the number of elements in the code . Proof of lemma.



f̂ (𝐮) =

𝐮∈

∑∑ 𝐮∈

=

(−1)⟨𝐮,𝐯⟩ f (𝐯) =

𝐯∈𝔽2n



𝐯∈ ⟂

f (𝐯)

∑ 𝐮∈



f (𝐯)

𝐯∈𝔽2n

(−1)⟨𝐮,𝐯⟩ +



𝐯∈∖{0}



(−1)⟨𝐮,𝐯⟩

𝐮∈

f (𝐯)



(−1)⟨𝐮,𝐯⟩ ,

𝐮∈

has been partitioned into two disjoint sets, the dual code  ⟂ and the nonzero elements of the where code, ∖{0}. In the first sum, ⟨𝐮, 𝐯⟩ = 0, so the inner sum is ||. In the second sum, for every 𝐯 in the outer sum, ⟨𝐮, 𝐯⟩ takes on the values 0 and 1 equally often as 𝐮 varies over  in the inner sum, so the inner sum is 0. Therefore, ∑ ∑ ∑ ∑ f (𝐯) (1) = || f (𝐯). f̂ (𝐮) = 𝐮∈ 𝐮∈ 𝐯∈ ⟂ 𝐯∈ ⟂ ◽ 𝔽2n

106

Linear Block Codes Proof of Theorem 3.28. The weight enumerators of the code and its dual can be written as ∑ ∑ zwt(𝐮) B(z) = zwt(𝐮) . A(z) = 𝐮∈

𝐮∈ ⟂

Let f (𝐮) = zwt(𝐮) . Its Hadamard transform is f̂ (𝐮) =



(−1)⟨𝐮,𝐯⟩ zwt(𝐯) .

𝐯∈𝔽2n

Writing out the inner product and the weight function explicitly on the vectors 𝐮 = (u0 , u1 , . . . , un−1 ) and 𝐯 = (𝑣0 , 𝑣1 , . . . , 𝑣n−1 ), we have ∑

f̂ (𝐮) =

(−1)

∑n−1 i=0

ui 𝑣i

n−1 ∏

𝐯∈𝔽2n

z𝑣i =

n−1 ∑∏ 𝐯∈𝔽2n

i=0

(−1)ui 𝑣i z𝑣i .

i=0

The sum over the 2n values of 𝐯 ∈ 𝔽2n can be expressed as n nested summations over the binary values of the elements of 𝐯: 1 1 1 n−1 ∑ ∑ ∑ ∏ f̂ (𝐮) = ··· (−1)ui 𝑣i z𝑣i . 𝑣0 =0 𝑣1 =0

𝑣n−1 =0 i=0

Now the distributive law can be used to pull the product out of the summations, f̂ (𝐮) =

n−1 1 ∏ ∑

(−1)ui 𝑣i z𝑣i .

i=0 𝑣i =0

If ui = 0, then the inner sum is 1 + z. If ui = 1, then the inner sum is 1 − z. Thus, f̂ (𝐮) = (1 + z)n−wt(𝐮) (1 − z)wt(𝐮) =

(

1−z 1+z

)wt(𝐮) (1 + z)n .

(3.14)

The weight enumerator for the dual code is B(z) =



zwt(𝐮) =

𝐮∈ ⟂

1 ∑ = || 𝐮∈

(

∑ 𝐮∈ ⟂

1−z 1+z

∑ 1 (1 + z)n = || 𝐮∈

3.6

f (𝐮) =

)wt(𝐮)

1 ∑̂ f (𝐮) || 𝐮∈

(1 + z)n (

1−z 1+z

(by Lemma 3.29)

(by (3.14))

)wt(𝐮) =

| 1 . (1 + z)n A(z)|| || |z=(1−z)∕(1+z)



Binary Hamming Codes and Their Duals

We now formally introduce a family of binary linear block codes, the Hamming codes, and their duals. Definition 3.30 For any integer m ≥ 2, a (2m − 1, 2m − m − 1, 3) binary code (where the 3 in the designation refers to the minimum distance of the code) may be defined by its m × n parity check matrix H, which is obtained by writing all possible binary m-tuples, except the all-zero tuple, as the columns of H. For example, simply write out the m-bit binary representation for the numbers from 1 to n in order. Codes equivalent to this construction are called Hamming codes. ◽

3.6 Binary Hamming Codes and Their Duals

107

For example, when m = 4, we get ⎡ ⎢ H=⎢ ⎢ ⎣

1 ⎤ 1 ⎥ 1 ⎥ ⎥ 01 02 03 04 05 06 07 18 19 110 111 112 113 114 115 ⎦ 1 0 0

0 1 0

1 1 0

0 0 1

1 0 1

0 1 1

1 1 1

0 0 0

1 0 0

0 1 0

1 1 0

0 0 1

1 0 1

0 1 1

as the parity check matrix for a (15, 11) Hamming code. However, it is usually convenient to reorder the columns — resulting in an equivalent code — so that the identity matrix which is interspersed throughout the columns of H appears in the first m columns. We therefore write ⎡ ⎢ H=⎢ ⎢ ⎣

1 0 0

0 1 0

0 0 1

0 0 0

01 02 04 18

1 ⎤ 1 ⎥ . 1 ⎥ ⎥ 03 05 06 07 19 110 111 112 113 114 115 ⎦ 1 1 0

1 0 1

0 1 1

1 1 1

1 0 0

0 1 0

1 1 0

0 0 1

1 0 1

0 1 1

It is clear from the form of the parity check matrix that for any m there exist three columns which add to zero; for example, ⎡1⎤ ⎡0⎤ ⎡1⎤ ⎢0⎥ ⎢1⎥ ⎢1⎥ ⎢0⎥ , ⎢0⎥ , and ⎢0⎥ , ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣0⎦ ⎣0⎦ ⎣0⎦ so by Theorem 3.13 the minimum distance is 3; Hamming codes are capable of correcting 1 bit error in the block, or detecting up to 2 bit errors. An algebraic decoding procedure for Hamming codes was described in Section 1.9.1. The dual to a (2m − 1, 2m − m − 1) Hamming code is a (2m − 1, m) code called a simplex code or a maximal-length feedback shift register code. Example 3.31 The parity check matrix of the (7, 4) Hamming code of (3.7) can be used as a generator of the (7, 3) simplex code with codewords 0000000 0101110 1100101 0111001

1001011 0010111 1011100 1110010

Observe that except for the zero codeword, all codewords have weight 4.



In general, all of the nonzero codewords of the (2m − 1, m) simplex code have weight 2m−1 (see Exercise 3.12) and every pair of codewords is at a distance 2m−1 apart (which is why it is called a simplex). For example, for the m = 2 case, the codewords {(000), (101), (011), (110)} form a tetrahedron. Thus, the weight enumerator of the dual code is m−1

B(z) = 1 + (2m − 1)z2

.

(3.15)

From the weight enumerator of the dual, we find using (3.13) that the weight distribution of the Hamming code is 1 (3.16) [(1 + z)n + n(1 − z)(1 − z2 )(n−1)∕2 ]. A(z) = n+1 Example 3.32

For the (7, 4) Hamming code, the weight enumerator is A(z) =

1 [(1 + z)7 + 7(1 − z)(1 − z2 )3 ] = 1 + 7z3 + 7z4 + z7 . 8

(3.17) ◽

108

Linear Block Codes Example 3.33

For the (15, 11) Hamming code, the weight enumerator is A(z) =

1 ((1 + z)15 + 15(1 − z)(1 − z2 )7 ) 16

= 1 + 35z3 + 105z4 + 168z5 + 280z6 + 435z7 + 435z8 + 280z9 + 168z10 +105z11 + 35z12 + z15 .

3.7

(3.18) ◽

Performance of Linear Codes

There are several different ways that we can characterize the error detecting and correcting capabilities of codes at the output of the channel decoder [483]. P(E) is the probability of decoder error, also known as the word error rate. This is the probability that the codeword at the output of the decoder is not the same as the codeword at the input of the encoder. Pb (E) or Pb is the probability of bit error, also known as the bit error rate. This is the probability that the decoded message bits (extracted from a decoded codeword of a binary code) are not the same as the encoded message bits. Note that when a decoder error occurs, there may be anywhere from 1 to k message bits in error, depending on what codeword is sent, what codeword was decoded, and the mapping from message bits to codewords. Pu (E) is the probability of undetected codeword error, the probability that errors occurring in a codeword are not detected. Pd (E) is the probability of detected codeword error, the probability that one or more errors occurring in a codeword are detected. Pub is the undetected bit error rate, the probability that a decoded message bit is in error and is contained within a codeword corrupted by an undetected error. Pdb is the detected bit error rate, the probability that a received message bit is in error and is contained within a codeword corrupted by a detected error. P(F) is the probability of decoder failure, which is the probability that the decoder is unable to decode the received vector (e.g., for a bounded distance decoder) and is able to determine that it cannot decode. In what follows, bounds and exact expressions for these probabilities will be developed. 3.7.1

Error Detection Performance

All errors with weight up to dmin − 1 can be detected, so in computing the probability of detection, only error patterns with weight dmin or higher need be considered. If a codeword 𝐜 of a linear code is transmitted and the error pattern 𝐞 happens to be a codeword, 𝐞 = 𝐜′ , then the received vector 𝐫 = 𝐜 + 𝐜′ is also a codeword. Hence, the error pattern would be undetectable. Thus, the probability that an error pattern is undetectable is precisely the probability that it is a codeword. We consider only errors in transmission of binary codes over the BSC with crossover probability p. (Extension to codes with larger alphabets is discussed in [483].) The probability of any particular pattern of j errors in a codeword is pj (1 − p)n−j . Recalling that Aj is the number of codewords in  of weight j, the probability that j errors form a codeword is Aj pj (1 − p)n−j . The probability of undetectable

3.7 Performance of Linear Codes

109

error in a codeword is then Pu (E) =

n ∑

Aj pj (1 − p)n−j .

(3.19)

j=dmin

The probability of a detected codeword error is the probability that one or more errors occur minus the probability that the error is undetected: n ( ) ∑ n pj (1 − p)n−j − Pu (E) = 1 − (1 − p)n − Pu (E). Pd (E) = j j=1 Computing these probabilities requires knowing the weight distribution of the code, which is not always available. It is common, therefore, to provide bounds on the performance. A bound on Pu (E) can be obtained by observing that the probability of undetected error is bounded above by the ( probability of ) occurrence of any error patterns of weight greater than or equal to dmin . Since there are nj different ways that j positions out of n can be changed, n ( ) ∑ n Pu (E) ≤ (3.20) pj (1 − p)n−j . j j=d min

A bound on Pd (E) is simply

Example 3.34

Pd (E) ≤ 1 − (1 − p)n .

For the Hamming (7, 4) code with A(z) = 1 + 7z3 + 7z4 + z7 , Pu (E) = 7p3 (1 − p)4 + 7p4 (1 − p)3 + p7 .

If p = 0.01 then Pu (E) ≈ 6.79 × 10−6 . The bound (3.20) gives Pu (E) < 3.39 × 10−5 .



The corresponding bit error rates can be bounded as follows. The undetected bit error rate Pub can be lower-bounded by assuming the undetected codeword error corresponds to only a single message bit error. Pub can be upper-bounded by assuming that the undetected codeword error corresponds to all k message bits being in error. Thus, 1 P (E) ≤ Pub ≤ Pu (E). k u Similarly for Pdb :

1 P (E) ≤ Pdb ≤ Pd (E). k d

Example √ 3.35 Figure 3.1 illustrates the detection probabilities for a BSC derived from a BPSK system, with p = Q( 2REb ∕N0 ), for a Hamming (15, 11) code. The weight enumerator is in (3.18). For comparison, the probability of an undetected error for the uncoded √ system is shown, in which any error is undetected, so Pu,uncoded = 1 − (1 − puncoded )k , where puncoded = Q( 2Eb ∕N0 ). Note that the upper bound on Pu is not very tight, but the upper bound on Pd is tight — they are indistinguishable on the plot, since they differ by Pu (E), which is orders of magnitude smaller than Pd (E). The uncoded probability of undetected error is much greater than the coded Pu . ◽

3.7.2

Error Correction Performance

An error pattern is correctable if and only if it is a coset leader in the standard array for the code, so the probability of correcting an error is the probability that the error is a coset leader. Let 𝛼i denote

probdeth15.m probdet.m

110

Linear Block Codes 100 10−2

Probability

10−4 10−6

Pu(E)

10−8

Pu(E) upper bound

Pd(E) Pd(E) upper bound Pu(E) uncoded

10−10 10−12

0

2

4

6

Eb /N0 (dB)

8

10

Figure 3.1: Error detection performance for a (15, 11) Hamming code. the number of coset leaders of weight i. The numbers 𝛼0 , 𝛼1 , . . . , 𝛼n are called the coset leader weight distribution. Over a BSC with crossover probability p, the probability of j errors forming one of the coset leaders is 𝛼j pj (1 − p)n−j . The probability of a decoding error is thus the probability that the error is not one of the coset leaders n ∑ 𝛼j pj (1 − p)n−j . P(E) = 1 − j=0

This result applies to any linear code with a complete decoder. Example 3.36

For the standard array in Table 3.1, the coset leader weight distribution is 𝛼0 = 1

𝛼1 = 7

𝛼2 = 7 𝛼3 = 1. ◽

If p = 0.01, then P(E) = 0.0014.

Most hard-decision decoders are bounded-distance decoders, selecting the codeword 𝐜 which lies within a Hamming distance of ⌊(dmin − 1)∕2⌋ of the received vector 𝐫. An exact expression for j the probability of error for a bounded-distance decoder can be developed as follows. Let Pl be the probability that a received word 𝐫 is exactly Hamming distance l from a codeword of weight j. Lemma 3.37 [[483], p. 249] j

Pl =

l ( )( ) ∑ j n − j j−l+2r p (1 − p)n−j+l−2r . r l − r r=0

Proof Assume (without loss of generality) that the all-zero codeword was sent. Let 𝐜 be a codeword of weight j, where j ≠ 0. Let the coordinates of 𝐜 which are 1 be called the 1-coordinates and let the coordinates of 𝐜 which are 0 be called the 0-coordinates. There are thus j 1-coordinates and n − j 0-coordinates of 𝐜. Consider now the ways in which the received vector 𝐫 can be a Hamming distance l away from 𝐜. To differ in l bits, it must differ in an integer r number of 0-coordinates and l − r

3.7 Performance of Linear Codes

111

1-coordinates, where 0 ≤ r ≤ l. The number of ways that 𝐫 can differ from 𝐜 in r of the 0-coordinates ( ) n−j is r . The total probability of 𝐫 differing from 𝐜 in exactly r 0-coordinates is (

) n−j r p (1 − p)n−j−r . r

) ( ) ( j j = l−r . Since The number of ways that 𝐫 can differ from 𝐜 in l − r of the 1-coordinates is j−(l−r)) the all-zero codeword was transmitted, the l − r coordinates of 𝐫 must be 0 (there was no crossover in the channel) and the remaining j − (l − r) bits must be 1. The total probability of 𝐫 differing from 𝐜 in exactly l − r 1-coordinates is ( ) j pj−l+r (1 − p)l−r . l−r j

The probability Pl is obtained by multiplying the probabilities on the 0-coordinates and the 1-coordinates (they are independent events since the channel is memoryless) and summing over r: j

Pl =

l ( ) ∑ n−j r=0

=

r

pr (1 − p)n−j−r

(

) j pj−l+r (1 − p)l−r l−r

l ( )( ) ∑ j n − j j−l+2r p (1 − p)n+l−j−2r . r l − r r=0



The probability of error is now obtained as follows. Theorem 3.38 For a binary (n, k) code with weight distribution {Ai }, the probability of decoding error for a bounded distance decoder is P(E) =

n ∑

⌊(dmin −1)∕2⌋

Aj

j=dmin



j

Pl .

(3.21)

l=0

Proof Assume that the all-zero codeword was sent. For a particular codeword of weight j ≠ 0, the probability that the received vector 𝐫 falls in the decoding sphere of that codeword is ⌊(dmin −1)∕2⌋



j

Pl .

l=0

Then the result follows by adding up over all possible weights, scaled by the number of codewords of ◽ weight j, Aj . The probability of decoder failure for the bounded distance decoder is the probability that the received codeword does not fall into any of the decoding spheres, ⌊(dmin −1)∕2⌋ (

P(F) = 1 −

) n pj (1 − p)n−j − j j=0 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ∑

probability of falling in correct decoding sphere

P(E). ⏟⏟⏟ probability of falling in the incorrect decoding sphere

Exact expressions to compute Pb (E) require information relating the weight of the message bits and the weight of the corresponding codewords. This information is summarized in the number 𝛽j , which is the total weight of the message blocks associated with codewords of weight j.

112

Linear Block Codes Example 3.39 For the (7, 4) Hamming code, 𝛽3 = 12, 𝛽4 = 16, and 𝛽7 = 4. That is, the total weight of the messages producing codewords of weight 3 is 12; the total weight of messages producing codewords of weight 4 is 16. ◽

Modifying (3.21), we obtain Pb (E) =

⌊(dmin −1)∕2⌋ n ∑ 1 ∑ j 𝛽j Pl . k j=d l=0 min

(See hamcode74pe.m.) Unfortunately, while obtaining values for 𝛽j for small codes is straightforward computationally, appreciably large codes require theoretical expressions which are usually unavailable. The probability of decoder error can be easily bounded by the probability of any error patterns of weight greater than ⌊(dmin − 1)∕2⌋: ( ) n ∑ n P(E) ≤ pj (1 − p)n−j . j j=⌊(d +1)∕2⌋ min

An easy bound on probability of failure is the same as the bound on this probability of error. Bounds on the probability of bit error can be obtained as follows. A lower bound is obtained by assuming that a decoder error causes a single bit error out of the k message bits. An upper bound is obtained by assuming that all k message bits are incorrect when the block is incorrectly decoded. This leads to the bounds 1 P(E) ≤ Pb (E) ≤ P(E). k 3.7.3

Performance for Soft-Decision Decoding

While all of the decoding in this chapter has been for hard-input decoders, it is interesting to examine the potential performance for soft-decision decoding. Suppose the codewords of an (n, k, dmin ) code  are modulated to a vector 𝐬 using BPSK having energy Ec = REb per coded bit and transmitted through an AWGN with variance 𝜎 2 = N0 ∕2. The transmitted vector 𝐬 is a point in n-dimensional space. In Exercise 1.15, it is shown that the Euclidean distance between two BPSK modulated codewords is related to the Hamming distance between the codewords by √ dE = 2 Ec dH . Suppose that there are K codewords (on average) at a distance dmin from a codeword. By the union bound (1.27), the probability of a block decoding error is given by ) (√ ( ) dE,min 2Rdmin Eb . P(E) ≈ KQ = KQ 2𝜎 N0 Neglecting the multiplicity constant K, we see that we achieve essentially comparable performance compared to uncoded transmission when Eb Rdmin Eb for uncoded = for coded. N0 N0 The asymptotic coding gain is the factor by which the coded Eb ∕N0 can be decreased to obtain equivalent performance. (It is called asymptotic because it applies only as the SNR becomes large enough that the union bound can be regarded as a reasonable approximation.) In this case the asymptotic coding gain is Rdmin . Recall that Figure 1.19 illustrated the advantage of soft-input decoding compared with hard-input decoding.

3.8 Erasure Decoding

3.8

113

Erasure Decoding

An erasure is an error in which the error location is known, but the value of the error is not. Erasures can arise in several ways. In some receivers the received signal can be examined to see if it falls outside acceptable bounds. If it falls outside the bounds, it is declared as an erasure. (For example, for BPSK signaling, if the received signal is too close to the origin, an erasure might be declared.) Example 3.40 Another way that an erasure can occur in packet-based transmission is as follows. Suppose that a sequence of codewords 𝐜1 , 𝐜2 , . . . , 𝐜N are written into the rows of a matrix c10 c20 ⋮ cN0

c11 c21 ⋮ cN1

c12 · · · c1n−1 c22 · · · c2n−1 ⋮ ⋮ ⋮ cN2 · · · cNn−1

then the columns are read out, giving the data sequence [c10 , c20 , . . . , cN0 ], [c11 , c21 , . . . , cN1 ], [c12 , c22 , . . . , cN2 ], . . . , [c1n−1 , c2n−1 , . . . , cNn−1 ]. Suppose that these are now sent as a sequence of n data packets, each of length N, over a channel which is susceptible to packet loss, but where the loss of a packet is known at the receiver (such as the Internet using a protocol that does not guarantee delivery, such as UDP). At the receiver, the packets are written into a matrix in column order — leaving an empty column corresponding to lost packets — then read out in row order. Suppose in this scheme that one of the packets, say the third, is lost in transmission. Then the data in the receiver interleaver matrix would look like c10 c20 ⋮ cN0

c11 c21 ⋮ cN1

c12 · · · c1n−1 c22 · · · c2n−1 ⋮ ⋮ ⋮ cN2 · · · cNn−1

where the gray boxes indicate lost data. While a lost packet results in an entire column of lost data, it represents only one erased symbol from the de-interleaved codewords, a symbol whose location is known. ◽

Erasures can also sometimes be dealt with using concatenated coding techniques, where an inner code declares erasures at some symbol positions, which an outer code can then correct. BOX 3.2: The UDP Protocol [SD1] UDP — user datagram protocol — is one of the protocols in the TCP/IP protocol suite. The most common protocol, TCP, ensures packet delivery by acknowledging each packet successfully received, retransmitting packets which are garbled or lost in transmission. UDP, on the other hand, is an open-ended protocol which does not guarantee packet delivery. For a variety of reasons, it incurs lower delivery latency and as a result, it is of interest in near real-time communication applications. The application designer must deal with dropped packets using, for example, error correction techniques.

Consider the erasure capability for a code of distance dmin . A single erased symbol removed from a code (with no additional errors) leaves a code with a minimum distance at least dmin − 1. Thus, f erased symbols can be “filled” provided that f < dmin . For example, a Hamming code with dmin = 3 can correct up to 2 erasures.

114

Linear Block Codes Now suppose that there are both errors and erasures. For a code with dmin experiencing a single erasure, there are still n − 1 unerased coordinates and the codewords are separated by a distance of at least dmin − 1. More generally, if there are f erased symbols, then the distance among the remaining digits is at least dmin − f . Letting tf denote the random error decoding distance in the presence of f erasures, we can correct up to tf = ⌊(dmin − f − 1)∕2⌋ errors. If there are f erasures and e errors, they can be corrected provided that 2e + f < dmin .

(3.22)

Since correcting an error requires determination of both the error position and the error value, while filling an erasure requires determination only of the error value, essentially twice the number of erasures can be filled as errors corrected. 3.8.1

Binary Erasure Decoding

We consider now how to simultaneously fill f erasures and correct e errors in a binary code with a given decoding algorithm [[483], p. 229]. In this case, all that is necessary is to determine for each erasure whether the missing value should be a one or a zero. An erasure decoding algorithm for this case can be described as follows: 1. Place zeros in all erased coordinates and decode using the usual decoder for the code. Call the resulting codeword 𝐜0 . 2. Place ones in all erased coordinates and decode using the usual decoder for the code. Call the resulting codeword 𝐜1 . 3. Find which of 𝐜0 and 𝐜1 is closest to 𝐫. This is the output code. Let us examine why this decoder works. Suppose we have (2e + f ) < dmin (so that correct decoding is possible). In assigning 0 to the f erased coordinates, we thereby generated e0 errors, e0 ≤ f , so that the total number of errors to be corrected is (e0 + e). In assigning 1 to the f erased coordinates, we make e1 errors, e1 ≤ f , so that the total number of errors to be corrected is (e1 + e). Note that e0 + e1 = f , so that either e0 or e1 is less than or equal to f ∕2. Thus, either 2(e + e0 ) ≤ 2(e + f ∕2)

or

2(e + e1 ) ≤ 2(e + f ∕2),

and 2(e + f ∕2) < dmin , so that one of the two decodings must be correct. Erasure decoding for nonbinary codes depends on the particular code structure. For example, decoding of Reed–Solomon codes is discussed in Section 6.7.

3.9

Modifications to Linear Codes

We introduce some minor modifications to linear codes. These are illustrated for some particular examples in Figure 3.2. Definition 3.41 An (n, k, d) code is extended by adding an additional parity coordinate, producing an (n + 1, k, d + 1) code. ◽

Example 3.42 We demonstrate the operations by modifying for the (7, 4, 3) Hamming code is ⎡1 0 0 1 0 H = ⎢0 1 0 1 1 ⎢ ⎣0 0 1 0 1

a (7, 4, 3) Hamming code. A parity check matrix 1 1⎤ 1 0⎥ . ⎥ 1 1⎦

3.9 Modifications to Linear Codes

115

The parity check matrix for an extended Hamming code, with an extra check bit that checks the parity of all the bits, can be written as ⎡1 0 0 1 0 1 1 0⎤ ⎢0 1 0 1 1 1 0 0⎥ H=⎢ . 0 0 1 0 1 1 1 0⎥ ⎢ ⎥ ⎣1 1 1 1 1 1 1 1⎦ The last row is the overall check bit row. By linear operations, this can be put in equivalent systematic form ⎡1 ⎢0 H=⎢ 0 ⎢ ⎣0

0 1 0 0

0 0 1 0

0 0 0 1

1 0 1 1

1 1 1 0

0 1 1 1

1⎤ 1⎥ 0⎥ ⎥ 1⎦

⎡1 ⎢1 G=⎢ 0 ⎢ ⎣1

0 1 1 1

1 1 1 0

1 0 1 1

1 0 0 0

0 1 0 0

0 0 1 0

0⎤ 0⎥ . 0⎥ ⎥ 1⎦

with the corresponding generator



See Figure 3.2.

Definition 3.43 A code is punctured by deleting one of its parity symbols. An (n, k) code becomes an (n − 1, k) code. ◽

Hamming (2m – 1, 2m – 1 – m, 3) G= Expurgate by deleting some codewords (e.g., odd-weight codewords

1 0 1 1

1 1 1 0

0 1 1 1

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

1 0 0

0 1 0

0 0 1

1 1 0

0 1 1

1 1 1

1 0 1

H=

Augment by adding new codewords (e.g., odd-weight codewords) Even weight subcode

(2m – 1, 2m – 2 – m, 4) G=

1 1 0

0 1 1

1 1 1

1 0 1

1 0 0

0 1 0

0 0 1

H=

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

1 0 1 1

1 1 1 0

0 1 1 1

Lengthen the code by addinge message coordinates

Shorten the code by deleting message coordinates

Extend by adding an overall parity check

Puncture by deleting a parity check Extended Hamming

(2m, 2m – 1 – m, 4) G=

1 1 0 1

0 1 1 1

1 1 1 0

1 0 1 1

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

H=

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

1 0 1 1

1 1 1 0

0 1 1 1

1 1 0 1

Figure 3.2: Demonstrating modifications on a Hamming code.

116

Linear Block Codes Puncturing an extended code can return it to the original code (if the extended symbols are the ones punctured). Puncturing can reduce the weight of each codeword by its weight in the punctured positions. The minimum distance of a code is reduced by puncturing if the minimum weight codeword is punctured in a nonzero position. Puncturing an (n, k, d) code p times can result in a code with minimum distance as small as d − p. Definition 3.44 A code is expurgated by deleting some of its codewords. It is possible to expurgate a linear code in such a way that it remains a linear code. The minimum distance of the code may increase. ◽ Example 3.45 If all the odd-weight codewords are deleted from the (7, 4) Hamming code, an even-weight subcode is obtained. ◽

Definition 3.46 A code is augmented by adding new codewords. It may be that the new code is not linear. The minimum distance of the code may decrease. ◽ Definition 3.47 A code is shortened by deleting a message symbol. This means that a row is removed from the generator matrix (corresponding to that message symbol) and a column is removed from the generator matrix (corresponding to the encoded message symbol). An (n, k) code becomes an (n − 1, k − 1) code. Shortened cyclic codes are discussed in more detail in Section 4.12. ◽ Definition 3.48 A code is lengthened by adding a message symbol. This means that a row is added to the generator matrix (for the message symbol) and a column is added to represent the coded message symbol. An (n, k) code becomes an (n + 1, k + 1) code. ◽

3.10 Best-Known Linear Block Codes Tables of the best-known linear block codes are available. An early version appears in [292]. More recent tables can be found at [50].

3.11 Exercises 3.1 3.2 3.3

Find, by trial and error, a set of four binary codewords of length three such that each word is at least a distance of 2 from every other word. Find a set of 16 binary words of length 7 such that each word is at least a distance of 3 from every other word. Hint: Hamming code. Perhaps the simplest of all codes is the binary parity check code, a (n, n − 1) code, where k = n − 1. Given a message vector 𝐦 = (m0 , m1 , . . . , mk−1 ), the codeword is 𝐜 = (m0 , m1 , . . . , mk−1 , b), ∑ where b = k−1 j=0 mj (arithmetic in GF(2)) is the parity bit. Such a code is called an even parity code, since all codewords have even parity — an even number of 1 bits. (a) (b) (c) (d)

Determine the minimum distance for this code. How many errors can this code correct? How many errors can this code detect? Determine a generator matrix for this code. Determine a parity check matrix for this code.

3.11 Exercises

117

(e) Suppose that bit errors occur independently with probability pc . The probability that a parity check is satisfied is the probability that an even number of bit errors occur in the received codeword. Verify the following expression for this probability: n ( ) ∑ 1 + (1 − 2pc )n n i pc (1 − pc )n−i = . 2 i i=0,ieven

3.4 3.5

For the (n, 1) repetition code, determine a parity check matrix. [483] Let p = 0.1 be the probability that any bit in a received vector is incorrect. Compute the probability that the received vector contains undetected errors given the following encoding schemes: (a) (b) (c) (d)

3.6

No code, word length n = 8. Even parity (see Exercise 3.3), word length n = 4. Odd parity, word length n = 9. (Is this a linear code?) Even parity, word length = n.

[ ] [272] Let 1 be an (n1 , k, d1 ) binary linear systematic code with generator [ ]G1 = P1 Ik . Let 2 be an (n2 , k, d2 ) binary linear systematic code with generator G2 = P2 Ik . Form the parity check matrix for an (n1 + n2 , k) code as ⎡ PT1 ⎤ ⎥ ⎢ H = ⎢In1 +n2 −k Ik ⎥ . ⎢ PT2 ⎥⎦ ⎣

3.7

Show that this code has minimum distance at least d1 + d2 . The generator matrix for a code over GF(2) is given by ⎡1 1 1 0 1 0⎤ G = ⎢1 0 0 1 1 1⎥ . ⎢ ⎥ ⎣0 0 1 0 1 1⎦

3.8

Find a generator matrix and parity check matrix for an equivalent systematic code. The generator and parity check matrix for a binary code  are given by ⎡1 0 1 0 1 1⎤ G = ⎢0 1 1 1 0 1⎥ ⎢ ⎥ ⎣0 1 1 0 1 0⎦

⎡1 1 0 1 1 0⎤ H = ⎢1 0 1 0 1 1⎥ . ⎢ ⎥ ⎣0 1 0 0 1 1⎦

(3.23)

This code is small enough that it can be used to demonstrate several concepts from throughout the chapter. (a) Verify that H is a parity check matrix for this generator. (b) Draw a logic diagram schematic for an implementation of an encoder for the nonsystematic generator G using ”and” and ”xor” gates. (c) Draw a logic diagram schematic for an implementation of a circuit that computes the syndrome. (d) List the vectors in the orthogonal complement of the code (that is, the vectors in the dual code  ⟂ ). (e) Form the standard array for code .

118

Linear Block Codes (f) Form the syndrome decoding table for . (g) How many codewords are there of weight 0, 1, . . . , 6 in ? Determine the weight enumerator A(z). (h) Using the generator matrix in (3.23), find the codeword with 𝐦 = [1, 1, 0] as message bits. (i) Decode the received word 𝐫 = [1, 1, 1, 0, 0, 1] using the generator of (3.23). (j) Determine the weight enumerator for the dual code. (k) Write down an explicit expression for Pu (E) for this code. Evaluate this when p = 0.01. (l) Write down an explicit expression for Pd (E) for this code. Evaluate this when p = 0.01. (m) Write down an explicit expression for P(E) for this code. Evaluate this when p = 0.01. (n) Write down an explicit expression for P(E) for this code, assuming a bounded distance decoder is used. Evaluate this when p = 0.01. (o) Write down an explicit expression for P(F) for this code. Evaluate this when p = 0.01. (p) Determine the generator G for an extended code, in systematic form. (q) Determine the generator for a code which has expurgated all codewords of odd weight. Then express it in systematic form. 3.9

[271] Let a binary systematic (8, 4) code have parity check equations c0 = m1 + m2 + m3 c1 = m0 + m1 + m2 c2 = m0 + m1 + m3 c3 = m0 + m2 + m3 . (a) Determine the generator matrix G for this code in systematic form. Also determine the parity check matrix H. (b) Using Theorem 3.13, show that the minimum distance of this code is 4. (c) Determine A(z) for this code. Determine B(z). (d) Show that this is a self-dual code.

3.10 Show that a self-dual code has a generator matrix G which satisfies GGT = 0. 3.11 Given a code with a parity check matrix H, show that the coset with syndrome 𝐬 contains a vector of weight 𝑤 if and only if some linear combination of 𝑤 columns of H equals 𝐬. 3.12 Show that all of the nonzero codewords of the (2m − 1, m) simplex code have weight 2m−1 . Hint: Start with m = 2 and work by induction. 3.13 Show that (3.13) follows from (3.12). 3.14 Show that (3.16) follows from (3.15) using the MacWilliams identity. 3.15 Let f (u1 , u2 ) = u1 u2 , for ui ∈ 𝔽2 . Determine the Hadamard transform f̂ of f . 3.16 The weight enumerator A(z) of (3.11) for a code  is sometimes written as WA (x, y) =

n ∑

Ai xn−i yi .

i=0

In this problem, consider binary codes, q = 2. (a) Show that A(z) = WA (x, y)|x=1,y=z . ∑ (b) Let WB (x, y) = ni=0 Bi xn−i yi be the weight enumerator for the code dual to . Show that the MacWilliams identity can be written as WB (x, y) =

1 W (x + y, x − y) qk A

3.11 Exercises

119

or WA (x, y) =

1 qn−k

WB (x + y, x − y).

(3.24)

(c) In the following subproblems, assume a binary code. Let x = 1 in (3.24). We can write n ∑

Ai yi =

i=0

n 1 ∑

2n−k

Bi (1 + y)n−i (1 − y)i .

(3.25)

i=0

∑ A Set y = 1 in this and show that ni=0 ki = 1. Justify this result. 2 (d) Now differentiate (3.25) with respect to y and set y = 1 to show that n ∑ iAi

2k

i=1

=

1 (n − B1 ). 2

If B1 = 0, this gives the average weight. (e) Differentiate (3.25) 𝜈 times with respect to y and set y = 1 to show that n ( ) ∑ i Ai i=𝜈

{ Hint: Define

(x)n+

=

𝜈 ( ) 1 ∑ i n−i Bi . = (−1) 𝜈 𝜈−i 2 i=0 2k

𝜈

n n − k + 1 starting at position i, all 2l−2 of the bursts are detectable except those of the form e(x) = xi a(x)g(x) for some a(x) = a0 + a1 x + · · · + al−n+k−1 xl−n+k−1 with a0 = al−n+k−1 = 1. The number of undetectable bursts is 2l−n+k−2 , so the fraction of undetectable bursts is 2−n+k .

Example 4.36 Let g(x) = x16 + x15 + x2 + 1. This can be factored as g(x) = (1 + x)(1 + x + x15 ), so the CRC is capable of detecting any odd number of bit errors. It can be shown that the smallest integer m such that g(x) divides 1 + xm is m = 32767, so by Exercise 4.37, the CRC is able to detect any pattern of two errors — a double-error pattern — provided that the code block length n ≤ 32767. All burst errors of length 16 or less are detectable. Bursts of length 17 are detectable with probability 0.99997. Bursts of length ≥ 18 are detectable with probability 0.99998. ◽

Table 4.7 [489, p. 123, 364], lists commonly used generator polynomials for CRC codes of various lengths. 4.13.1

Byte-Oriented Encoding and Decoding Algorithms

The syndrome computation algorithms described above are well-adapted to bit-oriented hardware implementations. However, CRCs are frequently used to check the integrity of files or data packets on computer systems which are intrinsically byte-oriented. An algorithm is presented here which produces the same result as a bit-oriented algorithm, but which operates on a byte at a time. The algorithm is faster because it deals with larger pieces of data and also because it makes use of parity information which is computed in advance and stored. It thus has higher storage requirements than the bitwise encoding algorithm but lower operational complexity; for a degree 16 polynomial, 256 2-byte integers must be stored. Consider a block of N bytes of data, as in a file or a data packet. We think of the first byte of data as corresponding to higher powers of x in its polynomial representation, since polynomial division

158

Cyclic Codes, Rings, and Polynomials Table 4.7: CRC Generators CRC Code CRC-4 CRC-7 CRC-8 CRC-12 CRC-ANSI CRC-CCITT CRC-SDLC CRC-24 CRC-32a CRC-32b

Generator Polynomial g(x) = x4 + x3 + x2 + x + 1 g(x) = x7 + x6 + x4 + 1 g(x) = x8 + x7 + x6 + x4 + x2 + 1 g(x) = x12 + x11 + x3 + x2 + x + 1 g(x) = x16 + x15 + x2 + 1 g(x) = x16 + x12 + x5 + 1 g(x) = x16 + x15 + x13 + x7 + x4 + x2 + x + 1 g(x) = x24 + x23 + x14 + x12 + x8 + 1 g(x) = x32 + x30 + x22 + x15 + x12 + x11 + x7 + x6 + x5 + x g(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 +x7 + x5 + x4 + x2 + x + 1

requires dealing first with highest powers of x. This convention allows the file to be processed in storage order. The data are stored in bytes, as in d0 , d1 , . . . , dN−1 , where di represents an 8-bit quantity. For a byte of data di , let di,7 , di,6 , . . . , di,0 denote the bits, where di,0 is the least-significant bit (LSB) and di,7 is the most significant bit (MSB). The byte di , the “next byte” to be processed, has a corresponding polynomial representation bi+1 (x) = di,7 x7 + di,6 x6 + · · · + di,1 x + di,0 . The algorithm described below reads in a byte of data and computes the CRC parity check information for all of the data up to that byte. It is described in terms of a CRC polynomial g(x) of degree 16, but generalization to other degrees is straightforward. For explicitness of examples, the generator polynomial g(x) = x16 + x15 + x2 + 1 (CRC-ANSI) is used throughout. Let m[i] (x) denote the message polynomial formed from the 8i bits of the first i data bytes {d0 , d1 , . . . , di−1 }, m[i] (x) = d0,7 x8i−1 + d0,6 x8i−2 + · · · + di−1,1 x + di−1,0 , and let p[i] (x) denote the corresponding parity polynomial of degree ≤ 15, p[i] (x) = p[i] x15 + p[i] x14 + · · · + p[i] x8 + p[i] x7 + · · · + p[i] x + p[i] . 7 15 14 8 1 0 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ upper byte. crc1

lower byte. crc0

(For the future, note that this is divided into 2 bytes, an upper most significant byte and a lower least significant byte.) By the operation of cyclic encoding, p[i] (x) = Rg(x) [x16 m[i] (x)], that is, x16 m[i] (x) = q(x)g(x) + p[i] (x)

(4.12)

for some quotient polynomial q(x). Let us now augment the message by one more byte. This is done by shifting the current message polynomial eight positions (bits) and inserting the new byte in the empty bit positions. We can write m[i+1] (x) = x8 m[i] (x) + bi+1 (x) , ⏟⏞⏟⏞⏟ ⏟⏟⏟ Shift 8 positions

add new byte

4.13 Binary CRC Codes

159

where bi+1 (x) represents the new data. The new parity polynomial is computed by p[i+1] (x) = Rg(x) [x16 m[i+1] (x)] = Rg(x) [x16 (x8 m[i] (x) + bi+1 (x))].

(4.13)

Using (4.12), we can write (4.13) as p[i+1] (x) = Rg(x) [x8 g(x)q(x) + x8 p[i] (x) + x16 bi+1 (x)]. This can be expanded as p[i+1] (x) = Rg(x) [x8 g(x)q(x)] + Rg(x) [x8 p[i] (x) + x16 bi+1 (x)], or p[i+1] (x) = Rg(x) [x8 p[i] (x) + x16 bi+1 (x)] since g(x) evenly divides x8 g(x)q(x). The argument can be expressed in expanded form as x23 + p[i] x22 + · · · + p[i] x9 + p[i] x8 x8 p[i] (x) + x16 bi+1 (x) = p[i] 15 14 1 0 + di,7 x23 + di,6 x22 + · · · + di,1 x17 + di,0 x16 = (di,7 + p[i] )x23 + (di,6 + p[i] )x22 + · · · + (di,0 + p[i] )x16 15 14 8 + p[i] x15 + p[i] x14 + · · · + p[i] x8 . 7 6 0 for j = 0, 1, . . . , 7. The 8 bits of t are the 8 bits in the new byte plus the eight MSBs Let tj = di,j + p[i] j+8 [i] of p (x). Then x15 + p[i] x14 + · · · + p[i] x8 . x8 p[i] (x) + x16 bi+1 (x) = t7 x23 + t6 x22 + · · · + t0 x16 + p[i] 7 6 0 The updated parity is thus x15 + p[i] x14 + · · · + p[i] x8 ] p[i+1] (x) = Rg(x) [t7 x23 + t6 x22 + · · · + t0 x16 + p[i] 7 6 0 = Rg(x) [t7 x23 + t6 x22 + · · · + t0 x16 ] + Rg(x) [p[i] x15 + p[i] x14 + · · · + p[i] x8 ] 7 6 0 = Rg(x) [t7 x23 + t6 x22 + · · · + t0 x16 ] + p[i] x15 + p[i] x14 + · · · + p[i] x8 , 7 6 0 where the last equality follows since the degree of the argument of the second Rg(x) is less than the degree of g(x). There are 28 = 256 possible remainders of the form Rg(x) [t7 x23 + t6 x22 + · · · + t0 x16 ].

(4.14)

For each 8-bit combination t = (t7 , t6 , . . . , t0 ), the remainder in (4.14) can be computed and stored in advance. For example, when t = 1 (i.e., t0 = 1 and other ti are 0) we find Rg(x) [x16 ] = x15 + x2 + 1, which has the representation in bits [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1], or in hex, 8005. Table 4.8 shows the remainder values for all 256 possible values of t, where the hex number R(t) represents the bits of the syndrome. Let ̃t(x) = t7 x23 + t6 x22 + · · · + t0 x16 and let R(t) = Rg(x) [̃t(x)] (i.e., the polynomial represented by the data in Table 4.8). The encoding update rule is summarized as x15 + p[i] x14 + · · · + p[i] x8 + p[i+1] (x) = p[i] 7 6 0 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ upper byte. New crc1

R(t) ⏟⏟⏟

.

lower byte. New crc0

160

Cyclic Codes, Rings, and Polynomials Table 4.8: Lookup Table for CRC-ANSI t 0 8 10 18 20 28 30 38 40 48 50 58 60 68 70 78 80 88 90 98 a0 a8 b0 b8 c0 c8 d0 d8 e0 e8 f0 f8

R(t) 0000 8033 8063 0050 80c3 00f0 00a0 8093 8183 01b0 01e0 81d3 0140 8173 8123 0110 8303 0330 0360 8353 03c0 83f3 83a3 0390 0280 82b3 82e3 02d0 8243 0270 0220 8213

t 1 9 11 19 21 29 31 39 41 49 51 59 61 69 71 79 81 89 91 99 a1 a9 b1 b9 c1 c9 d1 d9 e1 e9 f1 f9

R(t) 8005 0036 0066 8055 00c6 80f5 80a5 0096 0186 81b5 81e5 01d6 8145 0176 0126 8115 0306 8335 8365 0356 83c5 03f6 03a6 8395 8285 02b6 02e6 82d5 0246 8275 8225 0216

t 2 a 12 1a 22 2a 32 3a 42 4a 52 5a 62 6a 72 7a 82 8a 92 9a a2 aa b2 ba c2 ca d2 da e2 ea f2 fa

R(t) 800f 003c 006c 805f 00cc 80ff 80af 009c 018c 81bf 81ef 01dc 814f 017c 012c 811f 030c 833f 836f 035c 83cf 03fc 03ac 839f 828f 02bc 02ec 82df 024c 827f 822f 021c

t 3 b 13 1b 23 2b 33 3b 43 4b 53 5b 63 6b 73 7b 83 8b 93 9b a3 ab b3 bb c3 cb d3 db e3 eb f3 fb

R(t) 000a 8039 8069 005a 80c9 00fa 00aa 8099 8189 01ba 01ea 81d9 014a 8179 8129 011a 8309 033a 036a 8359 03ca 83f9 83a9 039a 028a 82b9 82e9 02da 8249 027a 022a 8219

t 4 c 14 1c 24 2c 34 3c 44 4c 54 5c 64 6c 74 7c 84 8c 94 9c a4 ac b4 bc c4 cc d4 dc e4 ec f4 fc

R(t) 801b 0028 0078 804b 00d8 80eb 80bb 0088 0198 81ab 81fb 01c8 815b 0168 0138 810b 0318 832b 837b 0348 83db 03e8 03b8 838b 829b 02a8 02f8 82cb 0258 826b 823b 0208

t 5 d 15 1d 25 2d 35 3d 45 4d 55 5d 65 6d 75 7d 85 8d 95 9d a5 ad b5 bd c5 cd d5 dd e5 ed f5 fd

R(t) 001e 802d 807d 004e 80dd 00ee 00be 808d 819d 01ae 01fe 81cd 015e 816d 813d 010e 831d 032e 037e 834d 03de 83ed 83bd 038e 029e 82ad 82fd 02ce 825d 026e 023e 820d

t 6 e 16 1e 26 2e 36 3e 46 4e 56 5e 66 6e 76 7e 86 8e 96 9e a6 ae b6 be c6 ce d6 de e6 ee f6 fe

R(t) 0014 8027 8077 0044 80d7 00e4 00b4 8087 8197 01a4 01f4 81c7 0154 8167 8137 0104 8317 0324 0374 8347 03d4 83e7 83b7 0384 0294 82a7 82f7 02c4 8257 0264 0234 8207

t 7 f 17 1f 27 2f 37 3f 47 4f 57 5f 67 6f 77 7f 87 8f 97 9f a7 af b7 bf c7 cf d7 df e7 ef f7 ff

R(t) 8011 0022 0072 8041 00d2 80e1 80b1 0082 0192 81a1 81f1 01c2 8151 0162 0132 8101 0312 8321 8371 0342 83d1 03e2 03b2 8381 8291 02a2 02f2 82c1 0252 8261 8231 0202

Values for t and R(t) are expressed in hex.

In words: The lower (least significant) byte of the updated parity is from R(t), looked up in the table. The upper (most significant byte) of the updated parity is obtained from the lower byte of the previous parity, shifted over 8 bits. The algorithm described above in terms of polynomials can be efficiently implemented in terms of byte-oriented arithmetic on a computer. The parity check information is represented in 2 bytes, crc1 and crc0, with crc1 representing the high-order parity byte. Together, [crc1,crc0] forms the 2-byte (16 bit) parity. Also, let R(t) denote the 16-bit parity corresponding to the t, as in Table 4.8. The operation ⊕ indicates bitwise modulo-2 addition (i.e., exclusive or). The fast CRC algorithm is summarized in Algorithm 4.1. Example 4.37 A file consists of 2 bytes of data, d0 = 109 and d1 = 39, or in hexadecimal notation, d0 = 6D16 and d1 = 2716 . This corresponds to the bits 0110 1101 0010 0111, ⏟⏞⏞⏞⏟⏞⏞⏞⏟ ⏟⏞⏞⏞⏟⏞⏞⏞⏟ d0

d1

Appendix 4.A Linear Feedback Shift Registers

161

Algorithm 4.1 Fast CRC encoding for a stream of bytes Input: A sequence of bytes d0 , d1 , … , dN − 1. 1 Initialization: Clear the parity information: Set [crc1,crc0] = [0,0]. 2 For i = 0 to N − 1: 3 Compute t = di ⊕ crc1 4 [crc1,crc0] = [crc0,0] ⊕R(t) 5 End 6 Output: Return the 16-bit parity [crc1,crc0].

with the LSB on the right, or, equivalently, the polynomial x14 + x13 + x11 + x10 + x8 + x5 + x2 + x + 1. (It is the same data as in Example 4.35.) The steps of the algorithm are as follows: 1 crc1,crc0 = [0,0] 2 i = 0: 3 t = d0 ⊕ crc1 = 6D16 + 0 = 6D16 4 [crc1,crc0] = [crc0,0] ⊕ R(t) = [0, 0]⊕816D16 2 i = 1: 3 t = d1 ⊕ crc1 = 2716 ⊕ 8116 = A616 4 [crc1,crc0] = [crc0,0] ⊕ R(t) = [6D16 , 0]⊕03D416 = 6ED416 6 Return 6ED4 . 16 The return value corresponds to the bits 0110 1110 1101 0100 which has polynomial representation p(x) = x14 + x13 + x11 + x10 + x9 + x7 + x6 + x4 + x2 . This is the same parity as obtained in Example 4.35.

4.13.2



CRC Protecting Data Files or Data Packets

When protecting data files or data packets using CRC codes, the CRC codeword length is selected in bytes. As suggested in Example 4.36, for the CRC-16 code the number of bits in the codeword should be less than 32767, so the number of bytes in the codeword should be less than 4095. That is, the number of message bytes should be less than 4093. Let K denote the number of message bytes and let N denote the number of code bytes, for example, N = K + 2. The encoded file is, naturally, longer than the unencoded file, since parity bytes is included. In encoding a file, the file is divided into blocks of length K. Each block of data is written to an encoded file, followed by the parity bytes. At the end of the file, if the number of bytes available is less than K, a shorter block is written out, followed by its parity bytes. In decoding (or checking) a file, blocks of N bytes are read in and the parity for the block is computed. If the parity is not zero, one or more error has been detected in that block. At the end of the file, if the block size is shorter, the appropriate block length read in is used.

Appendix 4.A

Linear Feedback Shift Registers

Closely related to polynomial division is the linear feedback shift register (LFSR). This is simply a divider with no input — the output is computed based on the initial condition of its storage elements.

162

Cyclic Codes, Rings, and Polynomials With proper feedback connections, the LFSR can be used to produce a sequence with many properties of random noise sequences (for example, the correlation function approximates a 𝛿 function). These pseudonoise sequences are widely used in spread spectrum communication and as synchronization sequences in common modem protocols. The LFSR can also be used to provide an important part of the representation of Galois fields, which are fundamental to many error correction codes (see Chapter 5). The LFSR also reappears in the context of decoding algorithms for BCH and Reed–Solomon codes, where an important problem is to determine a shortest LFSR and its initial condition which could produce a given output sequence (see Chapter 6). Appendix 4.A.1

Basic Concepts

A binary LFSR circuit is built using a polynomial division circuit with no input. Eliminating the input to the division circuit of Figure 4.6, we obtain the LFSR shown in Figure 4.25. Since there is no input, the output generated is due to the initial state of the registers. Since there are only a finite number of possible states for this digital device, the circuit must eventually return to a previous state. The number of steps before a state reappears is called the period of the sequence generated by the circuit. A binary LFSR with p storage elements has 2p possible states. Since the all-zero state never changes, it is removed from consideration, so the longest possible period is 2p − 1. a

D g0

b

+

D

g1

+

+

g2

gp–2

+

D

{ym}

d

D

gp–1

gp = 1

Figure 4.25: Linear feedback shift register. Example 4.38 Figure 4.26 illustrates the LFSR with connection polynomial g(x) = 1 + x + x2 + x4 . Table 4.9 shows the sequence of states and the output of the LFSR when it is loaded with the initial condition (1, 0, 0, 0). The sequence of states repeats after seven steps, so the output sequence is periodic with period 7. Table 4.10 shows the sequence of states for the same connection polynomial when the LFSR is loaded with the initial condition (1, 1, 0, 0), which again repeats after seven steps. Of the 15 possible nonzero states of the LFSR, these two sequences exhaust all but one of the possible states. The sequence for the last remaining state, corresponding to an initial condition (1, 0, 1, 1), is shown in Table 4.11; this repeats after only one step.

D

+

D

+

D

D

Figure 4.26: Linear feedback shift register with g(x) = 1 + x + x2 + x4 . ◽ Example 4.39 Figure 4.27 illustrates the LFSR with connection polynomial g(x) = 1 + x + x4 . Table 4.12 shows the sequence of states and the output of the LFSR when it is loaded with the initial condition (1, 0, 0, 0). In this case, the shift register sequences through 15 states before it repeats. ◽

Definition 4.40 A sequence generated by a connection polynomial g(x) of degree n is said to be a ◽ maximal length sequence if the period of the sequence is 2n − 1.

Appendix 4.A Linear Feedback Shift Registers

163

Table 4.9: LFSR Example with g(x) = 1 + x + x2 + x4 and Initial State 1 Count 0 1 2 3 4 5 6 7

State 1 0 0 0 1 0 1 1

0 1 0 0 1 1 1 0

0 0 1 0 1 1 0 0

Output 0 0 0 1 0 1 1 0

0 0 0 1 0 1 1 0

Table 4.10: LFSR Example with g(x) = 1 + x + x2 + x4 ] and Initial State 1 + x Count 0 1 2 3 4 5 6 7

State 1 0 0 1 1 1 0 1

1 1 0 1 0 0 1 1

0 1 1 1 0 1 0 0

Output 0 0 1 1 1 0 1 0

0 0 1 1 1 0 1 0

Table 4.11: LFSR Example with g(x) = 1 + x + x2 + x4 ] and Initial State 1 + x2 + x3 Count 0 1

State 1 1

0 0

1 1

1 1

Output 1 1

Thus, the output sequence of Example 4.39 is a maximal-length sequence, while the output sequences of Example 4.38 are not. A connection polynomial which produces a maximal-length sequence is a primitive polynomial. A program to exhaustively search for primitive polynomials modulo p for arbitrary (small) p is primfind.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

α10 α11 α12 α13 α14 α15 α16

primitive.txt

The sequence of outputs of the LFSR satisfies the equation ym =

p−1 ∑

00000

2

3

4

5

6

7

8

α α α α α α α α α

gj ym−p+j .

9

+

(4.15)

01010

00000

j=0

This may be seen as follows. Denote the output sequence of the LFSR in Figure 4.25 by {ym }. For immediate convenience, assume that the sequence is infinite { . . . , y−2 , y−1 , y0 , y1 , y2 , . . . } and represent this sequence as a formal power series y(x) =

∞ ∑

yi xi .

i=−∞

Consider the output at point “a” in Figure 4.25. Because of the delay x, at point “a” the signal is g0 xy(x),

10

11

12

13

14

15

α α α α α α α

16

primfind.c

164

Cyclic Codes, Rings, and Polynomials

+

D

D

D

D

Figure 4.27: Linear feedback shift register with g(x) = 1 + x + x4 . Table 4.12: LFSR example with g(x) = 1 + x + x4 ] and Initial State 1 Count 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

State 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1

0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0

0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0

0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0

Output 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0

where the factor x represents the delay through the memory element. At point ”b,” the signal is g0 x2 y(x) + g1 xy(x). Continuing likewise through the system, at the output point ”d” the signal is (g0 xp + g1 xp−1 + · · · + gp−1 x)y(x), which is the same as the output signal: (g0 xp + g1 xp−1 + · · · + gp−1 x)y(x) = y(x).

(4.16)

Equation (4.16) can be true only if coefficients of corresponding powers of x match. This produces the relationship p−1 ∑ gj yi−p+j . (4.17) yi = j=0

Letting g∗j = gp−j , (4.17) can be written in the somewhat more familiar form as a convolution, yi =

p ∑

g∗j yi−j .

(4.18)

g∗j yi−j = 0

(4.19)

j=1

Equation (4.18) can also be rewritten as p ∑ j=0

with the stipulation that g∗0 = 1.

Appendix 4.A Linear Feedback Shift Registers

165

The polynomial g∗ (x) with coefficients g∗j = gp−j is sometimes referred to as the reciprocal polynomial. That is, g∗ (x) has its coefficients in the reverse order from g(x). (The term “reciprocal” does not mean that h(x) is a multiplicative inverse of g(x); it is just a conventional name.) The reciprocal polynomial of g(x) is denoted as g∗ (x). If g(x) is a polynomial of degree p with nonzero constant term (i.e., g0 = 1 and gp = 1), then the reciprocal polynomial can be obtained by g∗ (x) = xp g(1∕x). It is clear in this case that the reciprocal of the reciprocal is the same as the original polynomial. However, if g0 = 0, then the degree of xp g(1∕x) is less than the degree of g(x) and this latter statement is not true. With the understanding that the output sequence is periodic with period 2p − 1, so that y−1 = y2p −2 , (4.19) is true for all i ∈ ℤ. Because the sum in (4.19) is equal to 0 for all i, the polynomial g∗ (x) is said to be an annihilator of y(x). Example 4.41 For the coefficient polynomial g(x) = 1 + x + x2 + x4 , the reciprocal polynomial is g∗ (x) = 1 + x2 + x3 + x4 and the LFSR relationship is yi = yi−2 + yi−3 + yi−4

for all i ∈ ℤ.

(4.20)

For the output sequence of Table 4.9 {0, 0, 0, 1, 0, 1, 1, 0, . . . }, it may be readily verified that (4.20) is satisfied. ◽

The LFSR circuit diagram is sometimes expressed in terms of the reciprocal polynomials, as shown in Figure 4.28. It is important to be careful of the conventions used. a

D g*p

+ g*p–1

b

D

+

D

g2*

+ g1*

d

{ym}

D g*0 = 1

Figure 4.28: Linear feedback shift register, reciprocal polynomial convention. Appendix 4.A.2

Connection With Polynomial Division

The output sequence produced by an LFSR has a connection with polynomial long division. To illustrate this, let us take g(x) = 1 + x + x2 + x4 , as in Example 4.38. The reciprocal polynomial is g∗ (x) = 1 + x2 + x3 + x4 . Let the dividend polynomial be d(x) = x3 . (The relationship between the sequence and the dividend are explored in Exercise 4.43.) The power series obtained by dividing d(x) by g∗ (x), with g∗ (x) written in order of increasing degree, is obtained by formal long division:

166

Cyclic Codes, Rings, and Polynomials The quotient polynomial corresponds to the sequence {0, 0, 0, 1, 0, 1, 1, . . . }, the same as the output sequence shown in Table 4.9. Let y0 , y1 , . . . be an infinite sequence produced by an LFSR, which we represent with y(x) = ∑ n y0 + y1 x + y2 x2 + · · · = ∞ n=0 yn x . Furthermore, represent the initial state of the shift register as y−1 , y−2 , . . . , y−p . Using the recurrence relation (4.18) we have y(x) =

∞ p ∑ ∑

g∗j yn−j xn

n=0 j=1

=

p ∑

g∗j xj

=

p ∑

g∗j xj

j=1

[ −j

∞ ∑

yn−j xn−j

n=0 −1

(y−j x + · · · + y−1 x ) +

j=1

=

p ∑

∞ ∑

] yn x

n

n=0

g∗j xj [(y−j x−j + · · · + y−1 x−1 ) + y(x)]

j=1

∑p

so that y(x) = =

g∗ xj (y−j x−j + · · · + y−1 x−1 ) j=1 j ∑p 1 − j=1 g∗j xj ∑p ∗ g (y + y−j+1 x + · · · + y−1 xj−1 ) j=1 j −j . g∗ (x)

(4.21)

Example 4.42 Returning to Example 4.38, with the periodic sequence y−1 = 1, y−2 = 1, y−3 = 0, and y−4 = 1. From (4.21), we find x3 x3 , y(x) = ∗ = 2 g (x) 1 + x + x3 + x4 ◽

as before.

Theorem 4.43 Let y(x) be produced by a LFSR with connection polynomial g(x) of degree p. If y(x) is periodic with period N, then g∗ (x) | (xN − 1)d(x), where d(x) is a polynomial of degree < p. Proof By the results above, y(x) = gd(x) ∗ (x) for a polynomial d(x) with deg(d(x)) < p. If y(x) is periodic, then y(x) = (y0 + y1 x + · · · + yN−1 xN−1 ) + xN (y0 + y1 x + · · · + yN−1 xN−1 ) + x2N (y0 + y1 x + · · · + yN−1 xN−1 ) + · · · = (y0 + y1 x + · · · + yN1 xN−1 )(1 + xN + x2N + · · · ) =− So

(y0 + y1 x + · · · + yN−1 xN−1 ) . xN − 1 (y0 + y1 x + · · · + yN−1 xN−1 ) d(x) = − g∗ (x) xN − 1

or g∗ (x)(y0 + y1 x + · · · + yN−1 xN−1 ) = −d(x)(xN − 1), establishing the result. For a given d(x), the period is the smallest N such that g∗ (x) | (xN − 1)d(x). Example 4.44

The polynomial g∗ (x) = 1 + x2 + x3 + x4 can be factored as g∗ (x) = (1 + x)(1 + x + x3 ).



Appendix 4.A Linear Feedback Shift Registers

167

Taking N = 1 and d(x) = 1 + x + x3 , we see that y(x) has period 1. This is the sequence shown in Table 4.11. We ◽ note that g∗ (x) | x7 − 1, so that any d(x) of appropriate degree will serve.

As a sort of converse to the previous theorem, we have the following. Theorem 4.45 If g∗ (x) | xN − 1, then y(x) = Proof Let q(x) = 1 g∗ (x)

=

xN −1 g∗ (x)

1 g∗ (x)

is periodic with period N or some divisor of N.

= y0 + y1 x + · · · + yN−1 xN−1 . Then

y0 + y1 + · · · + yN−1 xN−1 = (y0 + y1 x + · · · + yN−1 xN−1 )(1 + xN + x2N + · · · ) 1 − xN

= (y0 + y1 x + · · · + yN−1 xN−1 ) + xN (y0 + y1 x + · · · + yN−1 xN−1 ) + · · · , ◽

which represents a periodic sequence.

Theorem 4.46 If the sequence y(x) produced by the connection polynomial g(x) of degree p has period 2p − 1 — that is, y(x) a maximal-length sequence — then g∗ (x) is irreducible. Proof Since the shift register moves through 2p − 1 states before repeating, the shift register must progress through all possible nonzero conditions. Therefore, there is some “initial condition” corresponding to d(x) = 1. Without loss of generality we can take y(x) = 1∕g∗ (x). Suppose that g∗ (x) factors as a∗ (x)b∗ (x), where deg(a∗ (x)) = p1 and deg(b∗ (x)) = p2 , with p1 + p2 = p. Then c(x) d(x) 1 1 = = + y(x) = ∗ g (x) a∗ (x)b∗ (x) a∗ (x) b∗ (x) by partial fraction expansion. c(x)∕a∗ (x) represents a series with period at most 2r1 − 1 and d(x)∕b∗ (x) d(x) represents a series with period at most 2r2 − 1. The period of the sum ac(x) ∗ (x) + b∗ (x) is at most the least common multiple of these periods, which must be less than the product of the periods: (2p1 − 1)(2p2 − 1) = 2p − 3. But this is less than the period 2p − 1, so g∗ (x) must not have such factors.



As mentioned above, irreducibility does not imply maximal-length. The polynomial g∗ (x) = 1 + x + x2 + x3 + x4 divides x5 + 1. But by Theorem 4.45, y(x) = 1∕g∗ (x) has period 5, instead of the period 15 that a maximal-length sequence would have. What is needed for the polynomial to be primitive. Appendix 4.A.3

Some Algebraic Properties of Shift Sequences

Let y(x) be a sequence with period N. Then y(x) can be considered an element of RN = GF(2)[x]∕(xN − 1). Let g(x) be a connection polynomial and g∗ (x) be its reciprocal. Let 𝑤(x) = g∗ (x)y(x), where computation occurs in the ring RN , and let 𝑤(x) = 𝑤0 + 𝑤1 x + · · · + 𝑤N−1 xN−1 . The coefficient 𝑤i of this polynomial is computed by p ∑ g∗p yi−j . 𝑤i = j=0

However, by (4.19), this is equal to 0. That is, g∗ (x)y(x) = 0 in the ring RN . In this case, we say that g∗ (x) annihilates the sequence y(x). Let V(g∗ ) be the set of sequences annihilated by g∗ (x). We observe that V(g∗ ) is an ideal in the ring Rn and has a generator h∗ (x) which must divide xN − 1. The generator

168

Cyclic Codes, Rings, and Polynomials h∗ (x) is the polynomial factor of (xN − 1)∕g∗ (x) of smallest positive degree. If (X N − 1)∕g∗ (x) is irreducible, then h∗ (x) = (X N − 1)∕g∗ (x). Example 4.47 Let g(x) = 1 + x + x2 + x4 , as in Example 4.38. Then g∗ (x) = 1 + x2 + x3 + x4 . This polynomial divides x7 + 1: x7 + 1 = 1 + x2 + x3 . h∗ (x) = ∗ g (x) The polynomial y(x) = h∗ (x) corresponds to the output sequence 1, 0, 1, 1, 0, 0, 0 and its cyclic shifts, which appears in Table 4.9. The polynomial y(x) = (1 + x)h∗ (x) = 1 + x + x2 + x4 corresponds to the sequence 1, 1, 1, 0, 1 and its cyclic shifts, which appears in Table 4.10. The polynomial y(x) = (1 + x + x3 ) ∗ h∗ (x) = 1 + x + x2 + x3 x4 + x5 + x6 corresponds to the sequence 1, 1, 1, 1, 1, 1 and its cyclic shifts. This sequence appears in Table 4.11. This sequence also happens to have period 2.◽ Example 4.48 For the generator polynomial g(x) = 1 + x + x4 and its reciprocal g∗ (x) = 1 + x3 + x4 . This polynomial divides x15 + 1: h∗ (x) =

x15 + 1 = 1 + x3 + x4 + x6 + x8 + x9 + x10 + x11 . g∗ (x)

The polynomial y(x) = h∗ (x) corresponds to the sequence 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, which appears in Table 4.12. ◽

Programming Laboratory 2: Polynomial Division and Linear Feedback Shift Registers

Objective Computing quotients and remainders in polynomial division is an important computational step for encoding and decoding cyclic codes. In this lab, you are to create a C++ class which performs these operations for binary polynomials. You will also create an LFSR class, which will be used in the construction of a Galois field class.

(c) Trace the operation of the circuit for the g(x) and d(x) given, identifying the polynomials represented by the shift register contents at each step of the algorithm, as in Table 4.3. Also, identify the quotient and the remainder produced by the circuit. 2) For the connection polynomial g(x) = x4 + x3 + x + 1, trace the LFSR when the initial register contents are (1, 0, 0, 0), as in Example 4.38. Also, if this does not exhaust all possible 15 states of the LFSR, determine other initial states and the sequences they generate. Programming Part: BinLFSR

Preliminary Exercises Reading: Sections 4.9 and Appendix 4.A.1. 1) Let g(x) = x4 + x3 + x + 1 and d(x) = x8 + x7 + x5 + x4 + x3 + x + 1. (a) Perform polynomial long division of d(x) and g(x), computing the quotient and remainder, as in Example 4.24. (b) Draw the circuit configuration for dividing by g(x).

Create a C++ class BinLFSR which implements an LFSR for a connection polynomial of degree ≤ 32 (or the number of bits stored in an unsigned int). Store the state of the LFSR in an unsigned int variable. Create a constructor with arguments BinLFSR(unsigned int g, int n, unsigned int // initstate=1);

The first argument g is a representation of the connection polynomial. For example, g = 0x17 represents the bits 10111, which represents the polynomial

Appendix 4.A Linear Feedback Shift Registers g(x) = x4 + x2 + x + 1. The second argument n is the degree of the connection polynomial. The third argument has a default value, corresponding to the initial state (1, 0, 0, . . . , 0). Use a single unsigned int internally to hold the state of the shift register. The class should have member functions as follows: BinLFSR(void) { g=n=state=mask=mask1=0;} // default constructor BinLFSR(unsigned int g, int n, unsigned int // initstate=1); // constructor ∼BinLFSR() {}; // destructor void setstate(unsigned int state); // Set the initial state of the LFSR unsigned char step(void); // Step the LFSR one step, // and return 1-bit output unsigned char step(unsigned int &state); // Step the LFSR one step, // return 1-bit output // and the new state void steps(int nstep, unsigned char *outputs); // Step the LFSR nstep times, // returning the array of 1-bit outputs

169 (the number 257). Octal constants can be written using 0nnn, as in 0123, which has the bit pattern 001 010 011.



The bitwise and operator & can be used to mask bits off. For example, if a = 0x123, then in b = a & 0xFF;, b is equal to 0x23. To retain the lowest m bits of a number, mask it with ((1≪m)-1).



The algorithms can be implemented either by shifting right using ≫ or by shifting left using ≪. For a few reasons, it makes sense to shift left, so that the input comes into the LSB and the output comes out of the most significant bit. This may be initially slightly confusing, since the pictures portray shifts to the right.



As a tutorial, the code for the LFSR is provided. α α2 α3 α4 α 5 α6 α7 α8 α9 +

01010

00000

α 10 α11 α12α 13α 14 α 15 α 16

Algorithm 4.2 BinLFSR File: BinLFSR.h BinLFSR.cc testBinLFSR.cc MakeLFSR

Test the class as follows: 1. Use the LFSR class to generate the three sequences of Example 4.38. 2. Use the LFSR class to generate the output sequence and the sequence of states shown in Table 4.12. Resources and Implementation Suggestions The storage of the polynomial divider and LFSR could be implemented with a character array, as in unsigned char *storage = new unsigned char[n];

Shifting the registers would require a for loop. However, since the degree of the coefficient polynomial is of degree ≤ 32, all the memory can be contained in a single 4-byte unsigned integer, and the register shift can be accomplished with a single-bit shift operation.



The operator ≪ shifts bits left, shifting 0 into the LSBs. Thus, if a=3, then a≪2 is equal to 12. The number 1≪m is equal to 2m for m ≥ 0.



The operator ≫ shifts bits right, shifting in 0 to the MSB. Thus, if a=13, then a≫2 is equal to 3.



Hexadecimal constants can be written using 0xnnnn, as in 0xFF (the number 255), or 0x101

The class declarations are given in BinLFSR.h. The class definitions are given in BinLFSR.cc. In this case, the definitions are short enough that it would make sense to merge the .cc file into the .h file, but they are separated for pedagogical reasons.5 A simple test program is testBinLFSR.cc. A very simple makefile (if you choose to use make) is in MakeLFSR. Programming Part: BinPolyDiv Create a C++ class BinPolyDiv which implements a polynomial divisor/remainder circuit, where the degree of g(x) is < 32. The constructor has arguments representing the divisor polynomial and its degree: BinPolyDiv(unsigned char *g, int p);

The class should have member functions div and remainder which compute, respectively, the quotient and the remainder, with arguments as follows: unsigned int div(unsigned char *d, // dividend int ddegree, unsigned char *q, unsigned int "ientdegree, unsigned int &remainderdegree); unsigned int remainder(unsigned char *d,

5 The comment at the end of this code is parsed by the emacs editor in the c++ mode. This comment can be used by the compile command in emacs to run the compiler inside the editor.

170

Cyclic Codes, Rings, and Polynomials int n, int &remainderdegree);

The dividend d is passed in as an unsigned char array, 1 bit per character, so that arbitrarily long dividend polynomials can be accommodated. The remainder is returned as a single integer whose bits represent the storage register, with the least-significant bit representing the coefficient of smallest degree of the remainder. Internally, the remainder should be stored in a single unsigned int. Test your function on the polynomials g(x) = x4 + x3 + x + 1 and d(x) = x8 + x7 + x5 + x4 + x3 + x + 1 from the Preliminary Exercises. Also test your function on the polynomials from Example 4.24. 2

3

4

5

6

7

8

9

α α α α α α α α α +

01010

00000

Algorithm 4.3 BinPolyDiv File: BinPolyDiv.h BinPolyDiv.cc testBinPolyDiv.cc

α10 α11 α12 α13 α14 α15 α16

Programming Laboratory 3: CRC Encoding and Decoding Objective In this lab, you become familiar with cyclic encoding and decoding, both in bit-oriented and byte-oriented algorithms. Preliminary Reading: Section 4.13. Verify that the remainder d(x) in (4.11) is correct by dividing x16 m(x) by g(x). (You may want to do this both by hand and using a test program invoking BinPolyDiv, as a further test on your program.) Programming Part 1. Write a C++ class CRC16 which computes the 16-bit parity bits for a stream of data, where g(x) is a generator polynomial of degree 16. The algorithm should use the polynomial division idea (that is, a bit-oriented algorithm. You may probably want to make use of a BinPolyDiv object from Lab 2 in

Follow-On Ideas and Problems A binary {0, 1} sequence {yn } can be converted to a binary ±1 sequence zn by zn = (−1)yn . For a binary ±1 sequence {z0 , z1 , . . . , zN−1 } with period N, define the cyclic autocorrelation by N−1 1 ∑ rz (𝜏) = zz , N i=0 i ((i+𝜏)) where ((i + 𝜏)) denotes i + 𝜏 modulo N. Using your LFSR class, generate the sequence with connection polynomial g(x) = 1 + x + x4 and compute and plot rz (𝜏) for 𝜏 = 0, 1, . . . , 15. (You may want to make the plots by saving the computed data to a file, then plotting using some convenient plotting tool such as MATLAB.) You should observe that there is a single point with correlation 1 (at 𝜏 = 0) and that the correlation at all other lags is −1∕N. The shape of the correlation function is one reason that maximal length sequences are called pseudonoise sequences: the correlation function approximates a 𝛿 function (with the approximation improving for longer N). As a comparison, generate a sequence with period 7 using g(x) = 1 + x + x2 + x4 and plot rz (𝜏) for this sequence.

your class). Here is a class declaration you might find useful: class CRC16 { protected: BinPolyDiv div; // the divider object public: CRC16(unsigned int crcpoly); // constructor int CRC(unsigned char *data, int len); // Compute the CRC for the data // data=data to be encoded // len = number of bytes to be encoded // Return value: the 16 bits of // parity // (data[0] is associated with the // highest power of x∧n) };

Test your program first using Example 4.35. 2. Write a standalone program crcenc which encodes a file, making use of your CRC16 class. Use g(x) = x16 + x15 + x2 + 1. The program should accept three command line arguments: crcenc K filein fileout

where K is the message block length (in bytes), filein is the input file, and fileout is the encoded file.

Appendix 4.A Linear Feedback Shift Registers 3. Write a standalone program crcdec which decodes a file, making use of your CRC class. The program should accept three arguments: crcdec K filein fileout

where K is the message block length (in bytes), filein is an encoded file, and fileout is a decoded file. 4. Test crcenc and crcdec by first encoding then decoding a file, then comparing the decoded file with the original. (A simple compare program is cmpsimple.) The decoded file should be the same as the original file. Use a message block length of 1024 bytes. Use a file of 1,000,000 random bytes created using the makerand program for the test. 5. Test your programs further by passing the encoded data through a binary symmetric channel using the bsc program. Try channel crossover probabilities of 0.00001, 0.001, 0.01, and 0.1. Are there any blocks of data that have errors that are not detected? 6. Write a class FastCRC16 which uses the byte-oriented algorithm to compute the parity bits for a generator g(x) of degree 16. A sample class definition follows:

171 9. Write a decoder program fastcrcdec which decodes using FastCRC16. Verify that it decodes correctly. 10. How much faster is the byte-oriented algorithm? Resources and Implementation Suggestions Static Member Variables A static member variable of a class is a variable that is associated with the class. However, all instances of the class share that same data, so the data are not really part of any particular object. To see why these might be used, suppose that you want to build a system that has two FastCRC16 objects in it: FastCRC16 CRC1(g); FastCRC16 CRC2(g);

// instantiate two objects

The FastCRC16 algorithm needs the data from Table 4.8. This data could be represented using member data as in class FastCRC16 { protected: int *crctable; unsigned char crc0, crc1; // the two bytes of parity public: FastCRC16(int crcpoly); // constructor int CRC(unsigned char *data, int len); };

However, there are two problems with this: class FastCRC16 { protected: static int *crctable; unsigned char crc0, crc1; // the two bytes of parity public: FastCRC16(unsigned int crcpoly); // constructor int CRC(unsigned char *data, int len); // Compute the CRC for the data // data[0] corresponds to the // highest powers of x };

The table of parity values (as in Table 4.8) should be stored in a static class member variable (see the discussion below about static variables). The constructor for the class should allocate space for the table and fill the table, if it has not already been built. 7. Test your program using the data in Example 4.37. 8. Write a standalone program fastcrcenc which encodes a file using FastCRC16. Use g(x) = x16 + x15 + x2 + 1. The program should have the same arguments as the program crcenc. Test your program by encoding some data and verify that the encoded file is the same as for a file encoded using crcenc.

1. Each object would have its own table. This wastes storage space. 2. Each object would have to construct its table, as part of the constructor routine. This wastes time. As an alternative, the lookup table could be stored in a static member variable. Then it would only need to be constructed once (saving computation time) and only stored once (saving memory). The tradeoff is that it is not possible by this arrangement to have two or more different lookup tables in the same system of software at the same time. (There are ways to work around this problem, however. You should try to think of a solution on your own.) The declaration static int *crctable; which appears in the .h file does not define the variable. There must be a definition somewhere, in a C++ source file that is only compiled once. Also, since it is a static object, in a sense external to the class, its definition must be fully scoped. Here is how it is defined: // File: FastCRC.cc // ... #include "FastCRC.h"

172

Cyclic Codes, Rings, and Polynomials

int *FastCRC16::crctable=0;

This defines the pointer and initializes it to 0. Allocation of space for the table and computation of its contents is accomplished by the constructor:

main(int argc, char *argv[]) { int K; char *infname, *outfname; // ... if(argc!=4) { // check number of arguments is as expected cout « "Usage: " « argv[0] « "K infile outfile" « endl; exit(-1); } K = atoi(argv[1]); // read blocksize as an integer infname = argv[2]; // pointer to input file name outfname = argv[3]; // pointer to output file name // ...

// Constructor for FastCRC16 object FastCRC16::FastCRC16(int crcpoly) { if(FastCRC16::crctable==0) { // the table has not been allocated yet FastCRC16::crctable = new int[256]; // Now build the tables // ... } // ... }

Static member variables do not necessarily disappear when an object goes out of scope. We shall use static member variables again in the Galois field arithmetic implementation. Command Line Arguments For operating systems which provide a command-line interface, reading the command line arguments into a program is very straightforward. The arguments are passed in to the main routine using the variables argc and argv. These may then be parsed and used. argc is the total number of arguments on the command line, including the program name. If there is only the program name (with no other arguments), then argc==1. argv is an array of pointers to the string commands. argv[0] is the name of the program being run. As an example, to read the arguments for crcenc K filein fileout, you could use the following code: // Program crcenc

}

Picking Out All the Bits in a File To write the bit-oriented decoder algorithm, you need to pick out all the bits in an array of data. Here is some sample code: // d is an array of unsigned characters // with 'len' elements unsigned char bits[8]; // an array that hold the bits of one byte of d for(int i = 0; i > len; i++) { // work through all the bytes of data for(int j = 7; j >= 0; j–) { // work through the bits in each byte bits[j] = (data[i]&(1«j)) != 0; } // bits now has the bits of d[i] in it // ... }

// ...

4.14 Exercise 4.1 4.2

List the codewords for the (7, 4) Hamming code. Verify that the code is cyclic. In a ring with identity (that is, multiplicative identity), denote this identity as 1. Prove: (a) The multiplicative identity is unique. (b) If an element a has both a right inverse b (i.e., an element b such that ab = 1) and a left inverse c (i.e., an element c such that ca = 1), then b = c. In this case, the element a is said to have an inverse (denoted by a−1 ). Show that the inverse of an element a, when it exists, is unique. (c) If a has a multiplicative inverse a−1 , then (a−1 )−1 = a. (d) The set of units of a ring forms a group under multiplication. (Recall that a unit of a ring is an element that has a multiplicative inverse.) (e) If c = ab and c is a unit, then a has a right inverse and b has a left inverse.

4.14 Exercise (f) In a ring, a nonzero element a such that ax = 0 for x ≠ 0 is said to be a zero divisor. Show that if a has an inverse, then a is not a zero divisor. 4.3 4.4 4.5 4.6 4.7 4.8

4.9

Construct the ring R4 = GF(2)[x]∕(x4 + 1). That is, construct the addition and multiplication tables for the ring. Is R4 a field? Let R be a commutative ring and let a ∈ R. Let I = {b ∈ R ∶ ab = 0}. Show that I is an ideal of R. An element a of a ring R is nilpotent if an = 0 for some positive integer n. Show that the set of all nilpotent elements in a commutative ring R is an ideal. Let A and B be ideals in a ring R. The sum A + B is defined as A + B = {a + b ∶ a ∈ A, b ∈ B}. Show that A + B is an ideal in R. Show that A ⊂ A + B. Show that in the ring ℤ15 the polynomial p(x) = x2 − 1 has more than two zeros. In a field there would be only two zeros. What may be lacking in a ring that leads to “too many” zeros? In the ring R4 = GF(2)[x]∕(x4 + 1), multiply a(x) = 1 + x2 + x3 and b(x) = x + x2 . Also, cyclically convolve the sequences {1, 0, 1, 1} and {0, 1, 1}. What is the relationship between these two results? For the (15, 11) binary Hamming code with generator g(x) = x4 + x + 1: (a) Determine the parity check polynomial h(x). (b) Determine the generator matrix G and the parity check matrix H for this code in nonsystematic form. (c) Determine the generator matrix G and the parity check matrix H for this code in systematic form. (d) Let m(x) = x + x2 + x3 . Determine the code polynomial c(x) = g(x)m(x). (e) Let m(x) = x + x2 + x3 . Determine the systematic code polynomial c(x) = xn−k m(x) + Rg(x) [xn−k m(x)], where Rg(x) [ ] computes the remainder after division by g(x). (f) For the codeword c(x) = 1 + x + x3 + x4 + x5 + x9 + x10 + x11 + x13 , determine the message if nonsystematic encoding is employed. (g) For the codeword c(x) = 1 + x + x3 + x4 + x5 + x9 + x10 + x11 + x13 , determine the message if systematic encoding is employed. (h) Let r(x) = x14 + x10 + x5 + x3 . Determine the syndrome for r(x). (i) Draw the systematic encoder circuit for this code using the g(x) feedback polynomial. (j) Draw the decoder circuit for this circuit with r(x) input on the left of the syndrome register. Determine in particular the error pattern detection circuit. (k) Draw the decoder circuit for this circuit with r(x) input on the right of the syndrome register. Determine in particular the error pattern detection circuit. (l) Let r(x) = x13 + x10 + x9 + x5 + x2 + 1. Trace the execution of the Meggitt decoder with the input on the left, analogous to Table 4.5. (m) Let r(x) = x13 + x10 + x9 + x5 + x2 + 1. Trace the execution of the Meggitt decoder with the input on the right, analogous to Table 4.6.

4.10 Let f (x) be a polynomial of degree m in 𝔽 [x], where 𝔽 is a field. Show that if a is a root of f (x) (so that f (a) = 0), then (x − a) f (x). Hint: Use the division algorithm. Inductively, show that f (x) has at most m roots in 𝔽 . 4.11 The following are code polynomials from binary cyclic codes. Determine the highest-degree generator g(x) for each code. (a) c(x) = 1 + x4 + x5 (b) c(x) = 1 + x + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 (c) c(x) = x13 + x12 + x9 + x5 + x4 + x3 + x2 + 1

173

174

Cyclic Codes, Rings, and Polynomials (d) c(x) = x8 + 1 (e) c(x) = x10 + x7 + x5 + x4 + x3 + x2 + x + 1 4.12 Let g(x) = g0 + g1 x + · · · + gn−k xn−k be the generator for a cyclic code. Show that g0 ≠ 0. 4.13 Show that g(x) = 1 + x + x4 + x5 + x7 + x8 + x9 generates a binary (21,12) cyclic code. Devise a syndrome computation circuit for this code. Let r(x) = 1 + x4 + x16 be a received polynomial. Compute the syndrome of r(x). Also, show the contents of the syndrome computation circuit as each digit of r(x) is shifted in. 4.14 [272] Let g(x) be the generator for a binary (n, k) cyclic code . The reciprocal of g(x) is defined as g∗ (x) = xn−k g(1∕x). (In this context, “reciprocal” does not mean multiplicative inverse.) (a) As a particular example, let g(x) = 1 + x2 + x4 + x6 + x7 + x10 . Determine g∗ (x). The following subproblems deal with arbitrary cyclic code generator polynomials g(x). (b) Show that g∗ (x) also generates an (n, k) cyclic code. (c) Let  ∗ be the code generated by g∗ (x). Show that  and  ∗ have the same weight distribution. (d) Suppose  has the property that whenever c(x) = c0 + c1 x + cn−1 xn−1 is a codeword, so is its reciprocal c∗ (x) = cn−1 + cn−2 x + · · · + c0 xn−1 . Show that g(x) = g∗ (x). Such a code is said to be a reversible cyclic code. 4.15 [272] Let g(x) be the generator polynomial of a binary cyclic code of length n. (a) Show that if g(x) has x + 1 as a factor, then the code contains no codevectors of odd weight. (b) Show that if x + 1 is not a factor of g(x), then the code contains the all-one codeword. Hint: The following is true for any ring 𝔽 [x]: 1 − xn = (1 − x)(1 + x + x2 + · · · + xn−1 ). (c) Show that the code has minimum weight at least 3 if n is the smallest integer such that g(x) divides xn − 1. 4.16 Let A(z) be the weight enumerator for a binary cyclic code  with generator g(x). Suppose furthermore that x + 1 is not a factor of g(x). Show that the code generated by g̃ (x) = (x + 1)g(x) has weight enumerator Ã(z) = 12 [A(z) + A(−z)].

4.17 Let g(x) be the generator polynomial of an (n, k) cyclic code . Show that g(x𝜆 ) generates an (𝜆n, 𝜆k) cyclic code that has the same minimum weight as the code generated by g(x). 4.18 Let  be a (2m − 1, 2m − m − 1) Hamming code. Show that if a Meggitt decoder with input on the right-hand side is used, as in Figure 4.20, then the syndrome to look for to correct the digit m rn−1 is s(x) = xm−1 . Hint: g(x) divides x2 −1 + 1. Draw the Meggitt decoder for this Hamming code decoder. 4.19 [45] The code of length 15 generated by g(x) = 1 + x4 + x6 + x7 + x8 is capable of correcting 2 errors. (It is a (15, 7) BCH code.) Show that there are 15 correctable error patterns in which the highest-order bit is equal to 1. Devise a Meggitt decoder for this code with the input applied to the right of the syndrome register. Show that the number of syndrome patterns to check can be reduced to 8. 4.20 Let g(x) be the generator of a (2m − 1, 2m − m − 1) Hamming code and let g̃ (x) = (1 + x)g(x). Show that the code generated by g̃ (x) has minimum distance exactly 4. (a) Show that there exist distinct integers i and j such that xi + xj is not a codeword generated by g(x). Write xi + xj = q1 (x)g(x) + r1 (x).

4.14 Exercise (b) Choose an integer k such that the remainder upon dividing xk by g(x) is not equal to r1 (x). Write xi + xj + xk = q2 (x)g(x) + r2 (x). (c) Choose an integer l such that when xl is divided by g(x) the remainder is r2 (x). Show that l is not equal to i, j, or k. (d) Show that xi + xj + xk + xl = [q2 (x) + q3 (x)]g(x) and that xi + xj + xk + xl is a multiple of (x + 1)g(x). 4.21 [272] An error pattern of the form e(x) = xi + xi+1 is called a double-adjacent-error pattern. Let  be the (2m − 1, 2m − m − 2) cyclic code generated by g(x) = (x + 1)p(x), where p(x) is a primitive polynomial of degree m. Show that no two double-adjacent-error patterns can be in the same coset of a standard array for . Also show that no double-adjacent-error pattern and single-error pattern can be in the same coset of the standard array. Conclude that the code is capable of correcting all the single-error patterns and all the double-adjacent-error patterns. 4.22 [272] Let c(x) be a code polynomial in a cyclic code of length n and let c(i) (x) be its ith cyclic shift. Let l be the smallest positive integer such that c(l) (x) = c(x). Show that l is a factor of n. 4.23 Verify that the circuit shown in Figure 4.2 computes the product a(x)h(x). 4.24 Verify that the circuit shown in Figure 4.3 computes the product a(x)h(x). 4.25 Verify that the circuit shown in Figure 4.4 computes the product a(x)h(x). 4.26 Verify that the circuit shown in Figure 4.5 computes the product a(x)h(x). 4.27 Let h(x) = 1 + x2 + x3 + x4 . Draw the multiplier circuit diagrams as in Figures 4.4, and 4.5. 4.28 Let r(x) = 1 + x3 + x4 + x5 be the input to the decoder in Figure 4.21. Trace the execution of the decoder by following the contents of the registers. If the encoding is systematic, what was the transmitted message? 4.29 Cyclic code dual. (a) Let  be a cyclic code. Show that the dual code  ⟂ is also a cyclic code. (b) Given a cyclic code  with generator polynomial g(x), describe how to obtain the generator polynomial for the dual code  ⟂ . 4.30 As described in Section 3.6, the dual to a Hamming code is a (2m − 1, m) maximal-length code. Determine the generator matrix for a maximal-length code of length 2m − 1. 4.31 Let  be an (n, k) binary cyclic code with minimum distance dmin and let  ′ ⊂  be the shortened code for which the l high-order message bits are equal to 0. Show that  ′ has 2k−l codewords and ′ is a linear code. Show that the minimum distance dmin of  ′ is at least as large as dmin . 4.32 For the binary (31, 26) Hamming code generated using g(x) = 1 + x2 + x5 shortened to a (28,23) Hamming code. (a) Draw the decoding circuit for the (28, 23) shortened code using the method of simulating extra clock shifts. (b) Draw the decoding circuit for the (28, 23) shortened code using the method of changing the error pattern detection circuit. 4.33 Explain why CRC codes can be thought of as shortened cyclic codes. 4.34 Let g(x) = 1 + x2 + x4 + x5 . Determine the fraction of all burst errors of the following lengths that can be detected by a cyclic code using this generator: (a) burst length = 4; (b) burst length = 5; (c) burst length = 6; (d) burst length = 7; (e) burst length = 8. 4.35 Let g(x) = x8 + x7 + x6 + x4 + x2 + 1 be the generator polynomial for a CRC code. (a) Let m(x) = 1 + x + x3 + x6 + x7 + x12 + x16 + x21 . Determine the CRC-encoded message c(x).

175

176

Cyclic Codes, Rings, and Polynomials (b) The CRC-encoded polynomial r(x) = x2 + x3 + x5 + x6 + x7 + x8 + x10 + x11 + x14 + x17 + x20 + x23 + x26 + x28 is received. Has an error been made in transmission? 4.36 Verify the entry for t = 3 in Table 4.8. Also verify the entry for t =dc. 4.37 A double-error pattern is one of the form e(x) = xi + xj for 0 ≤ i < j ≤ n − 1. If g(x) does not have x as a factor and does not evenly divide 1 + xj−i , show that any double-error pattern is detectable. 4.38 A file containing the 2 bytes d0 = 56 = 3816 and d1 = 125 =7D16 is to be CRC encoded using the CRC-ANSI generator polynomial g(x) = x16 + x15 + x2 + 1. (a) Convert these data to a polynomial m(x). (b) Determine the CRC-encoded data c(x) = xr m(x) + Rg(x) [xr m(x)] and represent the encoded data as a stream of bits. (c) Using the fast CRC encoding of Algorithm 4.1, encode the data. Verify that it corresponds to the encoding obtained previously. 4.39 The output sequence of an LFSR with connection polynomial g(x) can be obtained by formal division of some dividend d(x) by g∗ (x). Let g(x) = 1 + x + x4 . Show by computational examples that when the connection polynomial is reversed (i.e., reciprocated), the sequence generated by it is reversed (with possibly a different starting point in the sequence). Verify this result analytically. 4.40 Show that (4.17) follows from (4.16). Show that (4.18) follows from (4.17). 4.41 Show that the circuit in Figure 4.29 produces the same sequence as that of Figure 4.25. (Of the two implementations, the one in Figure 4.25 is generally preferred, since the cascaded summers of Figure 4.29 result in propagation delays which inhibit high-speed operations.)

yi

D gp = 1

yi–1

D

yi–2

D

D

D

gp–1

gp–2

gp–3

g2

g1

+

+

+

+

+

yi–p

g0

Figure 4.29: Another LFSR circuit.

4.42 Figure 4.30 shows an LFSR circuit with the outputs of the memory elements labeled as state variables x1 through xp . Let ⎡x1 [k]⎤ ⎢x [k]⎥ 𝐱[k] = ⎢ 2 ⎥ . ⋮ ⎢ ⎥ ⎣xp [k]⎦ (a) Show that for the state labels as in Figure 4.30 that the state update equation is 𝐱[k + 1] = M𝐱[k],

4.14 Exercise

177

where M is the companion matrix ⎡0 ⎢1 ⎢ 0 M=⎢ ⎢0 ⎢⋮ ⎢0 ⎣ x1 [k + 1]

x1 [k]

D g0

0 0 1 0

0 0 0 1

0 0 0 0

··· ··· ··· ···

0 0 0 0

0

0

0

···

1

x2 [k]

+ g1

⎤ ⎥ ⎥ ⎥. ⎥ ⎥ −gp−1 ⎥⎦ −g0 −g1 −g2 −g3

xp–1 [k]

+

D

gp–2

xp [k]

+

D

D

gp–1

–1

Figure 4.30: An LFSR with state labels. (b) The characteristic polynomial of a matrix is p(x) = det(xI − M). Show that p(x) = g0 + g1 x + g2 x2 + · · · + gp−1 xp−1 = g(x). (c) It is a fact that every matrix satisfies its own characteristic polynomial. That is, p(M) = 0. (This is the Cayley–Hamilton theorem.) Use this to show that if g(x) (1 − xk ) then M k = I. (d) The period k of an LFSR with initial vector 𝐱[0] is the smallest k such that M k = I. Interpret this in light of the Cayley–Hamilton theorem, if p(x) is irreducible. (e) A particular output sequence {xp [0], xp [1], . . . , } is to be produced from this LFSR. Determine what the initial vector 𝐱[0] should be to obtain this sequence. (That is, what is the initial value of the LFSR register?) 4.43 Given a sequence y(x) produced by dividing by the reciprocal polynomial of g(x), y(x) =

d(x) , g∗ (x)

determine what d(x) should be to obtain the given y(x). The sequence {0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, . . . } is generated by the polynomial g∗ (x) = 1 + x3 + x4 . Determine the numerator polynomial d(x). 4.44 Show that the set of sequences annihilated by a polynomial g∗ (x) is an ideal. 4.45 [266] A Barker code is a binary-valued sequence {bn } of length n whose autocorrelation function has values of 0, 1, and n. Only nine such sequences are known, shown in Table 4.13. (a) Compute the autocorrelation value for the Barker sequence {b5 }. (b) Contrast the autocorrelation function for a Barker code with that of a maximal-length LFSR sequence. 4.46 [193] A nonbinary LFSR: It is known that x3 + x + 𝛽 is irreducible for some values of 𝛽 ∈ GF(27 ). Find a 𝛽 ∈ GF(128) such that x3 + x + 𝛽 is not only irreducible, but also primitive. Form the extension field GF(128) using the primitive polynomial g(𝛼) = 𝛼 7 + 𝛼 3 + 1. (This could be used, for example, as a 7-bit scrambler.) Answer: The value of 𝛽 can be found by checking each nonzero element of GF(128) in succession, building (in software) an LFSR and clocking to see if the LFSR generates all

178

Cyclic Codes, Rings, and Polynomials Table 4.13: Barker Codes n

{bn }

2 2 3 4 4 5 7 11 13

[1, 1] [−1, 1] [1, 1, −1] [1, 1, −1, 1] [1, 1, 1, −1] [1, 1, 1, −1, 1] [1, 1, 1, −1, −1, 1, −1] [1, 1, 1, −1, −1, −1, 1, −1, −1, 1, −1] [1, 1, 1, 1, 1, −1, −1, 1, 1, −1, 1, −1, 1]

possible nonzero values. Each register in the LFSR holds 7 bits, so the three registers can take 23⋅7 − 1 = 221 − 1 = 2,097, 151 distinct states. By this exhaustive search, it is found that 𝛽 = 𝛼 3 generates a primitive sequence.

4.15 References Cyclic codes were explored by Prange [350, 352, 353]. Our presentation owes much to Wicker [483], who promotes the idea of cyclic codes as ideals in a ring of polynomials. The Meggitt decoder is described in [306]. Our discussion of the Meggitt decoder closely follows [271]; many of the exercises were also drawn from that source. The tutorial paper [360] provides an overview of CRC codes, comparing five different implementations and also providing references to more primary literature. Much of the material on polynomial operations was drawn from [338]. The table of primitive polynomials is from [509], which in turn was drawn from [338]. An early but still important and thorough work on LFSRs is [160]. See also [158, 159]. The circuit implementations presented here can take other canonical forms. For other realizations, consult a book on digital signal processing, such as [328], or controls [145, 239]. The paper [510] has some of the fundamental algebraic results in it. An example of maximal length sequences to generate modem synchronization sequences is provided in [212]. The paper [394] has descriptions of correlations of maximal length sequences under decimation.

Chapter 5

Rudiments of Number Theory and Algebra 5.1

Motivation

We have seen that the cyclic structure of a code provides a convenient way to encode and reduces the complexity of decoders for some simple codes compared to linear block codes. However, there are several remaining questions to be addressed in approaching practical long code designs and effective decoding algorithms. 1. The cyclic structure means that the error pattern detection circuitry must only look for errors in the last digit. This reduces the amount of storage compared to the syndrome decoding table. However, for long codes, the complexity of the error pattern detection circuitry may still be considerable. It is therefore of interest to have codes with additional algebraic structure, in addition to the cyclic structure, that can be exploited to develop efficient decoding algorithms. 2. The decoders presented in Chapter 4 are for binary codes: knowing the location of errors is sufficient to decode. However, there are many important nonbinary codes, for which both the error locations and values must be determined. We have presented no theory yet for how to do this. 3. We have seen that generator polynomials g(x) must divide xn − 1. Some additional algebraic tools are necessary to describe how to find such factorizations over arbitrary finite fields. 4. Finally, we have not presented yet a design methodology, by which codes having a specified minimum distance might be designed. This chapter develops mathematical tools to address these issues. In reality, the amount of algebra presented in this chapter is both more and less than is needed. It is more than is needed, in that concepts are presented which are not directly called for in later chapters (even though their presence helps put other algebraic notions more clearly in perspective). It is less than is needed, in that the broad literature of coding theory uses all of the algebraic concepts presented here, and much more. An attempt has been made to strike a balance in presentation. Example 5.1 We present another example motivating the use of the algebra over finite fields [33]. This example will preview many of the concepts to be developed in this chapter, including modulo operations, equivalence, the Euclidean algorithm, irreducibility, and operations over a finite field. We have seen in Section 1.9.2 that the decoding algorithm for the Hamming code can be expressed purely in an algebraic way: finding the (single) error can be expressed as finding the solution to a single algebraic equation. It is possible to extend this to a two-error-correcting code whose solution is found by solving two polynomial equations in two unknowns. We demonstrate this by a particular example, starting from a Hamming (31, 26) code having a parity check matrix ⎡0 0 0 · · · 1 1⎤ ⎢0 0 0 · · · 1 1⎥ ⎢ ⎥ H = ⎢0 0 0 · · · 1 1⎥ . ⎢0 1 1 · · · 1 1⎥ ⎢1 0 1 · · · 0 1⎥ ⎣ ⎦

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

180

Rudiments of Number Theory and Algebra The 5-tuple in the i th column is obtained from the binary representation of the integer i. As in Section 1.9.2, we represent the 5-tuple in the i th column as a single “number,” denoted by 𝛾i , so we write [ ] H = 𝛾1 𝛾2 𝛾3 · · · 𝛾30 𝛾31 . Let us now attempt to move beyond a single error correction code by appending five additional rows to H. We will further assume that the five new elements in each column are some function of column number. That is, we assume that we can write H as ⎡ 0 ⎢ 0 ⎢ 0 ⎢ ⎢ 0 ⎢ 1 H=⎢ f (1) ⎢1 ⎢f2 (1) ⎢f3 (1) ⎢f (1) ⎢4 ⎣f5 (1)

0 0 0 1 0 f1 (2) f2 (2) f3 (2) f4 (2) f5 (2)

0 0 0 1 1 f1 (3) f2 (3) f3 (3) f4 (3) f5 (3)

··· ··· ··· ··· ··· ··· ··· ··· ··· ···

1 1 1 1 0 f1 (30) f2 (30) f3 (30) f4 (30) f5 (30)

1 ⎤ 1 ⎥ 1 ⎥ ⎥ 1 ⎥ 1 ⎥ . f1 (31)⎥ ⎥ f2 (31)⎥ f3 (31)⎥ f4 (31)⎥⎥ f5 (31)⎦

(5.1)

The function 𝐟 (i) = [ f1 (i), f2 (i), f3 (i), f4 (i), f5 (i)]T has binary components, so fl (i) ∈ {0, 1} . This function tells what binary pattern should be associated with each column. Another way to express this is to note that 𝐟 maps binary 5-tuples to binary 5-tuples. We can also use our shorthand notation. Let f (𝛾) be the symbol represented by the 5-tuple (f1 (i), f2 (i), f3 (i), f4 (i), f5 (i)), where i is the integer corresponding to 𝛾i = 𝛾. Using our shorthand notation, we could write (5.1) as [ ] 𝛾2 𝛾3 · · · 𝛾30 𝛾31 𝛾1 H= . f (𝛾1 ) f (𝛾2 ) f (𝛾3 ) · · · f (𝛾30 ) f (𝛾31 ) The problem now is to select a function 𝐟 so that H represents a code capable of correcting two errors, and does so in such a way that an algebraic solution is possible. To express the functions 𝐟 , we need some way of dealing with these 𝛾i 5-tuples as algebraic objects in their own right, with arithmetic defined to add, subtract, multiply, and divide. That is, the 𝛾i need to form a field, as defined in Section 2.3, or (since there are only finitely many of them) a finite field. Addition in the field is straightforward: we could define addition element-by-element. But how do we multiply in a meaningful, nontrivial way? How do we divide? The key is to think of each 5-tuple as corresponding to a polynomial of degree ≤4. For example: (0, 0, 0, 0, 0) ↔ 0 (0, 0, 0, 0, 1) ↔ 1 (0, 0, 0, 1, 0) ↔ x (0, 0, 1, 0, 0) ↔ x2 (1, 0, 1, 0, 1) ↔ x4 + x2 + 1. Note that each coefficient of the polynomials is binary; we assume that addition is modulo 2 (i.e., over GF(2)). Clearly, addition of polynomials accomplishes exactly the same thing as addition of the vectors. (They are isomorphic.) How can we multiply? We want our polynomials representing the 5-tuples to have degree ≤4, and yet when we multiply the degree may exceed that. For example, (x3 + x + 1)(x4 + x3 + x + 1) = x7 + x6 + x5 + x4 + x2 + 1. To reduce the degree, we choose some polynomial M(x) of degree 5, and reduce the product modulo M(x). That is, we divide by M(x) and take the remainder. Let us take M(x) = x5 + x2 + 1. When we divide x7 + x6 + x5 + x4 + x2 + 1 by M(x), we get a quotient of x2 + x + 1, and a remainder of x3 + x2 + x. We use the remainder: (x3 + x + 1)(x4 + x3 + x + 1) = x7 + x6 + x5 + x4 + x2 + 1 ≡ x3 + x2 + x (mod x5 + x2 + 1).

5.1 Motivation

181

Our modulo operations allow us now to add, subtract, and multiply these 5-tuples, considered as polynomials modulo some M(x). Can we divide? More fundamentally, given a polynomial a(x), is there some other polynomial s(x) — we may consider it a multiplicative inverse or a reciprocal — such that a(x)s(x) ≡ 1 mod M(x). The answer lies in the oldest algorithm in the world, the Euclidean algorithm. (More details later!) For now, just be aware that if M(x) is irreducible — it cannot be factored — then we can define division so that all of the 5-tuples 𝛾i have a multiplicative inverse except (0, 0, 0, 0, 0). Let us return now to the problem of creating a two-error-correcting code. Suppose that there are two errors, occurring at positions ii and i2 . Since the code is linear, it is sufficient to consider 𝐫 = (0, 0, . . . ,

1 , . . . , 1 , 0, . . . , 0). ⏟⏟⏟ ⏟⏟⏟ i1

i2

We find 𝐫H T = (s1 , s2 ) with

𝛾i1 + 𝛾i2 = s1 f (𝛾i1 ) + f (𝛾i2 ) = s2 .

(5.2)

If the two equations in (5.2) are functionally independent, then we have two equations in two unknowns, which we could solve for 𝛾i1 and 𝛾i2 which, in turn, will determine the error locations i1 and i2 . Let us consider some possible simple functions. One might be a simple multiplication: f (𝛾) = a𝛾. But this would lead to the two equations 𝛾i1 + 𝛾i2 = s1 a𝛾i1 + a𝛾i2 = s2 , representing the dependency s2 = as1 ; the new parity check equations would tell us nothing new. We could try f (𝛾) = 𝛾 + a; this would not help, since we would always have s2 = s1 . Let us try some powers. Say f (𝛾) = 𝛾 2 . We would then obtain 𝛾i1 + 𝛾i2 = s1

𝛾i21 + 𝛾i22 = s2 .

These looks like independent equations, but we have to remember that we are dealing with operations modulo 2. Notice that s21 = (𝛾i1 + 𝛾i2 )2 = 𝛾i21 + 𝛾i22 + 2𝛾i1 𝛾i2 = 𝛾i21 + 𝛾i22 = s2 . We have only the redundant s21 = s2 : the second equation is the square of the first and still conveys no new information. Try f (𝛾) = 𝛾 3 . Now the decoder equations are 𝛾i1 + 𝛾i2 = s1

𝛾i31 + 𝛾i32 = s2 .

These are independent! Now let’s see what we can do to solve these equations algebraically. In a finite field, we can do conventional algebraic manipulation, keeping in the back of our mind how we do multiplication and division. We can write s2 = 𝛾i31 + 𝛾i32 = (𝛾i1 + 𝛾i2 )(𝛾i21 − 𝛾i1 𝛾i2 + 𝛾i22 ) = s1 (𝛾i21 + 𝛾i1 𝛾i2 + 𝛾i22 ) = s1 (𝛾i1 𝛾i2 − s21 ) (where the signs have changed with impunity because these values are based on GF(2)). Hence we have the two equations s 𝛾i1 + 𝛾i2 = s1 𝛾i1 𝛾i2 = s21 + 2 s1 if s1 ≠ 0. We can combine these two equations into a quadratic: 𝛾i1 (s1 + 𝛾i1 ) = s21 +

s2 , s1

182

Rudiments of Number Theory and Algebra or

( ) s 𝛾i21 + s1 𝛾i1 + s21 + 2 = 0, s1

or

( ) s2 2 + s + = 0. 1 + s1 𝛾i−1 𝛾i−2 1 1 1 s1

For reasons to be made clear later, it is more useful to deal with the reciprocals of the roots. Let z = 𝛾i−1 . We then 1 have the equation ( ) s q(z) = 1 + s1 z + s21 + 2 z2 = 0. s1 The polynomial q(z) is said to be an error locator polynomial: the reciprocals of its roots tell the 𝛾i1 and 𝛾i2 , which, in turn, tell the locations of the errors. If there is only one error, then 𝛾i1 = s1 and 𝛾i3 = s2 and we end up with the equation 1 + s1 𝛾 −1 = 0. If there are 1 no errors, then s1 = s2 = 0. Let us summarize the steps we have taken. First, we have devised a way of operating on 5-tuples as single algebraic objects, defining addition, subtraction, multiplication, and division. This required finding some irreducible polynomial M(x) which works behind the scenes. Once we have got this, the steps are as follows: 1. We compute the syndrome 𝐫H T . 2. From the syndrome, we set up the error locator polynomial. We note that there must be some relationship between the sums of the powers of roots and the coefficients. 3. We then find the roots of the polynomial, which determine the error locations. For binary codes, knowing where the error is suffices to correct the error. For nonbinary codes, there is another step: knowing the error location, we must also determine the error value at that location. This involves setting up another polynomial, the error evaluation polynomial, whose roots determine the error values. The above steps establish the outline for this and the next chapters. Not only will we develop more fully the arithmetic, but we will be able to generalize to whole families of codes, capable of correcting many errors. However, the concepts are all quite similar to those demonstrated here. (It is historically interesting that it took roughly 10 years of research to bridge the gap between Hamming and the code presented above. Once this was accomplished, other generalizations followed quickly.) ◽

5.2

Number Theoretic Preliminaries

We begin with some notation and concepts from elementary number and polynomial theory. 5.2.1

Divisibility

Definition 5.2 An integer b is divisible by a nonzero integer a if there is an integer c such that b = ac. This is indicated notationally as a|b (read “a divides b”). If b is not divisible by a, we write a ∕| b. Let a(x) and b(x) be polynomials in F[x] (that is, the ring of polynomials with coefficients in F) where F is a field and assume that a(x) is not identically 0. Then b(x) is divisible by a polynomial a(x) if there is some polynomial c(x) ∈ F[x] such that b(x) = a(x)c(x); this is indicated by a(x)|b(x). ◽ Example 5.3

For a(x) and b(x) in ℝ[x], with b(x) = 112 + 96x + 174x2 + 61x3 + 42x4

and a(x) =

3 2 5 x + x+2 4 7

we have a(x)|b(x) since b(x) = 28(2 + x + 2x2 )a(x).

The following properties of divisibility of integers are straightforward to show. Lemma 5.4 [325] For integers, 1. a|b implies a|bc for any integer c.



5.2 Number Theoretic Preliminaries

183

2. a|b and b|c imply a|c. 3. a|b and a|c imply a|(bs + ct) for any integers s and t. 4. a|b and b|a imply a = ±b. 5. a|b, a > 0 and b > 0 imply a ≤ b. 6. if m ≠ 0, then a|b if and only if ma|mb. 7. if a|b and c|d then ac|bd. These properties apply with a few modifications to polynomials with coefficients from a finite field. Property (4) is different for polynomials: if a(x)|b(x) and b(x)|a(x), then a(x) = cb(x), where c is a nonzero element of the field of coefficients. Property (5) is also different for polynomials: a(x)|b(x) implies deg(a(x)) ≤ deg(b(x)). An important fact regarding division is expressed in the following theorem. Theorem 5.5 (Division algorithm) q and r such that

For any integers a and b with a > 0, there exist unique integers b = qa + r,

where 0 ≤ r < a. The number q is the quotient and r is the remainder. For polynomials, for a(x) and b(x) in F[x], F a field, there is a unique representation b(x) = q(x)a(x) + r(x), where deg(r(x)) < deg(a(x)). Proof [325, p. 5] We provide a partial proof for integers. Form the arithmetic progression . . . , b − 3a, b − 2a, b − a, b, b + a, b + 2a, b + 3a, . . . extending indefinitely in both directions. In this sequence select the smallest non-negative element and denote it by r; this satisfies the inequality 0 ≤ r < a and implicitly defines q by r = b − qa. ◽ Example 5.6

With b = 23 and a = 7 we have 23 = 3 ⋅ 7 + 2.

The quotient is 3 and the remainder is 2. Example 5.7



With b(x) = 2x3 + 3x + 2 and a(x) = x2 + 7 in ℝ[x], b(x) = (2x)(x2 + 7) + (−11x + 2).



Definition 5.8 If d|a and d|b, then d is said to be a common divisor of a and b. A common divisor g > 0 such that every common divisor of a and b divides g is called the greatest common divisor (GCD) and is denoted by (a, b). Integers a and b with a GCD equal to 1 are said to be relatively prime. The integers a1 , a2 , . . . , ak are pairwise relatively prime if (ai , aj ) = 1 for i ≠ j. If d(x)|a(x) and d(x)|b(x) then d(x) is said to be a common divisor of a(x) and b(x). If either a(x) or b(x) is not zero, the common divisor g(x) such that every common divisor of a(x) and b(x) divides g(x) is referred to as the GCD of a(x) and b(x) and is denoted by (a(x), b(x)). The GCD of polynomials (a(x), b(x)) is, by convention, normalized so that it is a monic polynomial.

184

Rudiments of Number Theory and Algebra If the GCD of a(x) and b(x) is a constant (which can be normalized to 1), then a(x) and b(x) are said to be relatively prime. ◽ Example 5.9

If a = 24 and b = 18 then, clearly, (a, b) = (24, 18) = 6.



Example 5.10 By some trial and error (to be reduced to an effective algorithm), we can determine that (851, 966) = 23. ◽ Example 5.11

With a(x) = 4x3 + 10x2 + 8x + 2 and b(x) = 8x3 + 14x2 + 7x + 1 in ℝ[x], it can be shown that 3 1 (a(x), b(x)) = x2 + x + . 2 2



Useful properties of the GCD: Theorem 5.12 1. For any positive integer m, (ma, mb) = m(a, b). 2. As a consequence of the previous result, if d|a and d|b and d > 0, then ) ( 1 a b = (a, b). , d d d 3. If (a, b) = g, then (a∕g, b∕g) = 1. 4. If (a, c) = (b, c) = 1, then (ab, c) = 1. 5. If c|ab and (b, c) = 1 then c|a. 6. Every divisor d of a and b divides (a, b). This follows immediately from (3) in Lemma 5.4 (or from the definition). 7. (a, b) = |a| if and only if a|b. 8. (a, (b, c)) = ((a, b), c) (associativity). 9. (ac, bc) = |c|(a, b) (distributivity).

5.2.2

The Euclidean Algorithm and Euclidean Domains

The Euclidean algorithm is perhaps the oldest algorithm in the world, being attributed to Euclid over 2000 years ago and appearing in his Elements. It was formulated originally to find the greatest common divisor of two integers. It has since been generalized to apply to elements in an algebraic structure known as a Euclidean domain. The powerful algebraic consequences include a method for solving a key step in the decoding of Reed–Solomon and BCH codes. To understand the Euclidean algorithm, it is perhaps most helpful to first see the Euclidean algorithm in action, without worrying formally yet about how it works. The Euclidean algorithm works by simple repeated division: Starting with two numbers, a and b, divide a by b to obtain a remainder. Then divide b by the remainder, to obtain a new remainder. Proceed in this manner, dividing the last divisor by the most recent remainder, until the remainder is 0. Then the last nonzero remainder is the GCD (a, b).

5.2 Number Theoretic Preliminaries

185

Example 5.13 Find (966, 851). Let a = 966 and b = 851. Divide a by b and express in terms of quotient and remainder. The results are expressed in equation and “long division” form:

1 851 966 851 115

966 = 851·1 + 115

Now take the divisor (851) and divide it by the remainder (115):

7 1 115 851 966 805 851 46 115

851 = 115·7 + 46

Now take the divisor (115) and divide it by the remainder (46):

2 7 1 46 115 851 966 92 805 851 23 46 115

115 = 46 · 2 + 23

Now take the divisor (46) and divide it by the remainder (23):

2 2 7 1 23 46 115 851 966 23 92 805 851 0 23 46 115

46 = 23 · 2 + 0

The remainder is now 0; the last nonzero remainder 23 is the GCD: (966, 851) = 23.



Example 5.14 In this example, we perform computations over ℤ5 [x], that is, operations modulo 5. Determine (a(x), b(x)) = (x7 + 3x6 + 4x4 + 2x3 + x2 + 4, x6 + 3x3 + 2x + 4), where a(x), b(x) ∈ ℤ5 [x]. (x7 + 3x6 + 4x4 + 2x3 + x2 + 4) = (x + 3)(x6 + 3x3 + 2x + 4) + (x4 + 3x3 + 4x2 + 2) (x6 + 3x3 + 2x + 4) = (x2 + 2x)(x4 + 3x3 + 4x2 + 2) + (3x2 + 3x + 4) 4

3

2

2

(5.3)

2

(x + 3x + 4x + 2) = (2x + 4x + 3)(3x + 3x + 4) + 0. With the degree of the last remainder equal to zero, we take the last nonzero remainder, 3x2 + 3x + 4 and normalize it to obtain the GCD: g(x) = 3−1 (3x2 + 3x + 4) = 2(3x2 + 3x + 4) = x2 + x + 3. For future reference, the polynomials s(t) and t(x) for this GCD are s(x) = 3x2 + x

t(x) = 2x3 + 2x + 2.

The Euclidean algorithm is established with the help of the following theorems and lemmas. Theorem 5.15 If g = (a, b), then there exist integers s and t such that g = (a, b) = as + bt.



186

Rudiments of Number Theory and Algebra For polynomials, if g(x) = (a(x), b(x)), then there are polynomials s(x) and t(x) such that g(x) = a(x)s(x) + b(x)t(x).

(5.4)

Proof [325] We provide the proof for the integer case; modification for the polynomial case is straightforward. Consider the linear combinations as + bt where s and t range over all integers. The set of integers E = {as + bt, s ∈ ℤ, t ∈ ℤ} contains positive and negative values and 0. Choose s0 and t0 so that as0 + bt0 is the smallest positive integer in the set: l = as0 + bt0 > 0. We now establish that l|a; showing that l|b is analogous. By the division algorithm, a = lq + r with 0 ≤ r < l. Hence r = a − ql = a − q(as0 + bt0 ) = a(1 − qs0 ) + b(−qt0 ), so r itself is in the set E. However, since l is the smallest positive integer in R, r must be 0, so a = lq, or l|a. Since g is the GCD of a and b, we may write a = gm and b = gn for some integers m and n. Then l = as0 + bt0 = g(ms0 + nt0 ), so g|l. Since it cannot be that g < l, since g is the greatest common divisor, it must be that g = l. ◽ From the proof of this theorem, we make the following important observation: the GCD g = (a, b) is the smallest positive integer value of as + bt as s and t range over all integers. Lemma 5.16 For any integer n, (a, b) = (a, b + an). For any polynomial n(x) ∈ F [x], (a(x), b(x)) = (a(x), b(x) + a(x)n(x)). Proof Let d = (a, b) and g = (a, b + an). By Theorem 5.15 there exist s0 and t0 such that d = as0 + bt0 . Write this as d = a(s0 − nt0 ) + (b + an)t0 = as1 + (b + an)t0 . It follows (from Lemma 5.4 part (3)) that g|d. We now show that d|g. Since d|a and d|b, we have that d|(an + b). Since g is the GCD of a and an + b and any divisor of a and an + b must divide the GCD, it follows that d|g. Since d|g and g|d, we must have g = d. (For polynomials, the proof is almost exactly the same, except that it is possible that g(x) = d(x) only if both are monic.) ◽ We demonstrate the use of this theorem and lemma by an example. Example 5.17 Determine g = (966, 851); this is the same as in Example 5.13, but now we keep track of a few more details. By the division algorithm, 966 = 1 ⋅ 851 + 115. (5.5) By Lemma 5.16, g = (851, 966) = (851,966 − 1 ⋅ 851) = (851,115) = (115,851). Thus, the problem has been reduced using the lemma to one having smaller numbers than the original, but with the same GCD. Applying the division algorithm again, 851 = 7 ⋅ 115 + 46

(5.6)

hence, again applying Lemma 5.16, (115,851) = (115,851 − 7 ⋅ 115) = (115, 46) = (46,115). Again, the GCD problem is reduced to one with smaller numbers. Proceeding by application of the division algorithm and the property, we obtain successively 115 = 2 ⋅ 46 + 23 (46,115) = (46,115 − 2 ⋅ 46) = (46, 23) = (23, 46) 46 = 2 ⋅ 23 + 0 (23, 46) = 23.

(5.7)

5.2 Number Theoretic Preliminaries

187

Chaining together the equalities, we obtain (966,851) = 23. We can find the s and t in the representation suggested by Theorem 5.15, (966,851) = 966s + 851t, by working the equations backward, substituting in for the remainders from each division in reverse order 23 = 115 − 2 ⋅ 46

“23” from (5.7)

= 115 − 2 ⋅ (851 − 7 ⋅ 115) = −2 ⋅ 851 + 15 ⋅ 115

“46” from (5.6)

= −2 ⋅ 851 + 15(966 − 1 ⋅ 851) = 15 ⋅ 966 − 17 ⋅ 851

“115” from (5.5) ◽

so s = 15 and t = −17. Example 5.18

It can be shown that for the polynomials in Example 5.14, s(x) = 3x2 + x

t(x) = 2x3 + 2x + 2.



Having seen the examples and the basic theory, we can now be a little more precise. In fullest generality, the Euclidean algorithm applies to algebraic structures known as Euclidean domains. Definition 5.19 [144, p. 301] A Euclidean domain is a set D with operations + and ⋅ satisfying: 1. D forms a commutative ring with identity. That is, D has an operation + such that ⟨D, +⟩ is a commutative group. Also, there is a commutative operation “multiplication,” denoted using ⋅ (or merely juxtaposition), such that for any a and b in D, a ⋅ b is also in D. The distributive property also applies: a ⋅ (b + c) = a ⋅ b + a ⋅ c for any a, b, c ∈ D. Also, there is an element 1, the multiplicative identity, in D such that a ⋅ 1 = 1 ⋅ a = a. 2. Multiplicative cancellation holds: if ab = cb and b ≠ 0, then a = c. 3. Every a ∈ D has a valuation 𝑣(a) ∶ D → ℕ ∪ {−∞} such that: (a) 𝑣(a) ≥ 0 for all a ∈ D (except when a = 0). (b) 𝑣(a) ≤ 𝑣(ab) for all a, b ∈ D, b ≠ 0. (c) For all a, b ∈ D with 𝑣(a) > 𝑣(b) there is a q ∈ D (quotient) and r ∈ D (remainder) such that a = qb + r with 𝑣(r) < 𝑣(b) or r = 0. 𝑣(b) is never −∞ except possibly when b = 0. The valuation 𝑣 is also called a Euclidean function. ◽ We have seen two examples of Euclidean domains: 1. The ring of integers under integer addition and multiplication, where the valuation is 𝑣(a) = |a| (the absolute value). Then the statement a = qb + r is obtained simply by integer division with remainder (the division algorithm). 2. Let F be a field. Then F[x] is a Euclidean domain with valuation function 𝑣(a(x)) = deg(a(x)) (the degree of the polynomial a(x) ∈ F[x]). It is conventional for this domain to take 𝑣(0) = −∞. Then the statement a(x) = q(x)b(x) + r(x) follows from polynomial division. The Euclidean algorithm can be stated in two versions. The first simply computes the GCD.

188

Rudiments of Number Theory and Algebra Theorem 5.20 (The Euclidean Algorithm) Let a and b be nonzero elements in a Euclidean domain. Then by repeated application of the division algorithm in the Euclidean domain, we obtain a series of equations: a = bq1 + r1

r1 ≠ 0 and 𝑣(r1 ) < 𝑣(b)

b = r1 q2 + r2

r2 ≠ 0 and 𝑣(r2 ) < 𝑣(r1 )

r1 = r2 q3 + r3

r3 ≠ 0 and 𝑣(r3 ) < 𝑣(r2 )

⋮ rj−2 = rj−1 qj + rj

rj ≠ 0 and 𝑣(rj ) < 𝑣(rj−1 )

rj−1 = rj qj+1 + 0

(rj+1 = 0).

Then (a, b) = rj , the last nonzero remainder of the division process. 2

3

4

5

6

7

8

αα α α α α α α α

9

+

01010

00000

α10 α11 α12 α13 α14 α15 α16

gcd.c

That the theorem stops after a finite number of steps follows since every remainder must be smaller (in valuation) than the preceding remainder and the (valuation of the) remainder must be nonnegative. That the final nonzero remainder is the GCD follows from property Lemma 5.16. This form of the Euclidean algorithm is very simple to code. Let ⌊a∕b⌋ denote the “quotient” without remainder of a∕b, that is, a = ⌊a∕b⌋b + r. Then recursion in the Euclidean algorithm may be expressed as qi = ⌊ri−2 ∕ri−1 ⌋ ri = ri−2 − ri−1 qi

(5.8)

i = 1, 2, . . . (until termination) with r−1 = a and r0 = b. The second version of the Euclidean algorithm, sometimes called the extended Euclidean algorithm, computes g = (a, b) and also the coefficients s and t of Theorem 5.15 such that

for

as + bt = g. The values for s and t are computed by finding intermediate quantities si and ti satisfying asi + bti = ri

(5.9)

at every step of the algorithm. The formula to update si and ti is (see Exercise 5.18) si = si−2 − qi si−1 ti = ti−2 − qi ti−1 ,

(5.10)

for i = 1, 2, . . . (until termination), with s−1 = 1 t−1 = 0

s0 = 0 t0 = 1.

(5.11)

The Extended Euclidean Algorithm is as shown in Algorithm 5.1. The following are some facts about the GCD which are proved using the Euclidean algorithm. Analogous results hold for polynomials. (It is helpful to verify these properties using small integer examples.) Lemma 5.21 1. For integers, (a, b) is the smallest positive value of as + bt, where s and t range over all integers. 2. If as + bt = 1 for some integers s and t, then (a, b) = 1; that is, a and b are relatively prime. Thus, a and b are relatively prime if and only if there exist s and t such that as + bt = 1.

5.2 Number Theoretic Preliminaries

189

Algorithm 5.1 Extended Euclidean Algorithm 1 2 3 4 5 6 7 8 9 10

Initialization: Set s and t as in (5.11). Let r−1 = a, r0 = b, s−1 = 1, s0 = 0, t−1 = 0, t0 = 1, i = 0 while(ri ≠ 0){ Repeat until remainder is 0 i=i+1 qi = ⌊ri−2 ∕ri−1 ⌋ Compute quotient ri = ri−2 − qi ri−1 Compute remainder si = si−2 − qi si−1 Compute s and t values ti = ti−2 − qi ti−1 } Return: s = si−1 , t = ti−1 , g = ri−1

5.2.3

An Application of the Euclidean Algorithm: The Sugiyama Algorithm

The Euclidean algorithm, besides computing the GCD, has a variety of other applications. Here, the Euclidean algorithm is put to use as a means of solving the problem of finding the shortest LFSR which produces a given output. This problem, as we shall see, is important in decoding BCH and Reed–Solomon codes. (The Berlekamp–Massey algorithm is another way of arriving at this solution.) We introduce the problem as a prediction problem. Given a set of 2p data points {bk } = {bk , k = 0, 1, . . . , 2p − 1} satisfying the LFSR equation1 bk = −

𝜈 ∑

tj bk−j ,

k = 𝜈, 𝜈 + 1, . . . , 2p − 1

(5.12)

j=1

we want to find the coefficients {tj } so that (5.12) is satisfied. That is, we want to find coefficients to predict bk using prior values. Furthermore, we want the number of nonzero coefficients 𝜈 to be as small as possible. Equation (5.12) can also be written as 𝜈 ∑

tj bk−j = 0,

k = 𝜈, 𝜈 + 1, . . . , 2p − 1,

(5.13)

j=0

with t0 = 1. One way to find the coefficients {tj }, given a set {bj }, is to explicitly set up and solve the Toeplitz matrix equation ⎡ b𝜈−1 b𝜈−2 b𝜈−3 ⎢ b𝜈−1 b𝜈−2 ⎢ b𝜈 ⎢ b𝜈 b𝜈−1 ⎢ b𝜈+1 ⎢ ⎢ ⋮ ⎢b ⎣ 2p−2 b2p−3 b2p−4

b0 ⎤ ⎡t1 ⎤ ⎡ −b𝜈 ⎤ ⎥⎢ ⎥ ⎢ ⎥ · · · b1 ⎥ ⎢t2 ⎥ ⎢ −b𝜈+1 ⎥ ⎥⎢ ⎥ ⎢ ⎥ · · · b2 ⎥ ⎢t3 ⎥ = ⎢ −b𝜈+2 ⎥ . ⎥⎢ ⎥ ⎢ ⎥ ⎥ ⎢⋮⎥ ⎢ ⋮ ⎥ · · · bp−1 ⎥⎦ ⎢⎣t𝜈 ⎥⎦ ⎢⎣−b2p−1 ⎥⎦ ···

That is, 𝜈 rows of this matrix can be selected to obtain a square matrix. The value of 𝜈 is not known however, so different values of 𝜈 would need to be tried to select the smallest 𝜈 which could work. The Sugiyama algorithm is an efficient way of solving this problem which guarantees that t(x) has minimal degree. The Sugiyama algorithm thus provides a means of synthesizing LFSR coefficients of smallest degree that produces the sequence {bk }.

1 Comparison with (4.18) shows that this equation has a − where (4.18) does not. This is because (4.18) is expressed over GF(2).

190

Rudiments of Number Theory and Algebra Since the LFSR recursion (5.13) is a convolution, this suggests considering relationships related to polynomial multiplication. Let us represent the sequence and the LFSR in terms of polynomials by ∑

2p−1

b(x) =

bi xi

and

t(x) = 1 +

i=0

𝜈 ∑

ti x i

i=1

and consider the product 𝜃(x) = b(x)t(x). Expanding the product out term by term, we find 𝜃(x) = b(x)t(x) = (b0 + b1 x + b2 x + · · · + b2p−1 x2p−1 )(t0 + t1 x + t2 x2 + · · · + t𝜈 x𝜈 ) = 𝜃0 + 𝜃1 x + 𝜃2 x2 + · · · + 𝜃𝜈−1 x𝜈−1 + 𝜃𝜈 x𝜇 + · · · 𝜃2p−1 x2p−1 n ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ incomplete sums 1

complete sums

+ 𝜃2p x2p + · · · + 𝜃𝜈+2p−1 x𝜈+2p−1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ incomplete sums 2

= b0 t0 + (b1 t0 + b0 t1 )x + (b2 t0 + b1 t1 + b0 t2 )x2 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

(5.14)

incomplete sums 1

+ · · · + (b𝜈−1 t0 + b𝜈−2 t1 + · · · b0 t𝜈−1 )x𝜈−1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ incomplete sums 1 (cont’d) b0 t𝜈 )x𝜈 + (b𝜈+1 t0 + b𝜈 t1

+ (b𝜈 t0 + b𝜈−1 t1 + · · · + + · · · + b1 t𝜈 )x𝜈+1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

(5.15)

(5.16)

complete sums

+ · · · + (b2p−1 t0 + · · · + b2p−1−𝜈 t𝜈 )x2p−1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ complete sums (cont’d) + b2p−1−𝜈 t𝜈 )x2p + · ·

+ (b2p−1 t1 + b2p−2 t2 + · · · · + b2p−1 t𝜈 x𝜈+2p−1 . ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

(5.17)

(5.18)

incomplete sums 2

In this product, the coefficients in the sum terms “complete sums” in (5.16) and (5.17) involve 𝜈 + 1 terms and are each 0 by (5.13). The terms in the “incomplete sums 1” (5.14) and (5.15) are not necessarily zero because they do not have all 𝜈 + 1 terms in the sum (5.13). These terms collectively form a polynomial of degree ≤ 𝜈 − 1. The terms in the “incomplete sums 2” (5.18) are not necessarily zero and these terms collectively form a polynomial whose lowest term may have degree 2p. (Thinking in terms of convolution, the “incomplete sums” terms involve where the sequences {bi } and {ti } are sliding into and out of full overlap, and the “complete sums” terms involve where the sequences being convolved are fully overlapping.) Because the “complete sums” terms are 0, the product may be written as (5.19) b(x)t(x) = r(x) − x2p s(x), where here r(x) = b0 t0 + (b1 t0 + b0 t1 )x + (b2 t0 + b1 t1 + b0 t2 )x2 + · · · + (b𝜈−1 t0 + b𝜈−2 t1 + · · · + b0 t𝜈−1 )x𝜈−1 with deg(r(x)) < 𝜈 and s(x) = −((b2p−1 t1 + b2p−2 t2 + · · · + b2p−1−𝜈 t𝜈 ) + · · · + b2p−1 t𝜈 x𝜈−1 ). Example 5.22 In this example, computations are done in ℤ5 . The sequence {2, 3, 4, 2, 2, 3}, corresponding to the polynomial b(x) = 2 + 3x + 4x2 + 2x3 + 2x4 + 3x5 , can be generated (as can be found using techniques

5.2 Number Theoretic Preliminaries

191

described below or the Berlekamp–Massey algorithm) using the coefficients t1 = 3, t2 = 4, t3 = 2, so that t(x) = 1 + 3x + 4x2 + 2x3 . We have p = 3 and 𝜈 = 3. Then in ℤ5 [x], b(x)t(x) = 2 + 4x + x2 + x6 + x7 + x8 = (2 + 4x + x2 ) + x6 (1 + x + x2 ). ⏟⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏟ r(x)

(5.20)

−x2p s(x)

The terms x3 , x4 , and x5 are missing, corresponding to terms with “complete sums.” We identify r(x) = 2 + 4x + x2

s(x) = −(1 + x + x2 ).



The linear relationship (5.13) implies and is implied by the structure of (5.19). Equation (5.19) can also be expressed as r(x) − b(x)t(x) = s(x), x2p that is, x2p |(r(x) − b(x)t(x)). As is described below, this means that we can also write b(x)t(x) ≡ r(x) (mod x2p ). There are two ways of interpreting this relationship. The first is, as observed, that x2p |(r(x) − b(x)t(x)). The second is to work with remainders modulo x2p , which results in discarding all terms of degree x2p or higher. For example, if b(x) and t(x) are multiplied together, then taking the result and discarding terms of degree 2p or higher, the result is r(x). Equation (5.19) can also be expressed as x2p s(x) + b(x)t(x) = r(x).

(5.21)

Equation (5.21) has the form of the equation related to the GCD in the extended Euclidean algorithm in (5.4), where in the Euclidean algorithm we set a(x) = x2p and b(x) is formed from the sequence {bk }. Rather than iterating the extended Euclidean algorithm until a remainder ri (x) = 0, iteration continues until deg(ri (x)) < p. Then the polynomial ti (x) returned by the extended Euclidean algorithm (normalized so that the constant coefficient is equal to 1) is the desired connection polynomial. Example 5.23 Given the sequence {2, 3, 4, 2, 2, 3}, where the coefficients are in ℤ5 , calling the gcd function with a(x) = x6 and b(x) = 2 + 3x + 4x2 + 2x3 + 2x4 + 3x5 results after three iterations in ri (x) = 3 + x + 4x2

si (x) = 1 + x + x2

ti (x) = 4 + 2x + x2 + 3x3 .

Normalizing ti (x) and the other polynomials by scaling by 4−1 = 4, we find t(x) = 4(4 + 2x + x2 + 3x3 ) = 1 + 3x + 4x2 + 2x3 r(x) = 4(3 + x + 4x2 ) = 2 + 4x + x2 s(x) = 4(1 + x + x2 ) = 4 + 4x + 4x2 = −(1 + x + x2 ). These correspond to the polynomials in (5.20).



One of the useful attributes of the Sugiyama algorithm is that it determines the coefficients {t1 , . . . , tp } satisfying (5.13) with the smallest value of 𝜈. Put another way, it determines the t(x) of smallest degree satisfying (5.19). Example 5.24 To see this, consider the sequence {2, 3, 2, 3, 2, 3}. This can be generated by the polynomial t1 (x) = 1 + 3x + 4x2 + 2x3 , since b(x)t1 (x) = 2 + 4x + 4x2 + 3x6 + x7 + x8 = (2 + 4x + 4x2 ) + x6 (3 + x + x2 ).

192

Rudiments of Number Theory and Algebra However, as a result of calling the Sugiyama algorithm, we obtain the polynomial t(x) = 1 + x, so b(x)t(x) = 2 + 3x6 . It may be observed (in retrospect) that the sequence of coefficients in b happen to satisfy bk = −bk−1 , consistent with the t(x) obtained. ◽

5.2.4

Congruence

Operations modulo an integer are fairly familiar. We frequently deal with operations on a clock modulo 24, “If it is 10:00 now, then in 25 hours it will be 11:00,” or on a week modulo 7, “If it is Tuesday, then in eight days it will be Wednesday.” The concept of congruence provides a notation to capture the idea of modulo operations. Definition 5.25 If an integer m ≠ 0 divides a − b, then we say that a is congruent to b modulo m and write a ≡ b (mod m). If a polynomial m(x) ≠ 0 divides a(x) − b(x), then we say that a(x) is congruent to b(x) modulo m(x) and write a(x) ≡ b(x) (mod m(x)). In summary: (5.22) a ≡ b (mod m) if and only if m|(a − b). ◽ Example 5.26 1. 7 ≡ 20 (mod 13). 2. 7 ≡ −6 (mod 13).



Congruences have the following basic properties. Theorem 5.27 [325, Theorem 2.1,Theorem 2.3, Theorem 2.4] For integers a, b, c, d, x, y, m: 1. a ≡ b (mod m) ⇔ b ≡ a (mod m) ⇔ b − a ≡ 0 (mod m). 2. If a ≡ b (mod m) and b ≡ c (mod m) then a ≡ c (mod m). 3. If a ≡ b (mod m) and c ≡ d (mod m) then ax + cy ≡ bx + dy (mod m). 4. If a ≡ b (mod m) and c ≡ d (mod m) then ac ≡ bd (mod m). From this it follows that if a ≡ b (mod m) then an ≡ bn (mod m). 5. If a ≡ b (mod m) and d|m and d > 0 then a ≡ b (mod d). 6. If a ≡ b (mod m) then for c > 0, ac ≡ bc (mod mc). 7. ax ≡ ay (mod m) if and only if x ≡ y (mod m∕(a, m)). 8. If ax ≡ ay (mod m) and (a, m) = 1 then x ≡ y (mod m). 9. If a ≡ b (mod m) then (a, m) = (b, m). From the definition, we note that if n|a, then a ≡ 0 (mod n).

5.2 Number Theoretic Preliminaries 5.2.5

193

The 𝜙 Function

Definition 5.28 The Euler totient function 𝜙(n) is the number of positive integers less than n that are relatively prime to n. This is also called the Euler 𝜙 function, or sometimes just the 𝜙 function. ◽ Example 5.29 1. 𝜙(5) = 4 (the numbers 1, 2, 3, 4 are relatively prime to 5). 2. 𝜙(4) = 2 (the numbers 1 and 3 are relatively prime to 4). 3. 𝜙(6) = 2 (the numbers 1 and 5 are relatively prime to 6).



It can be shown that the 𝜙 function can be written as ( ) ∏p−1 ∏ 1 1− =n , 𝜙(n) = n p p p|n p|n where the product is taken over all primes p dividing n. Example 5.30 𝜙(189) = 𝜙(3 ⋅ 3 ⋅ 3 ⋅ 7) = 189(1 − 1∕3)(1 − 1∕7) = 108. 𝜙(64) = 𝜙(26 ) = 64(1 − 1∕2) = 32.



We observe that: 1. 𝜙(p) = p − 1 if p is prime. 2. For distinct primes p1 and p2 , 𝜙(p1 p2 ) = (p1 − 1)(p2 − 1).

(5.23)

3. 𝜙(pm ) = pm−1 (p − 1) for p prime. 4. 𝜙(pm qn ) = pm−1 qn−1 (p − 1)(q − 1) for distinct primes p and q. 5. For positive integers m and n with (m, n) = 1, 𝜙(mn) = 𝜙(m)𝜙(n). 5.2.6

(5.24)

Some Cryptographic Payoff

With all the effort so far introducing number theory, it is interesting to put it to work on a problem of practical interest: public key cryptography using the RSA algorithm. This is really a topic distinct from error correction coding, but the application is important in modern communication and serves to motivate some of these theoretical ideas. In a symmetric public key encryption system, a user B has a private “key” which is only known to B and a public “key” which may be known to any interested party, C. A message encrypted by one key (either the public or private) can be decrypted by the other. For example, if C wants to send a sealed letter so that only B can read it, C encrypts using B’s public key. Upon reception, B can read it by deciphering using his private key. Or, if B wants to send a letter that is known to come from only him, B encrypts with his private key. Upon receipt, C can successfully decrypt only using B’s public key.

194

Rudiments of Number Theory and Algebra Public key encryption relies upon a “trapdoor”: an operation which is exceedingly difficult to compute unless some secret information is available. For the RSA encryption algorithm, the secret information is number theoretic: it relies upon the difficulty of factoring very large integers. 5.2.6.1

Fermat’s Little Theorem

Theorem 5.31 1. (Fermat’s little theorem)2 If p is a prime and if a is an integer such that (a, p) = 1 (i.e., p does not divide a), then p divides ap−1 − 1. Stated another way, if a ≢ 0 (mod p), ap−1 ≡ 1 (mod p). 2. (Euler’s generalization of Fermat’s little theorem) If n and a are integers such that (a, n) = 1, then a𝜙(n) ≡ 1 (mod n), where 𝜙 is the Euler 𝜙 function. For any prime p, 𝜙(p) = p − 1 and we get Fermat’s little theorem. Example 5.32 1. Let p = 7 and a = 2. Then ap−1 − 1 = 26 − 1 = 63 and p|26 − 1. 2. Compute the remainder of 8103 when divided by 13. Note that 103 = 8 ⋅ 12 + 7. Then with all computations modulo 13, 8103 = (812 )8 (87 ) ≡ (18 )(87 ) ≡ (−5)7 ≡ (−5)6 (−5) ≡ (25)3 (−5) ≡ (−1)3 (−5) ≡ 5. In the first step 812 can be found as follows. With n = 13, 𝜙(n) = 12, and (8, 13) = 1. By Euler’s generalization, 8𝜙(n) ≡ 1 modulo 13. ◽

Proof of Theorem 5.31. 1. The nonzero elements in the group ℤp , {1, 2, . . . , p − 1} form a group of order p − 1 under multiplication. By Lagrange’s theorem (Theorem 2.29), the order of any element in a group divides the order of the group, so for a ∈ ℤp with a ≠ 0, ap−1 = 1 in ℤp . If a ∈ ℤ and a ∉ ℤp , write a = (ã + kp) for some k ∈ ℤ and for 0 ≤ ã < p, then reduce modulo p. 2. Let Gn be the set of elements in ℤn that are relatively prime to n. Then (it can be shown that) Gn forms a group under multiplication. Note that the group Gn has 𝜙(n) elements in it. Now let a ∈ ℤn be relatively prime to n. Then a is in the group Gn . Since the order of an element divides the order of the group, we have a𝜙(n) ≡ 1 (mod n). If a ∉ ℤn , write a = (ã + kn) where ã ∈ ℤn . Then reduce modulo n. ◽

2 Fermat’s little theorem should not be confused with “Fermat’s last theorem,” proved by A. Wiles, which states that xn + yn = zn has no solution over the integers if n > 2.

5.2 Number Theoretic Preliminaries 5.2.6.2

195

RSA Encryption

Named after its inventors, Ron Rivest, Adi Shamir, and Leonard Adleman [378], the RSA encryption algorithm gets its security from the difficulty of factoring large numbers. The steps in setting up the system are:



Choose two distinct random prime numbers p and q (of roughly equal length for best security) and compute n = pq. Note that 𝜙(n) = (p − 1)(q − 1).



Randomly choose an encryption key e, an integer e such that the GCD (e, (p − 1)(q − 1)) = 1. By the extended Euclidean algorithm, there are numbers d and f such that de − f (p − 1)(q − 1) = 1, or de = 1 + (p − 1)(q − 1)f . That is, d ≡ e−1 (mod (p − 1)(q − 1)). (If the value of d returned by the Euclidean algorithm is negative, then, since the computations are done modulo (p − 1)(q − 1), a positive value can be obtained by adding multiples of (p − 1)(q − 1) to d until a positive integer is obtained.)



Publish the pair of numbers {e, n} as the public key. Retain the pair of numbers {d, n} as the private key. The factors p and q of n are never disclosed.

To encrypt (say, using the public key), break the message m (as a sequence of numbers) into blocks mi of length less than the length of n. Furthermore, assume that (mi , n) = 1 (which is highly probable, since n has only two factors). For each block mi , compute the encrypted block ci by ci = mei (mod n). (If e < 0, then find the inverse modulo n.) To decrypt (using the corresponding private key) compute f (p−1)(q−1)+1

cdi (mod n) = mde i (mod n) = m i Since (mi , n) = 1,

f (p−1)(q−1)

mi

f (p−1)(q−1)

(mod n) = mi mi

(mod n).

= (mi )(p−1)(q−1) = (mi )𝜙(n) ≡ 1 (mod n), f

f

so that cdi (mod n) = mi , as desired. To crack this, a person knowing n and e would have to factor n to find d. While multiplication (and computing powers) is easy and straightforward, factoring very large integers is very difficult. Example 5.33 Then

Let p = 47 and q = 71 and n = 3337. (This is clearly too short to be of cryptographic value.) (p − 1)(q − 1) = 3220.

The encryption key e must be relatively prime to 3220; take e = 79. Then we find by the Euclidean algorithm that d = 1019. The public key is (79, 3337). The private key is (1019, 3337). To encode the message block m1 = 688, we compute c1 = (688)79 (mod 3337) = 1570. To decrypt this, exponentiate cd1 = (1570)1019 (mod 3337) = 688.



In practical applications, primes p and q are usually chosen to be at least several hundred digits long. This makes factoring n = pq exceedingly difficult!

196

Rudiments of Number Theory and Algebra

5.3

The Chinese Remainder Theorem

The material from this section is not used until Section 7.4.2. Example 5.34

The integer x = 1028 has the property that x ≡ 0 (mod 4)

x ≡ 2 (mod 27) x ≡ 3 (mod 25).

(5.25)

In this forward direction, computation is very straightforward. But suppose you are given only the equivalences in (5.25), and it is desired to determine the integer x. The method of solution is found using the Chinese Remainder Theorem (CRT). ◽

In its simplest interpretation, the CRT is a method for finding the simultaneous solution to the set of congruences x ≡ a1 (mod m1 )

x ≡ a2 (mod m2 )

...

x ≡ ar (mod mr ).

(5.26)

However, the CRT applies not only to integers, but to other Euclidean domains, including rings of polynomials. The CRT provides an interesting isomorphism between rings which is useful in some decoding algorithms. One approach to the solution of (5.26) would be to find the solution set to each congruence separately, then determine if there is a point in the intersection of these sets. The following theorem provides a more constructive solution. Theorem 5.35 If m1 , m2 , . . . , mr are pairwise relatively prime elements with positive valuation in a Euclidean domain R, and a1 , a2 , . . . , ar are any elements in the Euclidean domain, then the set of congruences in (5.26) have common solutions. Let m = m1 m2 · · · mr . If x0 is a solution, then so is x = x0 + km for any k ∈ R. ∏ Proof Let m = rj=1 mj . Observe that (m∕mj , mj ) = 1 since the mi s are relatively prime. By Theorem 5.15, there are unique elements s and t such that (m∕mj )s + mj t = 1, which is to say that (m∕mj )s ≡ 1 (mod mj ). Let bj = s in this expression, so that we can write (m∕mj )bj ≡ 1 (mod mj ).

(5.27)

Also (m∕mj )bj ≡ 0 (mod mi ) if i ≠ j since mi |m. Form a solution x0 by x0 =

r ∑ (m∕mj )bj aj .

(5.28)

j=1

Then x0 ≡ (m∕mi )bi ai ≡ ai (mod mi ). Uniqueness is straightforward to verify, as is the fact that x = x0 + km is another solution.



It is convenient to introduce the notation Mi = m∕mi . The solution (5.28) can be written as ∑ x0 = rj=1 𝛾j aj , where m (5.29) 𝛾j = bj = Mj bj mj

5.3 The Chinese Remainder Theorem

197

with bj determined by the solution to (5.27). Observe that 𝛾j depends only upon the set of moduli {mi } and not upon x. If the 𝛾j s are precomputed, then the synthesis of x from the {ai } is a simple inner product. 2 3 4 5 6 7 8 9 αα α α α α α α α

+

Example 5.36

Find a solution x to the set of congruences x ≡ 0 (mod 4)

01010

x ≡ 2 (mod 27) x ≡ 3 (mod 25).

10

11

00000

12

13

14

15

16

α α α α α α α

crtgamma.m fromcrt.m tocrt.m testcrt.m

Since the moduli mi are powers of distinct primes, they are pairwise relative prime. Then m = m1 m2 m3 = 2700. Using the Euclidean algorithm it is straightforward to show that (m∕4)b1 ≡ 1 (mod 4) has solution b1 = −1. Similarly b2 = 10 b3 = −3. The solution to the congruences is given by x = (m∕4)(−1)(0) + (m∕27)(10)(2) + (m∕25)(−3)(3) = 1028.



The CRT provides a means of representing integers in the range 0 ≤ x < m, where m = m1 m2 · · · mr and the mi are pairwise relatively prime. Let R∕⟨m⟩ denote the ring of integers modulo m and let R∕⟨mi ⟩ denote the ring of integers modulo mi . Given a number x ∈ R∕⟨m⟩, it can be decomposed into an r-tuple [x1 , x2 , . . . , xr ] by x ≡ xi (mod mi ), i = 1, 2, . . . , r, where xi ∈ R∕⟨mi ⟩. Going the other way, an r-tuple of numbers [x1 , x2 , . . . , xr ] with 0 ≤ xi < mi can be converted into the number x they represent using (5.28). If we let x = [x1 , x2 , . . . , xr ], then the correspondence between a number x and its representation using the CRT can be represented as x ↔ x. We also denote this as x = CRT(x)

x = CRT−1 (x).

Ring operations can be equivalently computed in the original ring R∕⟨m⟩, or in each of the rings R∕⟨mi ⟩ separately. Example 5.37 Let m1 = 4, m2 = 27, m3 = 25. Let x = 25; then x = [1, 25, 0]. Let y = 37; then y = [1, 10, 12]. The sum z = x + y = 62 has z = [2, 8, 12], which represents x + y, added element by element, with the first component modulo 4, the second component modulo 27, and the third component modulo 25. The product z = x ⋅ y = 925 has z = [1, 7, 0], corresponding to the element-by-element product (modulo 4, 27, and 25, respectively). ◽

More generally, we have a ring isomorphism by the CRT. Let 𝜋i ∶ R∕⟨m⟩ → R∕⟨mi ⟩, i = 1, 2, . . . , r denote the ring homomorphism defined by 𝜋i (a) = a (mod mi ). We define the homomorphism 𝜒 ∶ R∕⟨m⟩ → R∕⟨m1 ⟩ × R∕⟨m2 ⟩ × · · · × R∕⟨mr ⟩ by 𝜒 = 𝜋1 × 𝜋2 × · · · × 𝜋r , that is, 𝜒(a) = (a mod m1 , a mod m2 , . . . , a mod mr ).

(5.30)

Then 𝜒 defines a ring isomorphism: both the additive and multiplicative structure of the ring are preserved, and the mapping is bijective. Example 5.38

The CRT can be applied to any Euclidean domain, not just integers.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

testcrtp.m tocrtpoly.m fromcrtpoly.m crtgammapoly.m

198

Rudiments of Number Theory and Algebra Suppose the Euclidean domain is ℝ[x] (polynomials with real coefficients) and the polynomials are m1 (x) = (x − 1)

m2 (x) = (x − 2)2

m3 (x) = (x − 3)3 .

These are clearly pairwise relatively prime, as can be verified with the Euclidean algorithm. Let f (x) = x5 + 4x4 + 5x3 + 2x2 + 3x + 2, then

f (x) (mod m1 (x)) ≡ a1 (x) = 17 f (x) (mod m2 (x)) ≡ a2 (x) = 279x − 406 f (x) (mod m3 (3)) ≡ a3 (x) = 533x2 − 2211x + 2567.

We find that m(x) = m1 (x)m2 (x)m3 (x) = x6 − 14x5 + 80x4 − 238x3 + 387x2 − 324x + 108. The polynomial f (x) can be represented as the 3-tuple of moduli polynomials: f (x) ↔ (17, 279x − 406, 533x2 − 2211 + 2567) = f (x). 1

5.3.1 5.3.1.1



The CRT and Interpolation The Evaluation Homomorphism

Let 𝔽 be a field and let R = 𝔽 [x]. Let f (x) ∈ R and let m1 (x) = x − u1 . Then computing the remainder of f (x) modulo x − u1 gives exactly f (u1 ). Example 5.39 Let f (x) = x4 + 3x3 + 2x2 + 4 ∈ ℝ[x] and let m1 (x) = x − 3. Then computing f (x)∕m1 (x) by long division, we obtain the quotient and remainder f (x) = (x − 3)(x3 + 6x2 + 20x + 60) + 184. But we also find that f (3) = 184. ◽

So f (x) mod (x − 3) = 184 = f (3).

Thus, we can write f (x) mod (x − u) = f (u). The mapping 𝜋i ∶ 𝔽 [x] → 𝔽 defined by 𝜋i (f (x)) = f (ui ) = f (x) (mod (x − ui )) is called the evaluation homomorphism. It can be shown that it is, in fact, a homomorphism: for two polynomials f (x), g(x) ∈ 𝔽 [x], 𝜋i (f (x) + g(x)) = 𝜋i (f (x)) + 𝜋i (g(x)) 𝜋i (f (x)g(x)) = 𝜋i (f (x))𝜋i (g(x)). 5.3.1.2

The Interpolation Problem

Suppose we are given the following problem: Given a set of points (ui , ai ), i = 1, 2, . . . , r, determine an interpolating polynomial f (x) ∈ 𝔽 [x] of degree < r such that f (u1 ) = a1 , f (u2 ) = a2 , . . . , f (ur ) = ar .

(5.31)

5.3 The Chinese Remainder Theorem

199

Now let f (x) = f0 + f1 x + · · · + fr−1 xr−1 . Since deg(f (x)) < r, we can think of f (x) as being in 𝔽 [x]∕(xr − 1). Also let mi (x) = x − ui ∈ 𝔽 [x] for i = 1, 2, . . . , r where the u1 , u2 , . . . , ur ∈ F are pairwise distinct. Then the mi (x) are pairwise relatively prime. By the evaluation homomorphism, the set of constraints (5.31) can be expressed as f (x) ≡ a1 (mod m1 (x)) f (x) ≡ a2 (mod m2 (x)) f (x) ≡ ar (mod mr (x)). So solving the interpolation problem simply becomes an instance of solving a CRT. ∏ The interpolating polynomial is found using the CRT. Let m(x) = ri=1 (x − ui ). By the proof of Theorem 5.35, we need functions bj (x) such that (m(x)∕mj (x))bj (x) ≡ 1 (mod mj ) and (m(x)∕mj (x))bj (x) ≡ 0 (mod mk ). That is,

[

r ∏

] (x − ui ) bj (x) ≡ 1 (mod mj ),

i=1,i≠j

[

and

r ∏

] (x − ui ) bj (x) ≡ 0 (mod mk )

i=1,i≠j

for k ≠ j. Let bj (x) = and let

r ∏

1 (u − ui ) j i=1,i≠j

r ∏ (x − ui ) lj (x) = (m(x)∕mj (x))bj (x) = . (u j − ui ) i=1,i≠j

(5.32)

Since lj (uj ) = 1 and

j ≠ k,

lj (uk ) = 0,

we see that bj (x) satisfies the necessary requirements. By (5.28), the interpolating polynomial is then simply r r ∑ ∑ (m(x)∕mj (x))bj (x)aj = aj lj (x). (5.33) f (x) = j=1

j=1

This form of an interpolating polynomial is called a Lagrange interpolator. The basis functions li (x) are called Lagrange interpolants. By the CRT, this interpolating polynomial is unique modulo m(x). The Lagrange interpolator can be expressed in another convenient form. Let m(x) =

r ∏

(x − ui ).

i=1

Then the derivative3 is ′

m (x) =

r ∑ ∏

(x − ui )

k=1 i≠k

3

Or formal derivative, if the field of operations is not real or complex.

200

Rudiments of Number Theory and Algebra so that m′ (uj ) =



(uj − ui ).

i≠j

(See also Definition 6.35.) The interpolation formula (5.33) can now be written as f (x) =

r ∑ i=1

ai

m(x) 1 . ′ (x − ui ) m (ui )

(5.34)

Example 5.40 An important instance of interpolation is the discrete Fourier transform (DFT). Let f (x) = f0 + f1 x + · · · + fN−1 xN−1 , with x = z−1 , be the Z-transform of a complex cyclic sequence. Then f (x) ∈ ℂ[x]∕⟨xN − 1⟩, since it is a polynomial of degree ≤ N − 1. Let m(x) = xN − 1. The N roots of m(x) are the complex numbers e−ij2𝜋∕N , i = 0, 1, . . . , N − 1. Let 𝜔 = e−j2𝜋∕N ; this is a primitive Nth root of unity. Then the factorization of m(x) can be written as N−1 ∏ (x − 𝜔i ), m(x) = xN − 1 = i=0

where the factors are pairwise relative prime. Define the evaluations Fk = 𝜋k (f (x)) = f (𝜔k ) =

N−1 ∑

fi 𝜔ik ,

k = 0, 1, . . . , N − 1.

i=0

Expressed another way, Fk = f (x) (mod x − 𝜔k ). The coefficients {fk } may be thought of as existing in a “time domain,” while the coefficients {Fk } may be thought of as existing in a “frequency domain.” By the ring isomorphism, for functions in the “time domain,” multiplication is polynomial multiplication (modulo xN − 1). That is, for polynomials f (x) and g(x), multiplication is f (x)g(x) (mod xN − 1), which amounts to cyclic convolution. For functions in the “transform domain,” multiplication is element by element. That is, for sequences (F0 , F1 , . . . , FN−1 ) and (G0 , G1 , . . . , GN−1 ), multiplication is element by element as complex numbers: (F0 G0 , F1 G1 , . . . , FN−1 GN−1 ). Thus, the ring isomorphism validates the statement: (cyclic) convolution in the time domain is equivalent to multiplication in the frequency domain. ◽

5.4

Fields

Fields were introduced in Section 2.3. We review the basic requirements here, in comparison with a ring. In a ring, not every element has a multiplicative inverse. In a field, the familiar arithmetic operations that take place in the usual real numbers are all available: ⟨F, +⟩ is an Abelian group. (Denote the additive identity element by 0.) The set F∖{0} (the set F with the additive identity removed) forms a commutative group under multiplication. Denote the multiplicative identity element by 1. Finally, as in a ring the operations + and ⋅ distribute: a ⋅ (b + c) = a ⋅ b + a ⋅ c for all a, b, c ∈ F. In a field, all the elements except the additive identity form a group, whereas in a ring, there may not even be a multiplicative identity, let alone an inverse for every element. Every field is a ring, but not every ring is a field. Example 5.41 ⟨ℤ5 , +, ⋅⟩ forms a field; every nonzero element has a multiplicative inverse. So this set forms not only a ring but also a group. Since this field has only a finite number of elements in it, it is said to be a finite field. ◽ However, ⟨ℤ6 , +, ⋅⟩ does not form a field, since not every element has a multiplicative inverse.

One way to obtain finite fields is described in the following.

5.4 Fields Theorem 5.42 The ring ⟨ℤp , +, ⋅⟩ is a field if and only if p is a prime. Before proving this, we need the following definition and lemma. Definition 5.43 In a ring R, if a, b ∈ R with both a and b not equal to zero but ab = 0, then a and b are said to be zero divisors. ◽ Lemma 5.44 In a ring ℤn , the zero divisors are precisely those elements that are not relatively prime to n. Proof Let a ∈ ℤn be not equal to 0 and be not relatively prime to n. Let d be the GCD of n and a. Then a(n∕d) = (a∕d)n, which, being a multiple of n, is equal to 0 in ℤn . We have thus found a number b = n∕d such that ab = 0 in ℤn , so a is a zero divisor in ℤn . Conversely, suppose that there is an a ∈ ℤn relatively prime to n such that ab = 0. Then it must be the case that ab = kn for some integer k. Since n has no factors in common with a, then it must divide b, which means that b = 0 in ℤn . ◽ Observe from this lemma that if p is a prime, there are no divisors of 0 in ℤp . We now turn to the proof of Theorem 5.42. Proof of Theorem 5.42. We have already shown that if p is not prime, then there are zero divisors and hence ⟨ℤp , +, ⋅⟩ cannot form a field. Let us now show that if p is prime, ⟨ℤp , +, ⋅⟩ is a field. We have already established that ⟨ℤp , +⟩ is a group. The key remaining requirement is to establish that ⟨ℤp ∖{0}, ⋅⟩ forms a group. The multiplicative identity is 1 and multiplication is commutative. The key remaining requirement is to establish that every nonzero element in ℤp has a multiplicative inverse. Let {1, 2, . . . , p − 1} be a list of the nonzero elements in ℤp , and let a ∈ ℤp be nonzero. Form the list {1a, 2a, . . . , (p − 1)a}. (5.35) Every element in this list is distinct, since if any two were identical, say ma = na with m ≠ n, then a(m − n) = 0, which is impossible since there are no zero divisors in ℤp . Thus, the list (5.35) contains all nonzero elements in ℤp and is a permutation of the original list. Since 1 is in the original list, it must appear in the list in (5.35). ◽ 5.4.1

An Examination of ℝ and ℂ

Besides the finite fields ⟨ℤp , +, ⋅⟩ with p prime, there are other finite fields. These fields are extension fields of ℤp . However, before introducing them, it is instructive to take a look at how the field of complex numbers ℂ can be constructed as a field extension from the field of real numbers ℝ. Recall that there are several representations for complex numbers. Sometimes it is convenient to use a “vector” notation, in which a complex number is represented as (a, b). Sometimes it is convenient to use a “polynomial” notation a + bi, where i is taken to be a root of the polynomial x2 + 1. However, since there is some preconception about the meaning of the symbol i, we replace it with the symbol √ 𝛼, which doesn’t carry the same connotations (yet). In particular, 𝛼 is not (yet) the symbol for −1. You may think of a + b𝛼 as being a polynomial of degree ≤ 1 in the “indeterminate” 𝛼. There is also a polar notation for complex numbers, in which the complex number is written as a + ib = rei𝜃 for

201

202

Rudiments of Number Theory and Algebra the appropriate r and 𝜃. Despite the differences in notation, it should be borne in mind that they all represent the same number. Given two complex numbers we define the addition component-by-component in the vector notation (a, b) and (c, d), where a, b, c, and d are all in ℝ, based on the addition operation of the underlying field ℝ. The set of complex number thus forms a two-dimensional vector space of real numbers. We define (a, b) + (c, d) = (a + c, b + d). (5.36) It is straightforward to show that this addition operation satisfies the group properties for addition, based on the group properties it inherits from ℝ. Now consider the “polynomial notation.” Using the conventional rules for adding polynomials, we obtain a + b𝛼 + c + d𝛼 = (a + c) + (b + d)𝛼, which is equivalent to (5.36). How, then, to define multiplication in such a way that all the field requirements are satisfied? If we simply multiply using the conventional rules for polynomial multiplication, (a + b𝛼)(c + d𝛼) = ac + (ad + bc)𝛼 + bd𝛼 2 ,

(5.37)

we obtain a quadratic polynomial, whereas complex numbers are represented as polynomials having degree ≤ 1 in the variable 𝛼. To reduce the degree as needed, polynomial multiplication must be followed by another step, computing the remainder modulo some other polynomial. Let us pick the polynomial g(𝛼) = 1 + 𝛼 2 to divide by. Dividing the product in (5.37) by g(𝛼)

we obtain the remainder (ac − bd) + (ad + bc)𝛼. Summarizing this, we now define the product of (a + b𝛼) by (c + d𝛼) by the following two steps: 1. Multiply (a + b𝛼) by (c + d𝛼) as polynomials. 2. Compute the remainder of this product when divided by g(𝛼) = 𝛼 2 + 1. That is, the multiplication is defined in the ring ℝ[𝛼]∕g(𝛼), as described in Section 4.4. Of course, having established the pattern, it is not necessary to carry out the actual polynomial arithmetic: by this two-step procedure we have obtained the familiar formula (a + b𝛼) ⋅ (c + d𝛼) = (ac − bd) + (ad + bc)𝛼 or, in vector form, (a, b) ⋅ (c, d) = (ac − bd, ad + bc). As an important example, suppose we want to multiply the complex numbers (in vector form) (0, 1) times (0, 1), or (in polynomial form) 𝛼 times 𝛼. Going through the steps of computing the product and the remainder we find 𝛼 ⋅ 𝛼 = −1. (5.38)

5.4 Fields

203

In other words, in the arithmetic that we have defined, the element 𝛼 satisfies the equation 𝛼 2 + 1 = 0.

(5.39) √ In other words, the indeterminate 𝛼 acts like the number −1. This is a result of the fact that multiplication is computed modulo of the polynomial g(𝛼) = 𝛼 2 + 1: the symbol 𝛼 is (now by construction) a root of the polynomial g(x). To put it another way, the remainder of a polynomial 𝛼 2 + 1 divided by 𝛼 2 + 1 is exactly 0. So, by this procedure, any time 𝛼 2 + 1 appears in any computation, it may be replaced with 0. Let us take another look at the polynomial multiplication in (5.37): (a + b𝛼)(c + d𝛼) = ac + (ad + bc)𝛼 + bd𝛼 2 .

(5.40)

Using (5.38), we can replace 𝛼 2 in (5.40) wherever it appears with expressions involving lower powers of 𝛼. We thus obtain (a + b𝛼)(c + d𝛼) = ac + (ad + bc)𝛼 + bd(−1) = (ac − bd) + (ad + bc)𝛼, as expected. If we had an expression involving 𝛼 3 it could be similarly simplified and expressed in terms of lower powers of 𝛼: 𝛼 3 = 𝛼 ⋅ 𝛼 2 = 𝛼 ⋅ (−1) = −𝛼. Using the addition and multiplication as defined, it is (more or less) straightforward to show that we have created a field which is, in fact, the field of complex numbers ℂ. As is explored in the exercises, it is important that the polynomial g(𝛼) used to define the multiplication operation not have roots in the base field ℝ. If g(𝛼) were a polynomial so that g(b) = 0 for some b ∈ ℝ, then the multiplication operation defined would not satisfy the field requirements, as there would be zero divisors. A polynomial g(x) that cannot be factored into polynomials of lower degree is said to be irreducible. By the procedure above, we have taken a polynomial equation g(𝛼) which has no real roots (it is irreducible) and created a new element 𝛼 which is the root of g(𝛼), defining along the way an arithmetic system that is mathematically useful (it is a field). The new field ℂ, with the new element 𝛼 in it, is said to be an extension field of the base field ℝ. At this point, it might be a tempting intellectual exercise to try to extend ℂ to a bigger field. However, we won’t attempt this because: 1. The extension created is sufficient to demonstrate the operations necessary to extend a finite field to a larger finite field; and (more significantly) 2. It turns out that ℂ does not have any further extensions: it already contains the roots of all polynomials in ℂ[x], so there are no other polynomials by which it could be extended. This fact is called the fundamental theorem of algebra. There are a couple more observations that may be made about operations in ℂ. First, we point out again that addition in the extension field is easy, being simply element-by-element addition of the vector representation. Multiplication has its own special rules, determined by the polynomial g(𝛼). However, if we represent complex numbers in polar form, a + b𝛼 = r1 ej𝜃1

c + d𝛼 = r2 ej𝜃2 ,

then multiplication is also easy: simply multiply the magnitudes and add the angles: r1 ej𝜃1 ⋅ r2 ej𝜃2 = r1 r2 ej(𝜃1 +𝜃2 ) . Analogously, we will find that addition in the Galois fields we construct is achieved by straightforward vector addition, while multiplication is achieved either by some operation which depends on a polynomial g, or by using a representation loosely analogous to the polar form for complex numbers, in which the multiplication is more easily computed.

204

Rudiments of Number Theory and Algebra 5.4.2

Galois Field Construction: An Example

A subfield of a field is a subset of the field that is also a field. For example, ℚ is a subfield of ℝ. A more potent concept is that of an extension field. Viewed one way, it simply turns the idea of a subfield around: an extension field E of a field F is a field in which F is a subfield. The field F in this case is said to be the base field. But more importantly is the way that the extension field is constructed. Extension fields are constructed to create roots of irreducible polynomials that do not have roots in the base field. Definition 5.45 A nonconstant polynomial f (x) ∈ R[x] is irreducible over R if f (x) cannot be expressed as a product g(x)h(x) where both g(x) and h(x) are polynomials of degree less than the degree of f (x) and g(x) ∈ R[x] and h(x) ∈ R[x]. ◽ In this definition, the ring (or field) in which the polynomial is irreducible makes a difference. For example, the polynomial f (x) = x2 − 2 is irreducible over ℚ, but over the real numbers we can write √ √ f (x) = (x + 2)(x − 2), so f (x) is reducible over ℝ. We have already observed that ⟨ℤp , +, ⋅⟩ forms a field when p is prime. It turns out that all finite fields have order equal to some power of a prime number, pm . For m > 1, the finite fields are obtained as extension fields to ℤp using an irreducible polynomial in ℤp [x] of degree m. These finite fields are usually denoted by GF(pm ) or GF(q) where q = pm , where GF stands for “Galois field,” named after the French mathematician Èveriste Galois (See Box 5.1).

Box 5.1: Èveriste Galois (1811–1832) The life of Galois is a study in brilliance and tragedy. At an early age, Galois studied the works in algebra and analysis of Abel and Lagrange, convincing himself (justifiably) that he was a mathematical genius. His mundane schoolwork, however, remained mediocre. He attempted to enter the Ècole Polytechnique, but his poor academic performance resulted in rejection, the first of many disappointments. At the age of seventeen, he wrote his discoveries in algebra in a paper which he submitted to Cauchy, who lost it. Meanwhile, his father, an outspoken local politician who instilled in Galois a hate for tyranny, committed suicide after some persecution. Sometime later, Galois submitted another paper to Fourier. Fourier took the paper home and died shortly thereafter, thereby resulting in another lost paper. As a result of some outspoken criticism against its director, Galois was expelled from the normal school he was attending. Yet another paper presenting his works in finite fields was a failure, being rejected by the reviewer (Poisson) as being too incomprehensible. Disillusioned, Galois joined the National Guard, where his outspoken nature led to some time in jail for a purported insult against Louis Philippe. Later he was challenged to a duel — probably a setup — to defend the honor of a woman. The night before the duel, Galois wrote a lengthy letter describing his discoveries. The letter was eventually published in Revue Encylopèdique. Alas, Galois was not there to read it: he was shot in the stomach in the duel and died the following day of peritonitis at the tender age of 20.

We demonstrate the extension process by constructing the operations for the field GF(24 ), analogous to the way the complex field was constructed from the real field. Any number in GF(24 ) can be represented as a 4-tuple (a, b, c, d), where a, b, c, d ∈ GF(2). Addition of these numbers is defined

5.4 Fields

205

to be element-by-element, modulo 2: For (a1 , a2 , a3 , a4 ) ∈ GF(24 ) and (b1 , b2 , b3 , b3 ) ∈ GF(24 ), where ai ∈ GF(2) and bi ∈ GF(2), (a1 , a2 , a3 , a4 ) + (b1 , b2 , b3 , b4 ) = (a1 + b1 , a2 + b2 , a3 + b3 , a4 + b4 ). Example 5.46

Add the numbers (1, 0, 1, 1) + (0, 1, 0, 1). Recall that in GF(2), 1 + 1 = 0, so that we obtain (1, 0, 1, 1) + (0, 1, 0, 1) = (1, 1, 1, 0).



To define the multiplicative structure, we need an irreducible polynomial of degree 4. The polynomial g(x) = 1 + x + x4 is irreducible over GF(2). (This can be verified since g(0) = 1 and g(1) = 1, which eliminates linear factors and it can be verified by exhaustion that the polynomial cannot be factored into quadratic factors.) In the extension field GF(24 ), define 𝛼 to be a root of g: 𝛼 4 + 𝛼 + 1 = 0, or 𝛼 4 = 1 + 𝛼.

(5.41)

A 4-tuple (a, b, c, d) representing a number in GF(24 ) has a representation in polynomial form a + b𝛼 + c𝛼 2 + d𝛼 3 . Now take successive powers of 𝛼 beyond 𝛼 4 : 𝛼 4 = 1 + 𝛼, 𝛼 5 = 𝛼(𝛼 4 ) = 𝛼 + 𝛼 2 , 𝛼 6 = 𝛼 2 (𝛼 4 ) = 𝛼 2 + 𝛼 3 ,

(5.42)

𝛼 7 = 𝛼 3 (𝛼 4 ) = 𝛼 3 (1 + 𝛼) = 𝛼 3 + 1 + 𝛼, and so forth. In fact, because of the particular irreducible polynomial g(x) which we selected, powers of 𝛼 up to 𝛼 14 are all distinct and 𝛼 15 = 1. Thus, all 15 of the nonzero elements of the field can be represented as powers of 𝛼. This gives us something analogous to a “polar” form; we call it the “power” representation. The relationship between the vector representation, the polynomial representation, and the “power” representation for GF(24 ) is shown in Table 5.1. The fact that a 4-tuple has a corresponding representation as a power of 𝛼 is denoted using ↔. For example, (0, 1, 0, 0) ↔ 𝛼

(0, 1, 1, 0) ↔ 𝛼 5 .

The Vector Representation (integer) column of the table is obtained from the Vector Representation column by binary-to-decimal conversion, with the least-significant bit on the left. Example 5.47 In GF(24 ) multiply the Galois field numbers 1 + 𝛼 + 𝛼 3 and 𝛼 + 𝛼 2 . Step 1 is to multiply these “as polynomials” (where the arithmetic of the coefficients takes place in GF(2)): (1 + 𝛼 + 𝛼 3 ) ⋅ (𝛼 + 𝛼 2 ) = 𝛼 + 𝛼 3 + 𝛼 4 + 𝛼 5 . Step 2 is to reduce using Table 5.1 or, equivalently, to compute the remainder modulo 𝛼 4 + 𝛼 + 1: 𝛼 + 𝛼 3 + 𝛼 4 + 𝛼 5 = 𝛼 + 𝛼 3 + (1 + 𝛼) + (𝛼 + 𝛼 2 ) = 1 + 𝛼 + 𝛼 2 + 𝛼 3 . So in GF(24 ), (1 + 𝛼 + 𝛼 3 ) ⋅ (𝛼 + 𝛼 2 ) = 1 + 𝛼 + 𝛼 2 + 𝛼 3 .

206

Rudiments of Number Theory and Algebra Table 5.1: Power, Vector, and Polynomial Representations of GF(24 ) as an Extension Using g(𝛼) = 1 + 𝛼 + 𝛼 4 Polynomial Representation 0 1 𝛼

1 +

1 + 1 1 + 1 + 1 1

𝛼 𝛼

𝛼2

+

𝛼 𝛼 𝛼 𝛼 𝛼

𝛼2 𝛼2

+

𝛼2

+ + + +

𝛼 𝛼2 𝛼2 𝛼2

𝛼3

+ +

𝛼3 𝛼3

+

𝛼3

+ + + +

𝛼3 𝛼3 𝛼3 𝛼3

2

Vector Representation

Vector Representation (integer)

Power Representation 𝛼n

Logarithm n

Zech Logarithm z(n)

0000 1000 0100 0010 0001 1100 0110 0011 1101 1010 0101 1110 0111 1111 1011 1001

0 1 2 4 8 3 6 12 11 5 10 7 14 15 13 9

– 1 = 𝛼0 𝛼 𝛼2 𝛼3 𝛼4 𝛼5 𝛼6 𝛼7 𝛼8 𝛼9 𝛼 10 𝛼 11 𝛼 12 𝛼 13 𝛼 14

– 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

– – 4 8 14 1 10 13 9 2 7 5 12 11 6 3

Note that 𝛼 15 = 1. (The Zech logarithm is explained in Section 5.6.)

In vector notation, we could also write (1, 1, 0, 1) ⋅ (0, 1, 1, 0) = (1, 1, 1, 1). This product can also be computed using the power representation. Since (1, 1, 0, 1) ↔ 𝛼 7 and (0, 1, 1, 0) ↔ 𝛼 5 , we have (1, 1, 0, 1) ⋅ (0, 1, 1, 0) ↔ 𝛼 7 𝛼 5 = 𝛼 12 ↔ (1, 1, 1, 1). The product is computed simply using the laws of exponents (adding the exponents).

Example 5.48



In GF(24 ), compute 𝛼 12 ⋅ 𝛼 5 . In this case, we would get 𝛼 12 ⋅ 𝛼 5 = 𝛼 17 .

However since 𝛼 15 = 1, 𝛼 12 ⋅ 𝛼 5 = 𝛼 17 = 𝛼 15 𝛼 2 = 𝛼 2 .



We compute 𝛼 a 𝛼 b = 𝛼 c by finding c = (a + b) (mod pm − 1). Since the exponents are important, a nonzero number is frequently represented by the exponent. The exponent is referred to as the logarithm of the number. It should be pointed that this power (or logarithm) representation of the Galois field exists because of the particular polynomial g(𝛼) which was chosen. The polynomial is not only irreducible, it is also primitive which means that successive powers of 𝛼 up to 2m − 1 are all unique, just as we have seen. While different irreducible polynomials can be used to construct the field there is, in fact, only one field with q elements in it, up to isomorphism. Tables which provide both addition and multiplication operations for GF(23 ) and GF(24 ) are provided inside the back cover of this book.

5.5 Galois Fields: Mathematical Facts 5.4.3

207

Connection with Linear Feedback Shift Registers

The generation of a Galois field can be represented using an LFSR with g(x) as the connection polynomial by labeling the registers as 1, 𝛼, 𝛼 2 , and 𝛼 3 , as shown in Figure 5.1. As the LFSR is clocked, successive powers of 𝛼 are represented by the state of the LFSR. The sequence of register contents as the LFSR is clocked n steps is shown in the table below. n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

𝛼 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0

1 1 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1

𝛼2 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0

𝛼3 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1

The register contents correspond exactly with the nonzero vectors of the Vector Representation column of Table 5.1. The state contents provide the vector representation, while the count provides the exponent in the power representation. 1

D

+

𝛼

𝛼2

𝛼3

D

D

D

Figure 5.1: LFSR labeled with powers of 𝛼 to illustrate Galois field elements.

5.5

Galois Fields: Mathematical Facts

Having presented an example of constructing a Galois field, we now lay out some aspects of the theory. We first examine the additive structure of finite fields, which tells us what size any finite field can be. Recalling Definition 4.6, that the characteristic is the smallest positive integer m such that m(1) = 1 + 1 + · · · + 1 = 0, we have the following. Lemma 5.49 The characteristic of a field must be either 0 or a prime number. Proof If the field has characteristic 0, the field must be infinite. Otherwise, suppose that the characteristic is a finite number k. Assume k is a composite number. Then k(1) = 0 and there are integers m ≠ 1 and n ≠ 1 such that k = mn. Then 0 = k(1) = (mn)(1) = m(1)n(1) = 0. But a field has no zero divisors, so either m or n is the characteristic, violating the minimality of the characteristic. ◽

208

Rudiments of Number Theory and Algebra It can be shown that any field of characteristic 0 contains the field ℚ. On the basis of this lemma, we can observe that in a finite field GF(q), there are p elements (p a prime number) {0, 1, 2 = 2(1), . . . , (p − 1) = (p − 1)(1)} which behave as a field (i.e., we can define addition and multiplication on them as a field). Thus, ℤp (or something isomorphic to it, which is the same thing) is a subfield of every Galois field GF(q). In fact, a stronger assertion can be made: Theorem 5.50 The order q of every finite field GF(q) must be a power of a prime. Proof By Lemma 5.49, every finite field GF(q) has a subfield of prime order p. We will show that GF(q) acts like a vector space over its subfield GF(p). Let 𝛽1 ∈ GF(q), with 𝛽1 ≠ 0. Form the elements a1 𝛽1 as a1 varies over the elements {0, 1, . . . , p − 1} in GF(p). The product a1 𝛽1 takes on p distinct values. (For if x𝛽1 = y𝛽1 we must have x = y, since there are no zero divisors in a field.) If by these p products we have “covered” all the elements in the field, we are done: they form a vector space over GF(p). If not, let 𝛽2 be an element which has not been covered yet. Then form a1 𝛽1 + a2 𝛽2 as a1 and a2 vary independently. This must lead to p2 distinct values in GF(q). If still not done, then continue, forming the linear combinations a1 𝛽1 + a2 𝛽2 + · · · + am 𝛽m until all elements of GF(q) are covered. Each combination of coefficients {a1 , a2 , . . . , am } corresponds ◽ to a distinct element of GF(q). Therefore, there must be pm elements in GF(q). This theorem shows that all finite fields have the structure of a vector space of dimension m over a finite field ℤp . For the field GF(pm ), the subfield GF(p) is called the ground field. This proof raises an important point about the representation of a field. In the construction of GF(24 ) in Section 5.4.2, we formed the field as a vector space over the basis vectors 1, 𝛼, 𝛼 2 , and 𝛼 3 . (Or, more generally, to form GF(pm ), we would use the elements {1, 𝛼, 𝛼 2 , . . . , 𝛼 m−1 } as the basis vectors.) However, another set of basis vectors could be used. Any set of m linearly independent nonzero elements of GF(pm ) can be used as a basis set. For example, for GF(24 ) we could construct the field as all linear combinations of {1 + 𝛼, 𝛼 + 𝛼 2 , 1 + 𝛼 3 , 𝛼 + 𝛼 2 + 𝛼 3 }. The multiplicative relationship prescribed by the irreducible polynomial still applies. While this is not as convenient a construction for most purposes, it is sometimes helpful to think of representations of a field in terms of different bases. Theorem 5.51 If x and y are elements in a field of characteristic p, (x + y)p = xp + yp . This rule is sometimes called “freshman exponentiation,” since it is erroneously employed by some students of elementary algebra. Proof By the binomial theorem,

p ( ) ∑ p

xi yp−i . i i=0 ( ) ( ) For a prime p and for any integer i ≠ 0 and i ≠ p, p| pi so that pi ≡ 0 (mod p). Thus, all the terms in the sum except the first and the last are p times some quantity, which are equal to 0 since the characteristic of the field is p. ◽ (x + y)p =

This theorem extends by induction in two ways: both to the number of summands and to the exponent: If x1 , x2 , . . . , xk are in a field of characteristic p, then ( k )pr k ∑ ∑ pr xi = xi (5.43) i=1

i=1

5.5 Galois Fields: Mathematical Facts

209

for all r ≥ 0. We now consider some multiplicative questions related to finite fields. Definition 5.52 Let 𝛽 ∈ GF(q). The order4 of 𝛽, written ord(𝛽) is the smallest positive integer n such that 𝛽 n = 1. ◽ Definition 5.53 An element with order q − 1 in GF(q) (i.e., it generates all the nonzero elements of the field) is called a primitive element. ◽ In other words, a primitive element has the highest possible order. We saw in the construction of GF(24 ) that the element we called 𝛼 has order 15, making it a primitive element in the field. We also saw that the primitive element enables the “power representation” of the field, which makes multiplication particularly easy. The questions addressed by the following lemmas are: Does a Galois field always have a primitive element? How many primitive elements does a field have? Lemma 5.54 If 𝛽 ∈ GF(q) and 𝛽 ≠ 0 then ord(𝛽)|(q − 1). Proof Let t = ord(𝛽). The set {𝛽, 𝛽 2 , . . . , 𝛽 t = 1} forms a subgroup of the nonzero elements in GF(q) under multiplication. Since the order of a subgroup must divide the order of the group (Lagrange’s theorem, Theorem 2.29), the result follows. ◽ Example 5.55

In the field GF(24 ), the element 𝛼 3 has order 5, since (𝛼 3 )5 = 𝛼 15 = 1,

and 5|15. In fact, we have the sequence 𝛼3 ,

(𝛼 3 )2 = 𝛼 6 ,

(𝛼 3 )3 = 𝛼 9 ,

(𝛼 3 )4 = 𝛼 12 ,

(𝛼 3 )5 = 𝛼 15 = 1.



Lemma 5.56 Let 𝛽 ∈ GF(q). 𝛽 s = 1 if and only if ord(𝛽)|s. Proof Let t = ord(𝛽). Let s be such that 𝛽 s = 1. Using the division algorithm, write s = at + r where 0 ≤ r < t. Then 1 = 𝛽 s = 𝛽 at 𝛽 r = 𝛽 r . By the minimality of the order (it must be the smallest positive integer), we must have r = 0. ◽ Conversely, if ord(𝛽)|s, then 𝛽 s = 𝛽 qt = (𝛽 t )q = 1, where t = ord(𝛽) and q = s∕t. Lemma 5.57 If 𝛼 has order s and 𝛽 has order t and (s, t) = 1, then 𝛼𝛽 has order st.

Example 5.58 Before embarking on the proof, an example of this lemma. In GF(24 ), 𝛼 3 and 𝛼 5 have orders that are relatively prime, being 5 and 3, respectively. It may be verified that 𝛼 3 𝛼 5 = 𝛼 8 has order 15 (it is primitive). ◽

4 The nomenclature is unfortunate, since we already have defined the order of a group and the order of an element within a group.

210

Rudiments of Number Theory and Algebra Proof First, (𝛼𝛽)st = (𝛼 s )t (𝛽 t )s = 1. Might there be a smaller value for the order than st? Let k be the order of 𝛼𝛽. Since (𝛼𝛽)k = 1, 𝛼 k = 𝛽 −k . Since 𝛼 s = 1, 𝛼 sk = 1, and hence 𝛽 −sk = 1. Furthermore, 𝛼 tk = 𝛽 −tk = 1. By Lemma 5.56, s|tk. Since (s, t) = 1, k must be a multiple of s. Similarly, 𝛽 −sk = 1 and so t|sk. Since (s, t) = 1, k must be a multiple of t. Combining these, we see that k must be a multiple of st. In light of the first observation, we have k = st. ◽ Lemma 5.59 In a finite field, if ord(𝛼) = t and 𝛽 = 𝛼 i , then ord(𝛽) =

t . (i, t)

Proof If ord(𝛼) = t, then 𝛼 s = 1 if and only if t|s (Lemma 5.56). Let ord(𝛽) = u. Note that i∕(i, t) is an integer. Then 𝛽 t∕(i,t) = (𝛼 i )t∕(i,t) = (𝛼 t )i∕(i,t) = 1. Thus, u|t∕(i, t). We also have (𝛼 i )u = 1 so t|iu. This means that t∕(i, t)|u. Combining the results, we have u = t∕(i, t).



Theorem 5.60 For a Galois field GF(q), if t|q − 1 then there are 𝜙(t) elements of order t in GF(q), where 𝜙(t) is the Euler totient function. Proof Observe from Lemma 5.54 that if t ∕| q − 1, then there are no elements of order t in GF(q). So assume that t|q − 1; we now determine how many elements of order t there are. Let 𝛼 be an element with order t. Then by Lemma 5.59, if 𝛽 = 𝛼 i for some i such that (i, t) = 1, then 𝛽 also has order t. The number of such is is 𝜙(t). Could there be other elements not of the form 𝛼 i having order t? Any element having order t is a root of the polynomial xt − 1. Each element in the set {𝛼, 𝛼 2 , 𝛼 3 , . . . , 𝛼 t } is a solution to the equation xt − 1 = 0. Since a polynomial of degree t over a field has no more than t roots (see Theorem 5.69 below), there are no elements of order t not in the set. ◽ The following theorem is a corollary of this result: Theorem 5.61 There are 𝜙(q − 1) primitive elements in GF(q). Example 5.62

In GF(7), the numbers 5 and 3 are primitive: 51 = 5,

52 = 4,

53 = 6,

54 = 2,

55 = 3,

56 = 1.

31 = 3,

32 = 4,

33 = 6,

34 = 4,

35 = 5,

36 = 1.

We also have 𝜙(q − 1) = 𝜙(6) = 2, so these are the only primitive elements.



Because primitive elements exist, the nonzero elements of the field GF(q) can always be written as powers of a primitive element. If 𝛼 ∈ GF(q) is primitive, then {𝛼, 𝛼 2 , 𝛼 3 , . . . , 𝛼 q−2 , 𝛼 q−1 = 1}

5.6 Implementing Galois Field Arithmetic

211

is the set of all nonzero elements of GF(q). If we let GF(q)∗ denote the set of nonzero elements of GF(q), we can write GF(q)∗ = ⟨𝛼⟩. If 𝛽 = 𝛼 i is also primitive (i.e., (i, q − 1) = 1), then the nonzero elements of the field are also generated by {𝛽, 𝛽 2 , 𝛽 3 , . . . , 𝛽 q−2 , 𝛽 q−1 = 1}, that is, GF(q)∗ = ⟨𝛽⟩. Despite the fact that these are different generators, these are not different fields, only different representations, so ⟨𝛼⟩ is isomorphic to ⟨𝛽⟩. We thus talk of the Galois field with q elements, since there is only one. Theorem 5.63 Every element of the field GF(q) satisfies the equation xq − x = 0. Furthermore, they constitute the entire set of roots of this equation. Proof Clearly, the equation can be written as x(xq−1 − 1) = 0. Thus, x = 0 is clearly a root. The nonzero elements of the field are all generated as powers of a primitive element 𝛼. For an element 𝛽 = 𝛼 i ∈ GF(q), 𝛽 q−1 = (𝛼 i )q−1 = (𝛼 q−1 )i = 1. Since there are q elements in GF(q), and at most q roots of the equation, the elements of GF(q) are all the roots. ◽ An extension field E of a field 𝔽 is a splitting field of a nonconstant polynomial f (x) ∈ 𝔽 [x] if f (x) can be factored into linear factors over E, but not in any proper subfield of E. Theorem 5.63 thus says that GF(q) is the splitting field for the polynomial xq − x. As an extension of Theorem 5.63, we have Theorem 5.64 Every element in a field GF(q) satisfies the equation n

xq − x = 0 for every n ≥ 0. Proof When n = 0, the result is trivial; when n = 1, we have Theorem 5.63, giving xq = x. The proof n−1 n−1 n ◽ is by induction: Assume that xq = x. Then (xq )q = xq = x, or xq = x. A field GF(p) can be extended to a field GF(pm ) for any m > 1. Let q = pm . The field GF(q) can be further extended to a field GF(qr ) for any r, by extending by an irreducible polynomial of degree r in GF(q)[x]. This gives the field GF(pmr ).

5.6

Implementing Galois Field Arithmetic

Lab 5 describes one way of implementing Galois field arithmetic in a computer using two tables. In this section, we present a way of computing operations using one table of Zech logarithms, as well as some concepts for hardware implementation. 5.6.1

Zech Logarithms

In a Galois field GF(2m ), the Zech logarithm z(n) is defined by 𝛼 z(n) = 1 + 𝛼 n ,

n = 1, 2, . . . , 2m − 2.

Table 5.1 shows the Zech logarithm for the field GF(24 ). For example, when n = 2, we have 1 + 𝛼2 = 𝛼8 so that z(2) = 8.

212

Rudiments of Number Theory and Algebra In the Zech logarithm approach to Galois field computations, numbers are represented using the exponent. Multiplication is thus natural. To see how to add, consider the following example: 𝛼3 + 𝛼5. The first step is to factor out the term with the smallest exponent, 𝛼 3 (1 + 𝛼 2 ). Now the Zech logarithm is used: 1 + 𝛼 2 = 𝛼 z(2) = 𝛼 8 . So 𝛼 3 + 𝛼 5 = 𝛼 3 (1 + 𝛼 2 ) = 𝛼 3 𝛼 8 = 𝛼 11 . The addition requires one table lookup and one multiply operation. It has been found that in many implementations, the use of Zech logarithms can significantly improve execution time. 5.6.2

Hardware Implementations

We present examples for the field GF(24 ) generated by g(x) = 1 + x + x4 . Addition is easily accomplished by simple mod-2 addition for numbers in vector representation. Multiplication of the element 𝛽 = b0 + b1 𝛼 + b2 𝛼 2 + b3 𝛼 3 by the primitive element 𝛼 is computed using 𝛼 4 = 1 + 𝛼 as 𝛼𝛽 = b0 𝛼 + b1 𝛼 2 + b2 𝛼 3 + b3 𝛼 4 = b3 + (b0 + b3 )𝛼 + b1 𝛼 2 + b2 𝛼 3 . These computations can be obtained using an LFSR as shown in Figure 5.2. Clocking the registers once fills them with the representation of 𝛼𝛽.

b0

+

b1

b2

b3

Figure 5.2: Multiplication of 𝛽 by 𝛼. Multiplication by specific powers of 𝛼 can be accomplished with dedicated circuits. For example, to multiply 𝛽 = b0 + b1 𝛼 + b2 𝛼 2 + b3 𝛼 3 by 𝛼 4 = 1 + 𝛼, we have 𝛽𝛼 4 = 𝛽 + 𝛼𝛽 = (b0 + b3 ) + (b0 + b1 + b3 )𝛼 + (b1 + b2 )𝛼 2 + (b2 + b3 )𝛼 3 , which can be represented as shown in Figure 5.3. Finally, we present a circuit which multiplies two arbitrary Galois field elements. Let 𝛽 = b0 + b1 𝛼 + b2 𝛼 2 + b3 𝛼 3 and let 𝛾 = c0 + c1 𝛼 + c2 𝛼 2 + c3 𝛼 3 . Then 𝛽𝛾 can be written in a Horner-like notation as 𝛽𝛾 = (((c3 𝛽)𝛼 + c2 𝛽)𝛼 + c1 𝛽)𝛼 + c0 𝛽.

b0

b1

b2

b3

+

+

+

+

Figure 5.3: Multiplication of an arbitrary 𝛽 by 𝛼 4 .

5.7 Subfields of Galois Fields

+

213

+

b3

+

b1

c0

c1

+

b2

c2

b3

c3

Figure 5.4: Multiplication of 𝛽 by an arbitrary field element. Figure 5.4 shows a circuit for this expression. Initially, the upper register is cleared. Then at the first clock the register contains c3 𝛽. At the second clock the register contains c3 𝛽𝛼 + c2 𝛽, where the multiplication by 𝛼 comes by virtue of the feedback structure. At the next clock the register contains (c3 𝛽𝛼 + c2 𝛽)𝛼 + c1 𝛽. At the final clock the register contains the entire product.

5.7

Subfields of Galois Fields

Elements in a base field GF(q) are also elements of its extension field GF(qm ). Given an element 𝛽 ∈ GF(qm ) in the extension field, it is of interest to know if it is an element in the base field GF(q). The following theorem provides the answer. Theorem 5.65 An element 𝛽 ∈ GF(qm ) lies in GF(q) if and only if 𝛽 q = 𝛽. Proof If 𝛽 ∈ GF(q), then by Lemma 5.54, ord(𝛽)|(q − 1), so that 𝛽 q = 𝛽. Conversely, assume 𝛽 q = 𝛽. Then 𝛽 is a root of xq − x = 0. Now observe that all q elements of GF(q) satisfy this polynomial and it can only have q roots. Hence 𝛽 ∈ GF(q). ◽ n

By induction, it follows that an element 𝛽 ∈ GF(qn ) lies in the subfield GF(q) if 𝛽 q = 𝛽 for any n ≥ 0. Example 5.66 The field GF(4) is a subfield of GF(256). Let 𝛼 be primitive in GF(256). We desire to find an element in GF(4) ⊂ GF(256). Let 𝛽 = 𝛼 85 . Then, invoking Theorem 5.65 𝛽 4 = 𝛼 85⋅4 = 𝛼 255 𝛼 85 = 𝛽. So 𝛽 ∈ GF(4) and GF(4) has the elements {0, 1, 𝛽, 𝛽 2 } = {0, 1, 𝛼 85 , 𝛼 170 }.



Theorem 5.67 GF(qk ) is a subfield of GF(qj ) if and only if k|j. The proof relies on the following lemma. Lemma 5.68 If n, r, and s are positive integers and n ≥ 2, then ns − 1|nr − 1 if and only if s|r. Proof of Theorem 5.67. If k|j, say j = mk, then GF(qk ) can be extended using an irreducible polynomial of degree m over GF(qk ) to obtain the field with (qk )m = qj elements. Conversely, let GF(qk ) be a subfield of GF(qj ) and let 𝛽 be a primitive element in GF(qk ). Then k −1 q = 1. As an element of the field GF(qj ), it must also be true (see, e.g., Theorem 5.63) that 𝛽

214

Rudiments of Number Theory and Algebra j

𝛽 q −1 = 1. From Lemma 5.56, it must be the case that qk − 1|qj − 1 and hence, from Lemma 5.68, it follows that k|j. ◽ As an example of this, Figure 5.5 illustrates the subfields of GF(224 ). GF(224) GF(212)

GF(28)

GF(26)

GF(24)

GF(23)

GF(22)

GF (2)

Figure 5.5: Subfield structure of GF(224 ).

5.8

Irreducible and Primitive Polynomials

We first present a result familiar from polynomials over complex numbers. Theorem 5.69 A polynomial of degree d over a field 𝔽 has at most d roots in any field containing 𝔽 . This theorem seems obvious, but in fact over a ring it is not necessarily true! The quadratic polynomial x2 − 1 has four roots in ℤ15 , namely 1, 4, 11, and 14 [33]. Proof Every polynomial of degree 1 (i.e., a linear polynomial) is irreducible. Since the degree of a product of several polynomials is the sum of their degrees, a polynomial of degree d cannot have more than d linear factors. By the division algorithm, (x − 𝛽) is a factor of a polynomial f (x) if and only if f (𝛽) = 0 (see Exercise 5.47). Hence f (x) can have at most d roots. ◽ While any irreducible polynomial can be used to construct the extension field, computation in the field is easier if a primitive polynomial is used. We make the following observation: Theorem 5.70 Let p be prime. An irreducible mth-degree polynomial f (x) ∈ GF(p)[x] divides m xp −1 − 1. Example 5.71

(x3 + x + 1)|(x7 + 1) in GF(2) (this can be shown by long division).



It is important to understand the implication of the theorem: an irreducible polynomial divides m m xp −1 − 1, but just because a polynomial divides xp −1 − 1 does not mean that it is irreducible. (Showing irreducibility is much harder than that!) Proof Let GF(q) = GF(pm ) be constructed using the irreducible polynomial f (x), where 𝛼 denotes m the root of f (x) in the field: f (𝛼) = 0. By Theorem 5.63, 𝛼 is a root of xp −1 − 1 in GF(q). Using the division algorithm write m (5.44) xp −1 − 1 = g(x)f (x) + r(x),

5.8 Irreducible and Primitive Polynomials

215

where deg(r(x)) < m. Evaluating (5.44) at x = 𝛼 in GF(q), we obtain 0 = 0 + r(𝛼). But the elements of the field GF(q) are represented as polynomials in 𝛼 of degree < m, so since r(𝛼) = 0 it must be that r(x) is the zero polynomial, r(x) = 0. ◽ A slight generalization, proved similarly using Theorem 5.64, is the following: Theorem 5.72 If f [x] ∈ GF(q)[x] is an irreducible polynomial of degree m, then k

f (x)|(xq − x) for any k such that m|k. Definition 5.73 An irreducible polynomial p(x) ∈ GF(pr )[x] of degree m is said to be primitive if the ◽ smallest positive integer n for which p(x) divides xn − 1 is n = prm − 1.

Example 5.74 Taking f (x) = x3 + x + 1, it can be shown by exhaustive checking that f (x) ∕| x4 + 1, f (x) ∕| x5 + 1, and f (x) ∕| x6 + 1, but f (x)|x7 + 1. In fact, x7 − 1 = (x3 + x + 1)(x4 + x2 + x + 1). ◽

Thus, f (x) is primitive.

The following theorem provides the motivation for using primitive polynomials. Theorem 5.75 The roots of an mth degree primitive polynomial p(x) ∈ GF(p)[x] are primitive elements in GF(pm ). That is, any of the roots can be used to generate the nonzero elements of the field GF(pm ). Proof Let 𝛼 be a root of an mth-degree primitive polynomial p(x). We have m −1

xp for some q(x). Observe that

m −1

𝛼p

− 1 = p(x)q(x)

− 1 = p(𝛼)q(𝛼) = 0q(𝛼) = 0,

from which we note that

m −1

𝛼p

= 1.

Now the question is, might there be a smaller power t of 𝛼 such that 𝛼 t = 1? If this were the case, then we would have 𝛼 t − 1 = 0. There would therefore be some polynomial xt − 1 that would have 𝛼 as a root. However, any root of m xt − 1 must also be a root of xp −1 − 1, because ord(𝛼)|pm − 1. To see this, suppose (to the contrary) that ord(𝛼) ∕| pm − 1. Then pm − 1 = k ord(𝛼) + r for some r with 0 < r < ord(𝛼). Therefore, we have m −1

1 = 𝛼p

= 𝛼k

ord(𝛼)+r

= 𝛼r ,

216

Rudiments of Number Theory and Algebra which contradicts the minimality of the order. m Thus, all the roots of xt − 1 are the roots of xp −1 − 1, so m −1

xt − 1|xp

− 1.

We show below that all the roots of an irreducible polynomial are of the same order. This means that ◽ p(x)|xt − 1. But by the definition of a primitive polynomial, we must have t = pm − 1. All the nonzero elements of the field can be generated as powers of the roots of the primitive polynomial. Example 5.76 The polynomial p(x) = x2 + x + 2 is primitive in GF(5). Let 𝛼 represent a root of p(x), so that 𝛼 2 + 𝛼 + 2 = 0, or 𝛼 2 = 4𝛼 + 3. The elements in GF(25) can be represented as powers of 𝛼 as shown in the following table. 0 𝛼 4 = 3𝛼 + 2 𝛼 9 = 3𝛼 + 4 𝛼 14 = 𝛼 + 2 𝛼 19 = 3𝛼 2 3 4 5 6 7 8 9 αα α α α α α α α

+

01010

10

11

13

14

𝛼1 = 𝛼 𝛼6 = 2 𝛼 11 = 3𝛼 + 3 𝛼 16 = 2𝛼 + 3 𝛼 21 = 2𝛼 + 1

𝛼 2 = 4𝛼 + 3 𝛼 7 = 2𝛼 𝛼 12 = 4 𝛼 17 = 𝛼 + 1 𝛼 22 = 4𝛼 + 1

𝛼 3 = 4𝛼 + 2 𝛼 8 = 3𝛼 + 1 𝛼 13 = 4𝛼 𝛼 18 = 3 𝛼 23 = 2𝛼 + 2

As an example of some arithmetic in this field,

00000

12

𝛼0 = 1 𝛼 5 = 4𝛼 + 4 𝛼 10 = 𝛼 + 4 𝛼 15 = 𝛼 + 3 𝛼 20 = 2𝛼 + 4

(3𝛼 + 4) + (4𝛼 + 1) = 2𝛼 15

(3𝛼 + 4)(4𝛼 + 1) = 𝛼 9 𝛼 22 = 𝛼 31 = (𝛼 24 )(𝛼 7 ) = 2𝛼.

16

α α α α α α α

primfind.c



The program primfind finds primitive polynomials in GF(p)[x], where the prime p can be specified. It does this by recursively producing all polynomials (or all of those of a weight you might specify) and evaluating whether they are primitive by using them as feedback polynomials in an LFSR. Those which generate maximal length sequences are primitive.

5.9

Conjugate Elements and Minimal Polynomials

From Chapter 4, we have seen that cyclic codes have a generator polynomial g(x) dividing xn − 1. Designing cyclic codes with a specified code length n thus requires the facility to factor xn − 1 into factors with certain properties. In this section, we explore aspects of this factorization question. It frequently happens that the structure of a code is defined over a field GF(qm ), but it is desired to employ a generator polynomial g(x) over the base field GF(q). For example, we might want a binary generator polynomial — for ease of implementation — but need to work over a field GF(2m ) for some m. How to obtain polynomials having coefficients in the base field but roots in the larger field is our first concern. The concepts of conjugates and minimal polynomials provide a language to describe the polynomials we need. We begin with a reminder and analogy from polynomials with real coefficients. Suppose we are given a complex number x1 = (2 + 3i). Over the (extension) field ℂ, there is a polynomial x − (2 + 3i) ∈ ℂ[x] which has x1 as a root. But suppose we are asked to find the polynomial with real coefficients that has x1 as a root. We are well acquainted with the fact that the roots of real polynomials come in complex conjugate pairs, so we conclude immediately that a real polynomial with root x1 must also have a root x2 = (2 − 3i). We say that x2 is a conjugate root to x1 . A polynomial having these roots is (x − (2 + 3i))(x − (2 − 3i)) = x2 − 4x + 13. Note in particular that the coefficients of the resulting polynomials are in ℝ, which was the base field for the extension to ℂ.

5.9 Conjugate Elements and Minimal Polynomials

217

This concept of conjugacy has analogy to finite fields. Suppose that f (x) ∈ GF(q)[x] has 𝛼 ∈ GF(qm ) as a root. (That is, the polynomial has coefficients in the base field, while the root comes from an extension field.) What are the other roots of f (x) in this field? ∑d

Theorem 5.77 Let GF(q) = GF(pr ) for some r ≥ 1. Let f (x) = GF(q). Then n n f (xq ) = [f (x)]q

j=0 fj x

j

∈ GF(q)[x]. That is, fi ∈

for any n ≥ 0. Proof [f (x)]

qn

=

[ d ∑

]qn fj x

j

=

j=0

=

d ∑

d ∑

qn

n

fj (xj )q

(by (5.43))

j=0 n

fj (xq )j

(by Theorem 5.65)

j=0 n

= f (xq ).

◽ n

Thus, if 𝛽 ∈ GF(qm ) is a root of f (x) ∈ GF(q)[x], then 𝛽 q is also a root of f (x). This motivates the following definition. Definition 5.78 Let 𝛽 ∈ GF(qm ). The conjugates of 𝛽 with respect to a subfield GF(q) are 2 3 𝛽, 𝛽 q , 𝛽 q , 𝛽 q , . . . . (This list must, of course, repeat at some point since the field is finite.) The conjugates of 𝛽 with respect to GF(q) form a set called the conjugacy class of 𝛽 with respect to GF(q). ◽ Example 5.79 1. Let 𝛼 ∈ GF(23 ) be primitive. The conjugates of 𝛼 with respect to GF(2) are 2

3

𝛼, 𝛼 2 , (𝛼 2 ) = 𝛼 4 , (𝛼 2 ) = 𝛼. So the conjugacy class of 𝛼 is {𝛼, 𝛼 2 , 𝛼 4 }. Let 𝛽 = 𝛼 3 , an element not in the conjugacy class of 𝛼. The conjugates of 𝛽 with respect to GF(2) are 𝛽 = 𝛼3 ,

(𝛼 3 )2 = 𝛼 6 ,

2

(𝛼 3 )2 = 𝛼 12 = 𝛼 7 𝛼 5 = 𝛼 5 ,

3

(𝛼 3 )2 = 𝛼 24 = 𝛼 21 𝛼 3 = 𝛼 3 .

So the conjugacy class of 𝛽 is {𝛼 3 , 𝛼 6 , 𝛼 5 }. The only other elements of GF(23 ) are 1, which always forms its own conjugacy class, and 0, which always forms its own conjugacy class. We observe that the conjugacy classes of the elements of GF(23 ) form a partition of GF(23 ). 2. Let 𝛽 ∈ GF(16) be an element such that ord(𝛽) = 3. (Check for consistency: since 3|15, there are 𝜙(3) = 2 elements of order 3 in GF(16).) The conjugacy class of 𝛽 with respect to GF(2) is 2

𝛽, 𝛽 2 , 𝛽 2 = 𝛽 4 = 𝛽. So there are two elements in this conjugacy class, {𝛽, 𝛽 2 }. 3. Find all the conjugacy classes in GF(24 ) with respect to GF(2). Let 𝛼 ∈ GF(24 ) be primitive. Pick 𝛼 and list its conjugates with respect to GF(2): 𝛼, 𝛼 2 , 𝛼 4 , 𝛼 8 , 𝛼 16 = 𝛼,

218

Rudiments of Number Theory and Algebra so the first conjugacy class is {𝛼, 𝛼 2 , 𝛼 4 , 𝛼 8 }. Now pick an element unused so far. Take 𝛼 3 and write its conjugates: 𝛼 3 , (𝛼 3 )2 = 𝛼 6 , (𝛼 3 )4 = 𝛼 12 , (𝛼 3 )8 = 𝛼 9 , (𝛼 3 )16 = 𝛼 3 , so the next conjugacy class is {𝛼 3 , 𝛼 6 , 𝛼 9 , 𝛼 12 }. Take another unused element, 𝛼 5 : 𝛼 5 , (𝛼 5 )2 = 𝛼 10 , (𝛼 5 )4 = 𝛼 5 , so the next conjugacy class is {𝛼 5 , 𝛼 10 }. Take another unused element, 𝛼 7 : 𝛼 7 , (𝛼 7 )2 = 𝛼 14 , (𝛼 7 )4 = 𝛼 13 , (𝛼 7 )8 = 𝛼 11 , (𝛼 7 )16 = 𝛼 7 , so the next conjugacy class is {𝛼 7 , 𝛼 14 , 𝛼 13 , 𝛼 11 }. The only unused elements now are 0, with conjugacy class {0}, and 1, with conjugacy class {1}. 4. Find all the conjugacy classes in GF(24 ) with respect to GF(4). Let 𝛼 be primitive in GF(24 ). The conjugacy classes with respect to GF(4) are: {𝛼, 𝛼 4 } {𝛼 6 , 𝛼 9 }

Definition 5.80 of q modulo n.

5

{𝛼 2 , 𝛼 8 } {𝛼 3 , 𝛼 12 } {𝛼 7 , 𝛼 13 } {𝛼 10 }

{𝛼 5 }

{𝛼 11 , 𝛼 14 }.



The smallest positive integer d such that n|qd − 1 is called the multiplicative order ◽

Lemma 5.81 Let 𝛽 ∈ GF(qm ) have ord(𝛽) = n and let d be the multiplicative order of q modulo n. d 2 d−1 Then 𝛽 q = 𝛽. The d elements 𝛽, 𝛽 q , 𝛽 q , . . . , 𝛽 q are all distinct. d

d

Proof Since ord(𝛽) = n and n|qd − 1, 𝛽 q −1 = 1, so 𝛽 q = 𝛽. k i k i To check distinctness, suppose that 𝛽 q = 𝛽 q for 0 ≤ i < k < d. Then 𝛽 q −q = 1, which by k i k i Lemma 5.56 implies that n|q − q , that is, q ≡ q (mod n). By Theorem 5.27, item 5.27, it follows that qk−i ≡ 1 (mod n∕(n, qi )), that is, qk−i ≡ 1 (mod n) (since q is a power of a prime, and n|qd − 1). By definition of d, this means that d|k − i, which is not possible since i < k < d. ◽

5.9.1

Minimal Polynomials

In this section, we examine the polynomial in GF(q)[x] which has an element 𝛽 ∈ GF(qm ) and all of its conjugates as roots. Definition 5.82 Let 𝛽 ∈ GF(qm ). The minimal polynomial of 𝛽 with respect to GF(q) is the smallest-degree, nonzero, monic polynomial p(x) ∈ GF(q)[x] such that p(𝛽) = 0. ◽ Returning to the analogy with complex numbers, we saw that the polynomial with f (x) = x2 − 4x + 13 with real coefficients has the complex number x1 = 2 + 3i as a root. Furthermore, it is clear that there is no real polynomial of smaller degree which has x1 as a root. We would say that x2 − 4x + 13 is the minimal polynomial of 2 + 3i with respect to the real numbers. Some properties for minimal polynomials: Theorem 5.83 [483, Theorem 3.2] For each 𝛽 ∈ GF(qm ) there exists a unique monic polynomial p(x) of minimal degree in GF(q)[x] such that: 5

This is yet another usage of the word “order.”

5.9 Conjugate Elements and Minimal Polynomials

219

1. p(𝛽) = 0. 2. The degree of p(x) ≤ m. 3. If there is a polynomial f (x) ∈ GF(q)[x] such that f (𝛽) = 0 then p(x)|f (x). 4. p(x) is irreducible in GF(q)[x]. Proof Existence: Given an element 𝛽 ∈ GF(qm ), write down the (m + 1) elements 1, 𝛽, 𝛽 2 , . . . , 𝛽 m which are elements of GF(qm ). Since GF(qm ) is a vector space of dimension m over GF(q), these m + 1 elements must be linearly dependent. Hence there exist coefficients ai ∈ GF(q) such that ∑ i a0 + a1 𝛽 + · · · + am 𝛽 m = 0; these are the coefficients of a polynomial p(x) = m i=0 ai x which has 𝛽 as the root. (It is straightforward to make this polynomial monic.) This also shows that the degree of p(x) ≤ m. Uniqueness: Suppose that there are two minimal polynomials of 𝛽, which are normalized to be monic; call them f (x) and g(x). These must both have the same degree. Then there is a polynomial r(x) having deg(r(x)) < deg(f (x)) such that f (x) = g(x) + r(x). Since 𝛽 is a root of f and g, we have 0 = f (𝛽) = g(𝛽) + r(𝛽). so that r(𝛽) = 0. Since a minimal polynomial f (x) has the smallest nonzero degree polynomial such that f (𝛽) = 0, it must be the case that r(x) = 0 (i.e., it is the zero polynomial), so f (x) = g(x). Divisibility: Let p(x) be a minimal polynomial. If there is a polynomial f (x) such that f (𝛽) = 0, we write using the division algorithm f (x) = p(x)q(x) + r(x), where deg(r) < deg(p). But then f (𝛽) = p(𝛽)q(𝛽) + r(𝛽) = 0, so r(𝛽) = 0. By the minimality of the degree of p(x), r(x) = 0, so p(x)|f (x). Irreducibility: If p(x) factors, so p(x) = f (x)g(x), then either f (𝛽) = 0 or g(𝛽) = 0, again a contradiction to the minimality of the degree of p(x). ◽ We observe that primitive polynomials are the minimal polynomials for primitive elements in a finite field. 2 d−1 Let p(x) ∈ GF(q)[x] be a minimal polynomial for 𝛽. Then 𝛽 q , 𝛽 q , . . . , 𝛽 q are also roots of p(x). Could there be other roots of p(x)? The following theorem shows that the conjugacy class for 𝛽 contains all the roots for the minimal polynomial of 𝛽. Theorem 5.84 [33, Theorem 4.410] Let 𝛽 ∈ GF(qm ) have order n and let d be the multiplicative order ∏ qi of q mod n. Then the coefficients of the polynomial p(x) = d−1 i=0 (x − 𝛽 ) are in GF(q). Furthermore, p(x) is irreducible. That is, p(x) is the minimal polynomial for 𝛽. Proof From Theorem 5.77 we see that p(𝛽) = 0 implies that p(𝛽 q ) = 0 for p(x) ∈ GF(q)[x]. It only remains to show that the polynomial having the conjugates of 𝛽 as its roots has its coefficients in GF(q). Write [p(x)]q =

d−1 ∏

i

(x − 𝛽 q )q =

i=0

=

d−1 ∏ i=1

d−1 ∏

i+1

(xq − 𝛽 q )

(by Theorem 5.51)

i=0 i

(xq − 𝛽 q ) =

d−1 ∏ i=0

i

(xq − 𝛽 q )

d

0

(since 𝛽 q = 𝛽 = 𝛽 q ).

220

Rudiments of Number Theory and Algebra Thus, [p(x)]q = p(xq ). Now writing p(x) =

∑d

i=0 pi x

( [p(x)]q =

i,

we have ) q d d ∑ ∑ q i pi x = pi xiq i=0

and q

p(x ) =

(5.45)

i=0 d ∑

pi xiq .

(5.46)

i=0 q

The two polynomials in (5.45) and (5.46) are identical, so it must be that pi = pi , so pi ∈ GF(q). If p(x) = g(x)h(x), where g(x) ∈ GF(q)[x] and h(x) ∈ GF(q)[x] are monic, then p(𝛽) = 0 implies 2 d−1 that g(𝛽) = 0 or h(𝛽) = 0. If g(𝛽) = 0, then g(𝛽 q ) = g(𝛽 q ) = · · · = g(𝛽 q ) = 0. g thus has d roots, so g(x) = p(x). Similarly, if h(𝛽) = 0, then it follows that h(x) = p(x). ◽ As a corollary, we have the following. Corollary 5.85 [483, p. 58] Let f (x) ∈ GF(q)[x] be irreducible. Then all of the roots of f (x) have the same order. Proof Let GF(qm ) be the smallest field containing all the roots of the polynomial f (x) and let 𝛽 ∈ GF(qm ) be a root of f (x). Then ord(𝛽)|qm − 1 (Lemma 5.54). By the theorem, the roots of f (x) are 2 the conjugates of 𝛽 and so are of the form {𝛽, 𝛽 q , 𝛽 q , . . . }. Since q = pr for some r, it follows that (q, qm − 1) = 1. Also, if t|qm − 1, then (q, t) = 1. By Lemma 5.59 we have k

ord(𝛽 q ) =

ord(𝛽) = ord(𝛽). (qk , ord(𝛽))

Since this is true for any k, each root has the same order.



Example 5.86 According to Theorem 5.84, we can obtain the minimal polynomial for an element 𝛽 by i multiplying the factors (x − 𝛽 q ). In what follows, you may refer to the conjugacy classes determined in Example 5.79. 1. Determine the minimal polynomial for each conjugacy class in GF(8) with respect to GF(2). To do the multiplication, a representation of the field is necessary; we use the representation using primitive polynomial g(x) = x3 + x + 1. Using the conjugacy classes we found before in GF(8), we obtain the minimal polynomials shown in Table 5.2. 2. Determine the minimal polynomial for each conjugacy class in GF(24 ) with respect to GF(2). Use Table 5.1 as a representation. The minimal polynomials are shown in Table 5.3. 3. Determine the minimal polynomial for each conjugacy class in GF(25 ) with respect to GF(2). Using the primitive polynomial x5 + x2 + 1, it can be shown that the minimal polynomials are as shown in Table 5.4. 4. Determine the minimal polynomials in GF(42 ) with respect to GF(4). Use the representation obtained from the subfield GF(4) = {0, 1, 𝛼 5 , 𝛼 10 } ⊂ GF(16) from Table 5.1. The result is shown in Table 5.5. ◽

Table 5.2: Conjugacy Classes over GF(23 ) with Respect to GF(2) Conjugacy Class {0} {1} {𝛼, 𝛼 2 , 𝛼 4 } {𝛼 3 , 𝛼 6 , 𝛼 5 }

Minimal Polynomial M− (x) = x M0 (x) = x + 1 M1 (x) = (x − 𝛼)(x − 𝛼 2 )(x − 𝛼 4 ) = x3 + x + 1 M3 (x) = (x − 𝛼 3 )(x − 𝛼 5 )(x − 𝛼 6 ) = x3 + x2 + 1

5.9 Conjugate Elements and Minimal Polynomials Table 5.3: Conjugacy Classes over GF(24 ) with Respect to GF(2) Conjugacy Class {0} {1} {𝛼, 𝛼 2 , 𝛼 4 , 𝛼 8 } {𝛼 3 , 𝛼 6 , 𝛼 9 , 𝛼 12 } {𝛼 5 , 𝛼 10 } {𝛼 7 , 𝛼 11 , 𝛼 13 , 𝛼 14 }

Minimal Polynomial M− (x) = x M0 (x) = x + 1 M1 (x) = (x − 𝛼)(x − 𝛼 2 )(x − 𝛼 4 )(x − 𝛼 8 ) = x4 + x + 1 M3 (x) = (x − 𝛼 3 )(x − 𝛼 6 )(x − 𝛼 9 )(x − 𝛼 12 ) = x4 + x3 + x2 + x + 1 M5 (x) = (x − 𝛼 5 )(x − 𝛼 10 ) = x2 + x + 1 M7 (x) = (x − 𝛼 7 )(x − 𝛼 11 )(x − 𝛼 13 )(x − 𝛼 14 ) = x4 + x3 + 1

Table 5.4: Conjugacy Classes over GF(25 ) with Respect to GF(2) Conjugacy Class {0}

Minimal Polynomial M− (x) = x

{1}

M0 (x) = x + 1 M1 (x) = (x − 𝛼)(x − 𝛼 2 )(x − 𝛼 4 )(x − 𝛼 8 )(x − 𝛼 16 ) = x5 + x2 + 1 3 M3 (x) = (x − 𝛼 )(x − 𝛼 6 )(x − 𝛼 12 )(x − 𝛼 17 )(x − 𝛼 24 ) = x5 + x4 + x3 + x2 + 1 M5 (x) = (x − 𝛼 5 )(x − 𝛼 9 )(x − 𝛼 10 )(x − 𝛼 18 )(x − 𝛼 20 ) = x5 + x4 + x2 + x + 1 7 M7 (x) = (x − 𝛼 )(x − 𝛼 14 )(x − 𝛼 19 )(x − 𝛼 25 )(x − 𝛼 28 ) = x5 + x3 + x2 + x + 1 M11 (x) = (x − 𝛼 11 )(x − 𝛼 13 )(x − 𝛼 21 )(x − 𝛼 22 )(x − 𝛼 26 ) = x5 + x4 + x3 + x + 1 15 M15 (x) = (x − 𝛼 )(x − 𝛼 23 )(x − 𝛼 27 )(x − 𝛼 29 )(x − 𝛼 30 ) = x5 + x3 + 1

{𝛼, 𝛼 , 𝛼 , 𝛼 , 𝛼 } 2

4

8

16

{𝛼 3 , 𝛼 6 , 𝛼 12 , 𝛼 17 , 𝛼 24 } {𝛼 5 , 𝛼 9 , 𝛼 10 , 𝛼 18 , 𝛼 20 } {𝛼 7 , 𝛼 14 , 𝛼 19 , 𝛼 25 , 𝛼 28 } {𝛼 11 , 𝛼 13 , 𝛼 21 , 𝛼 22 , 𝛼 26 } {𝛼 15 , 𝛼 23 , 𝛼 27 , 𝛼 29 , 𝛼 30 }

Table 5.5: Conjugacy Classes over GF(42 ) with Respect to GF(4) Conjugacy Class {0} {1} {𝛼, 𝛼 4 } {𝛼 2 , 𝛼 8 } {𝛼 3 , 𝛼 12 } {𝛼 5 } {𝛼 6 , 𝛼 9 } {𝛼 7 , 𝛼 13 } {𝛼 10 } {𝛼 11 , 𝛼 14 }

Minimal Polynomial M− (x) = x M0 (x) = x + 1 M1 (x) = (x + 𝛼)(x + 𝛼 4 ) = x2 + x + 𝛼 5 M2 (x) = (x + 𝛼 2 )(x + 𝛼 8 ) = x2 + x + 𝛼 10 M3 (x) = (x + 𝛼 3 )(x + 𝛼 12 ) = x2 + 𝛼 10 x + 1 M5 (x) = x + 𝛼 5 M6 (x) = (x + 𝛼 6 )(x + 𝛼 9 ) = x2 + 𝛼 5 x + 1 M7 (x) = (x + 𝛼 7 )(x + 𝛼 13 ) = x2 + 𝛼 5 x + 𝛼 5 M10 (x) = x + 𝛼 10 M11 (x) = (x + 𝛼 11 )(x + 𝛼 14 ) = x2 + 𝛼 10 x + 𝛼 10

As this example suggests, the notation Mi (x) is used to denote the minimal polynomial of the conjugacy class that 𝛼 i is in, where i is the smallest exponent in the conjugacy class. It can be shown that (see [268, p. 96]) for the minimal polynomial m(x) of degree d in a field of GF(qm ) that d|m.

221

222

Rudiments of Number Theory and Algebra

5.10 Factoring xn − 1 We now have the theoretical tools necessary to describe how to factor xn − 1 over arbitrary finite fields for various values of n. When n = qm − 1, from Theorem 5.63, every nonzero element of GF(qm ) is a m root of xq −1 − 1, so qm −2 ∏ qm −1 x −1= (x − 𝛼 i ) (5.47) i=0 m

for a primitive element 𝛼 ∈ GF(qm ). To provide a factorization of xq −1 − 1 over the field GF(q), the factors in (x − 𝛼 i ) (5.47) can be grouped together according to conjugacy classes, which then multiply m together to form minimal polynomials. Thus, xq −1 − 1 can be expressed as a product of the minimal polynomials of the nonzero elements. Example 5.87 1. The polynomial x7 − 1 = x2 shown in Table 5.2.

3 −1

− 1 can be factored over GF(2) as a product of the minimal polynomials x7 − 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1).

2. The polynomial x15 − 1 = x2 shown in Table 5.3.

4 −1

− 1 can be factored over GF(2) as a product of the minimal polynomials

x15 − 1 = (x + 1)(x4 + x + 1)(x4 + x3 + x2 + x + 1)(x2 + x + 1)(x4 + x3 + 1).



We now pursue the slightly more general problem of factoring xn − 1 when n ≠ qm − 1. An element 𝛽 ≠ 1 such that 𝛽 n = 1 and not for lower powers of n is called a primitive nth root of unity. The first step is to determine a field GF(qm ) (that is, to determine m) in which nth roots of unity can exist. Once the field is found, factorization is accomplished using minimal polynomials in the field. Theorem 5.60 tells us that if (5.48) n|qm − 1, then there are 𝜙(n) elements of order n in GF(qm ). Finding a field GF(qm ) with nth roots of unity thus requires finding an m such that n|qm − 1, which is usually done by trial and error. Example 5.88 Determine an extension field GF(3m ) in which 13th roots of unity exist. We see that 13|33 − 1, ◽ so that 13th roots exist in the field GF(33 ).

Once the field is found, we let 𝛽 be an element of order n in the field GF(qm ). Then 𝛽 is a root of xn − 1 in that field, and so are the elements 𝛽 2 , 𝛽 3 , . . . , 𝛽 n−1 . That is, xn − 1 =

n−1 ∏

(x − 𝛽 i ).

i=0

The roots are divided into conjugacy classes to form the factorization over GF(q). Example 5.89 Determine an extension field GF(2m ) in which 5th roots of unity exist and express the factorization in terms of polynomials in GF(2)[x]. Using (5.48) we check: 5 ∕| (2 − 1)

5 | (22 − 1)

5 | (23 − 1)

5|(24 − 1).

So in GF(16) there are primitive fifth roots of unity. For example, if we let 𝛽 = 𝛼 3 , 𝛼 primitive, then 𝛽 5 = 𝛼 15 = 1. The roots of x5 − 1 = x5 + 1 in GF(16) are 1, 𝛽, 𝛽 2 , 𝛽 3 , 𝛽 4 ,

5.11 Cyclotomic Cosets

223

which can be expressed in terms of the primitive element 𝛼 as 1, 𝛼 3 , 𝛼 6 , 𝛼 9 , 𝛼 12 . Using the minimal polynomials shown in Table 5.3 we have x5 + 1 = (x + 1)M3 (x) = (x + 1)(x4 + x3 + x2 + x + 1). Example 5.90



We want to find a field GF(2m ) which has 25th roots of unity. We need 25|(2m − 1).

By trial and error we find that when m = 20, 25|2m − 1. Now let us divide the roots of 220 − 1 into conjugacy classes. Let 𝛽 be a primitive 25th root of unity. The other roots of unity are the powers of 𝛽: 𝛽 0 , 𝛽 1 , 𝛽 2 , . . . , 𝛽 24 . Let us divide these powers into conjugacy classes: {1} {𝛽, 𝛽 2 , 𝛽 4 , 𝛽 8 , 𝛽 16 , 𝛽 7 , 𝛽 14 , 𝛽 3 , 𝛽 6 , 𝛽 12 , 𝛽 24 , 𝛽 23 , 𝛽 21 , 𝛽 17 , 𝛽 9 , 𝛽 18 , 𝛽 11 , 𝛽 22 , 𝛽 19 , 𝛽 13 } {𝛽 5 , 𝛽 10 , 𝛽 20 , 𝛽 15 } Letting Mi (x) ∈ GF(2)[x] denote the minimal polynomial having 𝛽 i for the smallest i as a root, we have the ◽ factorization x25 + 1 = M0 (x)M1 (x)M5 (x). Example 5.91

Let us find a field GF(7m ) in which x15 − 1 has roots; this requires an m such that 15|7m − 1.

m = 4 works. Let 𝛾 be a primitive 15th root of unity in GF(74 ). Then 𝛾 0 , 𝛾 1 , . . . , 𝛾 14 are roots of unity. Let us divide these up into conjugacy classes with respect to GF(7): 3

{1}, {𝛾, 𝛾 7 , 𝛾 49 = 𝛾 4 , 𝛾 7 = 𝛾 13 }, {𝛾 2 , 𝛾 14 , 𝛾 8 , 𝛾 11 }, {𝛾 3 , 𝛾 6 , 𝛾 12 , 𝛾 9 }, {𝛾 5 }, {𝛾 10 } ◽

Thus, x15 − 1 factors into six irreducible polynomials in GF(7).

5.11 Cyclotomic Cosets Definition 5.92 The cyclotomic cosets modulo n with respect to GF(q) contain the exponents of the n distinct powers of a primitive nth root of unity with respect to GF(q), each coset corresponding to ◽a conjugacy class. These cosets provide a shorthand representation for the conjugacy class. Example 5.93 For Example 5.91, n = 15 and q = 7. The cyclotomic cosets and the corresponding conjugacy classes are shown in Table 5.6. ◽

2 3 4 5 6 7 8 9 αα α α α α α α α

+

01010

10

11

00000

12

13

14

15

16

α α α α α α α

cyclomin.cc

“Tables” of cyclotomic cosets and minimal polynomials for GF(2m ) are available using the program cyclomin. Table 5.6: Cyclotomic Cosets Modulo 15 with Respect to GF(7) Conjugacy Class {1} {𝛾, 𝛾 7 , 𝛾 4 , 𝛾 13 } {𝛾 2 , 𝛾 14 , 𝛾 8 , 𝛾 11 } {𝛾 3 , 𝛾 6 , 𝛾 12 , 𝛾 9 } {𝛾 5 } {𝛾 10 }

↔ ↔ ↔ ↔ ↔ ↔

Cyclotomic Cosets {0} {1, 7, 4, 13} {2, 14, 8, 11} {3, 6, 12, 9} {5} {10}

224

Rudiments of Number Theory and Algebra

Appendix 5.A

How Many Irreducible Polynomials Are There?

The material in this appendix is not needed later in the book. However, it introduces several valuable analytical techniques and some useful facts. A finite field GF(qm ) can be constructed as an extension of GF(q) if an irreducible polynomial of degree m over GF(q) exists. The question of the existence of finite fields of order any prime power, then, revolves on the question of the existence of irreducible polynomials of arbitrary degree. Other interesting problems are related to how many such irreducible polynomials there are. To get some insight into the problem, let us first do some exhaustive enumeration of irreducible polynomials with coefficients over GF(2). Let In denote the number of irreducible polynomials of degree n. The polynomials of degree 1, x and x + 1, are both irreducible, so I1 = 2. The polynomials of degree 2 are x2 (reducible)

x2 + 1 = (x + 1)2 (reducible)

x2 + x = x(x + 1) (reducible)

x2 + x + 1 (irreducible).

So I2 = 1. In general, there are 2n polynomials of degree n. Each of these can either be factored into products of powers of irreducible polynomials of lower degree, or are irreducible themselves. Let us count how many different ways the set of binary cubics might factor. It can factor into a product of an irreducible polynomial of degree 2 and a polynomial of degree 1 in I2 I1 = 2 ways: x(x2 + x + 1)

(x + 1)(x2 + x + 1).

It can factor into a product of three irreducible polynomials of degree 1 in four ways: x3

x2 (x + 1)

x(x + 1)2

(x + 1)3

The remaining cubic binary polynomials, x3 + x + 1 and

x3 + x2 + 1

must be irreducible, so I3 = 2. This sort of counting can continue, but becomes cumbersome without some sort of mechanism to keep track of the various combinations of factors. This is accomplished using a generating function approach. Definition 5.94 A generating function of a sequence A0 , A1 , A2 , . . . is the formal power series A(z) =

∞ ∑

Ak zk .



k=0

The generating function is analogous to the z-transform of discrete-time signal processing, allowing us to formally manipulate sequences of numbers by polynomial operations. Generating functions A(z) and B(z) can be added (term by term), multiplied (using polynomial multiplication) ( i ) ( i ) ∞ ∞ ∑ ∑ ∑ ∑ A(z)B(z) = Bk Ai−k zi = Ak Bi−k zi i=0

k=0

i=0

k=0

and (formal) derivatives computed, if A(z) =

∞ ∑ k=0

Ak zk then A′ (z) =

∞ ∑

kAk zk−1 ,

k=1

with operations taking place in some appropriate field. The key theorem for counting the number of irreducible polynomials is the following.

Appendix 5.A How Many Irreducible Polynomials Are There?

225

Theorem 5.95 Let f (x) and g(x) be relatively prime, monic irreducible polynomials over GF(q) of degrees m and n, respectively. Let Ck be the number of monic polynomials of degree k whose only irreducible factors are f (x) and g(x). Then the moment generating function for Ck is C(z) =

1 1 . 1 − zm 1 − zn

∑ That is, Ck = i Bi Ak−i , where Ai is the i th coefficient in the generating function A(z) = 1∕(1 − zm ) and Bi is the i th coefficient in the generating function B(z) = 1∕(1 − zn ).

Example 5.96 Let f (x) = x and g(x) = x + 1 in GF(2)[x]. The set of polynomials whose factors are f (x) and g(x) are those with linear factors, for example, p(x) = (f (x))a (g(x))b ,

a, b ≥ 0.

According to the theorem, the weight enumerator for the number of such polynomials is 1 . (1 − z)2 This can be shown to be equal to ∑ 1 = (k + 1)zk = 1 + 2z + 3z2 + · · · . (1 − z)2 k=0 ∞

(5.49)

That is, there are two polynomials of degree 1 (f (x) and g(x)), three polynomials of degree 2 (f (x)g(x), f (x)2 and ◽ g(x)2 ), four polynomials of degree 3, and so on.

Proof Let Ak be the number of monic polynomials in GF(q)[x] of degree k which are powers of f (x). The kth power of f (x) has degree km, so { 1 if m|k Ak = 0 otherwise. We will take A0 = 1 (corresponding to f (x)0 = 1). The generating function for the Ak is A(z) = 1 + zm + z2m + · · · =

1 . 1 − zm

A(z) is called the enumerator by degree of the powers of f (z). Similarly, let Bk be the number of monic polynomials in GF(q)[x] of degree k which are powers of g(x); arguing as before we have B(z) = 1∕(1 − zn ). With Ck the number of monic polynomials of degree k whose only factors are f (x) and g(x), we observe that if deg(g(x)b ) = nb = i, then deg(f (x)a ) = ma = k − i for every 0 ≤ i ≤ k. Thus, Ck =

k ∑

Bi Ak−i

i=0

or, equivalently, C(z) = A(z)B(z).



The theorem can be extended by induction to multiple sets of polynomials, as in the following corollary.

226

Rudiments of Number Theory and Algebra Corollary 5.97 Let S1 , S2 , . . . , SN be sets of polynomials such that any two polynomials in different sets are relatively prime. The set of polynomials which are products of a polynomial from each set has ∏ an enumerator by degree Ni=1 Ai (z), where Ai (z) is the enumerator by degree of the set Si .

Example 5.98 For the set of polynomials formed by products of x, x + 1 and x2 + x + 1 ∈ GF(2)[x], the enumerator by degree is )2 ( 1 1 = 1 + 2z + 4z2 + 6z3 + 9z4 + · · · 1 − z 1 − z2 That is, there are six different ways to form polynomials of degree 3, and nine different ways to form polynomials of degree 4. (Find them!) ◽

Let Im be the number of monic irreducible polynomials of degree m. Applying the corollary, the set which includes I1 irreducible polynomials of degree 1, I2 irreducible polynomials of degree 2, and so forth, has the enumerator by the degree )Im ∞ ( ∏ 1 . 1 − zm m=1 Let us now extend this to a base field GF(q). We observe that the set of all monic polynomials in GF(q)[x] of degree k contains qk polynomials in it. So the enumerator by degree of the set of polynomials of degree k is ∞ ∑ 1 qk zk = . 1 − qz k=0 Furthermore, the set of all products of powers of irreducible polynomials is precisely the set of all monic polynomials. Hence, we have the following. Theorem 5.99 [33, Theorem 3.32] ∏ 1 = 1 − qz m=1 ∞

(

1 1 − zm

)Im

.

(5.50)

Equation (5.50) does not provide a very explicit formula for computing Im . However, it can be manipulated into more useful forms. Reciprocating both sides, we obtain (1 − qz) =

∞ ∏

(1 − zm )Im .

(5.51)

m=1

Taking the formal derivative of both sides and rearranging, we obtain ∞ ∑ mIm zm−1 q . = 1 − qz m=1 1 − zm

(5.52)

Multiplying both sides by z and expanding both sides of (5.52) in formal series, we obtain ∞ ∑ k=1

(qz)k =

∞ ∑ m=1

mIm

∞ ∑ k=1

(zm )k =

∞ ∑ m=1

mIm

∑ k∶k≠0 m|k

zk =

∞ ∑ ∑

mIm zk .

k=1 m∶m|k

Equating the kth terms of the sums on the left and right sides of this equation, we obtain the following theorem.

Appendix 5.A How Many Irreducible Polynomials Are There? Theorem 5.100 qk =



mIm ,

227

(5.53)

m|k

where the sum is taken of all m which divide k, including 1 and k. This theorem has the following interpretation. By Theorem 5.72, in a field of order q, the product k k of all distinct monic polynomials whose degrees divide k divides xq − x. The degree of xq − x is qk , the left-hand side of (5.53). The degree of the product of all distinct monic polynomials whose degrees divide k is the sum of the degrees of those polynomials. Since there are Im distinct monic irreducible polynomials, the contribution to the degree of the product of those polynomials is mIm . Adding all of these up, we obtain the right-hand side of (5.53). This implies the following: k

Theorem 5.101 [33, Theorem 4.415] In a field of order q, xq − x factors into the product of all monic irreducible polynomials whose degrees divide k.

Example 5.102 Let us take q = 2 and k = 3. The polynomials whose degrees divide k = 3 have degree 1 or 3. The product of the binary irreducible polynomials of degree 1 and 3 is x(x + 1)(x3 + x + 1)(x3 + x2 + 1) = x8 + x.



Theorem 5.100 allows a sequence of equations to be built up for determining Im for any m. Take for example q = 2: k = 1: 2 = (1)I1

−→ I1 = 2

k = 2: 4 = (1)I1 + 2I2

−→ I2 = 1

k = 3: 8 = (1)I1 + 3I3

−→ I3 = 2.

⋮ Appendix 5.A.1

Solving for Im Explicitly: The Moebius Function

However, Equation (5.53) only implicitly determines Im . An explicit formula can also be found. Equation (5.53) is a special case of a summation of the form ∑ g(m) (5.54) f (k) = m|k

in which f (k) = qk and g(m) = mIm . Solving such equations for g(m) can be accomplished using the number-theoretic function known as the Moebius (or Möbius) function 𝜇. Definition 5.103 The function 𝜇(n) ∶ ℤ+ → ℤ+ is the Moebius function, defined by ⎧1 ⎪ 𝜇(n) = ⎨(−1)r ⎪0 ⎩

if n = 1 if n is the product of r distinct primes if n contains any repeated prime factors.

Theorem 5.104 The Moebius function satisfies the following formula: { ∑ 1 if n = 1 𝜇(d) = 0 if n > 1. d|n



(5.55)

228

Rudiments of Number Theory and Algebra The proof is developed in Exercise 5.81. This curious “delta-function-like” behavior allows us to compute an inverse of some number-theoretic sums, as the following theorem indicates. ∑ Theorem 5.105 (Moebius inversion formula) If f (n) = d|n g(d) then ∑ 𝜇(d)f (n∕d). g(n) = d|n

Proof Let d|n. Then from the definition of f (n), we have ∑ g(k). f (n∕d) = k|(n∕d)

Multiplying both sides of this by 𝜇(d) and summing over divisors d of n, we obtain ∑ ∑ ∑ 𝜇(d)f (n∕d) = 𝜇(d)g(k). d|n

d|n k|(n∕d)

The order of summation can be interchanged as ∑ ∑ ∑ ∑ ∑ 𝜇(d)f (n∕d) = 𝜇(d)g(k) = g(k) 𝜇(d). d|n

k|n d|(n∕k)

k|n

d|(n∕k)



By (5.55), d|(n∕k) 𝜇(d) = 1 if n∕k = 1, that is, if n = k, and is zero otherwise. So the double summation collapses down to a g(n). ◽ Returning now to the problem of irreducible polynomials, (5.53) can be solved for Im using the Moebius inversion formula of Theorem 5.105, ∑ 1∑ mIm = 𝜇(d)q(m∕d) or Im = 𝜇(d)q(m∕d) . (5.56) m d|m d|m

Programming Laboratory 4: Programming the Euclidean Algorithm

2) Compute (x3 + 2x2 + x + 4, x2 + 3x + 4), operations in ℝ[x], and also find polynomials s(x) and t(x) such that a(x)s(x) + b(x)t(x) = (a(x), b(x)).

Objective The Euclidean algorithm is important both for modular arithmetic in general and also for specific decoding algorithms for BCH/Reed–Solomon codes. In this lab, you are to implement the Euclidean algorithm over both integers and polynomials.

Background Code is provided which implements modulo arithmetic in the class ModAr, implemented in the files indicated in Algorithm 5.2. α α 2 α 3 α 4 α5 α 6 α 7 α 8 α 9 +

Preliminary Exercises Reading: Sections 5.2.2 and 5.2.3. 1) In ℤ5 [x], determine g(x) = (2x5 + 3x4 + 4x3 + 3x2 + 2x + 1, x4 + 2x3 + 3x2 + 4x + 3) and also s(x) and t(x) such that a(x)s(x) + b(x)t(x) = (a(x), b(x)).

01010

00000

α 10 α11 α12α 13α 14 α 15 α 16

Algorithm 5.2 Modulo Arithmetic File: ModAr.h ModArnew.h testmodarnew1.cc

Code is also provided which implements polynomial arithmetic in the class polynomialT, using the files indicated

Appendix 5.A How Many Irreducible Polynomials Are There? in Algorithm 5.3. This class is templatized, so that the coefficients can come from a variety of fields or rings. For example, if you want a polynomial with double coefficients or int coefficients or ModAr coefficients, the objects are declared as polynomialT p1; polynomialT p2; polynomialT p3; α α 2 α 3 α 4 α5 α 6 α 7 α 8 α 9 +

01010

00000

α 10 α11α 12α 13α 14 α 15 α 16

Algorithm 5.3 Templatized Polynomials File: polynomialT.h polynomialT.cc testpoly1.cc

Programming Part 1) Write a C or C++ function that performs the Euclidean algorithm on integers a and b, returning g, s, and t such that g = as + bt. The function should have declaration void gcd(int a, int b, int &g, int &s, int &t);

Test your algorithm on (24, 18), (851,966), and other pairs of integers. Verify in each case that as + bt = g. 2) Write a function that computes the Euclidean algorithm on polynomialT. The function should have declaration template void gcd(const polynomialT &a, const polynomialT &b, polynomialT &g, polynomialT &s, polynomialT &t);

Also, write a program to test your function. Algorithm 5.4 shows a test program and the framework for the program,

Programming Laboratory 5: Programming Galois Field Arithmetic Objective Galois fields are fundamental to algebraic block codes. This lab provides a tool to be used for BCH and Reed–Solomon codes. It builds upon the LFSR code produced in Lab 2.

229

showing how to instantiate the function with ModAr and double polynomial arguments. α α 2 α 3 α 4 α5 α 6 α 7 α 8 α 9 +

01010

00000

Algorithm 5.4 Polynomial GCD File: testgcdpoly.cc gcdpoly.cc

α 10 α11 α12α 13α 14 α 15 α 16

Test your algorithm as follows: (a) Compute (3x7 + 4x6 + 3x4 + x3 + 1, 4x4 + x3 + x) and t(x) and s(x) for polynomials in ℤ5 [x]. Verify that a(x)s(x) + b(x)t(x) = g(x). (b) Compute (2x5 + 3x4 + 4x3 + 3x2 + 2x + 1, x4 + 3 2 2x + 3x + 4x + 3) and s(x) and t(x) for polynomials in ℤ5 [x]. Verify that a(x)s(x) + b(x)t(x) = g(x). (c) Compute (2 + 8x + 10x2 + 4x3 , 1 + 7x + 14x2 + 8x3 ) and s(x) and t(x) for polynomials in ℝ[x]. For polynomials with real coefficients, extra care must be taken to handle roundoff. Verify that a(x)s(x) + b(x)t(x) = g(x). 3) Write a function which applies the Sugiyama algorithm to a sequence of data or its polynomial representation. 4) Test your algorithm over ℤ5 [x] by finding the shortest polynomial generating the sequence {3, 2, 3, 1, 4, 0, 4, 3}. Having found t(x), compute b(x)t(x) and identify r(x) and s(x) and verify that they are consistent with the result found by the Sugiyama algorithm. 5) In ℤ5 [x], verify that the sequence {3, 2, 1, 0, 4, 3, 2, 1} can be generated using the polynomial t1 (x) = 1 + 2x + 4x2 + 2x3 + x4 . Then use the Sugiyama algorithm to find the shortest polynomial t(x) generating the sequence and verify that it works.

Preliminary Exercises Reading: Section 5.4. Write down the vector, polynomial, and power representations of the field GF(23 ) generated with the polynomial g(x) = 1 + x + x3 . Based on this, write down the tables v2p and p2v for this field. (See the implementation suggestions for the definition of these tables.)

230

Rudiments of Number Theory and Algebra

Programming Part Create a C++ class GFNUM2m with overloaded operators to implement arithmetic over the field GF(2m ) for an arbitrary m < 32. This is similar in structure to class ModAr, except that the details of the arithmetic are different. Test all operations of your class: +,-,*,/,∧+, +=, -=, *=, /=, +∧=+, ==, != for the field generated by g(x) = 1 + x + x4 by comparing the results the computer provides with results you calculate by hand. Then test for GF(23 ) generated by g(x) = 1 + x + x3 . The class GFNUM2m of Algorithm 5.5 provides the declarations and definitions for the class. In this representation, the field elements are represented intrinsically in the vector form, with the vector elements stored as the bits in a single int variable. This makes addition fast (bit operations). Multiplication of Galois field elements is easier when they are in exponential form and addition is easier when they are in vector form. Multiplication here is accomplished by converting to the power representation, adding the exponents, then converting back to the vector form. In the GFNUM2m provided, all of the basic field operations are present except for completing the construction operator initgf which builds the tables v2p and p2v. The main programming task, therefore, is to build these tables. This builds upon the LFSR functions already written. α α 2 α 3 α 4 α5 α 6 α 7 α 8 α 9 +

01010

00000

α 10 α11α 12α 13α 14 α 15 α 16

i 0 1 2 3 4 5 6 7

i 8 9 10 11 12 13 14 15

v2p[i] 3 14 9 7 6 13 11 12

p2v[i] 5 10 7 14 15 13 9 –

tion initgf(int m, int g) is created. Given the degree of the extension m and the coefficients of g(x) in the bits of the integer g, initgf sets up the conversion arrays. This can take advantage of the LFSR programmed in Lab 2. Starting with an initial LFSR state of 1, the v2p and p2v arrays can be obtained by repeatedly clocking the LFSR: the state of the LFSR represents the vector representation of the Galois field numbers, while the number of times the LFSR has been clocked represents the power representation. There are some features of this class which bear remarking on:



Algorithm 5.5 File: GFNUM2m.h GF2.h GFNUM2m.cc testgfnum.cc

Example 5.106 In the field GF(24 ) represented in Table 5.1, the field element (1, 0, 1, 1) has the power representation 𝛼 7 . The vector (1, 0, 1, 1) can be expressed as an integer using binary-to-decimal conversion (LSB on the right) as 11. We thus think of 11 as the vector representation. The number v2p[11] converts from the vector representation, 11, to the exponent of the power representation, 7. Turned around the other way, the number 𝛼 7 has the vector representation (as an integer) of 11. The number p2v[7] converts from the exponent of the power representation to the number 11. The conversion tables for the field are

p2v[i] 1 2 4 8 3 6 12 11

To get the whole thing working, the arrays p2v and v2p need to be set up. To this end, a static member func-

GF (2 m )

To make the conversion between the vector and power representations, two arrays are employed. The array v2p converts from vector to power representation and the array p2v converts from power to vector representation.

v2p[i] – 0 1 4 2 8 5 10

The output format (when printing) can be specified in either vector or power form. In power form, something like A∧+3 is printed; in vector form, an integer like 8 is printed. The format can be specified by invoking the static member function setouttype, as in GFNUM2m::setouttype(vector); // set vector output format GFNUM2m::setouttype(power); // set power output format



The v2p and p2v arrays are stored as static arrays. This means that (for this implementation) all field elements must come from the same size field. It is not possible, for example, to have some elements to be GF(24 ) and other elements to be GF(28 ). (You may want to give some thought to how to provide for such flexibility in a memory efficient manner.)



A few numbers are stored as static data in the class. The variable gfm represents the number m in GF(2m ). The variable gfN represents the number 2m − 1. These should be set up as part of the initgf function. These numbers are used in various operators (such as multiplication, division, and> exponentiation).

5.12 Exercise



Near the top of the header are the lines extern GFNUM2m ALPHA; // set up a global alpha extern GFNUM2m& A; // and a reference to alpha // for shorthand

These declare the variables ALPHA and A, the latter of which is a reference to the former. These variables can be used to represent the variable 𝛼 in your programs, as in GFNUM2m a; a = (A∧4) + (A∧8);//

231 // a is alpha∧4 + alpha∧8

The definitions of these variables should be provided in the GFNUM2m.cc file. Write your code, then extensively test the arithmetic using GF(24 ) (as shown above) and GF(28 ). For the field GF(28 ), use the primitive polynomial p(x) = 1 + x2 + x3 + x4 + x8 : GFNUM2m::initgf(8,0x11D); // 1 0001 1101 // x∧8 + x∧4 + x∧3 + x∧2 + 1

5.12 Exercise 5.1

Referring to the computations outlined in Example 5.1: (a) Write down the polynomial equivalents for 𝛾1 , 𝛾2 , . . . , 𝛾31 . (That is, find the binary representation for 𝛾i and express it as a polynomial.) (b) Write down the polynomial representation for 𝛾i3 , using operations modulo M(x) = 1 + x2 + x5 . (c) Explicitly write down the 10 × 31 binary parity check matrix H in (5.1).

5.2 5.3 5.4 5.5 5.6 5.7

5.8

Prove the statements in Lemma 5.4 that apply to integers. [325] Let s and g > 0 be integers. Show that integers x and y exist satisfying x + y = s and (x, y) = g if and only if g|s. n m [325] Show that if m > n, then (a2 + 1)|(a2 − 1). Wilson’s theorem: Show that if p is a prime, then (p − 1)! ≡ −1 (mod p). Let f (x) = x10 + x9 + x5 + x4 and g(x) = x2 + x + 1 be polynomials in GF(2)[x]. Write f (x) = g(x)q(x) + r(x), where deg(r(x)) < deg(g(x)). Uniqueness of division algorithm: Suppose that for integers a > 0 and b there are two representations b = q1 a + r1 b = q2 a + r2 , with 0 ≤ r1 < a and 0 ≤ r2 < a. Show that r1 = r2 . Let Ra [b] be the remainder of b when divided by a, where a and b are integers. That is, by the division algorithm, b = qa + Ra [b]. Prove the following by relating both sides to the division algorithm. (a) Ra [b + c] = Ra [Ra [b] + Ra [c]]. (b) Ra [bc] = Ra [Ra [b]Ra [c]]. (c) Do these results extend to polynomials?

5.9

Find the GCD g of 6409 and 42823. Also, find s and t such that 6409s + 42823t = g.

232

Rudiments of Number Theory and Algebra 5.10 Use the extended Euclidean algorithm over ℤp [x] to find g(x) = (a(x), b(x)) and the polynomials s(x) and t(x) such that a(x)s(t) + b(x)t(x) = g(x) for (a) a(x) = x3 + x + 1, b(x) = x2 + x + 1 for p = 2 and p = 3. (b) a(x) = x6 + x5 + x + 1, b(x) = x4 + x3 , p = 2 and p = 3. 5.11 Let a ∈ ℤn . Describe how to use the Euclidean algorithm to find an integer b ∈ ℤn such that ab = 1 in ℤn , if such a b exists and determine conditions when such a b exists. 5.12 Show that all Euclidean domains with a finite number of elements are fields. 5.13 Prove the GCD properties in Theorem 5.12. 5.14 Let a and b be integers. The least common multiple (LCM) m of a and b is the smallest positive integer such that a|m and b|m. The LCM of a and b is frequently denoted [a, b] (For polynomials a(x) and b(x), the LCM is the polynomial m(x) of smallest degree such that a(x)|m(x) and b(x)|m(x).) (a) If s is any common multiple of a and b (that is, a|r and b|r) and m = [a, b] is the least common multiple of a and b, then m|s. Hint: division algorithm. (b) Show that for m > 0, [ma, mb] = m[a, b]. (c) Show that [a, b](a, b) = |ab| 5.15 Let 1 and 2 be cyclic codes of length n generated by g1 (x) and g2 (x), respectively, with g1 (x) ≠ g2 (x). Let 3 = 1 ∩ 2 . Show that 3 is also a cyclic code and determine its generator polynomial g3 (x). If d1 and d2 are the minimum distances of 1 and 2 , respectively, what can you say about the minimum distance of 3 ? ∑p−1 5.16 Show that i=1 i−2 ≡ 0 (mod p), where p is a prime. Hint: The sum of the squares of the first n natural numbers is n(n + 1)(2n + 1)∕6. 5.17 [472] Show that {as + bt ∶ s, t ∈ ℤ} = {k(a, b) ∶ k ∈ ℤ} for all a, b ∈ ℤ. 5.18 Show that the update equations for the extended Euclidean algorithm in (5.10) are correct. That is, show that the recursion (5.10) produces si and ti satisfying the equation asi + bti = ri for all i. Hint: Show for the initial conditions given in (5.11) that (5.9) is satisfied for i = −1 and i = 0. Then do a proof by induction. 5.19 [45] A matrix formulation of the Euclidean algorithm. For polynomials a(x) and b(x), ⌋ ⌋ use a(x) b(x) + r(x) to denote the division algorithm, where q(x) = is the the notation a(x) = a(x) b(x) b(x) [ ] 1 0 (0) (0) (0) quotient. Let deg(a(x)) > deg(b(x)). Let a (x) = a(x) and b (x) = b(x) and A (x) = . 0 1 Let ⌋ a(k) (x) (k) q (x) = (k) b (x) [ ] 0 1 (k+1) A (x) = A(k) (x) 1 −q(k) (x) [ (k+1) ] [ ] [ (k) ] 0 1 a (x) a (x) = . 1 −q(k) (x) b(k) (x) b(k+1) (x) [ ] [ (k+1) ] (x) a (k+1) a(x) . = A b(x) b(k+1) (x) (K) (b) Show that [b (x) =] 0 for some[ integer ] K. a(x) a(K) (x) (K) . Conclude that any divisor of both a(x) and b(x) also (c) Show that = A (x) b(x) 0 divides a(K) (x). Therefore, (a(x), b(x))|a(K) (x). (a) Show that

5.12 Exercise (d) Show that

233 [ ] [ K [ (k) ]] [ (K) ] ∏ q (x) 1 a(x) a (x) = . b(x) 1 0 0 k=1

Hence conclude that a(K) (x)|a(x) and a(K) (x)|b(x), and therefore that a(K) (x)|(a(x), b(x)). (e) Conclude that a(K) (x) = 𝛾(a(x), b(x)) for some scalar 𝛾. Furthermore, show that a(K) (x) = (x)a(x) + A(K) (x)b(x). A(K) 11 12 5.20 [472] More on the matrix formulation of the Euclidean algorithm. Let [ [ ] [ ] ] 0 1 s−1 t−1 1 0 0 (i) R = = and Q = 0 1 s 0 t0 1 −q(i) (x), where q(i) (x) = ⌊ri−2 ∕ri−1 ⌋ and let R(i+1) = Q(i+1) R(i) . Show that: [ ] ] [ a(x) ri−1 (x) (i) , 0 ≤ i ≤ K, where K is the last index such that rK (x) ≠ 0. =R (a) b(x) ri (x) [ ] s t (b) R(i) = i−1 i−1 , 0 ≤ i ≤ K. si ti (c) Show that any common divisor of ri and ri+1 is a divisor of rK and that rK |ri and rK |ri+1 for −1 ≤ i < K. (d) si ti+1 − si+1 ti = (−1)i+1 , so that (si , ti ) = 1. Hint: determinant of R(i+1) . (e) si a + ti b = ri , −1 ≤ i ≤ K. (f) (ri , ti ) = (a, ti ). 5.21 [304] Properties of the extended Euclidean algorithm. Let qi , ri , si , and ti be defined as in Algorithm 5.1. Let n be the value of i such that rn = 0 (the last iteration). Using proofs by induction, show that the following relationships exist among these quantities: ti ri−1 − ti−1 ri = (−1)i a

0≤i≤n

si ri−1 − si−1 ri = (−1)i+1 b

0≤i≤n

si ti−1 − si−1 ti = (−1)

0≤i≤n

i+1

−1 ≤ i ≤ n

s i a + ti b = r i deg(si ) + deg(ri−1 ) = deg(b)

1≤i≤n

deg(ti ) + deg(ri−1 ) = deg(a)

0 ≤ i ≤ n.

5.22 Continued fractions and the Euclidean algorithm. Let u0 and u1 be in a field 𝔽 or ring of polynomials over a field 𝔽 [x]. A continued fraction representation of the ratio u0 ∕u1 is a fraction of the form u0 1 = a0 + . (5.57) 1 u1 a1 + 1 a2 + · · · 1 aj−1 + aj For example,

51 =2+ 22

1 3+

1 7

.

The continued fraction (5.57) can be denoted as ⟨a0 , a1 , . . . , aj ⟩.

234

Rudiments of Number Theory and Algebra (a) Given u0 and u1 , show how to use the Euclidean algorithm to find the a0 , a1 , . . . , aj in the continued fraction representation of u0 ∕u1 . Hint: by the division algorithm, u0 = u1 a0 + u2 . This is equivalent to u0 ∕u1 = a0 + 1∕(u1 ∕u2 ). (b) Determine the continued fraction expansion for u0 = 966, u1 = 815. Verify that it works. (c) Let u0 (x), u1 (x) ∈ ℤ5 [x], where u0 (x) = 1 + 2x + 3x2 + 4x3 + 3x5 and u1 (x) = 1 + 3x2 + 2x3 . Determine the continued fraction expansion for u0 (x)∕u1 (x). 5.23 [304] Padé Approximation and the Euclidean Algorithm. Let A(x) = a0 + a1 x + a2 x2 + · · · be a power series with coefficients in a field 𝔽 . A (𝜇, 𝜈) Padé approximant to A(x) is a rational function p(x)∕q(x) such that q(x)A(x) ≡ p(x) (mod xN+1 ), where 𝜇 + 𝜈 = N and where deg(p(x) ≤ 𝜇 and deg(q(x)) ≤ 𝜈. That is, A(x) agrees with the expansion p(x)∕q(x) for terms up to xN . The Padé condition can be written as q(x)AN (x) ≡ p(x) (mod xN+1 ), where AN (x) is the Nth truncation of A(x), AN (x) = a0 + a1 x + · · · + aN xN .

(a) Describe how to use the Euclidean algorithm to obtain a sequence of polynomials rj (x) and tj (x) such that ti (x)AN (x) ≡ rj (x) (mod xN+1 ). (b) Let 𝜇 + 𝜈 = deg(a(x)) − 1, with 𝜇 ≥ deg((a(x), b(x)). Show that there exists a unique index j such that deg(rj ) ≤ 𝜇 and deg(tj ) ≤ 𝜈. Hint: See the last property in Exercise 5.21 (c) Let A(x) = 1 + 2x + x3 + 3x7 + x9 + · · · be a power series. Determine a Padé approximation with 𝜇 = 5 and 𝜈 = 3; that is, an approximation to the truncated series A8 (x) = 1 + 2x + x3 + 3x7 . 5.24 Let I1 and I2 be ideals in 𝔽 [x] generated by g1 (x) and g2 (x), respectively. (a) The LCM of g1 (x) and g2 (x) is the polynomial g(x) of smallest degree such that g1 (x)|g(x) and g2 (x)|g(x). Show that I1 ∩ I2 is generated by the LCM of g1 (x) and g2 (x). (b) Let I1 + I2 mean the smallest ideal which contains I1 and I2 . Show that I1 + I2 is generated by the greatest common divisor of g1 (x) and g2 (x). 5.25 Using the extended Euclidean algorithm, determine the shortest linear feedback shift register that could have produced the sequence [1, 4, 2, 2, 4, 1] with elements in ℤ5 . 5.26 Prove statements (1)–(9) of Theorem 5.27. 5.27 If x is an even number, then x ≡ 0 (mod 2). What congruence does an odd integer satisfy? What congruence does an integer of the form x = 7k + 1 satisfy? 5.28 [325] Write a single congruence equivalent to the pair of congruences x ≡ 1 (mod 4) and x ≡ 2 (mod 3). 5.29 Compute: 𝜙(190), 𝜙(191), 𝜙(192). 5.30 [325] Prove the following divisibility facts: (a) (b) (c) (d) (e)

n6 − 1 is divisible by 7 if (n, 7) = 1. n7 − n is divisible by 42 for any integer n. n12 − 1 is divisible by 7 if (n, 7) = 1. n6k − 1 is divisible by 7 if (n, 7) = 1, k a positive integer. n13 − n is divisible by 2, 3, 5, 7, and 13 for any positive integer n.

5.31 Prove that 𝜙(n) = n

∏ p|n

( ) 1 1− , p e

e

e

where the product is taken over all primes p dividing n. Hint: write n = p11 p22 · · · pr r .

5.12 Exercise

235

5.32 In this exercise, you will prove an important property of the Euler 𝜙 function: ∑ 𝜙(d) = n,

(5.58)

d|n

where the sum is over all the numbers d that divide n. (a) Suppose n = pe , where p is prime. Show that (5.58) is true. (b) Now proceed by induction. Suppose that (5.58) is true for integers with k or fewer distinct prime factors. Consider any integer N with k + 1 distinct prime factors. Let p denote one of the prime factors of N and let pe be the highest power of p that divides N. Then N = pe n, where n has k distinct prime factors. As d ranges over the divisors of n, the set d, pd, p2 d, . . . , pe d ranges over the divisors of N. Now complete the proof. 5.33 Let Gn be the elements in ℤn that are relatively prime to n. Show that Gn forms a group under multiplication. 5.34 RSA Encryption: Let p = 97 and q = 149. Encrypt the message m = 1234 using the public key {e, n} = {35, 14453}. Determine the private key {d, n}. Then decrypt. 5.35 A message is encrypted using the public key {e, n} = {23, 64777}. The encrypted message is c = 1216. Determine the original message m. (That is, crack the code.) Check your answer by re-encoding it using e. 5.36 Find all integers that simultaneously satisfy the congruences x ≡ 2 (mod 4), x ≡ 1 (mod 9), and x ≡ 2 (mod 5). 5.37 Find a polynomial f (x) ∈ ℤ5 [x] simultaneously satisfying the three congruences f (x) ≡ 2 (mod x − 1) f (x) ≡ 3 + 2x (mod (x − 2)2 ) f (x) ≡ x2 + 3x + 2 (mod (x − 3)3 ). 5.38 Evaluation homomorphism: Show that f (x) (mod x − u) = f (u). Show that 𝜋 ∶ 𝔽 [x] → 𝔽 defined by 𝜋u (f (x)) = f (u) is a ring homomorphism. 5.39 Determine a Lagrange interpolating polynomial f (x) ∈ ℝ[x] such that f (1) = 3 f (2) = 6

f (3) = 1.

5.40 Let (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ) be points. (a) Write down the Lagrange interpolants li (x) for these points. (b) Write down a polynomial f (x) that interpolates through these points. 5.41 An interesting identity. In this exercise you will prove that for p1 , p2 , . . . , pN all distinct, the following identity holds: N N ∑ ∏ pi =1 p − pn i=1 n=1,n≠i i in any field. (a) Verify the identity when N = 2 and N = 3. (b) Let f (x) = xN−1 . Find a Lagrange interpolating polynomial g(x) for the points ), (p2 , pN−1 ), . . . , (pN , pN−1 ). (p1 , pN−1 N 1 2 (c) Determine the (N − 1)st derivative of g(x), g(N−1) (x). (d) Determine the (N − 1)st derivative of f (x), f (N−1) (x). (e) Show that the identity is true. (f) Based on this identity, prove the following facts:

236

Rudiments of Number Theory and Algebra

i.

∑ N ∏N i=1

i n=1,n≠i i−n

= 1. Based on this, show that N ( ) 1 ∑ N N i = 1. (−1)N−i N! i=1 i

ii.

∑ N ∏N i=1

n n=1,n≠i n−i

= 1. Based on this, show that N ∑

(−1)i−1

i=1

iii. iv.

∑ N ∏N

1 i=1 n=1,n≠i 1−xn−i = 1, x ≠ ∑ N ∏N 1 i=1 n=1,n≠i 1−(n∕i)x = 1 for

( ) N = 1. i

1. all x ≠ 0.

∑ 5.42 Let lj (x) be a Lagrange interpolant, as in (5.32). Show that rj=1 lj (x) = 1. 5.43 Show that x5 + x3 + 1 is irreducible over GF(2). 5.44 Determine whether each of the following polynomials in GF(2)[x] is irreducible. If irreducible, determine if it is also primitive. (a) x2 + 1 (b) x2 + x + 1 (c) x3 + x + 1

(d) x4 + x2 + 1 (e) x4 + x2 + x2 + x + 1. (f) x4 + x3 + x + 1

(g) x5 + x3 + x2 + x + 1 (h) x5 + x2 + 1 (i) x6 + x5 + x4 + x + 1

5.45 Let p(x) ∈ GF(2)[x] be p(x) = x4 + x3 + x2 + x + 1. This polynomial is irreducible. Let this polynomial be used to create a representation of GF(24 ). Using this representation of GF(24 ), do the following: (a) Let 𝛼 be a root of p(x). Show that 𝛼 is not a primitive element. (b) Show that 𝛽 = 𝛼 + 1 is primitive. (c) Find the minimal polynomial of 𝛽 = 𝛼 + 1. 5.46 Solve the following set of equations over GF(24 ), using the representation in Table 5.1. 𝛼x + 𝛼 5 y + z = 𝛼 𝛼2x + 𝛼3y + 𝛼8z = 𝛼4 𝛼 2 x + y + 𝛼 9 z = 𝛼 10 . 5.47 Show that (x − 𝛽) is a factor of a polynomial f (x) if and only if f (𝛽) = 0. 5.48 Create a table such as Table 5.1 for the field GF(23 ) generated by the primitive polynomial x3 + x + 1, including the Zech logarithms. 5.49 Construct the field GF(8) using the primitive polynomial p(x) = 1 + x2 + x3 , producing a table similar to Table 5.1. Use 𝛽 to represent the root of p(x): 𝛽 3 + 𝛽 2 + 1 = 0. 5.50 Extension of GF(3): (a) Prove that p(x) = x2 + x + 2 is irreducible in GF(3). (b) Construct the field GF(32 ) using the primitive polynomial x2 + x + 2. 5.51 Let g(x) = (x2 − 3x + 2) ∈ ℝ[x]. Show that in ℝ[x]∕⟨g(x)⟩ there are zero divisors, so that this does not produce a field. 5.52 Let f (x) be a polynomial of degree n over GF(2). The reciprocal of f (X) is defined as f ∗ (x) = xn f (1∕x).

5.12 Exercise

237

(a) Find the reciprocal of the polynomial f (x) = 1 + x + x5 . (b) Let f (x) be a polynomial with nonzero constant term. Prove that f (x) is irreducible over GF(2) if and only if f ∗ (x) is irreducible over GF(2). (c) Let f (x) be a polynomial with nonzero constant term. Prove that f (x) is primitive if and only if f ∗ (x) is primitive. 5.53 Extending Theorem 5.51. Show that ( k )p k ∑ ∑ p xi = xi . i=1

Show that

i=1 r

r

r

(x + y)p = xp + yp . 5.54 Prove Lemma 5.68. 5.55 Show that, over any field, xs − 1|xr − 1 if and only if s|r. 5.56 [33, p. 29] Let d = (m, n). Show that (xm − 1, xn − 1) = xd − 1. Hint: Let rk denote the remainders for the Euclidean algorithm over integers computing (m, n), with rk−2 = qk rk−1 + rk , and let r(k) be the remainder for the Euclidean algorithm over polynomials computing (xm − 1, xn − 1). Show that [q −1 ] k ∑ rk rk−1 i x (x ) (xrk−1 − 1) = xqk rk−1 +rk − xrk , i=0

so that x

rk−1

−1=x

rk

[q −1 k ∑

] (x

rk−1 i

)

(xrk−1 − 1) + xrk − 1.

i=0

5.57 The set of Gaussian integers ℤ[i] is made by adjoining i =

√ −1 to ℤ:

ℤ[i] = {a + bi ∶ a, b ∈ ℤ}. √ (This is analogous to adjoining i = −1 to ℝ to form ℂ.) For a ∈ ℤ[i], define the valuation function 𝑣(a) = aa∗ , where ∗ denotes complex conjugation. 𝑣(a) is also called the norm. ℤ[i] with this valuation forms a Euclidean domain. (a) Show that 𝑣(a) is a Euclidean function. Hint: Let a = a1 + a2 i and b = b1 + b2 i, a∕b = Q + iS with Q, S ∈ ℚ and q = q1 + iq2 , q1 , q2 ∈ ℤ be the nearest integer point to (a∕b) and let r = a − bq. Show that 𝜈(r) < 𝜈(b) by showing that 𝜈(r)∕𝜈(b) < 1. (b) Show that the units in ℤ[i] are the elements with norm 1. (c) Compute the GCDs of 6 and 3 + i in ℤ[i]. Express them as linear combinations of 6 and 3 + i. 2

r−1

5.58 The trace is defined as follows: For 𝛽 ∈ GF(pr ), Tr(𝛽) = 𝛽 + 𝛽 p + 𝛽 p + · · · + 𝛽 p the trace has the following properties:

. Show that

(a) For every 𝛽 ∈ GF(pr ), Tr(𝛽) ∈ GF(p). (b) There is at least one element 𝛽 ∈ GF(pr ) such that Tr(𝛽) ≠ 0. (c) The trace is a GF(p)-linear function. That is, for 𝛽, 𝛾 ∈ GF(p) and 𝛿1 , 𝛿2 ∈ GF(pr ), Tr[𝛽𝛿1 + 𝛾𝛿2 ] = 𝛽Tr[𝛿1 ] + 𝛾Tr[𝛿2 ].

238

Rudiments of Number Theory and Algebra 5.59 Square roots in finite fields: Show that every element in GF(2m ) has a square root. That is, for every 𝛽 ∈ GF(2m ), there is an element 𝛾 ∈ GF(2m ) such that 𝛾 2 = 𝛽. e e e 5.60 [472, p. 238] Let q = pm and let t be a divisor of q − 1, with prime factorization t = p11 p22 · · · pr r . Prove the following: (a) (b) (c) (d) (e)

For 𝛼 ∈ GF(q) with 𝛼 ≠ 0, ord(𝛼) = t if and only if 𝛼 t = 1 and 𝛼 t∕pi ≠ 1 for i = 1, 2, . . . , r. e GF(q) contains an element 𝛽i of order pi i for i = 1, 2, . . . , r. If 𝛼 ∈ GF(q) and 𝛽 ∈ GF(q) have ( ord(𝛼), ord(𝛽)) = 1, then ord(𝛼𝛽) = ord(𝛼) ord(𝛽). GF(q) has an element of order t. GF(q) has a primitive element.

5.61 Let f (x) = (x − a1 )r1 · · · (x − al )rl . Let f ′ (x) be the formal derivative of f (x). Show that (f (x), f ′ (x)) = (x − a1 )r1 −1 · · · (x − al )rl −1 . √ √ 5.62 The polynomial f (x) = x2 − 2 is irreducible over ℚ[x] because 2 is irrational. Prove that 2 is irrational. 5.63 Express the following as products of binary irreducible polynomials over GF(2)[x]. (a) x7 + 1. (b) x15 + 1. 5.64 Construct all binary cyclic codes of length 7. (That is, write down the generator polynomials for all binary cyclic codes of length 7.) 5.65 Refer to Theorem 4.16. List all of the distinct ideals in the ring GF(2)[x]∕(x15 − 1) by their generators. 5.66 [483] List by dimension all of the binary cyclic codes of length 31. 5.67 [483] List by dimension all of the 8-ary cyclic codes of length 33. Hint: x33 − 1 = (x + 1)(x2 + x + 1)(x10 + x7 + x5 + x3 + 1)(x10 + x9 + x5 + x + 1) (x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1). 5.68 List the dimensions of all the binary cyclic codes of length 19. Hint: x19 + 1 = (x + 1)

18 ∑

xi .

i=0

5.69 Let 𝛽 be an element of GF(2 ), with 𝛽 ≠ 0 and 𝛽 ≠ 1. Let q(x) be the minimal polynomial of 𝛽. What is the shortest cyclic code with q(x) as the generator polynomial? 5.70 Let 𝛽 ∈ GF(q) have minimal polynomial m(x) of degree d. Show that the reciprocal xd m(1∕x) is the minimal polynomial of 𝛽 −1 . 5.71 Let 𝛼 ∈ GF(210 ) be primitive. Find the conjugates of 𝛼 with respect to GF(2), GF(4), and GF(32). 5.72 In the field GF(9) constructed with the primitive polynomial x2 + x + 2 (see Exercise 5.50), determine the minimal polynomials of all the elements with respect to GF(3) and determine the cyclotomic cosets. 5.73 [483] Determine the degree of the minimal polynomial of 𝛽 with respect to GF(2) for a field element 𝛽 with the following orders: (a) ord(𝛽) = 3. Example: {𝛽, 𝛽 2 , 𝛽 4 = 𝛽}, so there are two conjugates. Minimal polynomial has degree 2. (b) ord(𝛽) = 5. (c) ord(𝛽) = 7. (d) ord(𝛽) = 9. 5.74 For each of the following polynomials, determine a field GF(2m ) where the polynomials can be factored into polynomials in GF(2)[x]. Determine the cyclotomic cosets in each case and the number of binary irreducible polynomials in the factorization. m

(a) x9 + 1 (b) x11 + 1

(c) x13 + 1 (d) x17 + 1

(e) x19 + 1 (f) x29 + 1

5.13 Exercise

239

5.75 Let g(x) be the generator of a cyclic code over GF(q) of length n and let q and n be relatively prime. Show that the vector of all 1s is a codeword if and only if g(1) ≠ 0. 5.76 Let g(x) = x9 + 𝛽 2 x8 + x6 + x5 + 𝛽 2 x2 + 𝛽 2 be the generator for a cyclic code of length 15 over GF(4), where 𝛽 = 𝛽 2 + 1 is primitive over GF(4). (a) (b) (c) (d) (e) (f)

Determine h(x). Determine the dimension of the code. Determine the generator matrix G and the parity check matrix H for this code. Let r(x) = 𝛽x3 + 𝛽 2 x4 . Determine the syndrome for r(x). Draw a circuit for a systematic encoder for this code. Draw a circuit which computes the syndrome for this code.

5.77 [483] Let ⟨f (x), h(x)⟩ be the ideal I ⊂ GF(2)[x]∕(xn − 1) formed by all linear combinations of the form a(x)f (x) + b(x)g(x), where a(x), b(x) ∈ GF(2)[x]∕(xn − 1). By Theorem 4.16, I is principal. Determine the generator for I. ∑ ∑∞ n n 5.78 Let A(z) = ∞ functions and let C(z) = A(z)B(z). n=0 An z and B(z) = n=0 Bn z be generating ∑ nA zn−1 , show that C′ (z) = Using the property of a formal derivative that A′ (z) = ∞ ∏k n=0 n ′ ′ A (z)B(z) + A(z)B (z). By extension, show that if C(z) = i=1 Ai (z), show that [∏ ]′ k k ∑ i=1 Ai (z) [Ai (z)]′ = . ∏k Ai (z) i=1 i=1 Ai (z) 5.79 Show how to make the transition from (5.51) to (5.52). 5.80 Show that: (a) For an LFSR with an irreducible generator g(x) of degree p, the period of the sequence is a factor of 2p − 1. (b) If 2p − 1 is prime, then every irreducible polynomial of degree p produces a maximal-length shift register. Incidentally, if 2n − 1 is a prime number, it is called a Mersenne prime. A few values of n which yield Mersenne primes are: n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89. 5.81 Prove Theorem 5.104. 5.82 Using Exercise 5.32 and Theorem 5.105, show that ∑ 𝜙(n) = 𝜇(d)(n∕d). d|n

5.83 Let q = 2. Use (5.56) to determine I1 , I2 , I3 , I4 , and I5 .

5.13 References The discussion of number theory was drawn from [325] and [472]. Discussions of the computational complexity of the Euclidean algorithm are in [247] and [472]. The Sugiyama algorithm is discussed in [424] and [304]. Fast Euclidean algorithm implementations are discussed in [44]. Continued fractions are discussed in [325]; Padé approximation appears in [304, 309, 472]. A delightful introduction to applications of number theory appears in [399]. Modern algebra can be found in [33, 42, 144, 472, 483]. More advanced treatments can be found in [206, 215], while a thorough treatment of finite fields is

240

Rudiments of Number Theory and Algebra in [268, 269]. Zech logarithms and some of their interesting properties are presented in [399]. Our discussion of hardware implementations is drawn from [271], while the format of the add/multiply tables inside the back cover is due to [351]. Discussion of irreducible polynomials was drawn closely from [33]. The RSA algorithm was presented first in [378]; for introductory discussions, see also [472] and [399]. The algebraic approach using the CRT for fast transforms in multiple dimensions is explored in [356].

Chapter 6

BCH and Reed–Solomon Codes: Designer Cyclic Codes The most commonly used cyclic error-correcting codes may be the BCH and Reed–Solomon (RS) codes. The BCH code is named after Bose, Ray–Chaudhuri, and Hocquenghem (see the references at the end of the chapter), who published work in 1959 and 1960 which revealed a means of designing codes over GF(2) with a specified design distance. Decoding algorithms were then developed by Peterson and others. The RS codes are also named after their inventors, who published them in 1960. It was later realized that RS codes and BCH codes are related and that their decoding algorithms are quite similar. This chapter describes the construction of BCH and RS codes and several decoding algorithms. Decoding of these codes is an extremely rich area. Chapter 7 describes other “modern” decoding algorithms.

6.1 6.1.1

BCH Codes Designing BCH Codes

BCH codes are cyclic codes and hence may be specified by a generator polynomial. A BCH code over GF(q) of length n capable of correcting at least t errors is specified as follows: 1. Determine the smallest m such that GF(qm ) has a primitive nth root of unity 𝛽. 2. Select a nonnegative integer b. Frequently, b = 1. 3. Write down a list of 2t consecutive powers of 𝛽: 𝛽 b , 𝛽 b+1 , . . . , 𝛽 b+2t−1 . Determine the minimal polynomial with respect to GF(q) of each of these powers of 𝛽. (Because of conjugacy, frequently these minimal polynomials are not distinct.) 4. The generator polynomial g(x) is the least common multiple (LCM) of these minimal polynomials. The code is a (n, n − deg(g(x)) cyclic code. Because the generator is constructed using minimal polynomials with respect to GF(q), the generator g(x) has coefficients in GF(q), and the code is over GF(q). Definition 6.1 If b = 1 in the construction procedure, the BCH code is said to be narrow sense. If ◽ n = qm − 1, then the BCH code is said to be primitive. Because not all codes are primitive, in this chapter we distinguish between the 𝛽 used to form the roots of the generator polynomial, and a primitive element 𝛼 in GF(qm ). (See, for example, Example 6.5.) Two fields are involved in the construction of the BCH codes. The “small field” GF(q) is where the generator polynomial has its coefficients and is the field where the elements of the codewords are. The “big field” GF(qm ) is the field where the generator polynomial has its roots. For encoding purposes, it is sufficient to work only with the small field. However, as we shall see, decoding requires operations Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

242

BCH and Reed–Solomon Codes: Designer Cyclic Codes in the big field. (It will be seen that for RS codes both the encode and decode operations take place in the “big” field.) Example 6.2 Let n = 31 = 25 − 1 for a primitive code with m = 5 and let 𝛽 be a root of the primitive polynomial x5 + x2 + 1 in GF(25 ). (That is, 𝛽 is an element with order n.) Let us take b = 1 (narrow-sense code) and construct a single-error- correcting binary BCH code. That is, we have t = 1. The 2t consecutive powers of 𝛽 are 𝛽, 𝛽 2 . The minimal polynomials of 𝛽 and 𝛽 2 with respect to GF(2) are the same (they are conjugates). Let us denote this minimal polynomial by M1 (x). Since 𝛽 is primitive, M1 (x) = x5 + x2 + 1 (see Table 5.4). Then g(x) = M1 (x) = x5 + x2 + 1. Since deg(g) = 5, we have a (31, 26) code. (This is, in fact, a Hamming code.) As a variation on this, consider now a non-narrow-sense code with b = 2. The 2t consecutive powers of 𝛽 are 𝛽2, 𝛽3. The root 𝛽 3 is in a different conjugacy class than the root 𝛽 2 , so another minimal polynomial must be included. The minimal polynomial for 𝛽 2 is M1 (x) (from above) and the minimal polynomial for 𝛽 3 is M3 (x) = x5 + x4 + x3 + x2 + 1. The generator polynomial is g(x) = M1 (x)M3 (x) = (x5 + x2 + 1)(x5 + x4 + x3 + x2 + 1) = x10 + x9 + x8 + x6 + x5 + x3 + 1. The code is a (31, 21) code. In this case, the non-narrow-sense choice of b = 2 has led to a lower rate code. However, the following example indicates that the story is not yet complete. ◽ Example 6.3 As before, let n = 31, but now construct a t = 2-error-correcting code, with b = 1, and let 𝛽 be a primitive element. We form 2t = 4 consecutive powers of 𝛽: 𝛽, 𝛽 2 , 𝛽 3 , 𝛽 4 . Dividing these into conjugacy classes with respect to GF(2), we have {𝛽, 𝛽 2 , 𝛽 4 }, {𝛽 3 }. Using Table 5.4 we find the minimal polynomials M1 (x) = x5 + x2 + 1 M3 (x) = x5 + x4 + x3 + x2 + 1 so that

g(x) = LCM[M1 (x), M3 (x)] = (x5 + x2 + 1)(x5 + x4 + x3 + x2 + 1) = x10 + x9 + x8 + x6 + x5 + x3 + 1.

This gives a (31, 31 − 10) = (31, 21) binary cyclic code. Comparing with the previous example, we see that in the previous case, with b = 2, additional roots were “accidentally” included in the conjugacy class by the minimal polynomial so that the code which was designed in that example is actually a two-error-correcting code. As these examples show, it can happen that the actual correction capability of a BCH code exceeds the design-correction capability. Let us now repeat the design for a t = 2-error-correcting code, but with a non-narrow-sense b = 2. The roots are 𝛽2, 𝛽3, 𝛽4, 𝛽5, which have minimal polynomials:

𝛽 2 , 𝛽 4 ∶ M2 (x) 𝛽 3 ∶ M3 (x) 𝛽 5 ∶ M5 (x).

The conjugacy classes for these three polynomials are (see Table 5.4) {𝛼, 𝛼 2 , 𝛼 4 , 𝛼 8 , 𝛼 16 },

{𝛼 3 , 𝛼 6 , 𝛼 12 , 𝛼 17 , 𝛼 24 },

{𝛼 5 , 𝛼 9 , 𝛼 10 , 𝛼 18 , 𝛼 20 }.

6.1 BCH Codes

243

The exponents of the pooled roots are 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 16, 17, 18, 20, 24. Since there is a run of six consecutive roots, this code actually is capable of correcting three errors. The generator is g(x) = M1 (x)M3 (x)M5 (x) = x15 + x11 + x10 + x9 + x8 + x7 + x5 + x3 + x2 + x + 1. ◽

Example 6.4 As before, let n = 31, but now construct a t = 3-error-correcting code, with b = 1. We form 2t = 6 consecutive powers of 𝛽: 𝛽, 𝛽 2 , 𝛽 3 , 𝛽 4 , 𝛽 5 , 𝛽 6 . Divided into conjugacy classes with respect to GF(2), we have {𝛽, 𝛽 2 , 𝛽 4 },

{𝛽 3 , 𝛽 6 },

{𝛽 5 }.

Denote the minimal polynomials for these sets as M1 (x), M3 (x), and M5 (x), respectively. Using Table 5.4 we find that the minimal polynomials for elements in these classes are M1 (x) = x5 + x2 + 1 M3 (x) = x5 + x4 + x3 + x2 + 1 M5 (x) = x5 + x4 + x2 + x + 1 so the generator is g(x) = LCM[M1 (x), M3 (x), M5 (x)] = M1 (x)M3 (x)M5 (x) = x15 + x11 + x10 + x9 + x8 + x7 + x5 + x3 + x2 + x + 1. This gives a (31, 31 − 15) = (31, 16) binary cyclic code.



Example 6.5 Construct a generator for a quaternary, narrow-sense BCH 2-error- correcting code of length n = 51. For a quaternary code, we have q = 4. The smallest m such that 51|qm − 1 is m = 4, so the arithmetic takes place in the field GF(44 ) = GF(256). Let 𝛼 be primitive in the field GF(256). The element 𝛽 = 𝛼 5 is a 51st root of unity. The subfield GF(4) in GF(256) can be represented (see Example 5.66) using the elements {0, 1, 𝛼 85 , 𝛼 170 }. The 2t consecutive powers of 𝛽 are 𝛽, 𝛽 2 , 𝛽 3 , 𝛽 4 . Partitioned into conjugacy classes with respect to GF(4), these powers of 𝛽 are {𝛽, 𝛽 4 }, {𝛽 2 }, {𝛽 3 }. The conjugacy classes for these powers of 𝛽 are {𝛽, 𝛽 4 , 𝛽 16 , 𝛽 64 }

{𝛽 2 , 𝛽 8 , 𝛽 32 , 𝛽 128 } {𝛽 3 , 𝛽 12 , 𝛽 48 , 𝛽 192 }

with corresponding minimal polynomials p1 (x) = x4 + 𝛼 170 x3 + x2 + x + 𝛼 170 p2 (x) = x4 + 𝛼 85 x3 + x2 + 𝛼 85 p3 (x) = x4 + 𝛼 170 x3 + x2 + 𝛼 170 x + 1. These are in GF(4)[x], as expected. The generator polynomial is g(x) = p1 (x)p2 (x)p3 (x) = x12 + 𝛼 85 x11 + 𝛼 170 x10 + x8 + 𝛼 170 x6 + 𝛼 170 x5 + 𝛼 170 x4 + 𝛼 85 x3 + 𝛼 85 x2 + 𝛼 85 x + 1.



244

BCH and Reed–Solomon Codes: Designer Cyclic Codes 6.1.2

The BCH Bound

The BCH bound is the proof that the constructive procedure described above produces codes with at least the specified minimum distance. We begin by constructing a parity check matrix for the code. Let c(x) = m(x)g(x) be a code polynomial. Then, for j = b, b + 1, . . . , b + 2t − 1, c(𝛽 j ) = m(𝛽 j )g(𝛽 j ) = m(𝛽 j )0 = 0, since these powers of 𝛽 are, by design, the roots of g(x). Writing c(x) = c0 + c1 x + · · · + cn−1 xn−1 , we have c0 + c1 𝛽 j + c2 (𝛽 j )2 + · · · + cn−1 (𝛽 j )n−1 = 0 j = b, b + 1, . . . , b + 2t − 1. The parity check conditions can be expressed in the form

[

1 𝛽 j 𝛽 2j

⎡ c0 ⎤ ] ⎢ c1 ⎥ · · · 𝛽 (n−1)j ⎢ c2 ⎥ = 0 j = b, b + 1, . . . , b + 2t − 1. ⎥ ⎢ ⎢ ⋮ ⎥ ⎣cn−1 ⎦

Let 𝛿 = 2t + 1; 𝛿 is called the design distance of the code. Stacking the row vectors for different values of j, we obtain a parity check matrix H, ⎡1 𝛽 b 𝛽 2b ⎢1 𝛽 b+1 𝛽 2(b+1) ⎢ H = ⎢⋮ ⎢1 𝛽 b+𝛿−3 𝛽 2(b+𝛿−3) ⎢1 𝛽 b+𝛿−2 𝛽 2(b+𝛿−2) ⎣

··· 𝛽 (n−1)b ⎤ ⎥ (n−1)(b+1) ··· 𝛽 ⎥ ⎥. · · · 𝛽 (n−1)(b+𝛿−3) ⎥ · · · 𝛽 (n−1)(b+𝛿−2) ⎥⎦

(6.1)

With this in place, there is one more important result needed from linear algebra to prove the BCH bound. A Vandermonde matrix is a square matrix of the form ⎡1 ⎢ ⎢1 V = V(x0 , x1 , . . . , xn−1 ) = ⎢1 ⎢ ⎢⋮ ⎢1 ⎣

x0 x1 x2 xn−1

x02 · · · x0n−1 ⎤ ⎥ x12 · · · x1n−1 ⎥ x22 · · · x2n−1 ⎥ ⎥ ⎥ n−1 ⎥ 2 xn−1 · · · xn−1 ⎦

or the transpose of such a matrix. Lemma 6.6 Let V be a n × n matrix as in (6.2). Then ∏ (xk − xj ). det(V) = 0≤j 0, then ⌊f −1 (x)⌋ ≤ rB (x) ≤ ⌊g−1 (x)⌋. (7.86) Proof Let L = rB (x). By definition of rB (x), we have B(𝑣, L) ≤ x. Invoking the inequalities associated with these quantities, we have g(L) ≤ B(𝑣, L) ≤ x < B(𝑣, L + 1) ≤ f (L + 1). Thus, L ≤ g−1 (x) and f −1 (x) < L + 1. That is, rB (x) ≤ g−1 (x)

and

f −1 (x) < rB (x) + 1,

or f −1 (x) − 1 < rB (x) ≤ g−1 (x).

340

Alternate Decoding Algorithms for Reed–Solomon Codes Since rB (x) is integer valued, taking ⌊⋅⌋ Throughout, we obtain (7.86).



Using (7.84), we have B(𝑣, L) = f (L) = 𝑣L2 ∕2 + (𝑣 + 2)L∕2. If f (L) = x, then (using the quadratic formula) √ ( ( ) ( ) ) 𝑣+2 2 2x 𝑣+2 −1 + − . f (x) = 2𝑣 𝑣 2𝑣 Using Lemma 7.53, we reach the conclusion: Theorem 7.54

( ( )) ⌊ ( ( ))⌋ −1 n m+1 Lm = rB n m+1 = f 2 2 ⌊√ ⌋ ( )2 ( ) ( ) 𝑣+2 nm(m+1) 𝑣+2 = − 2𝑣 . + 2𝑣 𝑣

(7.87)

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

computeLm.cc computeLm.m

A convenient upper bound is (see Exercise 7.22) ) ( 1 √ Lm < m + n∕𝑣. 2 7.6.5

Algorithms for Computing the Interpolation Step

As observed in the proof of the Interpolation Theorem (Theorem 7.46), the interpolating polynomial can be found by solving the set of ( ) m+1 C=n 2 linear interpolation constraint equations with > C unknowns, represented by (7.72). However, brute force numerical solutions (e.g., Gaussian elimination) would have complexity cubic in the size of the problem. In this section, we develop two other solutions which have, in principle, lower complexity. The interpolation constraint operations in (7.68) act as linear functionals. Definition 7.55 A mapping D ∶ 𝔽 [x, y] → 𝔽 is said to be a linear functional if for any polynomials Q(x, y) and P(x, y) ∈ 𝔽 [x, y] and any constants u, 𝑣 ∈ 𝔽 , D(uQ(x, y) + 𝑣P(x, y)) = uDQ(x, y) + 𝑣DP(x, y).



The operation Q(x, y) → Dr,s Q(𝛼, 𝛽) is an instance of a linear functional. We will recast the interpolation problem and solve it as a more general problem involving linear functionals: Find Q(x, y) satisfying a set of constraints of the form Di Q(x, y) = 0,

i = 1, 2, . . . , C,

where each Di is a linear functional. For our problem, each Di corresponds to some Dr,s , according to a particular order relating i to (r, s). (But other linear functionals could also be used, making this a more general interpolation algorithm.) ∑ Let us write Q(x, y) = Jj=0 aj 𝜙j (x, y), where the 𝜙j (x, y) are ordered with respect to some monomial order, and where aC ≠ 0. The upper limit J is bounded by J ≤ C, where C is given by (7.70). The operation of any linear functional Di on Q is Di Q =

J ∑ j=0

with coefficients di,j = Di 𝜙j (x, y).

J △∑

aj Di 𝜙j (x, y) =

j=0

aj di,j ,

(7.88)

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding 7.6.5.1

341

Finding Linearly Dependent Columns: The Feng–Tzeng Algorithm

The set of functionals D1 , D2 , . . . , DC can be represented as the columns of a matrix , ⎡ d1,0 d1,1 · · · d1,J ⎤ ⎢d ⎥ ] 2,0 d2,1 · · · d2,J ⎥ △ [ (0) (1) =⎢ = 𝐝 𝐝 · · · 𝐝(J) ∈ 𝔽 C×(J+1) . ⎢ ⋮ ⎥ ⎢ ⎥ ⎣dC,0 dC,1 · · · dC,J ⎦

(7.89)

Example 7.56 Over the field GF(5), we desire to create a polynomial of minimal (1, 3)-revlex rank through the following points, with the indicated multiplicities: Point Multiplicity (1, 1) (2, 3) (4, 2) ( ) There are 1 + in (7.66))

3 2

1 2 2

( ) +

3 2

= 1 + 3 + 3 = 7 constraints. These constraints are (using the notation Qr,s introduced Q0,0 (1, 1) = 0 Q0,0 (2, 3) = 0 Q0,1 (2, 3) = 0 Q1,0 (2, 3) = 0 Q0,0 (4, 2) = 0 Q0,1 (4, 2) = 0 Q1,0 (4, 2) = 0.

With seven constraints, some linear combination of the first eight monomials listed in Table 7.1 suffices. These monomials are 1, x, x2 , x3 , y, x4 , xy, and x5 . The polynomial we seek is Q(x, y) = a0 + a1 x + a2 x2 + a3 x3 + a4 y + a5 x4 + a6 xy + a7 x5 . This polynomial should satisfy the constraints (using (7.67)) D0,0 1 = 1 D0,0 x = 0

D0,0 y = 1

···

D0,0 x5 = 0

D1,0 1 = 0 D1,0 x = 1

D1,0 y = 0

···

D1,0 x5 = 5x4 = 0

D0,1 1 = 0 D0,1 y = 1

D0,1 y = 1

···

D0,1 x5 = 0.

Now form a matrix  whose columns correspond to the eight monomials and whose rows correspond to the seven constraints.

(7.90)



The condition that all the constraints are simultaneously satisfied, Di Q(x, y) = 0,

i = 1, 2, . . . , C

342

Alternate Decoding Algorithms for Reed–Solomon Codes can be expressed using the matrix  as ⎡a0 ⎤ ⎢a ⎥ 1  ⎢ ⎥ = 0 or ⎢⋮⎥ ⎢ ⎥ ⎣aJ ⎦

𝐝(0) a0 + 𝐝(1) a1 + · · · + 𝐝(J) aJ = 0,

so the columns of  are linearly dependent. The decoding problem can be expressed as follows: Interpolation problem 1: Determine the smallest J such that the first J columns of  are linearly dependent.

This is a problem in computational linear algebra, which may be solved by an extension of the BM algorithm known as the Feng–Tzeng algorithm. Recall that the BM algorithm determines the shortest linear-feedback shift register (LFSR) which annihilates a sequence of scalars. The Feng–Tzeng algorithm produces the shortest “LFSR” which annihilates a sequence of vectors. We will express the problem solved by the Feng–Tzeng algorithm this way. Let ⎡ a11 a12 · · · a1N ⎤ ⎢a a · · · a ⎥ [ ] 21 22 2N ⎥ A=⎢ = 𝐚1 𝐚2 · · · 𝐚N . ⎥ ⎢ ⋮ ⎥ ⎢ ⎣aM1 aM2 · · · aMN ⎦ The first l + 1 columns of A are linearly dependent if there exist coefficients c1 , c2 , . . . , cl , not all zero, such that (7.91) 𝐚l+1 + c1 𝐚l + · · · + cl 𝐚1 = 0. The problem is to determine the minimum l and the coefficients c1 , c2 , . . . , cl such that the linear dependency (7.91) holds. Let C(x) = c0 + c1 x + · · · + cl xl , where c0 = 1, denote the set of coefficients in the linear combination. Let 𝐚(x) = 𝐚0 + 𝐚1 x + · · · + 𝐚N xN be a representation of the matrix A, with 𝐚0 = 𝟏 (the vector of all ones) and let a(i) (x) = ai,0 + ai,1 x + · · · + ai,N xN be the ith row of 𝐚(x). We will interpret C(x)𝐚(x) element by element; that is, ⎡ C(x)a(1) (x) ⎤ ⎢ C(x)a(2) (x) ⎥ ⎥. C(x)𝐚(x) = ⎢ ⎥ ⎢ ⋮ ⎥ ⎢ (M) ⎣C(x)a (x)⎦ For n = l + 1, l + 2, . . . , N, let [C(x)𝐚(x)]n denote the coefficient (vector) of xn in C(x)𝐚(x). That is, [C(x)𝐚(x)]n = c0 𝐚n + c1 𝐚n−1 + · · · + cl 𝐚n−l =

l ∑

cj 𝐚n−j .

j=0

The problem to be solved can be stated as follows: Determine the minimum l and a polynomial C(x) with deg C(x) ≤ l such that [C(x)𝐚(x)]l+1 = 0. The general flavor of the algorithm is like that of the BM algorithm, with polynomials being updated if they result in a discrepancy. The algorithm proceeds element-by-element through the matrix down the columns of the matrix. At each element, the discrepancy is computed. If the discrepancy is nonzero, previous columns on this row are examined to see if they had a nonzero discrepancy. If so, then the polynomial is updated using the previous polynomial that had the discrepancy. If there is no previous nonzero discrepancy on that row, then the discrepancy at that location is saved and the column is considered “blocked” — that is, no further work is done on that column and the algorithm moves to the next column.

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding (i−1,j)

(i−1,j)

(i−1,j)

343

(i−1,j)

Let C(i−1,j) (x) = c0 + c1 x + · · · + cj−1 xj−1 , with c0 = 1, be defined for each column j, where j = 1, 2, . . . , l + 1, and for i = 1, 2, . . . , M, where each polynomial C(i−1,j) has the property that (i−1,j)

[C(i−1,j) (x)a(h) (x)]j = ah,j + c1

(i−1,j)

ah,j−1 + · · · + cj−1 ah,1 = 0 for h ≤ i − 1.

That is, in column j at position (i − 1, j) of the matrix and all previous rows there is no discrepancy. The initial polynomial for the first column is defined as C(0,1) (x) = 1. The discrepancy at position (i, j) is computed as (i−1,j) (i−1,j) ai,j−1 + · · · + cj−1 ai,1 . di,j = [C(i−1,j) (x)a(i) (x)]j = ai,j + c1 If di,j = 0, then no update to the polynomial is necessary and C(i,j) (x) = C(i−1,j) (x). If, on the other hand, di,j ≠ 0, then an update is necessary. If there is on row i a previous column u that had a nonzero discrepancy (that was not able to be resolved by updating the polynomial), then the polynomial is updated according to di,j j−u (u) x C (x), (7.92) C(i,j) (x) = C(i−1,j) (x) − di,u where u is the column where the previous nonzero discrepancy occurred and C(u) (x) is the polynomial which had the nonzero discrepancy in column u. If there is a nonzero discrepancy di,j , but there is no previous nonzero discrepancy on that row, then that discrepancy is saved, the row at which the discrepancy occurred is saved, 𝜌(j) = i, and the polynomial is saved, Cj (x) = C(i−1,j) (x). The column is considered “blocked,” and processing continues on the next column with C(0,j+1) (x) = C(i−1,j) (x). The following lemma indicates that the update (7.92) zeros the discrepancy. Lemma 7.57 If di,j ≠ 0 and there is a previous polynomial C(u) (x) at column u, so that C(u) (x) = C(i−1,u) (x) and di,u ≠ 0, then the update (7.92) is such that [C(i,j) (x)a(r) (x)]j = 0 for r = 1, 2, . . . , i. Proof We have [C(i,j) (x)a(r) (x)]j = [C(i−1,j) (x), a(r) (x)]j − = [C(i−1,j) (x), a(r) (x)]j − ⎧ di,j ⎪0 − di,u 0 =⎨ di,j ⎪di,j − di,u di,u ⎩

di,j di,u di,j di,u

[C(u) (x)a(r) (x)xj−u ]j [C(u) (x)a(r) (x)]u

for r = 0, 1, . . . , i − 1 for r = i,

where the second equality follows from a result in Exercise 7.18.

Algorithm 7.5 The Feng–Tzeng Algorithm This representation is due to McEliece [302] 1 Input: A matrix A of size M × N 2 Output: A polynomial C of minimal degree annihilating columns of A 3 Initialize: s = 0 (column counter) 4 C = 1 (C holds the current polynomial)



344

Alternate Decoding Algorithms for Reed–Solomon Codes

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

26 27 28

dsave = zeros(1,N) (holds nonzero discrepancies) 𝜌 = zeros(1,N) (holds row where nonzero discrepancy is) Begin: while(1) (loop over columns) s = s + 1; r = 0; (move to beginning of next column) columnblocked = 0; while(1) (loop over rows in this column) r = r + 1; (move to next row) drs = [C(x)a(r) (x)]s (compute discrepancy here using current poly.) if(drs ≠ 0) (if nonzero discrepancy) if(there is a u such that 𝜌(u) = r) (if a previous nonzero disc. on this row) drs C(x) = C(x) − dsa𝑣e(u) Cu (x)xs−u ; (update polynomial) else (no previous nonzero discrepancy on this row) 𝜌(s) = r; (save row location of nonzero discrepancy) Cs (x) = C(x); (save polynomial for this column) dsave(s) = drs ; (save nonzero discrepancy for this column) columnblocked = 1; (do no more work on this column) end (else) end (if drs ) if(r ≥ M or columnblocked=1) break; end; (end of loop over row) end (while (1)) if(columnblocked=0) break; end; (end loop over columns) end (while(1)) End

α10 α11 α12 α13 α14 α15 α16

testft.m fengtzeng.m invmodp.m

It can be shown that the polynomial C(x) produced by this algorithm results in the minimal number of first columns of A which are linearly dependent [121]. Example 7.58 We apply the algorithm to the matrix  of (7.90) in Example 7.56. The following matrix outlines the steps of the algorithm. 0 0 0 0 0 ⎤ ⎡ 1a 1 b 0 ⎥ ⎢ 2 0 2 0 1 0 0 1 c ⎥ ⎢ d h m ⎥ ⎢ 0 0 0 0 1i 0 3n 0 ⎥ ⎢ 1e 2f 0 1j 2o 0 ⎥ ⎢ 0 0 ⎥ ⎢ 0 0 0 2 0 4 1 1 g ⎢ k p r ⎥ ⎥ ⎢ 0 0 0 0 0 0 2q 0 ⎥ ⎢ ⎣ 0 0 0 0 0 2l 0 2s ⎦ The initially nonzero discrepancies are shown in this matrix; those which are in squares resulted in polynomials updated by (7.92) and the discrepancy was zeroed. a: Starting with C(x) = 1, a nonzero discrepancy is found. There are no previous nonzero discrepancies on this row, so C(1) (x) = 1 is saved and we jump to the next column. b: Nonzero discrepancy found, but the polynomial was updated using previous nonzero discrepancy on this row (at a): C(x) = 4x + 1. c: Nonzero discrepancy found and no update is possible. Save C(2) (x) = 4x + 1 and the discrepancy and jump to next column. d: Nonzero discrepancy found, but updated polynomial computed (using polynomial at c) C(x) = 2x2 + 2x + 1. e: Nonzero discrepancy found; no update possible. Save C(3) (x) = 2x2 + 2x + 1 and the discrepancy and jump to next column. f: Update polynomial: C(x) = x3 + 3x2 + 1 g: Save C(4) (x) = x3 + 3x2 + 1 h: Update polynomial: C(x) = 2x4 + 4x3 + 3x2 + 1

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

345

i: Save C(5) (x) = 2x4 + 4x3 + 3x2 + 1 j: Update polynomial: C(x) = 3x5 + 3x3 + 3x2 + 1 k: Update polynomial: C(x) = x5 + 4x4 + 3x3 + 1x2 + 1 l: Save C(6) (x) = x5 + 4x4 + 3x3 + 1x2 + 1 m: Update polynomial C(x) = x6 + 4x4 + 3x3 + x2 + 1 n: Update polynomial C(x) = 3x5 + 3x3 + 3x2 + 1 o: Update polynomial C(x) = x6 + 4x5 + 3x4 + 3x3 + 3x2 + 1 p: Update polynomial C(x) = 3x6 + 3x4 + 3x2 + 1 q: Save C(7) (x) = 3x6 + 3x4 + 3x2 + 1 r: Update polynomial C(x) = 2x7 + 4x6 + 3x2 + 1 s: Update polynomial C(x) = x7 + 2x5 + 4x4 + 2x2 + 1 Returning to the interpolation problem of Example 7.56, we obtain from the coefficients of C(x) the coefficients of the polynomial Q(x, y) = 1 + 0x + 2x2 + 4x3 + 0y + 2x4 + 0xy + x5 . It can be easily verified that this polynomial satisfies the interpolation and multiplicity constraints specified in Example 7.56. ◽

The computational complexity goes as the cube of the size of the matrix. One view of this algorithm is that it is simply a restatement of conventional Gauss–Jordan reduction and has similar computational complexity. 7.6.5.2

Finding the Intersection of Kernels: The Kötter Algorithm

Let 𝔽L [x, y] ⊂ 𝔽 [x, y] denote the set of polynomials whose y-degree is ≤ L. (The variable y is distinguished here because eventually we will be looking for y-roots of Q(x, y).) Then any Q(x, y) ∈ 𝔽L [x, y] can be written in the form L ∑ Q(x, y) = qk (x)yk k=0

for polynomials qk (x) ∈ 𝔽 [x]. 𝔽L [x, y] is an 𝔽 [x]-module (see Section 7.4.1): for any polynomials a(x) and b(x) in 𝔽 [x] and polynomials Q(x, y) and P(x, y) in 𝔽L [x, y], (a(x)P(x, y) + b(x)Q(x, y)) ∈ 𝔽L [x, y] since the y-degree of the linear combination does not change. For a linear functional D, we will generically write KD as the kernel of D: KD = ker D = {Q(x, y) ∈ 𝔽 [x, y] ∶ DQ(x, y) = 0}. For a set of linear functionals D1 , D2 , . . . , DC defined on 𝔽L [x, y], let K1 , K2 , . . . , KC be their corresponding kernels, so that Ki = ker Di = {Q(x, y) ∈ 𝔽 [x, y] ∶ Di Q(x, y) = 0}. Then a solution to the problem Di Q(x, y) = 0 for all i = 1, 2, . . . , C

(7.93)

lies in the intersection of the kernels K = K1 ∩ K2 ∩ · · · ∩ KC . We see that the interpolation problem can be expressed as follows: Interpolation Problem 2: Determine the polynomial of minimal rank in K.

346

Alternate Decoding Algorithms for Reed–Solomon Codes To find the intersection constructively, we will employ cumulative kernels, defined as follows: K 0 = 𝔽L [x, y] and K i = K i−1 ∩ Ki = K1 ∩ · · · ∩ Ki . That is, K i is the space of solutions of the first i problems in (7.93). The solution of the interpolation is a polynomial of minimum (1, 𝑣)-degree in K C . We will partition the polynomials in 𝔽L [x, y] according to the exponent of y. Let Sj = {Q(x, y) ∈ 𝔽L [x, y] ∶ LM(Q) = xi yj for some i} be the set of polynomials whose leading monomial has y-degree j. Let gi,j be the minimal element of K i ∩ Sj , where here and throughout the development “minimal” or “min” means minimal rank, with respect to a given monomial order. Then {gC,j }Lj=0 is a set of polynomials that satisfy all of the constraints (7.93). The Kötter algorithm generates a sequence of sets of polynomials (G0 , G1 , . . . , GC ), where Gi = (gi,0 , gi,1 , . . . , gi,L ), and where gi,j is a minimal element of K i ∩ Sj . (That is, it satisfies the first i constraints and the y-degree is j.) Then the output of the algorithm is the element of GC of minimal order in the set GC which has polynomials satisfying all C constraints: Q(x, y) = min gC,j (x, y). 0≤j≤L

This satisfies all the constraints (since it is in K C ) and is of minimal order. We introduce a linear functional and some important properties associated with it. For a given linear functional D, define the mapping [⋅, ⋅]D ∶ 𝔽 [x, y] × 𝔽 [x, y] → 𝔽 [x, y] by [P(x, y), Q(x, y)]D = (DQ(x, y))P(x, y) − (DP(x, y))Q(x, y). Lemma 7.59 For all P(x, y), Q(x, y) ∈ 𝔽 [x, y], [P(x, y), Q(x, y)]D ∈ ker D. Furthermore, if P(x, y) > Q(x, y) (with respect to some fixed monomial order) and Q(x, y) ∉ KD , then rank [P(x, y), Q(x, y)]D = rank P(x, y). Proof We will prove the latter statement of the lemma (the first part is in Exercise 7.19). For notational convenience, let a = DQ(x, y) and b = DP(x, y). (Recall that a, b ∈ 𝔽 .) If a ≠ 0 and P(x, y) > Q(x, y), then [P(x, y), Q(x, y)]D = aP(x, y) − bQ(x, y), which does not change the leading monomial, so that ◽ LM[P(x, y), Q(x, y)]D = LMP(x, y) and furthermore, rank [P(x, y), Q(x, y)]D = rank P(x, y). The algorithm is initialized with G0 = (g0,0 , g0,1 , . . . , g0,L ) = (1, y, y2 , . . . , yL ). To form Gi+1 given the set Gi , we form the set Ji = {j ∶ Di+1 (gi,j ) ≠ 0} as the set of polynomials in Gi which do not satisfy the i + 1st constraint. If J is not empty, then an update is necessary (i.e., there is a discrepancy). In this case, let j∗ index the polynomial of minimal rank, j∗ = arg min gi,j j∈Ji

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding and let f denote the polynomial gi,j∗ :

f = min gi,j . j∈Ji

347

(7.94)

The update rule is as follows: ⎧gi,j ⎪ = ⎨[gi,j , f ]Di+1 ⎪[xf , f ] Di+1 ⎩

gi+1,j

if j ∉ Ji if j ∈ Ji but j ≠ j∗ if j = j∗ .

(7.95)

The key theorem governing this algorithm is the following. Theorem 7.60 For i = 0, . . . , C, gi,j = min{g ∶ g ∈ K i ∩ Sj } for j = 0, 1, . . . , L.

(7.96)

Proof The proof is by induction on i. The result is trivial when i = 0. It is to be shown that gi+1,j = min{g ∶ g ∈ K i+1 ∩ Sj } for j = 0, 1, . . . , L is true, given that (7.96) is true. We consider separately the three cases in (7.95). Case 1: j ∉ Ji , so that gi+1,j = gi,j . The constraint Di+1 gi,j = 0 is satisfied (since j ∉ Ji ), so gi+1,j = gi,j ∈ Ki+1 . By the inductive hypothesis, gi+1,j ∈ K i ∩ Sj . Combining these, we have gi+1,j ∈ K i+1 ∩ Sj . Since gi,j is minimal in K i ∩ Sj , it must also be minimal in the set K i+1 ∩ Sj , since the latter is contained in the former. Case 2: In this case, gi+1,j = [gi,j , f ]Di+1 = (Di+1 gi,j )f − (Di+1 f )gi,j , which is a linear combination of f and gi,j . Since both f and gi,j are in K i , gi+1,j is also in K i . By Lemma 7.59, gi+1,j ∈ Ki+1 . Combining these inclusions, gi+1,j ∈ K i ∩ Ki+1 = K i+1 . By (7.94), rank gi,j > rank f , so that by Lemma 7.59, rank gi+1,j = rank gi,j . Since gi,j ∈ Sj , it follows that gi+1,j ∈ Sj also. And, since gi+1,j has the same rank as gi,j , which is minimal in K i ∩ Sj , it must also be minimal in the smaller set K i+1 ∩ Sj . Case 3: In this case, the update is gi+1,j = [xf , f ]Di+1 = (Di+1 xf )f − (Di+1 f )xf , which is a linear combination of f and xf . But f ∈ K i by the induction hypothesis. We must show that xf ∈ K i . Let f̃ (x, y) = xf (x, y). In Exercise 7.21, it is shown that, in terms of the Hasse partial derivatives of f and f̃ , f̃r,s (x, y) = fr−1,s (x, y) + xfr,s (x, y).

(7.97)

If fr−1,s (xj , yj ) = 0 and fr,s (xj , yj ) = 0 for j = 0, 1, . . . , i, then f̃r,s (xj , yj ) = 0, so xf ∈ K i . This will hold provided that the sequence of linear functionals (D0 , D1 , . . . , DC ) are ordered such that Dr−1,s always precedes Dr,s .

348

Alternate Decoding Algorithms for Reed–Solomon Codes

Assuming this to be the case, we conclude that f ∈ K i and xf ∈ K i , so that gi+1,j ∈ K i . By Lemma 7.59, gi+1,j ∈ Ki+1 , so gi+1,j ∈ K i ∩ Ki+1 = K i+1 . Since f ∈ Sj (by the induction hypothesis), then xf ∈ Sj (since multiplication by x does not change the y-degree). By Lemma 7.59, rank gi+1,j = rank xf , which means that gi+1,j ∈ Sj also. Showing that gi+1,j is minimal is by contradiction. Suppose there exists a polynomial h ∈ K i+1 ∩ Sj such that h < gi+1,j . Since h ∈ K i ∩ Sj , we must have f ≤ h and rank gi+1,j = rank xf . There can be no polynomial f ′ ∈ Sj with rank f < rank f ′ < rank xf , so that it follows that LM(h) = LM(f ). By suitable normalization, the leading coefficients of h and f can be equated. Now let f̃ = h − f . By linearity, f̃ ∈ K i . By cancellation of the leading terms, f̃ < f . Now Di+1 h = 0, since h ∈ K i+1 , but Di+1 f ≠ 0, since j ∈ Ji in (7.94). Thus, we have a polynomial f̃ such that f̃ ∈ K i \K i+1 and f̃ < f . But f was supposed to be the minimal element of K i \K i+1 , by its selection in (7.94). This leads to a ◽ contradiction: gi+1,j must be minimal. Let us return to the ordering condition raised in Case 3. We must have an order in which (r − 1, s) precedes (r, s). This is accomplished when the (r, s) data are ordered according to (m − 1, 1) lex order: (0, 0), (0, 1), . . . , (0, m − 1), (1, 0), (1, 1), . . . , (1, m − 2), . . . , (m − 1, 0). At the end of the algorithm, we select the minimal element out of GC as Q0 (x, y). Kötter’s algorithm for polynomial interpolation is shown in Algorithm 7.6. This algorithm is slightly more general than just described: the point (xi , yi ) is interpolated up to order mi , where mi can vary with i, rather than having a fixed order m at each point. There is one more explanation necessary, regarding line 19. In line 19, the update is computed as gi,j = Δ xf − (Di xf )f = (Dr,s f (xi , yi ) − Dr,s xf (x, y)|x=xi ,y=yi )f (x, y) = (xDr,s f (xi , yi ) − (Dr−1,s f (xi , yi ) + xi Dr,s f (xi , yi )))f (x, y) (using (7.97)) = (xDr,s f (xi , yi ) − xi Dr,s f (xi , yi ))f (x, y) (since Dr−1,s f (xi , yi ) = 0) = Dr,s f (xi , yi )(x − xi )f (x, y).

Algorithm 7.6 Kötter’s Interpolation for Guruswami–Sudan Decoder 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Input: Points: (xi , yi ), i = 1, … , n; Interpolation order mi ; a (1, 𝑣) monomial order; L = Lm Returns: Q0 (x, y) satisfying the interpolationproblem. Initialize: gj = yj for j = 0 to L. for i = 1 to n (go from i − 1st stage to ith stage) C = (mi + 1)mi ∕2 (compute number of derivatives involved) for (r, s) = (0, 0) to (mi − 1, 0) by (mi − 1, 1) lex order (from 0 to C) for j = 0 to L Δj = Dr,s gj (xi , yi ) (compute‘‘discrepancy’’) end (for j) J = {j ∶ Δj ≠ 0} (set of nonzero discrepancies) if(J ≠ ∅) j∗ = arg min{gj ∶ j ∈ J} (polynomial of least weighted degree) f = gj∗ Δ = Δj∗ for(j ∈ J) if(j ≠ j∗ )

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

17 18 19 20 21 22 23 24 25

349

gj = Δgj − Δj f (update without change in rank) else if(j = j∗ ) gj = (x − xi )f (update with change in rank) end (if) end (for j) end (if J) end (for (r, s)) end (for i) Q0 (x, y) = minj {gj (x, y)} (least weighted degree)

α α2 α3 α4 α5 α6 α7 α8 α9 +

Example 7.61 The points (1, 𝛼 3 ), (𝛼, 𝛼 4 ), (𝛼 2 , 𝛼 5 ), (𝛼 3 , 𝛼 7 ), and (𝛼 4 , 𝛼 8 ) are to be interpolated by a polynomial Q(x, y) using the Kötter algorithm, where operations are over GF(24 ) with 1 + 𝛼 + 𝛼 4 = 0. Use multiplicity m = 1 interpolation at each point and the (1, 2)-revlex order. At each point, there is one constraint, since m = 1. Initial: G0 : g0 (x, y) = 1, g1 (x, y) = y, g2 (x, y) = y2 , g3 (x, y) = y3 , g4 (x, y) = y4 . i = 0: (r, s) = (0, 0), (xi , yi ) = (1, 𝛼 3 ). Discrepancies: Δ0 = g0 (1, 𝛼 3 ) = 1, Δ1 = g1 (1, 𝛼 3 ) = 𝛼 3 , Δ2 = g2 (1, 𝛼 3 ) = 𝛼 6 , Δ3 = g3 (1, 𝛼 3 ) = 𝛼 9 , Δ4 = g4 (1, 𝛼 3 ) = 𝛼 12 . J = {0, 1, 2, 3, 4}, j∗ = 0, f = 1, Δ = 1. G1 : g0 = 1 + x, g1 = 𝛼 3 + y, g2 = 𝛼 6 + y2 , g3 = 𝛼 9 + y3 , g4 = 𝛼 12 + y4 . i = 1: (r, s) = (0, 0), (xi , yi ) = (𝛼, 𝛼 4 ). Discrepancies: Δ0 = 𝛼 4 , Δ1 = 𝛼 7 , Δ2 = 𝛼 14 , Δ3 = 𝛼 8 , Δ4 = 𝛼 13 . J = {0, 1, 2, 3, 4}, j∗ = 0, f = 1 + x, Δ = 𝛼4 . g0 = 𝛼 + 𝛼 4 x + x2 , g1 = 𝛼 7 x + 𝛼 4 y, g2 = (𝛼 11 + 𝛼 14 x) + 𝛼 4 y2 , g3 = (𝛼 3 + 𝛼 8 x) + 𝛼 4 y3 , g4 = (𝛼 12 + G2 : 13 4 4 𝛼 x) + 𝛼 y . i = 2: (r, s) = (0, 0), (xi , yi ) = (𝛼 2 , 𝛼 5 ). Discrepancies: Δ0 = 𝛼 13 , Δ1 = 0, Δ2 = 𝛼 8 , Δ3 = 𝛼 6 , Δ4 = 𝛼 2 . J = {0, 2, 3, 4}, j∗ = 0, f = 𝛼 + 𝛼 4 x + x2 , Δ = 𝛼 13 . G3 : g0 = 𝛼 3 + 𝛼 11 x + 𝛼 10 x2 + x3 , g1 = 𝛼 7 x + 𝛼 4 y, g2 = 𝛼 8 x2 + 𝛼 2 y2 , g3 = (𝛼 14 + 𝛼 7 x + 𝛼 6 x2 ) + 𝛼 2 y3 , g4 = (𝛼 12 + 𝛼x + 𝛼 2 x2 ) + 𝛼 2 y4 . i = 3: (r, s) = (0, 0), (xi , yi ) = (𝛼 3 , 𝛼 7 ). Discrepancies: Δ0 = 𝛼 14 , Δ1 = 𝛼 14 , Δ2 = 𝛼 7 , Δ3 = 𝛼 2 , Δ4 = 𝛼 3 . J = {0, 1, 2, 3, 4}, j∗ = 1, f = 𝛼 7 x + 𝛼 4 y, Δ = 𝛼 14 . Δ = 𝛼 14 . G4 : g0 = (𝛼 2 + 𝛼 7 x + 𝛼 9 x2 + 𝛼 14 x3 ) + 𝛼 3 y, g1 = (𝛼 10 x + 𝛼 7 x2 ) + (𝛼 7 + 𝛼 4 x)y, g2 = (𝛼 14 x + 𝛼 7 x2 ) + 𝛼 11 y + 𝛼y2 , g3 = (𝛼 13 + 𝛼 5 x + 𝛼 5 x2 ) + 𝛼 6 y + 𝛼y3 , g4 = (𝛼 11 + 𝛼 5 x + 𝛼x2 ) + 𝛼 7 y + 𝛼y4 .

01010

00000

α10 α11 α12 α13 α14 α15 α16

testGS1.cc kotter.cc

350

Alternate Decoding Algorithms for Reed–Solomon Codes i = 4: (r, s) = (0, 0), (xi , yi ) = (𝛼 4 , 𝛼 8 ). Discrepancies: Δ0 = 𝛼 11 , Δ1 = 𝛼 7 , Δ2 = 𝛼 11 , Δ3 = 𝛼 2 , Δ4 = 𝛼 10 . J = {0, 1, 2, 3, 4}, j∗ = 0, f = (𝛼 2 + 𝛼 7 x + 𝛼 9 x2 + 𝛼 14 x3 ) + (𝛼 3 )y, Δ = 𝛼 11 . G5 : g0 = (𝛼 6 + 𝛼 9 x + 𝛼 5 x2 + 𝛼x3 + 𝛼 14 x4 ) + (𝛼 7 + 𝛼 3 x)y, g1 = (𝛼 9 + 𝛼 8 x + 𝛼 9 x2 + 𝛼 6 x3 ) + (𝛼 12 + x)y, g2 = (𝛼 13 + 𝛼 12 x + 𝛼 11 x2 + 𝛼 10 x3 ) + 𝛼y + 𝛼 12 y2 , g3 = (𝛼 14 + 𝛼 3 x + 𝛼 6 x2 + 𝛼x3 ) + 𝛼y + 𝛼 12 y3 , g4 = (𝛼 2 + 𝛼 5 x + 𝛼 6 x2 + 𝛼 9 x3 ) + 𝛼 8 y + 𝛼 12 y4 .

α α2 α3 α4 α5 α6 α7 α8 α9

Final: Q0 (x, y) = g1 (x, y) = (𝛼 9 + 𝛼 8 x + 𝛼 9 x2 + 𝛼 6 x3 ) + (𝛼 12 + x)y.

+

01010

00000

α10 α11 α12 α13 α14 α15 α16



Example 7.62

testgs3.cc

Let  be a (15, 7) code over GF(24 ) and let m(x) = 𝛼 + 𝛼 2 x + 𝛼 3 x2 + 𝛼 4 x3 + 𝛼 5 x4 + 𝛼 6 x5 + 𝛼 7 x6 be encoded over the support set (1, 𝛼, . . . , 𝛼 14 ). That is, the codeword is (m(1), m(𝛼), m(𝛼 2 ), . . . , m(𝛼 14 )). The corresponding code polynomial is c(x) = 𝛼 6 + 𝛼 11 x + x2 + 𝛼 6 x3 + 𝛼x4 + 𝛼 14 x5 + 𝛼 8 x6 + 𝛼 11 x7 + 𝛼 8 x8 + 𝛼x9 + 𝛼 12 x10 + 𝛼 12 x11 + 𝛼 14 x12 + x13 + 𝛼x14 . Suppose the received polynomial is r(x) = 𝛼 6 + 𝛼 9 x + x2 + 𝛼 2 x3 + 𝛼x4 + 𝛼 9 x5 + 𝛼 8 x6 + 𝛼 3 x7 + 𝛼 8 x8 + 𝛼x9 + 𝛼 12 x10 + 𝛼 12 x11 + 𝛼 14 x12 + x13 + 𝛼x14 . That is, the error polynomial is e(x) = 𝛼 2 x + 𝛼 3 x3 + 𝛼 4 x5 + 𝛼 5 x7 . For this code and interpolation multiplicity, let tm = 4 Lm = 3. The first decoding step is to determine an interpolating polynomial for the points (1, 𝛼 6 ), (𝛼, 𝛼 9 ), (𝛼 2 , 1), (𝛼 3 , 𝛼 2 ), (𝛼 4 , 𝛼), (𝛼 5 , 𝛼 9 ), (𝛼 6 , 𝛼 8 ), (𝛼 7 , 𝛼 3 ), (𝛼 8 , 𝛼 8 ), (𝛼 9 , 𝛼), (𝛼 10 , 𝛼 12 ), (𝛼 11 , 𝛼 12 ), (𝛼 12 , 𝛼 14 ), (𝛼 13 , 1), (𝛼 14 , 𝛼) with interpolation multiplicity m = 2 at each point using the (1, 6)-revlex degree.

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

351

Using testGS3, the final set of interpolating polynomials GC = {g0 , g1 , g2 , g3 } can be shown to be g0 (x, y) = (𝛼 13 + 𝛼x + 𝛼 4 x2 + 𝛼 6 x3 + 𝛼 13 x4 + 𝛼 11 x5 + 𝛼 10 x6 + 𝛼 8 x8 + 𝛼 8 x9 + 𝛼 5 x10 + 𝛼 9 x11 + 𝛼 3 x12 + 𝛼 6 x13 + 𝛼 3 x14 + 𝛼 2 x15 + 𝛼 10 x16 + 𝛼 7 x17 + 𝛼 11 x18 + 𝛼 8 x19 + 𝛼 3 x20 + 𝛼 14 x21 + 𝛼 14 x22 ) + (𝛼 6 + 𝛼 5 x + 𝛼 3 x2 + 𝛼 2 x3 + 𝛼 13 x4 + 𝛼 7 x5 + 𝛼 7 x6 + x7 + 𝛼 7 x8 + 𝛼 11 x9 + 𝛼 14 x10 + 𝛼 12 x11 + 𝛼 5 x12 + 𝛼 8 x13 + 𝛼 13 x14 + 𝛼 9 x15 )y + (𝛼 13 + 𝛼 9 x + 𝛼 11 x2 + 𝛼 7 x3 + x4 + 𝛼 10 x5 + 𝛼 11 x6 + 𝛼 5 x7 + 𝛼 5 x8 + 𝛼 7 x9 )y2 + (1 + 𝛼 2 x + 𝛼 2 x2 + 𝛼 4 x3 )y3 g1 (x, y) = (𝛼 13 + 𝛼 13 x + 𝛼 14 x2 + x3 + 𝛼 3 x4 + 𝛼 6 x5 + 𝛼 12 x6 + 𝛼 14 x7 + 𝛼 8 x8 + 𝛼 6 x9 + 𝛼 8 x10 + 𝛼 7 x11 + 𝛼 13 x12 + 𝛼x13 + 𝛼 11 x14 + 𝛼 10 x15 + x16 + 𝛼 9 x17 + 𝛼 14 x18 + 𝛼 3 x19 + 𝛼 3 x21 ) + (𝛼 2 + 𝛼 9 x + 𝛼 11 x2 + 𝛼x3 + 𝛼 11 x4 + 𝛼 10 x5 + 𝛼 5 x6 + 𝛼x7 + 𝛼 6 x8 + 𝛼 10 x10 + 𝛼 7 x11 + 𝛼 10 x12 + 𝛼 13 x13 + 𝛼 4 x14 )y + (𝛼 4 + 𝛼 5 x + 𝛼 12 x2 + 𝛼 12 x3 + 𝛼 11 x4 + 𝛼 5 x5 + 𝛼 7 x6 + 𝛼x7 )y2 + (𝛼 11 + 𝛼x + 𝛼 14 x2 )y3 g2 (x, y) = (𝛼 13 + 𝛼 11 x + 𝛼 5 x2 + 𝛼 13 x3 + 𝛼 7 x4 + 𝛼 4 x5 + 𝛼 11 x6 + 𝛼 12 x7 + 𝛼 9 x8 + 𝛼 4 x9 + 𝛼 10 x10 + 𝛼 5 x11 + 𝛼 2 x12 + 𝛼 14 x13 + 𝛼 6 x14 + 𝛼 8 x15 + 𝛼 13 x16 + 𝛼 7 x17 + 𝛼 4 x18 + 𝛼x19 + 𝛼 7 x21 ) + (1 + 𝛼 7 x + 𝛼 9 x2 + 𝛼 14 x3 + 𝛼 9 x4 + 𝛼 8 x5 + 𝛼 3 x6 + 𝛼 14 x7 + 𝛼 4 x8 + 𝛼 8 x10 + 𝛼 5 x11 + 𝛼 8 x12 + 𝛼 11 x13 + 𝛼 2 x14 )y + (𝛼 3 x + 𝛼 7 x2 + 𝛼 10 x3 + 𝛼 6 x4 + 𝛼 3 x5 + 𝛼 14 x7 + x8 )y2 + (𝛼 9 + 𝛼 14 x + 𝛼 12 x2 )y3 g3 (x, y) = (𝛼 5 + 𝛼 9 x + 𝛼 13 x2 + 𝛼 2 x3 + 𝛼x4 + 𝛼 14 x5 + 𝛼 2 x6 + 𝛼x7 + 𝛼 12 x8 + 𝛼x9 + 𝛼 14 x10 + 𝛼 7 x11 + 𝛼 9 x13 + 𝛼 5 x14 + 𝛼 5 x15 + 𝛼 9 x16 + 𝛼 11 x17 + 𝛼 3 x18 + 𝛼 13 x19 ) + (𝛼 7 + 𝛼 7 x + 𝛼 8 x2 + 𝛼 14 x3 + x4 + 𝛼 11 x5 + 𝛼 7 x6 + 𝛼 2 x7 + 𝛼 5 x8 + 𝛼 14 x9 + 𝛼 12 x10 + 𝛼 8 x11 + 𝛼 5 x12 + 𝛼 5 x13 )y + (𝛼 5 + 𝛼x + 𝛼 10 x2 + 𝛼 11 x3 + 𝛼 13 x4 + 𝛼 6 x5 + 𝛼 4 x6 + 𝛼 8 x7 )y2 + (1 + 𝛼 8 x)y3 of weighted degrees 22, 20, 20, and 19, respectively. The one of minimal order selected is Q(x, y) = g3 (x, y).



The computational complexity of this algorithm, like the Feng–Tzeng algorithm, goes as the cube of the size of the problem, O(m3 ), with a rather large multiplicative factor. The cubic complexity of these problems makes decoding impractical for large values of m. 7.6.6

A Special Case: m = 1 and L = 1

We will see below how to take the interpolating polynomial Q(x, y) and find its y-roots. This will handle the general case of arbitrary m. However, we treat here an important special case which occurs when m = 1 and L = 1. In this case, the y-degree of Q(x, y) is equal to 1 and it is not necessary to employ a sophisticated factorization algorithm. This special case allows for conventional decoding of Reed–Solomon codes without computation of syndromes and without the error evaluation step. When L = 1, let us write Q(x, y) = P1 (x)y − P0 (x). If it is the case that P1 (x)|P0 (x), let p(x) =

P0 (x) . P1 (x)

352

Alternate Decoding Algorithms for Reed–Solomon Codes Then it is clear that y − p(x)|Q(x, y), since P1 (x)y − P0 (x) = P1 (x)(y − p(x)). Furthermore, it can be shown that the p(x) returned will produce a codeword within a distance ≤ ⌊(n − k)∕2⌋ of the transmitted codeword. Hence, it decodes up to the design distance. In light of these observations, the following decoding algorithm is an alternative to the more conventional approaches (e.g.: find syndromes; find error locator; find roots; find error values; or: find remainder; find rational interpolator; find roots; find error values). Algorithm 7.7 Guruswami–Sudan Interpolation Decoder with m = 1 and L = 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Input: Points: (xi , yi ), i = 1, … , n; a (1, k − 1) monomial order Returns: p(x, y) as a decoded codeword if it exists Initialize: g0 (x, y) = 1, g1 (x, y) = y for i = 1 to n Δ0 = g0 (xi , yi ) (compute discrepancies) Δ1 = g1 (xi , yi ) J = {j ∶ Δj ≠ 0} (set of nonzero discrepancies) if(J ≠ ∅) j∗ = arg min{gj ∶ j ∈ J} (polynomial of min. weighted degree) f = gj∗ Δ = Δj∗ for(j ∈ J) if(j ≠ j∗ ) gj = Δgj − Δj f (update without change in rank) else if(j = j∗ ) gj = (x − xi )f (update with change in rank) end(if) end(for j) end(if J) end(for i) Q(x, y) = minj {gj (x, y)} (least weighted degree) Write Q(x, y) = P1 (x)y − P0 (x) r(x) = P0 (x) mod P1 (x). if(r(x) = 0) p(x) = P0 (x)∕P1 (x). if(deg p(x) ≤ k − 1) then p(x) is decoded message end(if) else Uncorrectable error pattern end(if)

Example 7.63 the codeword

Consider a (5, 2) RS code over GF(5), using support set (0, 1, 2, 3, 4). Let m(x) = 1 + 4x. Then (m(0), m(1), m(2), m(3), m(4)) = (1, 0, 4, 3, 2) → c(x) = 1 + 4x2 + 3x3 + 2x4 .

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

353

Let the received polynomial be r(x) = 1 + 2x + 4x2 + 3x3 + 2x4 . The following table indicates the steps of the algorithm. i (xi , yi ) g0 (x, y) – – 1

g1 (x, y) y

0 1 2 3 4

4+y (4 + 4x) + y (2 + 4 + 4x2 ) + 3y (3 + 4x + 3x2 ) + (2 + 3x)y (3 + 4x + 3x2 ) + (2 + 3x)y

(0,1) (1,0) (2,4) (3,3) (4,2)

x 4x + x2 (2 + x + x2 ) + 3y (4 + 4x + 3x2 + x3 ) + (1 + 3x)y (4 + 3x + 2x2 + 4x3 + x4 ) + (1 + 4x + 3x2 )y

(An interesting thing happens at i = 2: It appears initially that g0 (x, y) = (2 + x + x2 ) + 3y is no longer in the set S0 , the set of polynomials whose leading monomial has y-degree 0, because a term with y appears in g0 (x, y). However, the leading term is actually x2 . Similar behavior is seen at other steps.) At the end of the algorithm, we take Q(x, y) = (2 + 3x)y + (3 + 4x + 3x2 ) = (2 + 3x)y − (2 + x + 2x2 ), α α2 α3 α4 α5 α6 α7 α8 α9

so that

+

2

p(x) =

2 + x + 2x = 1 + 4x, 2 + 3x ◽

which was the original message.

7.6.7

01010

An Algorithm for the Factorization Step: The Roth–Ruckenstein Algorithm

We now consider the problem of factorization when the y-degree of Q(x, y) ≥ 1. Having obtained the polynomial Q(x, y) interpolating a set of data points (xi , yi ), i = 1, 2, . . . , n, the next step in the GS decoding algorithm is to determine all factors of the form y − p(x), where p(x) is a polynomial of degree ≤ 𝑣, such that (y − p(x))|Q(x, y). We have the following observation: Lemma 7.64 (y − p(x))|Q(x) if and only if Q(x, p(x)) = 0. (This is analogous to the result in univariate polynomials that (x − a)|g(x) if and only if g(a) = 0.) Proof Think of Q(x, y) as a polynomial in the variable y with coefficients over 𝔽 (x). The division algorithm applies, so that upon division by y − p(x), we can write Q(x, y) = q(x, y)(y − p(x)) + r(x), where the y-degree of r(x) must be 0, since the divisor y − p(x) has y-degree 1. Evaluating at y = p(x), we see that Q(x, p(x)) = r(x). Then Q(x, p(x)) = 0 if and only if r(x) = 0 (identically). ◽ Definition 7.65 A function p(x) such that Q(x, p(x)) = 0 is called a y-root of Q(x, y).



The algorithm described in this section, the Roth–Ruckenstein algorithm [382], finds y-roots. (Another algorithm due to Gao and Shokrollahi [151] is also known to be effective for this factorization.) The notation ⟨⟨Q(x, y)⟩⟩ denotes the coefficient (polynomial) of the highest power of x that divides Q(x, y). That is, if xm |Q(x, y) but xm+1∕| Q(x, y), then ⟨⟨Q(x, y)⟩⟩ =

Q(x, y) . xm

Then Q(x, y) = ⟨⟨Q(x, y)⟩⟩xm

00000

α10 α11 α12 α13 α14 α15 α16

testGS5.cc kotter1.cc

354

Alternate Decoding Algorithms for Reed–Solomon Codes for some m ≥ 0. Example 7.66 If Q(x, y) = xy, then ⟨⟨Q(x, y)⟩⟩ = y. If Q(x, y) = x2 y4 + x3 y5 , then ⟨⟨Q(x, y)⟩⟩ = y4 + xy5 . If ◽ Q(x, y) = 1 + x2 y4 + x3 y5 , then ⟨⟨Q(x, y)⟩⟩ = Q(x, y).

Let p(x) = a0 + a1 x + a2 x2 + · · · + a𝑣 x𝑣 be a y-root of Q(x, y). The Roth–Ruckenstein algorithm will determine the coefficients of p(x) one at a time. The coefficient a0 is found using the following lemma. Lemma 7.67 Let Q0 (x, y) = ⟨⟨Q(x, y)⟩⟩. If (y − p(x))|Q(x, y), then y = p(0) = a0 is a root of the equation Q0 (0, y) = 0. Proof If (y − p(x))|Q(x, y), then (y − p(x))|xm Q0 (x, y) for some m ≥ 0. But since y − p(x) and xm must be relatively prime, it must be the case that (y − p(x))|Q0 (x, y), so that Q0 (x, y) = T0 (x, y)(y − p(x)) for some quotient polynomial T0 (x, y). Setting y = p(0), we have Q0 (0, y) = Q0 (0, p(0)) = T0 (0, p(0))(p(0) − p(0)) = T0 (0, p(0))0 = 0.



From this lemma, the set of possible values of the coefficient a0 of p(x) are the roots of the polynomial Q0 (0, y). The algorithm now works by inductively “peeling off” layers, leaving a structure by which a1 can similarly be found, then a2 , and so forth. It is based on the following theorem, which defines the peeling off process and extends Lemma 7.67. Theorem 7.68 Let Q0 (x, y) = ⟨⟨Q(x, y)⟩⟩. Let p0 (x) = p(x) = a0 + a1 x + · · · + a𝑣 x𝑣 ∈ 𝔽𝑣 [x]. For j ≥ 1 define (pj−1 (x) − pj−1 (0)) pj (x) = (7.98) = aj + · · · + a𝑣 x𝑣−j x Tj (x, y) = Qj−1 (x, xy + aj−1 )

(7.99)

Qj (x, y) = ⟨⟨Tj (x, y)⟩⟩.

(7.100)

Then for any j ≥ 1, (y − p(x))|Q(x, y) if and only if (y − pj (x))|Qj (x, y). Proof The proof is by induction: We will show that (y − pj−1 (x))|Qj−1 (x, y) if and only if (y − pj (x))|Qj (x, y). ⇒: Assuming (y − pj−1 (x))|Qj−1 (x, y), we write Qj−1 (x, y) = (y − pj−1 (x))u(x, y) for some quotient polynomial u(x, y). Then from (7.99), Tj (x, y) = (xy + aj−1 − pj−1 (x))u(x, xy + aj−1 ). Since aj−1 − pj−1 (x) = −xpj (x), Tj (x, y) = x(y − pj (x))u(x, xy + aj−1 ) so that (y − pj (x))|Tj (x, y). From (7.100), Tj (x, y) = xm Qj (x, y) for some m ≥ 0, so (y − pj (x))|xm Qj (x, y). Since xm and y − p(x) are relatively prime, (y − pj (x))|Qj (x, y).

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

355

⇐: Assuming (y − pj (x))|Qj (x, y) and using (7.100), (y − pj (x))|Tj (x, y). From (7.99), (y − pj (x))|Qj−1 (x, xy + aj−1 ) so that Qj−1 (x, xy + aj−1 ) = (y − pj (x))u(x, y)

(7.101)

for some quotient polynomial u(x, y). Replace y by (y − aj−1 )∕x in (7.101) to obtain Qj−1 (x, y) = ((y − aj−1 )∕x − pj (x))u(x, (y − aj−1 )∕x). The fractions can be cleared by multiplying both sides by some sufficiently large power L of x. Then using the fact that pj−1 (x) = aj−1 + xpj (x), we obtain xL Qj−1 (x, y) = (y − pj−1 (x))𝑣(x, y) for some polynomial 𝑣(x, y). Thus, (y − pj−1 (x))|xL Qj−1 (x, y) and so (y − pj−1 (x))|Qj−1 (x, y). ◽ The following lemma is a repeat of Lemma 7.67 and is proved in a similar manner. Lemma 7.69 If (y − p(x))|Q(x, y), then y = pj (0) is a root of the equation Qj (0, y) = 0 for j = 0, 1, . . . , 𝑣. Since pj (0) = aj , this lemma indicates that the coefficient aj can be found by finding the roots of the equation Qj (0, y) = 0. Finally, we need a termination criterion, provided by the following lemma. Lemma 7.70 If y|Q𝑣+1 (x, y), then p(x) = a0 + a1 x + · · · + a𝑣 x𝑣 is a y-root of Q(x, y). Proof Note that if y|Q𝑣+1 (x, y), then Q𝑣+1 (x, 0) = 0. By the construction (7.98), pj (x) = 0 for j ≥ 𝑣 + 1. The condition y|Q𝑣+1 (x, y) is equivalent to (y − p𝑣+1 (x))|Q𝑣+1 (x, y). Thus, by Theorem 7.68, it must be the case that (y − p(x))|Q(x). ◽ The overall operation of the algorithm is outlined as follows. for each a0 in the set of roots of Q0 (0, y) for each a1 in the set of roots of Q1 (0, y) for each a2 in the set of roots of Q2 (0, y) ⋮ for each au in the set of roots of Qu (0, y) if Qu (x, 0) = 0, then (by Lemma 7.70), p(x) = a0 + a1 x + · · · + au xu is a y-root end for ⋮ end for end for end for The iterated “for” loops (up to depth u) can be implemented using a recursive programming structure. The following algorithm uses a depth-first tree structure.

356

Alternate Decoding Algorithms for Reed–Solomon Codes

Algorithm 7.8 Roth–Ruckenstein Algorithm for Finding y-roots of Q(x, y) 1 2 3 4 5 6

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Input: Q(x, y), D (maximum degree of p(x)) Output: List of polynomials p(x) of degree ≤ D such that (y − p(x))|Q(x, y) Initialization: Set p(x) = 0, u = deg(p) = −1, D = maximum degree (set as internal global) Set up linked list where polynomials are saved. Set 𝜈 = 0 (the number of the node; global variable) Call rothrucktree (Q(x, y), u, p) Function rothrucktree (Q, u, p): Input: Q(x, y), p(x) and u (degree of p) Output: List of polynomials 𝜈 = 𝜈 + 1 (increment node number) if(Q(x, 0) = 0) add p(x) to the output list end(if) else if(u < D) (try another branch of the tree) R = list of roots of Q(0, y) for each 𝛼 ∈ R Qnew (x, y) = Q(x, xy + 𝛼) (shift the polynomial) pu+1 = 𝛼 (new coefficient of p(x)) Call rothrucktree (⟨⟨Qnew (x, y)⟩⟩, u + 1, p)(recursive call) end (for) else (leaf of tree reached with nonzero polynomial) (no output) end (if) end

Example 7.71 α α2 α3 α4 α5 α6 α7 α8 α9

01010

Let Q(x, y) = (4 + 4x2 + 2x3 + 3x4 + x5 + 3x6 + 4x7 ) + (1 + 2x + 2x2 + 3x4 + 3x6 )y

+

00000

+ (1 + x + 2x2 + x3 + x4 )y2 + (4 + 2x)y3 α10 α11 α12 α13 α14 α15 α16

testGS2.cc rothruck.cc rothruck.h

be a polynomial in GF(5)[x, y]. Figures 7.5 and 7.6 illustrate the flow of the algorithm with D = 2.



At Node 1, the polynomial Q(0, y) = 4 + y + y2 + 4y3 is formed (see Figure 7.5) and its roots are computed as {1, 4} (1 is actually a repeated root).

• • • •

At Node 1, the root 1 is selected. Node 2 is called with ⟨⟨Q(x, xy + 1)⟩⟩.

• • • •

The recursion returns to Node 2, where the root 3 is selected, and Node 4 is called with ⟨⟨Q(x, xy + 3)⟩⟩.

At Node 2, the polynomial Q(0, y) = 3 + 3y2 is formed, with roots {2, 3}. At Node 2, the root 2 is selected. Node 3 is called with ⟨⟨Q(x, xy + 2)⟩⟩. At Node 3, the polynomial Q(x, 0) = 0, so the list of roots selected to this node {1, 2} forms an output polynomial, p(x) = 1 + 2x. At Node 4, the polynomial Q(0, y) = 2 + 3y is formed, with roots {1}. At Node 4, the root 1 is selected. Node 5 is called with ⟨⟨Q(x, xy + 1)⟩⟩. At Node 5, it is not the case that Q(x, 0) = 0. Since the level of the tree (3) is greater than D (2), no further searching is performed along this branch, so no output occurs. (However, if D had been equal to 3, the next branch would have been called with a root of 2 and a polynomial would have been found; the polynomial p(x) = 1 + 3x + x2 + 2x3 would have been added to the list.)

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

357

• •

The recursion returns to Node 1, where the root 4 is selected and Node 6 is called with ⟨⟨Q(x, xy + 4)⟩⟩.

• •

At Node 7, the polynomial Q(0, y) = 3 + y is formed, with root {2}. Node 8 is called with ⟨⟨Q(x, xy + 2)⟩⟩.

At Node 6, the polynomial Q(0, y) = 2 + y is formed, with root {3}. (See Figure 7.6.) Node 7 is called with ⟨⟨Q(x, xy + 3)⟩⟩. At Node 8, Q(x, 0) = 0, so the list of roots selected to this node {4, 3, 2} forms an output polynomial p(x) = 4 + 3x + 2x2 .

The set of polynomials produced is {1 + 2x, 4 + 3x + 2x2 }.



Example 7.72 For the interpolating polynomial Q(x, y) of Example 7.62, the Roth–Ruckenstein algorithm determines the following y-roots (see testGS3.cc): p(x) ∈ {𝛼 + 𝛼 2 x + 𝛼 3 x2 + 𝛼 4 x3 + 𝛼 5 x4 + 𝛼 6 x5 + 𝛼 7 x6 , 𝛼 7 + 𝛼 6 x + 𝛼 14 x2 + 𝛼 5 x3 + 𝛼 10 x4 + 𝛼 10 x5 + 𝛼 2 x6 , 𝛼 12 + 𝛼 10 x + 𝛼 11 x3 + 𝛼 13 x4 + 𝛼 11 x5 + 𝛼 11 x6 } = . Note that the original message m(x) is among this list, resulting in a codeword a Hamming distance 4 away from the received r(x). There are also two others (confer the fact that Lm = 3 was computed in Example 7.62). The other polynomials result in distances 5 and 6, respectively, away from r(x). Being further than tm away from r(x), these can be eliminated from further consideration. ◽

7.6.7.1

What to Do with Lists of Factors?

The GS algorithm returns the message m(x) on its list of polynomials , provided that the codeword for m(x) is within a distance tm of the received vector. This is called the causal codeword. It may also return other codewords at a Hamming distance ≤ tm , which are called plausible codewords. Other codewords, at a distance > tm from r(x) may also be on the list; these are referred to as noncausal codewords (i.e., codewords not caused by the original, true, codeword). We have essentially showed, by the results above, that all plausible codewords are in  and that the maximum number of codewords is ≤ Lm . In Example 7.72, it was possible to winnow the list down to a single plausible codeword. But in the general case, what should be done when there are multiple plausible codewords in ? If the algorithm is used in conjunction with another coding scheme (in a concatenated scheme), it may be possible to eliminate one or more of the plausible codewords. Or there may be external ways of selecting a correct codeword from a short list. Another approach is to exploit soft decoding information, to determine using a more refined measure which codeword is closest to r(x). But there is, in general, no universally satisfactory solution to this problem. However, it turns out that for many codes, the number of elements in  is equal to 1: the list decoder may not actually return a list with more than one element in it. This concept has been explored in [302]. We summarize some key conclusions. When a list decoder returns more than one plausible codeword, there is the possibility of a decoding failure. Let L denote the number of codewords on the list and let list L′ denote the number of noncausal codewords. There may be a decoding error if L′ > 1. Let PE denote the probability of a decoding error. We can write PE < E[L′ ], the average number of noncausal codewords on the list. Now consider selecting a point at random in the space GF(q)n and placing a Hamming sphere of radius tm around it. How many codewords of an (n, k) code, on average, are in this Hamming sphere?

358

Alternate Decoding Algorithms for Reed–Solomon Codes

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

Lbarex.m computetm.m computeLbar.m computeLm.m

The “density” of codewords in the space is qk ∕qn . As we have seen (Section 3.3.1), the number of points in the Hamming sphere is tm ( ) ∑ n (q − 1)2 . s s=0

Therefore, the average number of codewords in a sphere of radius tm around a random point is L(tm ) =

tm ( ) qk ∑ n (q − 1)2 . qn s=0 s

It can be shown [302] that L(tm ) is slightly less than a rigorous bound on the average number of noncausal codewords on the list. Example 7.73 [302] For a (32, 8) RS code over GF(32), we have the following results (only values of m are shown that lead to distinct values of tm ): m

tm

Lm

L(tm )

0 1 2 4 120

12 14 15 16 17

1 2 4 8 256

1.36305e − 10 2.74982e − 07 1.02619e − 05 0.000339205 0.00993659

Thus, while the list for m = q may have as many as L4 = 8 polynomials, the probability L(t4 ) is very small — the list is very likely to contain only one codeword, which will be the causal codeword. It is highly likely that this code is capable of correcting up to 16 errors. (Actually, even for m = 120, the probability of more than codeword is still quite small; however, the computational complexity for m = 120 precludes its use as a practical decoding option.) ◽ Example 7.74

[302] For a (31,15) code over GF(32), we have m

tm

Lm L(tm )

0 8 1 5.62584e-06 3 9 4 0.000446534 21 10 31 0.0305164 It is reasonable to argue that the code can correct up to nine errors with very high probability.

7.6.8



Soft-Decision Decoding of Reed–Solomon Codes

The Reed–Solomon decoding algorithms described up to this point in the book have been hard decision decoding algorithms, making explicit use of the algebraic structure of the code and employing symbols which can be interpreted as elements in a Galois field. A long outstanding problem in coding theory has been to develop a soft-decision decoding algorithm for Reed–Solomon codes, which is able to exploit soft channel outputs without mapping it to hard values, while still retaining the ability to exploit the algebraic structure of the code. This problem was solved [251] by an extension of the GS algorithm which we call the Koetter–Vardy (KV) algorithm.3 3

The name Koetter is simply a transliteration of the name Kötter.

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding Recall that the GS algorithm has a parameter m representing the interpolation multiplicity, m. For most of this chapter, a fixed multiplicity has been employed at each point. It is possible, however, to employ a different multiplicity at each point. (In fact, Algorithm 7.6 already handles multiple multiplicities.) In the KV algorithm, a mapping is found from posterior probabilities (soft information) to the multiplicities, after which the conventional interpolation and factorization steps of the GS are used to decode. (The mapping still results in some degree of “hardness,” since probabilities exist on a continuum, while the multiplicities must be integers.) We present here their algorithm for memoryless channels; for further results on channels with memory and concatenated channels, the reader is referred to [251]. 7.6.8.1

Notation

Recall (see Section 1.6) that a memoryless channel can be modeled as an input alphabet 𝔛, an output alphabet 𝔜, and a set of |𝔛| functions f (⋅|x) ∶ 𝔜 → ℝ. The channel input and output are conventionally viewed as a random variables X and Y. If Y is continuous, then f (⋅|x) is a probability density function (for example, for a Gaussian channel, f (y|x) would be a Gaussian likelihood function). If Y is discrete, then f (⋅|x) is a probability mass function. The memoryless nature of the channel is reflected by the assumption that the joint likelihood factors, f (y1 , y2 , . . . , yn |x1 , x2 , . . . , xn ) =

n ∏

f (yi |xi ).

i=1

It is assumed that the X random variable is uniformly distributed (i.e., that each codeword is selected with equal probability). Given an observation y ∈ 𝔜, the probability that some 𝛼 ∈ 𝔛 was transmitted is found using Bayes’ theorem, P(X = 𝛼|Y = y) = ∑

f (y|𝛼) , x∈𝔛 f (y|x)

where the assumption of uniformity of X is used. For Reed–Solomon codes, the input alphabet is the field over which the symbols occur, 𝔛 = GF(q). We have therefore 𝔛 = {𝛼1 , 𝛼2 , . . . , 𝛼q }, for some arbitrary ordering of the elements of GF(q). Let 𝐲 = (y1 , y2 , . . . , yn ) ∈ 𝔜n be a vector of observations. We define the posterior probabilities 𝜋i,j = P(X = 𝛼i |Y = yj ),

i = 1, 2, . . . , q,

j = 1, 2, . . . , n

and form the q × n matrix Π with elements 𝜋i,j . The matrix Π is called the reliability matrix. It is convenient below to use the notation Π(𝛼, j) to refer the element in the row indexed by 𝛼 and the column indexed by j. It is assumed that the reliability matrix is provided (somehow) as the input to the decoding algorithm. A second matrix is also employed. Let M be a q × n multiplicity matrix with non-negative elements mi,j , where mi,j is the interpolation multiplicity associated with the point (𝛼i , yj ). The key step of the algorithm to be described below is to provide a mapping from the reliability matrix Π to the multiplicity matrix M. Definition 7.75 Let M be a multiplicity matrix with elements mi,j . We will denote by QM (x, y) the polynomial of minimal (1, k − 1)-weighted degree that has a zero of multiplicity at least mi,j at the ◽ point (𝛼i , yj ) for every i, j such that mi,j ≠ 0. Recall that the main point of the interpolation theorem is that there must be more degrees of freedom (variables) than there are constraints. The number of constraints introduced by a multiplicity

359

360

Alternate Decoding Algorithms for Reed–Solomon Codes ( ) mi,j is equal to mi,j2+1 . The total number of constraints associated with a multiplicity matrix M is called the cost of M, denoted C(M), where 1 ∑∑ m (m + 1). 2 i=1 j=1 i,j i,j q

C(M) =

n

As before, let C(𝑣, l) be the number of monomials of weighted (1, 𝑣)-degree less than or equal to l. Then by the interpolation theorem an interpolating solution exists if C(𝑣, l) > C(M). Let 𝑣 (x) = min{l ∈ ℤ ∶ C(𝑣, l) > x} be the smallest (1, 𝑣)-weighted degree which has the number of monomials of lesser (or equal) degree exceeding x. Then k−1 (C(M)) is the smallest (1, k − 1)-weighted degree which has a sufficiently large number of monomials to exceed the cost C(M). (Confer with Km defined in (7.76).) By (7.64), for a given cost C(M), we must have √ 𝑣 (C(M)) < 2𝑣C(M). (7.102) It will be convenient to represent vectors in GF(q)n as indicator matrices over the reals, as follows. Let 𝐯 = (𝑣1 , 𝑣2 , . . . , 𝑣n ) ∈ GF(q)n and let [𝐯] denote the q × n matrix which has [𝐯]i,j = 1 if 𝑣j = 𝛼i , and [𝐯]i,j = 0 otherwise. That is, [𝐯]i,j = I𝛼i (𝑣j ), { 1 𝑣j = 𝛼i where I𝛼i (𝑣j ) is the indicator function, I𝛼i (𝑣j ) = 0 otherwise. Example 7.76

In the field GF(3), let 𝐯 = (1, 2, 0, 1). The matrix representation is



For two q × n matrices A and B, define the inner product ⟨A, B⟩ = tr(AB ) = T

q n ∑ ∑

ai,j bi,j .

i=1 j=1

Now using this, we define the score of a vector 𝐯 = (𝑣1 , 𝑣2 , . . . , 𝑣n ) with respect to a given multiplicity matrix M as SM (𝐯) = ⟨M, [𝐯]⟩. The score thus represents the total multiplicities of all the points associated with the vector 𝐯.

Example 7.77

⎡m1,1 m1,2 m1,3 m1,4 ⎤ ⎢ ⎥ If M = ⎢m2,1 m2,2 m2,3 m2,4 ⎥, then the score of the vector 𝐯 from the last example is ⎢m m m m ⎥ ⎣ 3,1 3,2 3,3 3,4 ⎦ SM (𝐯) = m2,1 + m3,2 + m1,3 + m2,4 .



7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding 7.6.8.2

361

A Factorization Theorem

Our key result is an extension of the factorization theorem for different multiplicities. Theorem 7.78 Let C(M) be the cost of the matrix M (the number of constraints to satisfy). Let 𝐜 be a codeword in a Reed–Solomon (n, k) code over GF(q) and let p(x) be a polynomial that evaluates to 𝐜 (i.e., 𝐜 = (p(x1 ), p(x2 ), . . . , p(xn )) for code support set {x1 , x2 , . . . , xn }). If SM (c) > k−1 (C(M)), then (y − p(x))|QM (x, y). Proof Let 𝐜 = (c1 , . . . , cn ) be a codeword and let p(x) be a polynomial of degree < k that maps to the codeword: p(xj ) = cj for all xj in the support set of the code. Let g(x) = QM (x, p(x)). We will show that g(x) is identically 0, which will imply that (y − p(x))|QM (x, y). Write SM (𝐜) = m1 + m2 + · · · + mn . The polynomial QM (x, y) has a zero of order mj at (xj , cj ) (by the definition of QM ). By Lemma 7.47, (x − xj )mj |QM (x, p(x)), for j = 1, 2, . . . , n. Thus, g(x) = QM (x, p(x)) is divisible by the product (x − x1 )m1 (x − x2 )m2 · · · (x − xn )mn having degree m1 + m2 + · · · + mn = SM (𝐜), so that either deg(g(x)) ≥ SM (𝐜), or g(x) must be zero. Since deg p(x) ≤ k − 1, the degree of QM (x, p(x)) is less than or equal to the (1, k − 1)-weighted degree of QM (x, y). Furthermore, since QM (x, y) is of minimal (1, k − 1)-degree, deg1,k−1 QM (x, y) ≤ k−1 (C(M)): deg g(x) ≤ deg1,k−1 QM (x, y) ≤ k−1 (C(M)). By hypothesis, we have SM (𝐜) > k−1 (C(M)). Since we cannot have deg g(x) ≥ SM (𝐜), we must therefore have g(x) is zero (identically). As before, we write using the factorization theorem Q(x, y) = (y − p(x))q(x, y) + r(x) so that at y = p(x), we have Q(x, p(x)) = g(x) = 0 = 0q(x, y) + r(x) so r(x) = 0, giving the stated divisibility.



In light of (7.102), this theorem indicates that QM (x, y) has a factor y − p(x), where p(x) evaluates to a codeword 𝐜, if √ (7.103) SM (𝐜) ≥ 2(k − 1)C(M). 7.6.8.3

Mapping from Reliability to Multiplicity

Given a cost C, we define the set 𝔐(C) as the set of all matrices with non-negative elements whose cost is equal to C: { } q n 1 ∑∑ q×n 𝔐(C) = M ∈ ℤ ∶ mi,j ≥ 0 and m (m + 1) = C . 2 i=1 j=1 i,j i,j The problem we would address now is to select a multiplicity matrix M which maximizes the score SM (𝐜) for a transmitted codeword 𝐜 (so that the condition SM (𝐜) > k−1 (C(M)) required by Theorem 7.78 is satisfied). Not knowing which codeword was transmitted, however, the best that can be hoped for is to maximize some function of a codeword chosen at random. Let X = (X1 , X2 , . . . , Xn ) be a

362

Alternate Decoding Algorithms for Reed–Solomon Codes random transmitted vector. The score for this vector is SM (X) = ⟨M, [X]⟩. We choose to maximize the expected value of this score,4 ∑ E[SM (X)] = SM (𝐱)P(𝐱) 𝐱∈𝔛n

with respect to a probability distribution P(𝐱). We adopt the distribution determined by the channel output (using the memoryless channel model), P(𝐱) = P(x1 , x2 , . . . , xn ) =

n ∏

P(Xj = xj |Yj = yj ) =

j=1

n ∏

Π(xj , j),

(7.104)

j=1

where Π is the reliability matrix. This would be the posterior distribution of X given the observations if the prior distribution of X were uniform over GF(q)n . However, the X are, in fact, drawn from the set of codewords, so this distribution model is not accurate. But computing the optimal distribution function, which takes into account all that the receiver knows about the code locations, can be shown to be NP complete [251]. We therefore adopt the suboptimal, but tractable, stance. The problem can thus be stated as: Select the matrix M ∈ 𝔐 maximizing E[SM (X)], where the expectation is with respect to the distribution P in (7.104). We will denote the solution as M(Π, C), where M(Π, C) = arg max E[SM (X)]. M∈𝔐(C)

We have the following useful lemma. Lemma 7.79 E[SM (X)] = ⟨M, Π⟩. Proof E[SM (X)] = E⟨M, [X]⟩ = ⟨M, E[X]⟩ by linearity. Now consider the (i, j)th element: E[X]i,j = E[I𝛼i (Xj )] = (1)P(Xj = 𝛼i ) + (0)P(Xj ≠ 𝛼i ) = P(Xj = 𝛼i ) = Π(𝛼i , j), where the last equality follows from the assumed probability model in (7.104).



The following algorithm provides a mapping from a reliability matrix Π to a multiplicity matrix M. Algorithm 7.9 Koetter–Vardy Algorithm for Mapping from Π to M 1 2 3 4 5 6 7 8 9

Input: A reliability matrix Π; a positive integer S indicating the number of interpolation points Output: A multiplicity matrix M Initialization: Set Π∗ = Π and M = 0. Do: Find the position (i, j) of the largest element 𝜋i,j∗ of Π∗ . Set 𝜋i,j∗ =

∗ 𝜋i,j

mi,j +2

Set mi,j = mi,j + 1 Set S = S − 1 While S > 0

4 More correctly, one might want to maximize P(S (X) >  M k−1 (C(M))), but it can be shown that this computation is, in general, very complicated.

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

363

Let M be formed by normalizing the columns of M produced by this algorithm to sum to 1. It can be shown that M → Π as S → ∞. That is, the algorithm produces an integer matrix which, when normalized to look like a probability matrix, asymptotically approaches Π. Since one more multiplicity is introduced for every iteration, it is clear that the score increases essentially linearly with S and that the cost increases essentially quadratically with S. Example 7.80

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

pi2ml.m

Suppose

⎡0.1349 ⎢ ⎢0.2444 Π = ⎢0.2232 ⎢ ⎢0.1752 ⎢ ⎣0.2222

0.3046 0.2584 0.2335⎤ ⎥ 0.1578 0.1099 0.1816⎥ 0.1337 0.2980 0.1478⎥ . ⎥ 0.1574 0.2019 0.2307⎥ ⎥ 0.2464 0.1317 0.2064⎦

Figure 7.7 shows a plot of ∥ M − Π ∥ after S iterations as S varies up to 400 iterations. It also shows the score and the cost 1n ⟨M, Π⟩. The M matrix produced after 400 iterations, and its normalized equivalent, are ⎡13 ⎢ ⎢25 M = ⎢22 ⎢ ⎢17 ⎢ ⎣22

31 26 23⎤ ⎥ 16 11 18⎥ 13 30 15⎥ ⎥ 16 20 23⎥ ⎥ 25 13 21⎦

⎡0.1313 ⎢ ⎢0.2525 M = ⎢0.2222 ⎢ ⎢0.1717 ⎢ ⎣0.2222

0.3069 0.2600 0.2300⎤ ⎥ 0.1584 0.1100 0.1800⎥ 0.1287 0.3000 0.1500⎥ . ⎥ 0.1584 0.2000 0.2300⎥ ⎥ 0.2475 0.1300 0.2100⎦



Some understanding of this convergence result — why M proportional to Π is appropriate — can be obtained as follows. We note that the cost C(M) can be written as C(M) =

1 (⟨M, M⟩ + ⟨M, 1⟩), 2

where 𝟏 is the matrix of all ones. For sufficiently large M, C(M) is close to 12 ⟨M, M⟩, which is (1/2) the Frobenius norm of M. For fixed norm ⟨M, M⟩, maximizing the expected score ⟨M, Π⟩ is accomplished (as indicated by the Cauchy–Schwartz inequality) by setting M to be proportional to Π. 7.6.8.4

The Geometry of the Decoding Regions

In the bound (7.103), write SM (𝐜) = ⟨M, [𝐜]⟩ and C(M) = ⟨M, Π⟩ + ⟨M, 𝟏⟩; we thus see from Theorem 7.78 that the decoding algorithm outputs a codeword 𝐜 if √ ⟨M, [𝐜]⟩ ≥ k − 1. √ ⟨M, M⟩ + ⟨M, 1⟩ Asymptotically (as S → ∞ and the normalized M → Π), we have √ ⟨Π, [𝐜]⟩ ≥ k − 1 + o(1) √ ⟨Π, Π⟩

(7.105)

(where the o(1) term accounts for the neglected ⟨M, 1⟩ term). Observe that for any codeword 𝐜, ⟨[𝐜], [𝐜]⟩ = n. Recall that in the Hilbert space ℝqn , the cosine of the angle 𝛽 between vectors X and Y is ⟨X, Y⟩ . cos 𝛽(X, Y) = √ ⟨X, X⟩⟨Y, Y⟩

364

Alternate Decoding Algorithms for Reed–Solomon Codes

Figure 7.5: An example of the Roth–Ruckenstein Algorithm over GF(5). In light of (7.105), a codeword 𝐜 is on the decoder list if √ ⟨Π, [𝐜]⟩ ≥ (k − 1)∕n + o(1). √ n⟨Π, Π⟩ Asymptotically (n → ∞), the codeword is on the list if √ cos 𝛽([𝐜], Π) ≥ R + o(1).

7.6 The Guruswami–Sudan Decoding Algorithm and Soft RS Decoding

Figure 7.6: An example of the Roth–Ruckenstein Algorithm over GF(5) (cont’d). The asymptotic decoding regions are thus cones in Euclidean space ℝqn : the central axis √ of the cone is the line from the origin to the point Π. Codewords which lie within an angle of cos−1 R of the central axis are included in the decoding list of Π. √ √ Each codeword lies on a sphere S of radius ⟨[𝐜], [𝐜]⟩ = n. To contrast the KV algorithm with the GS algorithm, the GS algorithm would take a reliability matrix Π and project it (by some √nonlinear means) onto the sphere S, and then determine the codewords within an angle of cos−1 R of this projected point. Conventional decoding is similar, except that the angle of inclusion is cos−1 (1 + R)∕2 and the decoding regions are nonoverlapping. 7.6.8.5

Computing the Reliability Matrix

We present here a suggested method for computing a reliability matrix for transmission using a large signal constellation. Consider the constellation shown in Figure 7.8. Let the signal points in the constellation be 𝐬0 , . . . , 𝐬M−1 and let the received point be 𝐫. Then the likelihood functions f (𝐫|𝐬i ) can be computed. Rather than using all M points in computing the reliability, we compute using only the N largest likelihoods. From the N largest of these likelihoods (corresponding to the N closest points in Gaussian

365

Alternate Decoding Algorithms for Reed–Solomon Codes

Norm difference

366 0.6 0.4 0.2 0

0

50

100

150

200

250

300

350

400

0

50

100

150

200

250

300

350

400

0

50

100

150

200 250 Number of steps

300

350

400

Score

30 20 10 0

Cost

6000 4000 2000 0

Figure 7.7: Convergence of M to Π, and the score and cost as a function of the number of iterations.

si1

si3

r

si4

si2

Figure 7.8: Computing the reliability function. noise) f (𝐫|𝐬i1 ), f (𝐫|𝐬i2 ), . . . , f (𝐫|𝐬iN ), we form P(𝐫|𝐬ik ) =

f (𝐫|𝐬ik ) N ∑ l=1

.

f (𝐫|𝐬il )

Using likelihoods computed this way, the authors of [251] have examined codes with a rate near 1/2 for a 256 QAM signal constellation. The soft-decision decoding algorithm achieved gains of up to 1.6 dB compared to conventional (hard-decision) decoding of the same RS codes.

7.7 Exercises

7.7 7.1

367

Exercises [84] Let M be a module over a ring R. (a) Show that the additive identity 0 ∈ M is unique. (b) Show that each f ∈ M has a unique additive inverse. (c) Show that 0f = 0, where 0 ∈ R on the left-hand side is the zero element in R and 0 ∈ M on the right-hand side is the additive identity element in M.

7.2 7.3 7.4

7.5 7.6

Show that an ideal I ⊂ R is an R-module. Show that if a subset M ⊂ R is a module over R, then M is an ideal in R. Let I be an ideal in R. Show that the quotient M = R∕I is an R-module under the quotient ring sum operation and the scalar multiplication defined for cosets [g] ∈ R∕I and f ∈ R by f [g] = [fg] ∈ R∕I. Show that the expression leading to (7.19) is true, that is, g′0 (𝛼 k )𝛼 kb = g′ (𝛼 b+k )𝛼 b(2−d+k) . Show that g′ (𝛼 b+k )𝛼 b(k+2−d) pk 𝛼 k = C̃ for k = 0, 1, . . . , d − 2, where C̃ is a constant, using the ∏ ∏ ∑ following steps. Let g[r] (x) = ri=0 (x − 𝛼 i ) and p[r] (x) = ri=1 (x − 𝛼 i ) = rk=0 p[r] xk . (That is, k these correspond to polynomials with b = 0.) You may prove the result by induction, using [r]

g (𝛼

k

)p[r] 𝛼k k

=C

[r]

=

r−1 ∏

(𝛼 r+1 − 𝛼 i+1 ),

k = 0, 1, . . . , r

i=0

as the inductive hypothesis. . (a) Show that g[r] (𝛼 k ) = g[r] (𝛼 k−1 ) 𝛼 𝛼k−1−𝛼 { ′ −𝛼r [r] k k ′ g (𝛼 )(𝛼 − 𝛼 r+1 ) k = 0, 1, . . . , r . Hint: g[r+1] (x) = (b) Show that g[r+1] (𝛼 k ) = k =r+1 g[r] (𝛼 r+1 ) g[r] (x)(x − 𝛼 r+1 ). ⎧ −p[r] 𝛼 r+1 k=0 0 ⎪ [r+1] [r] r+1 (c) Show that pk = ⎨p[r] − 𝛼 pk k = 1, 2, . . . , r . Hint: p[r+1] (x) = p[r] (x)(x − 𝛼 r+1 ). k−1 ⎪ 1 k = r + 1. ⎩ ∏ ′ ′ [r+1] r+1 𝛼 = g[r+1] (𝛼 r+1 )𝛼 r+1 = ri=0 𝛼 r+2 − (d) Show that for the case k = r + 1, g[r+1] (𝛼 r+1 )pr+1 ′



r+k−1

r−1



𝛼 i+1 = C[r+1] . ′ (e) For the case that k = 0, show (using the inductive hypothesis) that g[r+1] (𝛼 0 )p0[r+1] 𝛼 0 = C[r+1] . (f) For the case that k = 1, 2, . . . , r, show that g[r+1] (𝛼 k )pk[r+1] 𝛼 k = C[r+1] . ∏ ∏b+d−2 i (g) Now extend to the case that b ≠ 0. Let g0 (x) = d−2 and let i=0 (x − 𝛼 ) and g(x) = i=b ∏d−2 ∏ i ). Show that g′ (x) = g′ (𝛼 b x)𝛼 −b(d−2) and p0 (x) = i=1 (x − 𝛼 i ) and p(x) = b+d−2 (x − 𝛼 i=b+1 0 p0k = pk 𝛼 b(k−d+2) . Conclude that ̃ g′ (𝛼 b+k )pk 𝛼 k+b(k−r) = C𝛼 br = C. (h) Hence show that

7.7

N2 (𝛼 k )∕W2 (𝛼 k ) ̃ = C. N1 (𝛼 k )∕W1 (𝛼 k )

̃ b(d−1−k) , k = d − 1, . . . , n − 1, where C̃ is defined in Exercise 7.6. Show that f (𝛼 k )g(𝛼 b+k ) = −C𝛼 This may be done as follows:

368

Alternate Decoding Algorithms for Reed–Solomon Codes ∑ ∏d−2 𝛼 i p0i i ̃ (a) Let f0 (x) = d−2 i=0 𝛼i −x and g0 (x) = i=0 (x − 𝛼 ). Show that f0 (x)g0 (x) = −C. Hint: Lagrange interpolation. (b) Show that f (x) = x−b f0 (x)𝛼 b(d−2) to conclude the result. 7.8 7.9

Show that (7.23) and (7.24) follow from (7.21) and (7.22) and the results of Exercises 7.6 and 7.7. Work through the steps of Lemma 7.15. (a) Explain why there must by polynomials Q1 (x) and Q2 (x) such that N(x) − W(x)P(x) = Q1 (x)Π(x) and M(x) − V(x)P(x) = Q2 (x)Π(x). (b) Show that (N(x)V(x) − M(x)W(x))P(x) = (M(x)Q1 (x) − N(x)Q2 (x))Π(x). (c) Explain why (Π(x), P(x))|N(x) and (Π(x), P(x))|M(x). (Here, (Π(x), P(x)) is the GCD.) (d) Show that (N(x)V(x) − M(x)W(x))

M(x)Q1 (x) − N(x)Q2 (x) P(x) = Π(x) (Π(x), P(x)) Π(x), P(x)

and hence that Π(x)|(N(x)V(x) − M(x)W(x)). (e) Show that deg(N(x)V(x) − M(x)W(x)) < k. Hence conclude that N(x)V(x) − M(x)W(x) = 0. (f) Let d(x) = (W(x), V(x)) (the GCD), so that W(x) = d(x)𝑤(x) and V(x) = d(x)𝑣(x) for relatively prime polynomials 𝑣(x) and 𝑤(x). Show that N(x) = h(x)𝑤(x) and M(x) = h(x)𝑣(x) for a polynomial h(x) = N(x)∕𝑤(x) = M(x)∕𝑣(x). (g) Show that h(x)𝑤(x) − d(x)𝑤(x)P(x) = Q1 (x)Π(x) and h(x)𝑣(x) − d(x)𝑣(x)P(x) = Q2 (x)Π(x). (h) Show that there exist polynomials s(x) and t(x) such that s(x)𝑤(x) + t(x)𝑣(x) = 1. Then show that h(x) − d(x)P(x) = (s(x)Q1 (x) + t(x)Q2 (x))Π(x). (i) Conclude that (h(x), d(x)) is also a solution. Conclude that deg(𝑤(x)) = 0, so that only (M(x), V(x)) has been reduced. 7.10 [59] Explain how the Euclidean algorithm can be used to solve the rational interpolation problem (i.e., how it can be used to solve the WB key equation). 7.11 Show that when xi is an error in a check location that Ψ2,1 (xi ) ≠ 0 and that h(xi ) ≠ 0. Hint: If Ψ2,1 (xi ) = 0, show that Ψ2,2 (xi ) must also be zero; furthermore, we have Ψ1,1 (xi ) = 0, which implies Ψ1,2 (xi ) = 0. Show that this leads to a contradiction, since (x − xi )2 cannot divide det(Ψ(x)). 7.12 Write down the monomials up to weight 8 in the (1, 4)-revlex order. Compute C(4, 8) and compare with the number of monomials you obtained. 7.13 Write down the polynomials up to weight 8 in the (1, 4)-lex order. Compute C(4, 8) and compare with the number of monomials you obtained. 7.14 Write the polynomial Q(x, y) = x5 + x2 y + x3 + y + 1 ∈ GF(5)[x, y] with the monomials in (1, 3)-revlex order. Write the polynomial with the monomials in (1, 3)-lex order. 7.15 Let Q(x, y) = 2x2 y + 3x3 y3 + 4x2 y4 . Compute the Hasse derivatives D1,0 Q(x, y), D0,1 Q(x, y), D1,1 Q(x, y), D2,0 Q(x, y), D2,1 Q(x, y), and D3,2 Q(x, y). 7.16 For a (16, 4) RS code over GF(16), plot Km as a function of m. 7.17 A polynomial Q(x, y) ∈ GF(5)[x, y] is to be found such that: Q(2, 3) = 0 D0,1 Q(2, 3) = 0 D1,0 Q(2, 3) = 0 Q(3, 4) = 0 D0,1 Q(3, 4) = 0 D1,0 Q(3, 4) = 0

(a) Determine the monomials that constitute Q(x, y) in (1, 4)-revlex order.

7.8 Exercises

369

(b) Determine the matrix  as in (7.89) representing the interpolation and multiplicity constraints for a polynomial. 7.18 7.19 7.20 7.21 7.22

In relation to the Feng–Tzeng algorithm, show that [C(x)a(i) (x)xp ]n = [C(x)a(i) (x)]n−p . From Lemma 7.59, show that for P(x, y), Q(x, y) ∈ 𝔽 [x, y], [P(x, y), Q(x, y)]D ∈ ker D. Write down the proof to Lemma 7.40. Show that (7.97) is true. Bounds on Lm : (a) Show that √ ⌊√ ⌋ ) ( ⎢ ( )⎥ nm(m + 1) nm(m + 1) 𝑣+2 ⎥ ⎢ − ≤ rB (nm(m + 1)∕2) ≤ . ⎢ 𝑣 2𝑣 ⎥ 𝑣 ⎦ ⎣ √ (b) Show that Lm < (m + 12 ) n∕𝑣.

7.23 [302] Let A(𝑣, K) be the rank of the monomial xK in (1, 𝑣)-revlex order. From Table 7.1, we have K 0 1 2 3 4 5 K(3, L) 0 1 2 3 5 7 (a) (b) (c) (d)

Show that A(𝑣, K + 1) = C(𝑣, K). Show that A(𝑣, K) = |{(i, j) ∶ i + 𝑣j < K}|. Show that B(L, 𝑣) = A(𝑣L + 1, 𝑣) − 1. Euler’s integration formula [[247], Section 1.2.11.2, (3)] indicates that the sum of a function f (k) can be represented in terms of an integral as n ∑ k=0

n

f (k) =

∫0

1 f (x) dx − (f (n) − f (0)) + ∫0 2

n{

x−

1 2

}

f ′ (x) dx,

where {x} = x − ⌊x⌋ is the fractional part of x. Based on this formula show that A(𝑣, K) =

K 2 K r(𝑣 − r) + + , 2𝑣 2 2𝑣

(7.106)

where r = K (mod 𝑣). (e) Show the following bound: (K + 𝑣∕2)2 K2 < A(K, 𝑣) ≤ . 2𝑣 2𝑣 7.24 Bounds on Km : (a) Show that

⌊ ( ) ⌋ m+1 )∕m + 1, Km = rA (n 2

where rA (x) = max{K ∶ A(𝑣, K) ≤ x}. (b) Show that

√ √ ⌊ 𝑣n(m + 1)∕m − 𝑣∕(2m)⌋ + 1 ≤ Km ≤ ⌊ 𝑣n(m + 1)∕m⌋.

(c) Hence show that asymptotically (as m → ∞)

√ t∞ = n − K∞ = n − 1 − ⌊ 𝑣n⌋.

(7.107)

370

Alternate Decoding Algorithms for Reed–Solomon Codes

7.8

References

The original Welch–Berlekamp algorithm appeared in [480]. In addition to introducing the new key equation, it also describes using different symbols as check symbols in a novel application to magnetic recording systems. This was followed by [34], which uses generalized minimum distance to improve the decoding behavior. The notation we use in Section 7.2.1 comes from this and from [320]. A comparison of WB key equations is in [317]. Our introduction to modules was drawn from [84]; see also [209, 218]. Our discussion of the DB form of the key equation, as well as the idea of “exact sequences” and the associated algorithms, is drawn closely from [86]. Other derivations of WB key equations are detailed in [284] and [320]. The development of the Welch–Berlekamp algorithm from Section 7.4.2 closely follows [284]. The modular form follows from [86]. Other related work is in [58] and [59]. The idea of list decoding goes back to [110]. The idea of this interpolating recovery is expressed in [480]. Work preparatory to the work here appears in [253] and was extended in [423], building in turn on [14], to decode beyond the RS design distance for some low rate codes. In particular, a form of Theorem 7.48 appeared originally in [14]; our statement and proof follows [172]. In [172], the restriction to low-rate codes was removed by employing higher multiplicity interpolating polynomials. The Feng–Tzeng algorithm appears in [121], which also shows how to use their algorithm for decoding up to the Hartmann–Tzeng and Roos BCH bounds. A preceding paper, [120] shows how to solve the multi-sequence problem using a generalization of the Euclidean algorithm, essentially producing a Gröbner basis approach. The algorithm attributed to Kötter [253] is clearly described in [302]. Other algorithms for the interpolation step are in [248] and in [327], which puts a variety of algorithms under the unifying framework of displacements. The factorization step was efficiently expressed in [382]. The description presented of the Roth–Ruckenstein algorithm draws very closely in parts from the excellent tutorial paper [302]. Alternative factorization algorithms appear in [19, 121,154,495 in edits].

Chapter 8

Other Important Block Codes 8.1

Introduction

There are a variety of block codes of both historical and practical importance which are used either as building blocks or components of other systems, which we have not yet seen in this book. In this chapter, we introduce some of the most important of these.

8.2 8.2.1

Hadamard Matrices, Codes, and Transforms Introduction to Hadamard Matrices

A Hadamard matrix of order n is an n × n matrix Hn of ±1 such that Hn HnT = nI.

√ That is, by normalizing Hn by 1∕ n an orthogonal matrix is obtained. The distinct columns of H are pairwise orthogonal, as are the rows. Some examples of Hadamard matrices are: [ ] H1 = 1 ⎡ ⎢ ⎢ ⎢ H8 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1 1 1 1 1 1 1 1

[ H2 = 1 −1 1 −1 1 −1 1 −1

1 1 −1 −1 1 1 −1 −1

1 1 1 −1 1 −1 −1 1 1 −1 −1 1

]

1 1 1 1 −1 −1 −1 −1

H4 1 −1 1 −1 −1 1 −1 1

⎡ 1 1 1 1 ⎤ ⎢ 1 −1 1 −1 ⎥ =⎢ 1 1 −1 −1 ⎥ ⎢ ⎥ ⎣ 1 −1 −1 1 ⎦ 1 1 ⎤ 1 −1 ⎥ −1 −1 ⎥ ⎥ −1 1 ⎥ . −1 −1 ⎥ −1 1 ⎥ ⎥ 1 1 ⎥ 1 −1 ⎦

(8.1)

The operation of computing 𝐫Hn , where 𝐫 is a row vector of length n, is sometimes called computing the Hadamard transform of 𝐫. As we show in Section 8.3.3, there are fast algorithms for computing the Hadamard transform which are useful for decoding certain Reed–Muller codes (among other things). Furthermore, the Hadamard matrices can be used to define some error-correction codes. It is clear that multiplying a row or a column of Hn by −1 produces another Hadamard matrix. By a sequence of such operations, a Hadamard matrix can be obtained which has the first row and the first column equal to all ones. Such a matrix is said to be normalized. Some of the operations associated with Hadamard matrices can be expressed using the Kronecker product. Definition 8.1 The Kronecker product A ⊗ B of an m × n matrix A with a p × q matrix B is the mp × nq obtained by replacing every element aij of A with the matrix aij B. The Kronecker product is associative and distributive, but not commutative. ◽

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

372

Other Important Block Codes Example 8.2

Let A=

Then

[ ] a11 a12 a13 a21 a22 a23

⎡a11 b11 ⎢a b A ⊗ B = ⎢ 11 21 ⎢a21 b11 ⎢ ⎣a21 b21

a11 b12 a11 b22 a21 b12 a21 b22

[ and

a12 b11 a12 b21 a22 b11 a22 b21

B=

a12 b12 a12 b22 a22 b12 a22 b22

] b11 b12 . b21 b22 a13 b11 a13 b21 a23 b11 a23 b21

a13 b12 ⎤ a13 b22 ⎥⎥ . a23 b12 ⎥ ⎥ a23 b22 ⎦



Theorem 8.3 The Kronecker product has the following properties [319, Chapter 9]: 1. A ⊗ B ≠ B ⊗ A in general. (The Kronecker product does not commute.) 2. For a scalar x, (xA) ⊗ B = A ⊗ (xB) = x(A ⊗ B). 3. Distributive properties: (A + B) ⊗ C = (A ⊗ C) + (B ⊗ C). A ⊗ (B + C) = (A ⊗ B) + (A ⊗ C). 4. Associative property: (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C). 5. Transposes: (A ⊗ B)T = AT ⊗ BT . 6. Trace (for square A and B): tr(A ⊗ B) = tr(A)tr(B). 7. If A is diagonal and B is diagonal, then A ⊗ B is diagonal. 8. Determinant, where A is m × m and B is n × n: det(A ⊗ B) = det (A)n det (B)m . 9. The Kronecker product theorem: (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD),

(8.2)

provided that the matrices are shaped such that the indicated products are allowed. 10. Inverses: If A and B are nonsingular, then A ⊗ B is nonsingular and (A ⊗ B)−1 = A−1 ⊗ B−1 .

(8.3)

Returning now to Hadamard matrices, it may be observed that the Hadamard matrices in (8.1) have the structure [ ] Hn Hn . H2n = H2 ⊗ Hn = Hn −Hn This works in general: Theorem 8.4 If Hn is a Hadamard matrix, then so is H2n = H2 ⊗ Hn . Proof By the properties of the Kronecker product, T = (H2 ⊗ Hn )(H2 ⊗ Hn )T = H2 H2T ⊗ Hn HnT = (2I2 ) ⊗ (nIn ) H2n H2n

= 2n(I2 ⊗ In ) = 2nI2n .



This construction of Hadamard matrices is referred to as the Sylvester construction. By this construction, Hadamard matrices of sizes 1, 2, 4, 8, 16, 32, etc., exist. However, unless a Hadamard

8.2 Hadamard Matrices, Codes, and Transforms

373

matrix of size 6 exists, for example, then this construction cannot be used to construct a Hadamard matrix of size 12. As the following theorem indicates, there is no Hadamard matrix of size 6. Theorem 8.5 A Hadamard matrix must have an order that is either 1, 2, or a multiple of 4. Proof [292, p. 44] Suppose without loss of generality that Hn is normalized. By column permutations, we can put the first three rows of Hn in the following form: 1 1 ··· 1 1 1 ··· 1 1 1 ··· 1 ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟

1 1 ··· 1 1 1 ··· 1 −1 −1 · · · −1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

1 1 ··· 1 −1 −1 · · · −1 1 1 ··· 1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

1 1 ··· 1 −1 −1 · · · −1 −1 −1 · · · −1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

i

j

k

l

For example, j is the number of columns such that the first two rows of Hn have ones while the third row has negative ones. Since the rows are orthogonal, we have i+j−k−l=0

(inner product of row 1 with row 2)

i−j+k−l=0

(inner product of row 1 with row 3)

i−j−k+l=0

(inner product of row 2 with row 3),

which collectively imply i = j = k = l. Thus, n = 4i, so n must be a multiple of 4. (If n = 1 or 2, then there are not three rows to consider.) ◽ This theorem does not exclude the possibility of a Hadamard matrix of order 12. However, it cannot be obtained by the Sylvester construction. 8.2.2

The Paley Construction of Hadamard Matrices

Another method of constructing Hadamard matrices is by the Paley construction, which employs some number-theoretic concepts. This allows, for example, creation of the Hadamard matrix H12 . While in practice Hadamard matrices of order 4k are most frequently employed, the Paley construction introduces the important concepts of quadratic residues and the Legendre symbol, both of which have application to other error- correction codes. Definition 8.6 For all numbers a such that (a, p) = 1, the number a is called a quadratic residue modulo p if the congruence x2 ≡ a (mod p) has some solution x. That is to say, a is the square of some number, modulo p. If a is not a quadratic residue, then a is called a quadratic nonresidue. ◽ If a is a quadratic residue modulo p, then so is a + p, so we consider as distinct residues only these which are distinct modulo p. Example 8.7 The easiest way to find the quadratic residues modulo a prime p is to list the nonzero numbers modulo p, then square them. Let p = 7. The set of nonzero numbers modulo p is {1, 2, 3, 4, 5, 6}. Squaring these numbers modulo p, we obtain the list {12 , 22 , 32 , 42 , 52 , 62 } = {1, 4, 2, 2, 4, 1}. So the quadratic residues modulo 7 are {1, 2, 4}. The quadratic nonresidues are {3, 5, 6}. The number 9 is a quadratic residue modulo 7, since 9 = 7 + 2, and 2 is a quadratic residue. Now let p = 11. Forming the list of squares, we have {12 , 22 , 32 , 42 , 52 , 62 , 72 , 82 , 92 , 102 } = {1, 4, 9, 5, 3, 3, 5, 9, 4, 1}. The quadratic residues modulo 11 are {1, 3, 4, 5, 9}.



374

Other Important Block Codes Theorem 8.8 Quadratic residues have the following properties: 1. There are (p − 1)∕2 quadratic residues modulo p for an odd prime p. 2. The product of two quadratic residues or two quadratic nonresidues is always a quadratic residue. The product of a quadratic residue and a quadratic nonresidue is a quadratic nonresidue. 3. If p is of the form 4k + 1, then −1 is a quadratic residue modulo p. If p is of the form 4k + 3, then −1 is a nonresidue modulo p. The Legendre symbol is a number-theoretic function associated with quadratic residues. Definition 8.9 Let p be an odd prime. The Legendre symbol 𝜒 p (x) is defined as ⎧0 ⎪ 𝜒 p (x) = ⎨1 ⎪−1 ⎩

if x is a multiple of p if x is a quadratic residue modulo p if x is a quadratic nonresidue modulo p. ( ) The Legendre symbol 𝜒 p (x) is also denoted as px . Example 8.10



Let p = 7. The Legendre symbol values are x: 𝜒 7 (x):

0 0

1 1

2 1

3 −1

4 1

5 1

4 1

5 −1

6 −1

When p = 11, the Legendre symbol values are x: 𝜒 11 (x):

0 0

1 1

2 −1

3 1

6 −1

7 −1

8 −1

9 1

10 −1 ◽

The key to the Paley construction of Hadamard matrices is the following theorem. Lemma 8.11 [292, p. 46] For any c ≢ 0 (mod p), where p is an odd prime, p−1 ∑

𝜒 p (b)𝜒 p (b + c) = −1.

(8.4)

b=0

Proof From Theorem 8.8 and the definition of the Legendre symbol, 𝜒 p (xy) = 𝜒 p (x)𝜒 p (y). Since b = 0 contributes nothing to the sum in (8.4), suppose b ≠ 0. Let z ≡ (b + c)b−1 (mod p). As b runs from 1, 2, . . . , p − 1, z takes on distinct values in 0, 2, 3, . . . , p − 1, but not the value 1. Then p−1 ∑

𝜒 p (b)𝜒 p (b + c) =

b=0

p−1 ∑ b=1



𝜒 p (b)𝜒 p (bz) =

p−1 ∑

𝜒 p (b)𝜒 p (b)𝜒 p (z) =

b=1

p−1 ∑

𝜒 p (b)2 𝜒 p (z)

b=1

p−1

=

𝜒 p (z) − 𝜒 p (1) = 0 − 𝜒 p (1) = −1,

z=0

where the last equality follows since half of the numbers z from 0 to p − 1 have 𝜒 p (z) = −1 and the ◽ other half 𝜒 p (z) = 1, by Theorem 8.8.

8.2 Hadamard Matrices, Codes, and Transforms

375

With this background, we can now define the Paley construction. 1. First, construct the p × p Jacobsthal matrix Jp , with elements qij given by qij = 𝜒 p (j − i) (with zero-based indexing). Note that the first row of the matrix is 𝜒 p (j), which is just the Legendre symbol sequence. The other rows are obtained by cyclic shifting. Note that each row and column of Jp sums to 0 since each row contains (p − 1)∕2 elements of 1 and (p − 1)∕2 elements of −1, and that Jp + JpT = 0. [

2. Second, form the matrix Hp+1

] 1 𝟏T = , 𝟏 Jp − Ip

where 𝟏 is a column vector of length p containing all ones. Example 8.12 ⎡ 0 ⎢ −1 ⎢ ⎢ −1 J7 = ⎢ 1 ⎢ ⎢ −1 ⎢ 1 ⎢ ⎣ 1

Example 8.13

Let p = 7. For the first row of the matrix, see Example 8.10. 1 0 −1 −1 1 −1 1

1 1 0 −1 −1 1 −1

−1 1 1 0 −1 −1 1

1 −1 1 1 0 −1 −1

−1 1 −1 1 1 0 −1

−1 −1 1 −1 1 1 0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎡ ⎢ ⎢ ⎢ ⎢ H8 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1 1 1 1 1 1 1 1

1 −1 −1 −1 1 −1 1 1

1 1 −1 −1 −1 1 −1 1

1 1 1 −1 −1 −1 1 −1

1 −1 1 1 −1 −1 −1 1

1 1 −1 1 1 −1 −1 −1

1 −1 1 −1 1 1 −1 −1

1 −1 −1 1 −1 1 1 −1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎦



We now show the construction of H12 . The 11 × 11 Jacobsthal matrix is

J11

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

0 −1 1 −1 −1 −1 1 1 1 −1 1

1 0 −1 1 −1 −1 −1 1 1 1 −1

−1 1 0 −1 1 −1 −1 −1 1 1 1

1 −1 1 0 −1 1 −1 −1 −1 1 1

1 1 −1 1 0 −1 1 −1 −1 −1 1

1 1 1 −1 1 0 −1 1 −1 −1 −1

−1 1 1 1 −1 1 0 −1 1 −1 −1

−1 −1 1 1 1 −1 1 0 −1 1 −1

−1 −1 −1 1 1 1 −1 1 0 −1 1

1 −1 −1 −1 1 1 1 −1 1 0 −1

−1 1 −1 −1 −1 1 1 1 −1 1 0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

and the Hadamard matrix is ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ H12 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1 1 1 1 1 1 1 1 1 1 1 1

1 −1 −1 1 −1 −1 −1 1 1 1 −1 1

1 1 −1 −1 1 −1 −1 −1 1 1 1 −1

1 −1 1 −1 −1 1 −1 −1 −1 1 1 1

1 1 −1 1 −1 −1 1 −1 −1 −1 1 1

1 1 1 −1 1 −1 −1 1 −1 −1 −1 1

1 1 1 1 −1 1 −1 −1 1 −1 −1 −1

1 −1 1 1 1 −1 1 −1 −1 1 −1 −1

1 −1 −1 1 1 1 −1 1 −1 −1 1 −1

1 −1 −1 −1 1 1 1 −1 1 −1 −1 1

1 1 −1 −1 −1 1 1 1 −1 1 −1 −1

1 −1 1 −1 −1 −1 1 1 1 −1 1 −1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(8.5)



376

Other Important Block Codes The following lemma establishes that the Paley construction gives a Hadamard matrix. Lemma 8.14 Let Jp be a p × p Jacobsthal matrix. Then Jp JpT = pI − U and Jp U = UJp = 𝟎, where U is the matrix of all ones. Proof Let P = Jp JpT . Then pii =

p−1 ∑

q2ik = p − 1

(since 𝜒 2p (x) = 1 for x ≠ 0)

k=0

∑ p−1

pij =

qik qjk =

k=0

=

p−1 ∑

p−1 ∑

𝜒 p (k − i)𝜒 p (k − j)

k=0

𝜒 p (b)𝜒 p (b + c) = −1 (subs. b = k − i, c = i − j, then use Lemma 8.11).

b=0

Also, Jp U = 0 since each row contains (p − 1)∕2 elements of 1 and (p − 1)∕2 elements of −1. [

Now T Hp+1 Hp+1

1 𝟏T = 𝟏 Jp − I



][ ] [ ] 1 𝟏T p+1 0 = . 𝟏 JpT − I 0 U + (Jp − I)(JpT − I)

It can be shown that Jp + JpT = 0. So using Lemma 8.14, U + (Jp − I)(JpT − I) = U + pI − U − Jp − T JpT + I = (p + 1)I. So Hp+1 Hp+1 = (p + 1)Ip+1 . 8.2.3

Hadamard Codes

Let An be the binary matrix obtained by replacing the 1s in a Hadamard matrix with 0s, and replacing the −1s with 1s. We have the following code constructions:



By the orthogonality of Hn , any pair of distinct rows of An must agree in n∕2 places and differ in n∕2 places. Deleting the left column of An (since these bits are all the same and do not contribute anything to the code), the rows of the resulting matrix form a code of length n − 1 called the Hadamard code, denoted An , having n codewords and minimum distance n∕2. This is also known as the simplex code.



By including the binary-complements of all codewords in An , we obtain the code Bn which has 2n codewords of length n − 1 and a minimum distance of n∕2 − 1.



Starting from An again, if we adjoin the binary complements of the rows of An , we obtain a code with code length n, 2n codewords, and minimum distance n∕2. This code is denoted C.

This book does not treat many nonlinear codes. However, if any of these codes are constructed using a Paley matrix with n > 8, then the codes are nonlinear. (The linear span of the nonlinear code is a quadratic residue code.) Interestingly, if the Paley Hadamard matrix is used in the construction of An or Bn , then the codes are cyclic, but not necessarily linear. If the codes are constructed from Hadamard matrices constructed using the Sylvester construction, the codes are linear.

8.3

Reed–Muller Codes

Reed–Muller codes were among the first codes to be deployed in space applications, being used in the deep space probes flown from 1969 to 1977 [483, p. 149]. They were probably the first family of codes to provide a mechanism for obtaining a desired minimum distance. And, while they have been largely displaced by Reed–Solomon codes in volume of practice, they have a fast maximum likelihood

8.3 Reed–Muller Codes

377

decoding algorithm which is still very attractive. They are also used as components in several other systems. Furthermore, there are a variety of constructions for Reed–Muller codes which has made them useful in many theoretical developments. 8.3.1

Boolean Functions

Reed–Muller codes are closely tied to functions of Boolean variables and can be described as multinomials over the field GF(2) [362]. Consider a Boolean function of m variables, f (𝜈1 , 𝜈2 , . . . , 𝜈m ), which is a mapping from the vector space Vm of binary m-tuples to the binary numbers {0, 1}. Such functions can be represented using a truth table, which is an exhaustive listing of the input/output values. Boolean functions can also be written in terms of the variables. Example 8.15

The table below is a truth table for two functions of the variables 𝜈1 , 𝜈2 , 𝜈3 , and 𝜈4 . 𝜈4 𝜈3 𝜈2 𝜈1

= = = =

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 0

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1 1 1 1

f1 = 0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 f2 = 1 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 It can be verified (using, e.g., methods from elementary digital logic design) that f1 (𝜈1 , 𝜈2 , 𝜈3 , 𝜈4 ) = 𝜈1 + 𝜈2 + 𝜈3 + 𝜈4 and that f2 (𝜈1 , 𝜈2 , 𝜈3 , 𝜈4 ) = 1 + 𝜈1 𝜈4 + 𝜈1 𝜈3 + 𝜈2 𝜈3 .



The columns of the truth table can be numbered from 0 to 2m − 1 using a base-2 representation with 𝜈1 as the least-significant bit. Then without ambiguity, the functions can be represented simply using their bit strings. From Example 8.15, 𝐟1 = (0110100110010110) 𝐟2 = (1111100110101100). The number of distinct Boolean functions in m variables is the number of distinct binary sequences of m length 2m , which is 22 . The set M of all Boolean functions in m variables forms a vector space that has a basis {1, 𝜈1 , 𝜈2 , . . . , 𝜈m , 𝜈1 𝜈2 , 𝜈1 𝜈3 , . . . , 𝜈m−1 𝜈m , . . . , 𝜈1 𝜈2 𝜈3 · · · 𝜈m }. Every function f in this space can be represented as a linear combination of these basis functions: f = a0 1 + a1 𝜈1 + a2 𝜈2 + · · · + am 𝜈m + a12 𝜈1 𝜈2 + · · · + a12 . . . m 𝜈1 𝜈2 · · · 𝜈m . Functional and vector notation can be used interchangeably. Here are some examples of some basic functions and their vector representations: 1 ↔ 𝟏 = 1111111111111111 𝜈1 ↔ 𝝂 1 = 0101010101010101 𝜈2 ↔ 𝝂 2 = 0011001100110011 𝜈3 ↔ 𝝂 3 = 0000111100001111

378

Other Important Block Codes 𝜈4 ↔ 𝝂 4 = 0000000011111111 𝜈1 𝜈2 ↔ 𝝂 1 𝝂 2 = 0001000100010001 𝜈1 𝜈2 𝜈3 𝜈4 ↔ 𝝂 1 𝝂 2 𝝂 3 𝝂 4 = 0000000000000001. As this example demonstrates, juxtaposition of vectors represents the corresponding Boolean ”and” function, element by element. A vector representing a function can be written as 𝐟 = a0 𝟏 + a1 𝝂 1 + a2 𝝂 2 + · · · + am 𝝂 m + a12 𝝂 1 𝝂 2 + · · · + a12 . . . m 𝝂 1 𝝂 2 · · · 𝝂 m .

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

genrm.cc

8.3.2

Definition of the Reed–Muller Codes

Definition 8.16 [483, p. 151] The binary Reed–Muller code RM(r, m) of order r and length 2m consists of all linear combinations of vectors 𝐟 associated with Boolean functions f that are monomials of degree ≤ r in m variables. ◽

Example 8.17 ciated vectors

The RM(1, 3) code has length 23 = 8. The monomials of degree ≤ 1 are {1, 𝑣1 , 𝑣2 , 𝑣3 }, with asso-

1↔𝟏= 𝑣3 ↔ 𝐯3 = 𝑣2 ↔ 𝐯2 = 𝑣1 ↔ 𝐯1 =

(1 (0 (0 (0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1) 1) 1) 1).

It is natural to describe the code using a generator matrix having these vectors as rows. ⎡1 ⎢0 G=⎢ 0 ⎢ ⎣0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1⎤ 1⎥ . 1⎥ ⎥ 1⎦

1 1 1 0

(8.6)

This is an (8, 4, 4) code; it is single-error correcting and double-error detecting. This is also the extended Hamming code (obtained by adding an extra parity bit to the (7, 4) Hamming code). ◽

Example 8.18 The RM(2, 4) code has length 16 and is obtained by linear combinations of the monomials up to degree 2, which are {1, 𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 , 𝑣1 𝑣2 , 𝑣1 𝑣3 , 𝑣1 𝑣4 , 𝑣2 𝑣3 , 𝑣2 𝑣4 , 𝑣3 𝑣4 } with the following corresponding vector representations: 𝟏 = (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1)

𝐯4 𝐯3 𝐯2 𝐯1

= = = =

(0 (0 (0 (0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 0

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

𝐯3 𝐯4 𝐯2 𝐯4 𝐯2 𝐯3 𝐯1 𝐯4 𝐯1 𝐯3 𝐯1 𝐯2

= = = = = =

(0 (0 (0 (0 (0 (0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 1

0 0 0 0 0 0

0 0 0 0 1 0

0 0 1 0 0 0

0 0 1 0 1 1

0 0 0 0 0 0

0 0 0 1 0 0

0 1 0 0 0 0

0 1 0 1 0 1

1 0 0 0 0 0

1 0 0 1 1 0

1 1) 1 1) 1 1) 0 1) 0 1) 0 1).

This is a (16, 11) code, with minimum distance 4.

1) 1) 1) 1)



8.3 Reed–Muller Codes

379

In general, for an RM(r, m) code, the dimension is ( ) ( ) ( ) m m m k =1+ + +···+ . 1 2 r The codes are linear, but not cyclic. As the following lemma states, we can recursively construct an RM(r + 1, m + 1) code — twice the length — from an RM(r, m) and RM(r + 1, m) code. In this context, the notation (𝐟 , 𝐠) means the concatenation of the vectors 𝐟 and 𝐠. Lemma 8.19 RM(r + 1, m + 1) = {(f, f + g) for all f ∈ RM(r + 1, m) and g ∈ RM(r, m)}. Proof The codewords of RM(r + 1, m + 1) are associated with Boolean functions in m + 1 variables of degree ≤ r + 1. If c(𝑣1 , . . . , 𝑣m+1 ) is such a function (i.e., it represents a codeword) we can write it as c(𝑣1 , . . . , 𝑣m+1 ) = f (𝑣1 , . . . , 𝑣m ) + 𝑣m+1 g(𝑣1 , . . . , 𝑣m ), where f is a Boolean function in m variables of degree ≤ r + 1, and hence represents a codeword in RM(r + 1, m), and g is a Boolean function in m variables with degree ≤ r, representing a Boolean function in RM(r, m). The corresponding functions 𝐟 and 𝐠 are thus in RM(r + 1, m) and RM(r, m), respectively. Now let f̃ (𝜈1 , 𝜈2 , . . . , 𝜈m+1 ) = f (𝜈1 , 𝜈2 , . . . , 𝜈m ) + 0 ⋅ 𝜈m+1 represent a codeword in RM(r + 1, m + 1) and let g̃ (𝜈1 , 𝜈2 , . . . , 𝜈m+1 ) = 𝜈m+1 g(𝜈1 , 𝜈2 , . . . , 𝜈m ) represent a codeword in RM(r + 1, m + 1). The associated vectors, which are codewords in RM(r + 1, m + 1), are 𝐟̃ = (𝐟 , 𝐟 ) and

𝐠̃ = (0, 𝐠).

Their linear combination (𝐟 , 𝐟 + 𝐠) must therefore also be a codeword in RM(r + 1, m + 1).



We now use this lemma to compute the minimum distance of an RM(r, m) code. Theorem 8.20 The minimum distance of RM(r, m) is 2m−r . Proof By induction. When m = 1, the RM(0, 1) code is built from the basis {1}, giving rise to two codewords: it is the length-2 repetition code. In this case, dmin = 2. The RM(1, 1) code is built upon the basis vectors {1, 𝑣1 } and has four codewords of length two: 00, 01, 10, 11. Hence dmin = 1. As an inductive hypothesis, assume that up to m and for 0 ≤ r ≤ m the minimum distance is 2m−r . We will show that dmin for RM(r, m + 1) is 2m−r+1 . Let 𝐟 and 𝐟 ′ be in RM(r, m) and let 𝐠 and 𝐠′ be in RM(r − 1, m). By Lemma 8.19, the vectors 𝐜1 = (𝐟 , 𝐟 + 𝐠) and 𝐜2 = (𝐟 ′ , 𝐟 ′ + 𝐠′ ) must be in RM(r, m + 1). If 𝐠 = 𝐠′ then d(𝐜1 , 𝐜2 ) = d((𝐟 , 𝐟 + 𝐠), (𝐟 ′ , 𝐟 ′ + 𝐠)) = d((𝐟 , 𝐟 ), (𝐟 ′ , 𝐟 ′ )) = 2d(𝐟 , 𝐟 ′ ) ≥ 2 ⋅ 2m−r by the inductive hypothesis. If 𝐠 ≠ 𝐠′ then d(𝐜1 , 𝐜2 ) = 𝑤(𝐟 − 𝐟 ′ ) + 𝑤(𝐠 − 𝐠′ + 𝐟 − 𝐟 ′ ). Claim: 𝑤(𝐱 + 𝐲) ≥ 𝑤(𝐱) − 𝑤(𝐲). Proof: Let 𝑤xy be the number of places in which the nonzero digits of 𝐱 and 𝐲 overlap. Then 𝑤(𝐱 + 𝐲) = (𝑤(𝐱) − 𝑤xy ) + (𝑤(𝐲) − 𝑤xy ). But since 2𝑤(𝐲) ≥ 2𝑤xy , the result follows. By this result, d(𝐜1 , 𝐜2 ) ≥ 𝑤(𝐟 − 𝐟 ′ ) + 𝑤(𝐠 − 𝐠′ ) − 𝑤(𝐟 − 𝐟 ′ ) = 𝑤(𝐠 − 𝐠′ ). But 𝐠 − 𝐠′ ∈ RM(r − 1, m), so that 𝑤(𝐠 − 𝐠′ ) ≥ 2m−(r−1) = 2m−r+1 .



380

Other Important Block Codes The following theorem is useful in characterizing the duals of RM codes. Theorem 8.21 For 0 ≤ r ≤ m − 1, the RM(m − r − 1, m) code is dual to the RM(r, m) code. Proof [483, p. 154] Let 𝐚 be a codeword in RM(m − r − 1, m) and let 𝐛 be a codeword in RM(r, m). Associated with 𝐚 is a polynomial a(𝜈1 , 𝜈2 , . . . , 𝜈m ) of degree ≤ m − r − 1; associated with 𝐛 is a polynomial b(𝜈1 , 𝜈2 , . . . , 𝜈m ) of degree ≤ r. The product polynomial has degree ≤ m − 1, and thus corresponds to a codeword in the RM(m − 1, m) code, with vector representation 𝐚𝐛. Since the minimum distance of RM(m − r − 1, m) is 2r−1 and the minimum distance of RM(m − 1, m) is 2m−r , the codeword 𝐚𝐛 must have even weight. Thus, 𝐚 ⋅ 𝐛 ≡ 0 (mod 2). From this, RM(m − r − 1, m) must be a subset of the dual code to RM(r, m). Note that dim(RM(r, m)) + dim(RM(m − r − 1, m)) ( ) ( ) ( ) ( ) ( ) m m m m m =1+ +···+ +1+ + +···+ 1 r 1 2 m−r−1 ( ) ( ) ( ) ( ) ( ) ( ) m m m m m m =1+ +···+ + + + +···+ 1 r m m−1 m−2 r+1 =

m ( ) ∑ m i=0

i

= 2m .

By the theorem regarding the dimensionality of dual codes, Theorem 2.64, RM(m − r − 1, m) must be the dual to RM(r, m). ◽ It is clear that the weight distribution of RM(1, m) codes is A0 = 1, A2m = 1, and A2m−1 = 2m+1 − 2. Beyond these simple results, the weight distributions are more complicated. 8.3.3

Encoding and Decoding Algorithms for First-Order RM Codes

In this section, we describe algorithms for encoding and decoding RM(1, m) codes, which are (2m , m + 1, 2m−1 ) codes. In Section 8.3.4, we present algorithms for more general RM codes. 8.3.3.1

Encoding RM(1, m) Codes

Consider the RM(1, 3) code generated by ⎡𝟏⎤ ⎢𝐯 ⎥ 𝐜 = (c0 , c1 , . . . , c7 ) = 𝐦G = (m0 , m3 , m2 , m1 ) ⎢ 3 ⎥ 𝐯 ⎢ 2⎥ ⎣𝐯1 ⎦ ⎡1 ⎢0 = (m0 , m3 , m2 , m1 ) ⎢ ⎢0 ⎢ ⎣0

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 0

1 1 0 1

1 1 1 0

1⎤ 1⎥⎥ . 1⎥ ⎥ 1⎦

The columns of G consist of the numbers (1, 0, 0, 0) through (1, 1, 1, 1) in increasing binary counting order (with the least-significant bit on the right). This sequence of bit values can thus be obtained using a conventional binary digital counter. A block diagram of an encoding circuit embodying this idea is shown in Figure 8.1.

8.3 Reed–Muller Codes

381 m3

m0

m2

m1

× ×

+

c0, c1,..., c7

× ×

1

1

1

1

3-bit binary digital counter

Figure 8.1: An encoder circuit for a RM(1, 3) code.

8.3.3.2

Decoding RM(1, m) Codes

The idea behind the decoder is to compare the received sequence 𝐫 with every codeword in RM(1, m) by means of correlation, then to select the codeword with the highest correlation. As we shall show, because of the structure of the code these correlations can be computed using a Hadamard transform. The existence of a fast Hadamard transform algorithms makes this an efficient decoding algorithm. Let 𝐫 = (r0 , r1 , . . . , r2m −1 ) be the received sequence, and let 𝐜 = (c0 , c1 , . . . , c2m −1 ) be a codeword. We note that 2m −1



2m −1 ri

ci

(−1) (−1) =

i=0

∑ i=0

2m −1

(−1)

ri +ci

=



(−1)ri ⊕ci

i=0

2m −1

=



(−1)d(ri ,ci ) = 2m − 2d(𝐫, 𝐜),

(8.7)

i=0

where ⊕ denotes addition modulo 2 and d(ri , ci ) is the Hamming distance between the arguments. A sequence which minimizes d(𝐫, 𝐜) has the largest number of positive terms in the sum on the right of (8.7) and therefore maximizes the sum. Let F(𝐫) be the transformation that converts binary {0, 1} elements of 𝐫 to binary ±1 values of a vector 𝐑 according to F(𝐫) = F(r0 , r1 , . . . , r2m −1 ) = 𝐑 = ((−1)r0 , (−1)r1 , . . . , (−1)r2m −1 ). We refer to 𝐑 as the bipolar representation of 𝐫. Similarly, define F(𝐜) = 𝐂 = (C0 , C1 , . . . , C2m −1 ). We define the correlation function 2m −1

T = cor(𝐑, 𝐂) = cor((R0 , R1 , . . . , R2m −1 ), (C0 , C1 , . . . , C2m −1 )) =



Ri Ci .

i=0

By (8.7), the codeword 𝐜 which minimizes d(𝐫, 𝐜) maximizes the correlation cor(𝐑, 𝐂). The decoding algorithm is summarized as: Compute Ti = cor(𝐑, 𝐂i ), where 𝐂i = F(𝐜i ) for each of the 2m+1 codewords, then select that codeword for which cor(𝐑, 𝐂i ) is the largest. The simultaneous computation of all the correlations can be represented as a matrix. Let 𝐂i be represented as a column vector and let [ ] H = 𝐂0 𝐂1 . . . 𝐂2m+1 −1 .

382

Other Important Block Codes Then all the correlations can be computed by [ 𝐓 = T0 T1

] . . . T2m+1 −1 = 𝐑H.

Recall that the generator matrix for the RM(1, m) code can be written as ⎡ 𝟏 ⎤ ⎢ 𝝂m ⎥ G = ⎢𝝂 m−1 ⎥ . ⎢ ⎥ ⎢ ⋮ ⎥ ⎣ 𝝂1 ⎦ We actually find it convenient to deal explicitly with those codewords formed as linear combinations of only the vectors 𝝂 1 , 𝝂 1 , . . . , 𝝂 m , since 𝟏 + 𝐜 complements all the elements of 𝐜, which corresponds to negating the elements of the transform 𝐂. We therefore deal with the 2m × 2m matrix H2m . Let us examine one of these matrices in detail. For the RM(1, 3) code with G as in (8.6), the matrix H8 can be written as ⎡ ⎢ ⎢ ⎢ H8 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

0

1

2

3

4

5

6

7

1 1 1 1 1 1 1

−1 1 −1 1 −1 1 −1

1 −1 −1 1 1 −1 −1

−1 −1 1 1 −1 −1 1

1 1 1 −1 −1 −1 −1

−1 1 −1 −1 1 −1 1

1 −1 −1 −1 −1 1 1

−1 −1 1 −1 1 1 −1

⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎦

(8.8)

Examination of this reveals that, with this column ordering, column 1 corresponds to F(𝐯1 ), column 2 corresponds to F(𝐯2 ), and column 4 corresponds to F(𝐯3 ). In general, the ith column corresponds to the linear combination of i1 𝐯1 + i2 𝐯2 + i3 𝐯3 , where i has the binary representation i = i1 + 2i2 + 4i3 . We write the binary( representation as i = (i3 , i2 , i1 )2 . In the general case, for an 2m × 2m Hadamard ) ∑m matrix, we place F j=1 ij 𝐯j in the ith column, where i = (im , im−1 , . . . , i1 )2 . The computation 𝐑H is referred to as the Hadamard transform of 𝐑. The decoding algorithm can be described as follows: Algorithm 8.1 Decoding for RM(1, m) Codes 1 2 3 4 5 6 7 8 9 10 11 12 13

Input: 𝐫 = (r0 , r1 , … , r2m −1 ). ̂ Output: A maximum-likelihood codeword 𝐜. Begin Find the bipolar representation 𝐑 =  (𝐫). Compute the Hadamard transform 𝐓 = 𝐑H2m = (t0 , t1 , … , t2m −1 ) Find the coordinate ti with the largest magnitude Let i have the binary expansion (im , im−1 , … , i1 )2 . (i1 LSB) if(ti > 0) (1 is not sent) ∑m 𝐜̂ = j=1 ij 𝐯j else (1 is sent – complement all the bits) ∑m 𝐜̂ = 1 + j=1 ij 𝐯j end (if) End

8.3 Reed–Muller Codes Example 8.22

383

For the RM(1, 3) code, suppose the received vector is

α α2 α3 α4 α5 α6 α7 α8 α9

[ ] 𝐫 = 1, 0, 0, 1, 0, 0, 1, 0 .

+

01010

The steps of the algorithm follow:

α10 α11 α12 α13 α14 α15 α16

rmdecex.m

1. Compute the transform: 𝐑 = [−1, 1, 1, −1, 1, 1, −1, 1]. 2. Compute 𝐓 = 𝐑H = [2, −2, 2, −2, −2, 2, −2, −6]. 3. The maximum absolute element occurs at t7 = −6, so i = 7 = (1, 1, 1)2 . 4. Since t7 < 0, 𝐜 = 𝟏 + 𝐯1 + 𝐯2 + 𝐯3 = [1, 0, 0, 1, 0, 1, 1, 0]. ◽

8.3.3.3

Expediting Decoding Using the Fast Hadamard Transform

The main step of the algorithm is the computation of the Hadamard transform 𝐑H. This can be considerably expedited by using a fast Hadamard transform, applicable to Hadamard matrices obtained via the Sylvester construction. This transform is analogous to the fast Fourier transform (FFT), but is over the set of numbers ±1. It is built on some facts from linear algebra. As we have seen, Sylvester Hadamard can be built by H2m = H2 ⊗ H2m−1 .

(8.9)

This gives the following factorization. Theorem 8.23 The matrix H2m can be written as (2) (m) H2m = M2(1) m M2m · · · M2m ,

(8.10)

where M2(i)m = I2m−i ⊗ H2 ⊗ I2i−1 , and where Ip is a p × p identity matrix. Proof By induction. When m = 1 the result holds, as may be easily verified. Assume, then, that (8.10) holds for m. We find that M (i)m+1 = I2m+1−i ⊗ H2 ⊗ I2i−1 2

= (I2 ⊗ I2m−i ) ⊗ H2 ⊗ I2i−1

(by the structure of the identity matrix)

= I2 ⊗ (I2m−i ⊗ H2 ⊗ I2i−1 )

(associativity)

= I2 ⊗ M2(i)m

(definition).

Furthermore, by the definition, M m+1 = H2 ⊗ I2m . We have m+1 2

H2m+1 = M (1) M (2) · · · M (m+1) 2m+1 2m+1 2m+1 )( ) ( ) ( I2 ⊗ M2(2) · · · I2 ⊗ M2(m) (H2 ⊗ I2m ) = I2 ⊗ M2(1) m m m ) ( (2) (m) (Kronecker product theorem 8.2) = (I2m H2 ) ⊗ M2(1) m M2m · · · M2m = H2 ⊗ H2m .

00000



384

Other Important Block Codes Example 8.24

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

hadex.m

By the theorem, we have the factorization H8 = M8(1) M8(2) M8(3) = (I22 ⊗ H2 ⊗ I20 )(I21 ⊗ H2 ⊗ I21 )(I20 ⊗ H2 ⊗ I22 ).

Straightforward substitution and multiplication show that this gives the matrix H8 in (8.8). Let 𝐑 = [R0 , R1 , . . . , R7 ]. Then the Hadamard transform can be written 𝐓 = 𝐑H8 = 𝐑(M8(1) M8(2) M8(3) ) = 𝐑(I22 ⊗ H2 ⊗ I20 )(I21 ⊗ H2 ⊗ I21 )(I20 ⊗ H2 ⊗ I22 ). The matrices involved are

M8(1)

M8(2)

⎡ ⎢ ⎢ ⎢ ⎢ = I2 ⊗ H2 ⊗ I2 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

M8(3)

α α2 α3 α4 α5 α6 α7 α8 α9

⎡ 1 ⎢ 1 ⎢ ⎢ ⎢ = I4 ⊗ H2 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1 0 1 0

⎡ 1 ⎢ ⎢ ⎢ ⎢ = H2 ⊗ I4 = ⎢ 1 ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1 −1 1 1

1 −1 1 1

1 −1 1 1

0 1 0 1

1 0 −1 0

0 1 0 −1 1 0 1 0

0 1 0 1

1 0 −1 0

1 1

1 1

1 1 −1

1

−1 1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ 1 ⎥ ⎥ −1 ⎦

−1 1

0 1 0 −1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎤ ⎥ ⎥ ⎥ 1 ⎥ ⎥. ⎥ ⎥ ⎥ −1 ⎥ ⎥ ⎦

Let 𝐏[0] = 𝐑 and let 𝐏[k+1] = 𝐏[k] M8(k+1) . The stages of transform can be written

+

01010

00000

𝐓 = (𝐑M8(1) )M8(2) M8(3) = (𝐏(1) M8(2) )M8(3) = 𝐏(2) M8(3) = 𝐏(3) .

α10 α11 α12 α13 α14 α15 α16

testfht.cc fht.cc fht.m

Figure 8.2 shows the flow diagram corresponding to the matrix multiplications, where arrows indicate the direction of flow, arrows incident along a line imply addition, and the coefficients −1 along the horizontal branches indicate the gain along their respective branch. At each stage, the two-point Hadamard transform is apparent. (At the first stage, the operations of H2 are enclosed in the box to highlight the operation.) The interleaving of the various stages by virtue of the Kronecker product is similar to the “butterfly” pattern of the FFT. ◽

8.3 Reed–Muller Codes

385 [1]

R0

H2

P0

R1

−1

P1

R2

H2

P2

R3

−1

P3

R4

H2

P4

R5

−1

P5

R6

H2

P6

R7

−1

P7

[2]

[1]

[2]

[2]

−1

P2

−1

P3

[1]

[3]

P2 = T2

[2]

[3]

P 3 = T3

[2]

P4

[1]

[3]

−1

P4 = T4

−1

P5 = T5

−1

P6 = T6

−1

P7 = T7

[2]

P5

[1] [1]

[3]

P1 = T1

P1

[1] [1]

[3]

P0 = T0

P0

[3]

[2]

−1

P6

−1

P7

[3]

[2]

[3]

Figure 8.2: Signal flow diagram for the fast Hadamard transform. The conventional computation of the Hadamard transform 𝐑H2m produces 2m elements, each of which requires 2m addition/subtraction operations, for a total complexity of (2m )2 . The fast Hadamard transform has m stages, each of which requires 2m addition/subtraction operations, for a total complexity of m2m . This is still exponential in m (typical for maximum likelihood decoding), but much lower than brute force evaluation. Furthermore, as Figure 8.2 suggests, parallel/pipelined hardware architectures are possible. The RM(1, m) decoding algorithm employing the fast Hadamard transform is referred to as the “Green machine,” after its developer at the Jet Propulsion Laboratory for the 1969 Mariner mission [483]. 8.3.4

The Reed Decoding Algorithm for RM(r, m) Codes, r ≥ 1

Efficient decoding algorithms for general RM codes rely upon the concept of majority logic decoding, in which multiple estimates of a bit value are obtained, and the decoded value is that value which occurs in the majority of estimates. We demonstrate this first for a RM(2, 4) code, then develop a notation to extend this to other RM(r, m) codes. 8.3.4.1

Details for an RM(2, 4) Code

Let us write the generator for the RM(2, 4) code as

We partition the 11 input bits to correspond to the rows of this matrix as 𝐦 = (m0 , m4 , m3 , m2 , m1 , m34 , m24 , . . . , m12 ) = (𝐦0 , 𝐦1 , 𝐦2 ).

386

Other Important Block Codes Thus, the bits in 𝐦0 are associated with the zeroth-order term, the 𝐦1 bits are associated with the first-order terms, and the second-order terms are associated with 𝐦2 . The encoding operation is ⎡ G0 ⎤ 𝐜 = (c0 , c1 , c2 , . . . , c15 ) = 𝐦G = [𝐦0 , 𝐦1 , 𝐦2 ] ⎢ G1 ⎥ = 𝐦0 G0 + 𝐦1 G1 + 𝐦2 G2 . ⎥ ⎢ ⎣ G2 ⎦

(8.11)

The general operation of the algorithm is as follows: Given a received vector 𝐫, estimates are first obtained for the highest-order block of message bits, 𝐦2 . Then 𝐦2 G2 is subtracted off from 𝐫, leaving only lower-order codewords. Then the message bits for 𝐦1 are obtained, then are subtracted, and so forth. The key to finding the message bits comes from writing multiple equations for the same quantity and taking a majority vote. Selecting coded bits from (8.11), we have c0 = m0 c1 = m0 + m1 c2 = m0 + m2 c3 = m0 + m1 + m2 + m12 . Adding these code bits together (modulo 2), we obtain an equation for the message bit m12 : c0 + c1 + c2 + c3 = m12 . We can similarly obtain three other equations for the message bit m12 , c4 + c5 + c6 + c7 = m12 c8 + c9 + c10 + c11 = m12 c12 + c13 + c14 + c15 = m12 . Given the code bits c0 , . . . , c15 , we could compute m12 four independent ways. However, the code bits are not available at the receiver, only the received bits 𝐫 = (r0 , r1 , . . . , r15 ). We use this in conjunction with the equations above to obtain four estimates of m12 : = r0 + r1 + r2 + r3 m ̂ (1) 12 m ̂ (2) = r4 + r5 + r6 + r7 12 m ̂ (3) = r8 + r9 + r10 + r11 12 m ̂ (4) = r12 + r13 + r14 + r15 . 12 Expressions such as this, in which the check sums all yield the same message bit, are said to be orthogonal1 on the message bit. From these four orthogonal equations, we determine the value of m12 by majority vote. Given m ̂ (i) , i = 1, 2, 3, 4, the decoded value m ̂ 12 is 12 (2) (3) (4) m ̂ 12 = maj(m ̂ (1) ,m ̂ 12 ,m ̂ 12 ,m ̂ 12 ), 12

where maj(· · · ) returns the value that occurs most frequently among its arguments. If errors occur such that only one of the m ̂ (i) is incorrect, the majority vote gives the correct answer. 12 If two of them are incorrect, then it is still possible to detect the occurrence of errors.

1

This is a different usage from orthogonality in the vector-space sense.

8.3 Reed–Muller Codes

387

It is similarly possible to set up multiple equations for the other second-order bits m13 , m14 , . . . , m34 . m13 = c0 + c1 + c4 + c5

m14 = c0 + c1 + c8 + c9

m13 = c2 + c3 + c6 + c7

m14 = c2 + c3 + c10 + c11

m13 = c8 + c9 + c12 + c13

m14 = c4 + c5 + c12 + c13

m13 = c10 + c11 + c14 + c15

m14 = c6 + c7 + c14 + c15

m23 = c0 + c2 + c4 + c6

m24 = c0 + c2 + c8 + c10

m23 = c1 + c3 + c5 + c7

m24 = c1 + c3 + c9 + c11

m23 = c8 + c10 + c12 + c14

m24 = c4 + c6 + c12 + c14

m23 = c9 + c11 + c13 + c15

m24 = c5 + c7 + c13 + c15

m34 m34 m34 m34

= c0 + c4 + c8 + c12 = c1 + c5 + c9 + c13 = c2 + c6 + c10 + c14 = c3 + c7 + c11 + c15

(8.12)

Based upon these equations, majority logic decisions are computed for each element of the second-order block. These decisions are then stacked up to give the block ̂ 2 = (m 𝐦 ̂ 34 , m ̂ 24 , m ̂ 14 , m ̂ 23 , m ̂ 13 , m ̂ 12 ). We then “peel off” these decoded values to get ̂ 2 G2 . 𝐫′ = 𝐫 − 𝐦 Now we repeat for the first-order bits. We have eight orthogonal check sums on each of the first-order message bits. For example, m1 = c0 + c1

m1 = c2 + c3

m1 = c4 + c5

m1 = c6 + c7

m1 = c8 + c9

m1 = c10 + c11

m1 = c12 + c13

m1 = c14 + c15

We use the bits of 𝐫 ′ to obtain eight estimates, m1(1) = r0′ + r1′

m1(2) = r2′ + r3′

m1(3) = r4′ + r5′

m1(4) = r6′ + r7′

m1(5) = r8′ + r9′

′ ′ m1(6) = r10 + r11

′ ′ m1(7) = r12 + r13

′ ′ m1(8) = r14 + r15

then make a decision using majority logic, m ̂ 1 = maj(m1(1) , m1(2) , . . . , m1(8) ). Similarly, eight orthogonal equations can be written on the bits m2 , m3 , and m4 , resulting in the estimate ̂ 1 = (m ̂ 1, m ̂ 2, m ̂ 3, m ̂ 4 ). 𝐦 ̂ 1 , we strip it off from the received signal, Having estimated 𝐦 ̂ 1 G1 𝐫 ′′ = 𝐫 ′ − 𝐦 and look for m0 . But if the previous decodings are correct, 𝐫 ′′ = m0 𝟏 + 𝐞.

388

Other Important Block Codes Then m0 is obtained simply by majority vote: ′′ m ̂ 0 = maj(r0′′ , r1′′ , . . . , r15 ).

If at any stage there is no clear majority, then a decoding failure is declared. Example 8.25 The computations described here are detailed in the indicated file. Suppose 𝐦 = (00001001000), so the codeword is 𝐜 = 𝐦G = (0101011001010110). Suppose the received vector is 𝐫 = (0101011011010110). The bit message estimates are = r0 + r1 + r2 + r3 = 0 m ̂ (1) 12

m ̂ (2) = r4 + r5 + r6 + r7 = 0 12

= r8 + r9 + r10 + r11 = 1 m ̂ (3) 12

m ̂ (4) = r12 + r13 + r14 + r15 = 0 12

We obtain m ̂ 12 = maj(0, 0, 1, 0) = 0. We similarly find m ̂ 13 = maj(0, 0, 1, 0) = 0

m ̂ 14 = maj(1, 0, 0, 0) = 0

m ̂ 23 = maj(1, 1, 0, 1) = 1

̂ 34 = maj(1, 0, 0, 0) = 0, m ̂ 24 = maj(1, 0, 0, 0) = 0 m ̂ 2 = (001000). Removing this decoded value from the received vector, we obtain so that 𝐦 ⎡𝐯3 𝐯4 ⎤ ⎢𝐯 𝐯 ⎥ ⎢ 2 4⎥ ⎢𝐯 𝐯 ⎥ ̂ 2 G2 = 𝐫 ′ − 𝐦 ̂ 2 ⎢ 2 3 ⎥ = (0101010111010101). 𝐫′ = 𝐫 − 𝐦 ⎢𝐯1 𝐯4 ⎥ ⎢𝐯1 𝐯3 ⎥ ⎢ ⎥ ⎣𝐯1 𝐯2 ⎦ At the next block, we have ̂ 2 = maj(0, 0, 0, 0, 1, 0, 0, 0) = 0 m ̂ 1 = maj(1, 1, 1, 1, 0, 1, 1, 1) = 1 m ̂ 4 = maj(1, 0, 0, 0, 0, 0, 0, 0) = 0 m ̂ 3 = maj(0, 0, 0, 0, 1, 0, 0, 0) = 0 m ̂ 2 = (0001). We now remove this decoded block so that 𝐦 ⎡𝐯4 ⎤ ⎢𝐯 ⎥ ̂ 2 G1 = 𝐫 ′ − 𝐦 ̂ 2 ⎢ 3 ⎥ = (0000000010000000). 𝐫 ′′ = 𝐫 ′ − 𝐦 ⎢𝐯2 ⎥ ⎢ ⎥ ⎣𝐯1 ⎦ α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

The majority decision is m ̂ 0 = 0. The overall decoded message is

00000

α10 α11 α12 α13 α14 α15 α16

̂ = (m ̂ 1, 𝐦 ̂ 2 ) = (00001001000). 𝐦 ̂ 0, 𝐦 This matches the message sent.



rmdecex2.m

8.3.4.2

A Geometric Viewpoint

Clearly, the key to employing majority logic decoding on an RM(r, m) code is to find a description of the equations which are orthogonal on each bit. Consider, for example, the orthogonal equations for m34 , as seen in (8.12). Writing down the indices of the checking bits, we create the check set S34 = {{0, 4, 8, 12}, {1, 5, 9, 13}, {2, 6, 10, 14}, {3, 7, 11, 15}}.

8.3 Reed–Muller Codes

389

Now represent the indices in 4-bit binary, S34 = {{(0000), (0100), (1000), (1100)}, {(0001), (0101), (1001), (1101)}, {(0010), (0110), (1010), (1110)}, {(0011), (0111), (1011), (1111)}}. Within each subset there are pairs of binary numbers which are adjacent, differing by a single bit. We can represent this adjacency with a graph that has a vertex for each of the numbers from 0000 to 1111, with edges between those vertices that are logically adjacent. The graph for a code with n = 3 is shown in Figure 8.3(a). It forms a conventional three-dimensional cube. The graph for a code with n = 4 is shown in Figure 8.3(b); it forms a four-dimensional hypercube. The check set S34 can be represented as subsets of the nodes in the graph. Figure 8.4 shows these sets by shading the “plane” defined by each of the fours check subsets. Similarly, the check sets for each the bits m12 , m13 , etc., form a set of planes.

0110

0010 110

1110

1010

111

0111

0100

010

011 0000

101

1011 1000

001

000

1100

0011

100

0101

1101

1001

0001

(a)

1111

(b)

Figure 8.3: Binary adjacency relationships in (a) three and (b) four dimensions.

0110

0010

1010

1110

0111

0100 T

0011 0000

1100 1011

1000

S

1111

0001

0101

1101

1001

Figure 8.4: Planes shaded to represent the equations orthogonal on bit m34 .

390

Other Important Block Codes With these observations, let us now develop the notation to describe the general situation. For a codeword 𝐜 = (c0 , c1 , . . . , cn−1 ), let the coordinate ci be associated with the binary m-tuple Pi obtained by complementing the binary representation of the index i. For example, c0 is associated with P0 = (1111) (since 0 = (0000)2 ) and c6 is associated with P6 = (1001) (since 6 = (0110)2 ). We think of the Pi as points on the adjacency graph such as those shown in Figure 8.3. Each codeword 𝐜 in RM(r, m) forms an incidence vector for a subset of the graph, selecting points in the graph corresponding to 1 bits in the codeword. For example, the codeword 𝐜 = (0101011001010110) is an incidence vector for the subset containing the points {P1 , P3 , P5 , P6 , P9 , P11 , P13 , P14 }. Let I = {1, 2, . . . , m}. We represent the basis vectors for the RM(r, m) code as subsets of I. For example, the basis vector 𝐯1 is represented by the set {1}. The basis vector 𝐯2 𝐯3 is represented by the set {2, 3}. The basis vector 𝐯2 𝐯3 𝐯4 is represented by {2, 3, 4}. With this notation, we now define the procedure for finding the orthogonal check sums for the vector 𝐯i1 𝐯i2 · · · 𝐯ip [483, p. 160]. 1. Let S = {S1 , S2 , . . . , S2m−p } be the subset of points associated with the incidence vector 𝐯i1 𝐯i2 · · · 𝐯ip . 2. Let {j1 , j2 , . . . , jm−p } be the set difference I − {i1 , i2 , . . . , ip }. Let T be the subset of points associated with the incidence vector 𝐯j1 𝐯j2 · · · 𝐯jm−p . The set T is called the complementary subspace to S. 3. The first check sum consists of the sum of the coordinates specified by the points in T. 4. The other check sums are obtained by “translating” the set T by the points in S. That is, for each Si ∈ S, we form the set T + Si . The corresponding check sum consists of the sum of the coordinates specified by this set.

Example 8.26

Checksums for RM(2, 4). Let us find checksums for 𝐯3 𝐯4 = (0000000000001111).

1. The subset for which 𝐯3 𝐯4 is an incidence vector contains the points S = {P12 , P13 , P14 , P15 } = {(0011)(0010)(0001)(0000)}. In Figure 8.4, the set S is indicated by the dashed lines. 2. The difference set is {j1 , j2 } = {1, 2, 3, 4} − {3, 4} = {1, 2}, which has the associated vector 𝐯1 𝐯2 = (0001000100010001). This is the incidence vector for the set T = {P3 , P7 , P11 , P15 } = {(1100)(1000)(0100)(0000)}. In Figure 8.4, the set T is the darkest of the shaded regions. 3. T represents the checksum m34 = c12 + c8 + c4 + c0 . 4. The translations of T by the nonzero elements of S are: by P12 = (0011) → {(1111)(1011)(0111)(0011)} = {P0 , P4 , P8 , P12 } by P13 = (0010) → {(1110)(1010)(0110)(0010)} = {P1 , P5 , P9 , P13 } by P14 = (0001) → {(1101)(1001)(0101)(0001)} = {P2 , P6 , P10 , P14 }. These correspond to the checksums m34 = c15 + c11 + c7 + c3 m34 = c14 + c10 + c6 + c2 m34 = c13 + c9 + c5 + c1 , which are shown in the figure as shaded planes.

8.3 Reed–Muller Codes

391

Figure 8.5 indicates the check equations for all of the second-order vectors for the RM(2, 4) code.

0110

0010

0110

1110 0010

1010

0111

0100

1000

0101

1101

0000

1001

0001

0010

0010 0111

0100

0000

0101

1101

1001

0001

(d)

0000

0111

0100

0101

1101

0000

1001

(e)

1111 1100

0011

1011

0001

1110

1010

1111 1100

1000

1101

1001

0110

0010 0111

0011

0101

(c)

0100

1011 1000

1000

0001

1110

1010

1111 1100

0011

0000

1001

0110

1110

1010

1101

1011

(b)

0110

1111 1100

0011

0101

0001

(a)

0111

0100

1011 1000

1110

1010

1111 1100

0011

1011

0000

0111

0100

1100

0011

0010

1010

1111

0110

1110

1011 1000

0101

1101

1001

0001

(f)

Figure 8.5: Geometric descriptions of parity check equations for second-order vectors of the RM(2, 4) code. (a) 𝐯3 𝐯4 ; (b) 𝐯2 𝐯4 ; (c) 𝐯2 𝐯3 ; (d) 𝐯1 𝐯4 ; (e) 𝐯1 𝐯3 ; (f) 𝐯1 𝐯2 .

Now let us examine check sums for the first-order vectors. 1. For the vector 𝐯4 = (0000000011111111) the set S is S = {P8 , P9 , P10 , P11 , P12 , P13 , P14 , P15 } = {(0111), (0110), (0101), (0100), (0011), (0010), (0001), (0000)}. These eight points are connected by the dashed lines in Figure 8.6(a). 2. The difference set is {1, 2, 3, 4} − {4} = {1, 2, 3}, which has the associated vector 𝐯1 𝐯2 𝐯3 = (0000000100000001), which is the incidence vector for the set T = {P7 , P15 } = {(1000), (0000)}. The corresponding equation is m4 = c8 + c0 . The subset is indicated by the widest line in Figure 8.6(a). 3. There are eight translations of T by the points in S. These are shown by the other wide lines in the figure. ◽

392

Other Important Block Codes 0110

0010

0110

1110 0010

1010

0111

0100

1000

1101

0000

1011 1000

1001

0010 0111

0100

0000

0111

0100

0101

1101

0000

1001

1011 1000

0101

1101

1001

0001

(c)

1111 1100

0011

1011

0001

1110

1010

1111 1100

1000

1001

0110

1110

0011

1101

(b)

0110

1010

0101

0001

(a)

0010

1111 1100

0011

0101

0001

0111

0100

1011

0000

1010

1111 1100

0011

1110

(d)

Figure 8.6: Geometric descriptions of parity check equations for first-order vectors of the RM(2, 4) code. (a) 𝐯4 ; (b) 𝐯3 ; (c) 𝐯2 ; (d) 𝐯1 . 8.3.5

Other Constructions of Reed–Muller Codes

The |𝐮|𝐮 + 𝐯| Construction The |𝐮|𝐮 + 𝐯| introduced in Exercise 3.29 may be used to construct Reed–Muller codes. In fact, RM(r, m) = {[𝐮|𝐮 + 𝐯] ∶ 𝐮 ∈ RM(r, m − 1), 𝐯 ∈ RM(r − 1, m − 1)} having generator matrix ] [ GRM (r, m − 1) GRM (r, m − 1) . GRM (r, m) = 0 GRM (r − 1, m − 1) [

A Kronecker Construction Let G(2,2)

] 1 1 = . Define the m-fold Kronecker product of G(2,2) as 0 1

G(2m ,2m ) = G(2,2) ⊗ G(2,2) ⊗ · · · ⊗ G(2,2) , ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ m operands

8.4 Building Long Codes from Short Codes: The Squaring Construction which is a 2m × 2m matrix. Then the generator for the RM(r, m) code is obtained by selecting from G(2m ,2m ) those rows with weight greater than or equal to 2m−r .

8.4

Building Long Codes from Short Codes: The Squaring Construction

There are several ways of combining short codes together to obtain codes with different properties. Among these are the [𝐮|𝐮 + 𝐯] construction (outlined in Exercise 3.29) and concatenation (described in Section 10.6). In this section, we present another one, called the squaring construction [272, 292]. We begin by examining partitions of codes into cosets by subcodes. Let 0 =  be a binary linear (n, k0 ) block code with generator G and let 1 ⊂ 0 be a (n, k1 ) subcode of 0 . That is, 1 is a subgroup of 0 . Recall that a coset of 1 is a set of the form 𝐜l + 1 = {𝐜l + 𝐜 ∶ 𝐜 ∈ 1 }, where 𝐜l ∈ 0 is a coset leader. We will take the nonzero coset leaders in ∖1 . From Section 2.2.5, recall that 0 ∕1 forms a factor group, partitioning 0 into 2k0 −k1 disjoint subsets each containing 2k1 codewords. Each of these subsets can be represented by a coset leader. The set of coset leaders is called the coset representative space. The coset representative for the coset 1 is always chosen to be 0. Denote this coset representative space by [0 ∕1 ]. The code 1 and the set [∕1 ] share only the zero vector in common, 1 ∩ [∕1 ] = 0. Without loss of generality, let G0 = G be expressed in a form that k1 rows of G0 can be selected as a generator G1 for 1 . The 2k0 −k1 codewords generated by the remaining k0 − k1 rows of G0 ∖G1 can be used as to generate representatives for the cosets in ∕1 . Let G0∖1 = G0 ∖G1 (that is, the set difference, thinking of the rows as individual elements of the set). The 2k0 −k1 codewords generated by G0∖1 form a (n, k − k1 ) subcode of 0 . Every codeword in  can be expressed as the sum of a codeword in 1 and a vector in [0 ∕1 ]. We denote this as 0 = 1 ⊕ [0 ∕1 ] = {𝐮 + 𝐯 ∶ 𝐮 ∈ 1 , 𝐯 ∈ [0 ∕1 ]}. The set-operand sum ⊕ is called the direct sum. Example 8.27 While the squaring construction can be applied to any linear block code, we demonstrate it here for a Reed–Muller code. Consider the RM(1, 3) code with ⎡1 ⎢0 G = G0 = ⎢ 0 ⎢ ⎣0

1 0 0 1

1 1 0 0

1 1 0 1

1 0 1 0

1 0 1 1

1 1 1 0

1⎤ 1⎥ . 1⎥ ⎥ 1⎦

Let 1 be the (8, 3) code generated by the first k1 = 3 rows of the generator G0 , ⎡1 1 1 1 1 1 1 1⎤ G1 = ⎢0 0 1 1 0 0 1 1⎥ . ⎢ ⎥ ⎣0 0 0 0 1 1 1 1⎦ The cosets in ∕1 are

[0, 0, 0, 0, 0, 0, 0, 0] + 1 , [0, 1, 0, 1, 0, 1, 0, 1] + 1 .

The coset representatives are [0 ∕1 ] = {[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 1, 0, 1, 0, 1]}

393

394

Other Important Block Codes generated by the rows of the matrix G0∖1 = G0 ∖G1 = [0, 1, 0, 1, 0, 1, 0, 1].



One-level squaring is based on 1 and the partition 0 ∕1 . Let |0 ∕1 |2 denote the code ℭ0∕1 of length 2n obtained by the squaring construction, defined as ℭ0∕1 = |0 ∕1 |2 = {(𝐚 + 𝐱, 𝐛 + 𝐱) ∶ 𝐚, 𝐛 ∈ 1 and 𝐱 ∈ [0 ∕1 ]}.

(8.13)

Since there are 2k0 −k1 vectors in [0 ∕1 ] and 2k1 choices each for 𝐚 and 𝐛, there are 2k0 −k1 2k1 2k1 = 2k0 +k1 codewords in ℭ0∕1 . The code ℭ0∕1 is thus a (n, k0 + k1 ) code. Let 𝐦 = [m1,0 , m1,1 , . . . , m1,k1 −1 , m2,0 , m2,1 , . . . , m2,k1 −1 , m3,0 , m3,1 , . . . , m3,k0 −k1 −1 ] be a message vector. A coded message of the form 𝐜 = (𝐚 + 𝐱, 𝐛 + 𝐱) from (8.13) can be obtained by 0 ⎤ ⎡ G1 G1 ⎥ ≜ 𝐦𝔊0∕1 𝐜 = 𝐦⎢ 0 ⎢ ⎥ ⎣G0∖1 G0∖1 ⎦ so the matrix 𝔊0∕1 is the generator for the code. The minimum weight for the code is d0∕1 = min(2d0 , d1 ). We can express the generator matrix 𝔊0∕1 in the following way. For two matrices M1 and M2 having the same number of columns, let M1 ⊕ M2 denote the stacking operation [ ] M1 M1 ⊕ M2 = . M2 This is called the matrix direct sum. Let I2 be the 2 × 2 identity. Then [ ] G1 0 I2 ⊗ G1 = , 0 G1 where ⊗ is the Kronecker product. We also have [ ] [ ] 1 1 ⊗ G0∖1 = G0∖1 G0∖1 . Then we can write

Example 8.28

[ ] 𝔊0∕1 = I2 ⊗ G1 ⊕ 1 1 ⊗ G0∖1 .

Continuing the previous example, the generator for the code |0 ∕1 |2 is

𝔊0∕1

⎡ ⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1 0 0

1 0 0

1 1 0

1 1 0

1 0 1

1 0 1

1 1 1

1 1 1

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

1 0 0

1 0 0

1 1 0

1 1 0

1 0 1

1 0 1

1 1 1

1 1 1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎦



We can further partition the cosets as follows. Let 2 be a (n, k2 ) subcode of 1 with generator G2 , with 0 ≤ k2 ≤ k1 . Then each of the 2k0 −k1 cosets 𝐜l + 1 in the partition 0 ∕1 can be partitioned into 2k1 −k2 cosets consisting of the following codewords 𝐜l + 𝐝p + 2 = {𝐜l + 𝐝p + 𝐜 ∶ 𝐜 ∈ 2 }

8.4 Building Long Codes from Short Codes: The Squaring Construction

395

for each l in 0, 1, 2, . . . , 2k−k1 and each p in 1, 2, . . . , 2k1 −k2 , where 𝐝p is a codeword in 1 but not in 2 . This partition is denoted as ∕1 ∕2 . We can express the entire code as the direct sum 0 = [∕1 ] ⊕ [1 ∕2 ] ⊕ 2 . Let G1∖2 denote the generator matrix for the coset representative space [1 ∕2 ]. Then G1∖2 = G1 ∖G2 . Example 8.29

Let 2 be generated by the first two rows of G1 , so [ 1 1 1 1 1 1 1 G2 = 0 0 1 1 0 0 1

] 1 . 1

There are two cosets in 1 ∕2 , [0, 0, 0, 0, 0, 0, 0, 0] + 2 , [0, 0, 0, 0, 1, 1, 1, 1] + 2 . The set of coset representatives [1 ∕2 ] is generated by [ G1∖2 = G1 ∖G2 = 0 0

0

0

1

1

1

] 1 .



Two-level squaring begins by forming the two one-level squaring construction codes ℭ0∕1 = |0 ∕1 |2 and ℭ1∕2 = |1 ∕2 |2 , with generators 𝔊0∕1 and 𝔊1∕2 , respectively, given by ⎡ G1 𝔊0∕1 = ⎢ 0 ⎢ ⎣G0∖1

0 ⎤ G1 ⎥ ⎥ G0∖1 ⎦

⎡ G2 𝔊1∕2 = ⎢ 0 ⎢ ⎣G1∖2

0 ⎤ G2 ⎥ . ⎥ G1∖2 ⎦

(8.14)

Note that ℭ1∕2 is a subcode (subgroup) of ℭ0∕1 . The coset representatives for ℭ0∕1 ∕ℭ1∕2 , which are denoted by [ℭ0∕1 ∕ℭ1∕2 ], form a linear code. Let 𝔊ℭ0∕1 ∖ℭ1∕2 denote the generator matrix for the coset representatives [ℭ0∕1 ∕ℭ1∕2 ]. Then form the code ℭ0∕1∕2 = |0 ∕1 ∕2 |4 by ℭ0∕1∕2 = |0 ∕1 ∕2 |4 = {(𝐚 + 𝐱, 𝐛 + 𝐱) ∶ 𝐚, 𝐛 ∈ ℭ1∕2 and 𝐱 ∈ [ℭ0∕1 ∕ℭ1∕2 ]}. That is, it is obtained by the squaring construction of ℭ0∕1 and ℭ0∕1 ∕ℭ1∕2 . The generator matrix for ℭ0∕1∕2 is 0 ⎤ ⎡ 𝔊1∕2 0 𝔊1∕2 ⎥ . 𝔊0∕1∕2 = ⎢ ⎥ ⎢𝔊 ⎣ ℭ0∕1 ∖ℭ1∕2 𝔊ℭ0∕1 ∖ℭ1∕2 ⎦ This gives a (4n, k0 + 2k1 + k2 ) linear block code with minimum distance d0∕1∕2 = min(4d0 , 2d1 , d2 ). Writing 𝔊0∕1 and 𝔊1∕2 as in (8.14), rearranging rows and columns and performing some simple row operations, we can write 𝔊0∕1∕2 as

𝔊0∕1∕2

⎡ G2 ⎢ 0 ⎢ ⎢ 0 ⎢ 0 =⎢ ⎢G0∖1 ⎢G ⎢ 1∖2 ⎢ 0 ⎢ ⎣ 0

0 G2 0 0 G0∖1 G1∖2 0 G1∖2

0 0 G2 0 G0∖1 G1∖2 G1∖2 0

0 ⎤ 0 ⎥ ⎥ 0 ⎥ G2 ⎥ ⎥, G0∖1 ⎥ G1∖2 ⎥⎥ G1∖2 ⎥ ⎥ G1∖2 ⎦

396

Other Important Block Codes which can be expressed as [ 𝔊0∕1∕2 = I4 ⊗ G2 ⊕ 1 1 1 1

]

⎡1 1 1 1⎤ ⊗ G0∖1 ⊕ ⎢0 0 1 1⎥ ⊗ G1∖2 . ⎢ ⎥ ⎣0 1 0 1⎦

Note that [

1

1

] 1

1

and

⎡1 1 1 1⎤ ⎢0 0 1 1⎥ ⎢ ⎥ ⎣0 1 0 1⎦

are the generator matrices for the zeroth- and first-order Reed–Muller codes of length 4. More generally, let 1 , 2 , . . . , m be a sequence of linear subcodes of  = 0 with generators Gi , minimum distance di , and dimensions k1 , k2 , . . . , km satisfying C0 ⊇ C1 ⊇ · · · ⊇ Cm k ≥ k1 ≥ · · · ≥ km ≥ 0. Then form the chain of partitions 0 ∕1 , 0 ∕1 ∕2 , . . . , 0 ∕1 ∕ · · · ∕m , such that the code can be expressed as the direct sum 0 = [∕1 ] ⊕ [1 ∕2 ] ⊕ · · · ⊕ [q−1 ∕m ]. Assume that the generator matrix is represented in a way that G0 ⊇ G1 · · · ⊇ Gm . Let Gi∖i+1 denote the generator matrix for the coset representative space [i ∕i+1 ], with rank(Gi∖i+1 ) = rank(Gi ) − rank(Gi+1 ) and Gi∖i+1 = Gi ∖Gi+1 . Then higher-level squaring is performed recursively. From the codes 𝔊0∕1∕···∕m−1 ≜ |0 ∕1 ∕ · · · ∕m−1 |2

m−1

and the code 𝔊1∕2∕···∕m ≜ |1 ∕2 ∕ · · · ∕m |2

m−1

,

form the code m

𝔊0∕1∕···∕m ≜ [0 ∕1 ∕ · · · ∕m ]2

= {(𝐚 + 𝐱, 𝐛 + 𝐱) ∶ 𝐚, 𝐛 ∈ 𝔊1∕2∕···∕m , 𝐱 ∈ [𝔊0∕1∕···∕m−1 ∕𝔊1∕2∕···∕m ]}. The generator matrix can be written (after appropriate rearrangement of the rows) as 𝔊0∕1∕ . . . ∕m = I2m ⊗ Gm ⊕

m ∑

⊕GRM (r, m) ⊗ Gr∖r+1 ,

r=0

where GRM (r, m) is the generator of the RM(r, m) code of length 2m .

8.5 Quadratic Residue Codes

8.5

397

Quadratic Residue Codes

Quadratic residue codes are codes of length p, where p is a prime with coefficients in GF(s), where s is a quadratic residue of p. They have rather good distance properties, being among the best codes known of their size and dimension. We begin the construction with the following notation. Let p be prime. Denote the set of quadratic residues of p by Qp and the set of quadratic nonresidues by Np . Then the elements in GF(p) are partitioned into sets as GF(p) = Qp ∪ Np ∪ {0}. As we have seen GF(p) is cyclic. This gives rise to the following observation: Lemma 8.30 A primitive element of GF(p) must be a quadratic nonresidue. That is, it is in Np . Proof Let 𝛾 be a primitive element of GF(p). We know 𝛾 p−1 = 1, and p − 1 is the smallest such power. Suppose 𝛾 is a quadratic residue. Then there is a number 𝜎 (square root) such that 𝜎 2 = 𝛾. Taking powers of 𝜎, we have 𝜎 2(p−1) = 1. Furthermore, the powers 𝜎, 𝜎 2 , 𝜎 3 , . . . , 𝜎 2(p−1) can be shown to all be distinct. But this contradicts the order p of the field. ◽ So a primitive element 𝛾 ∈ GF(p) satisfies 𝛾 e ∈ Qp if and only if e is even, and 𝛾 e ∈ Np if and only if e is odd. The elements of Qp correspond to the first (p − 1)∕2 consecutive powers of 𝛾 2 ; that is, Qp is a cyclic group under multiplication modulo p, generated by 𝛾 2 . The quadratic residue codes are designed as follows. Choose a field GF(s) as the field for the coefficients, where s is a quadratic residue modulo p. We choose an extension field GF(sm ) so that it has a primitive pth root of unity; from Lemma 5.54 we must have p|sm − 1. (It can be shown [292, p. 519] that if s = 2, then p must be of the form p = 8k ± 1.) Let 𝛽 be a primitive pth root of unity in GF(sm ). Then the conjugates with respect to GF(s) are 2

3

𝛽 1, 𝛽 s, 𝛽 s , 𝛽 s , . . . The cyclotomic coset is {1, s, s2 , s3 , . . . , }. Since s ∈ Qp and Qp is a group under multiplication modulo p, Qp is closed under multiplication by s. So all of the elements in the cyclotomic coset are in Qp . Thus, Qp is a cyclotomic coset or the union of cyclotomic cosets. Example 8.31 Let p = 11, which has quadratic residues Qp = {1, 3, 4, 5, 9}. Let s = 3. A field having a primitive 11th root of unity is GF(35 ). Let 𝛽 ∈ GF(35 ) be a primitive 11th root of unity. The conjugates of 𝛽 are: 𝛽, 𝛽 3 , 𝛽 9 , 𝛽 27 = 𝛽 5 , 𝛽 81 = 𝛽 4 , so the cyclotomic coset is {1, 3, 9, 5, 4}, ◽

which is identical to Qp .

Now let 𝛽 be a primitive pth root of unity in GF(sm ). Because of the results above, ∏ q(x) = (x − 𝛽 i ) i∈Qp

is a polynomial with coefficients in GF(s). Furthermore, ∏ (x − 𝛽 i ) n(x) = i∈Np

(8.15)

398

Other Important Block Codes also has coefficients in GF(s). We thus have the factorization xp − 1 = q(x)n(x)(x − 1). Let R be the ring GF(s)[x]∕(xp − 1). Definition 8.32 [292, p. 481] For a prime p, the quadratic residue codes of length Q, Q, N, and N are the cyclic codes (or ideals of R) with generator polynomials q(x),

(x − 1)q(x),

n(x),

(x − 1)n(x),

respectively. The codes Q and N have dimension 12 (p + 1); the codes Q and N have dimension 12 (p − 1).

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

testQR.cc

The codes Q and N are sometimes called augmented QR codes, while Q and N are called expurgated QR codes. ◽

Example 8.33 Let p = 17 and s = 2. The field GF(28 ) has a primitive 17th root of unity, which is 𝛽 = 𝛼 15 . The quadratic residues modulo 17 are {1, 2, 4, 8, 9, 13, 15, 16}. Then q(x) = (x − 𝛽)(x − 𝛽 2 )(x − 𝛽 4 )(x − 𝛽 8 )(x − 𝛽 9 )(x − 𝛽 13 )(x − 𝛽 15 )(x − 𝛽 16 ) = 1 + x + x2 + x4 + x6 + x7 + x8 n(x) = (x − 𝛽 3 )(x − 𝛽 5 )(x − 𝛽 6 )(x − 𝛽 7 )(x − 𝛽 10 )(x − 𝛽 11 )(x − 𝛽 12 )(x − 𝛽 14 ) = 1 + x3 + x4 + x5 + x8 .



QR codes tend to have rather good distance properties. Some binary QR codes are the best codes known for their particular values of n and k. A bound on the distance is provided by the following. Theorem 8.34 The minimum distance d of the codes Q or N satisfies d2 ≥ p. If, additionally, p = 4l − 1 for some l, then d2 − d + 1 ≥ p. The proof relies on the following lemma. Lemma 8.35 Let q̃ (x) = q(xn ), where n ∈ Np (where the operations are in the ring R). Then the roots of q̃ (x) are in the set {𝛼 i , i ∈ Np }. That is, q̃ (x) is a scalar multiple of n(x). Similarly, n(xn ) is a scalar multiple of q(x). Proof Let 𝜌 be a generator of the nonzero elements of GF(p). From the discussion around Lemma 8.30, Q is generated by even powers of 𝜌 and Np is generated by odd powers of 𝜌. ∏ Write q̃ (x) = i∈Qp (xn − 𝛼 i ). Let m ∈ Np . Then for any m ∈ Np , q̃ (𝛼 m ) =



(𝛼 mn − 𝛼 i ).

i∈Qp

But since m ∈ Np and n ∈ Np , mn ∈ Qp (being both odd powers of 𝜌). So 𝛼 i = 𝛼 mn for some value of i, so 𝛼 m is a root of q̃ (x). ◽ The effect of evaluation at q(xn ) is to permute the coefficients of q(x). Proof of Theorem 8.34. [292, p. 483]. Let a(x) be a codeword of minimum nonzero weight d in Q. Then by Lemma 8.35, the polynomial ã(x) = a(xn ) is a codeword in N. Since the coefficients of ã(x)

8.6 Golay Codes

399 Table 8.1: Extended Quadratic Residue Codes Q [292, 483]

n

k

d

n

k

d

n

k

d

8 18 24 32 42 48 72

4 9 12 16 21 24 36

4* 6* 8* 8* 10* 12* 12

74 80 90 98 104 114 128

37 40 45 49 52 57 64

14 16* 18* 16 20* 12–16 20

138 152 168 192 194 200

69 76 84 96 97 100

14–22 20 16–24 16–28 16–28 16–32

*Indicates that the code is as good as the best known for this n and k.

are simply a permutation (and possible scaling) of those of a(x), ã(x) must be a codeword of minimum weight in N. The product a(x)ã(x) must be a multiple of the polynomial ∏ q∈Qp

(x − 𝛼 q )

∏ n∈Np

(x − 𝛼 n ) =

p−1 p−1 ∏ xp − 1 ∑ j (x − 𝛼 i ) = x. = x−1 i=1 j=0

Thus, a(x)ã(x) has weight p. Since a(x) has weight d, the maximum weight of a(x)ã(x) is d2 . We obtain the bound d2 ≥ p. If p = 4k − 1 then n = −1 is a quadratic nonresidue. In the product a(x)ã(x) = a(x)a(x−1 ) there are ◽ d terms equal to 1, so the maximum weight of the product is d2 − d + 1. Table 8.1 summarizes known distance properties for some augmented binary QR codes, with indications of best known codes. In some cases, d is expressed in terms of upper and lower bounds. While general decoding techniques have been developed for QR codes, we present only a decoding algorithm for a particular QR code, the Golay code presented in the next section. Decoding algorithms for other QR codes are discussed in [292, 366, 365], and [108].

8.6

Golay Codes

Of these codes it was said, “The Golay code is probably the most important of all codes, for both practical and theoretical reasons.” [292, p. 64]. While the Golay codes have not supported the burden of applications this alleged importance would suggest, they do lie at the confluence of several routes of theoretical development and are worth studying. Let us take p = 23 and form the binary QR code. The field GF(211 ) has a primitive 23rd root of unity. The quadratic residues are Qp = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18} and the corresponding generators for Q and N are ∏ q(x) = (x − 𝛽 i ) = 1 + x + x5 + x6 + x7 + x9 + x11 i∈Qp

n(x) = 1 + x2 + x4 + x5 + x6 + x10 + x11 . This produces a (23,12,7) code, the Golay code G23 . It is straightforward to verify that this code is a perfect code: the number of points out to a distance t = 3 is equal to ( ) ( ) ( ) ( ) 23 23 23 23 V2 (23, 3) = + + + = 211 . 0 1 2 3

400

Other Important Block Codes This satisfies the Hamming bound. It is said that Golay discovered this code by observing this relationship in Pascal’s triangle. Adding an extra overall parity bit to the (23, 12) code produces the Golay G24 (24,12,8) code. The binary Golay code also has some interesting generator matrix representations. The generator matrix for the G24 code can be written as

(8.16) We note the following facts about G24 .



In this representation, the 11 × 11 AT11 matrix on the upper right is obtained from the transpose of 12 × 12 Hadamard matrix of Paley type (8.5) by removing the first row and column of H12 , then replacing −1 by 1 and 1 by 0. Since the rows of H12 differ in six places, the rows of AT11 differ by six places. Because of the identity block, the sum of any two rows of G has weight 8.



If 𝐮 and 𝐯 are rows of G (not necessarily distinct), then wt(𝐮 ⋅ 𝐯) ≡ 0 (mod 2). So every row of G is orthogonal to every other row. Therefore, G is also the parity check matrix H of G24 . Also G is dual to itself: G24 = G⟂ 24 . Such a code is call self-dual.



Every codeword has even weight: If there were a codeword 𝐮 of odd weight then wt(𝐮 ⋅ 𝐮) = 1. Furthermore, since every row of the generator has weight divisible by 4, every codeword is even.



The weight distributions for the (23, 12) and the (24, 12) codes are shown in Table 8.2.

8.6.1

Decoding the Golay Code

We present here two decoding algorithms for the Golay code. The first decoder, due to [108], is algebraic and is similar in spirit to the decoding algorithms used for BCH and Reed–Solomon codes. The second decoder is arithmetic, being similar in spirit to the Hamming decoder presented in Section 1.9.1. Table 8.2: Weight Distributions for the G23 and G24 Codes [483] G23 :

i: 0 Ai : 1

G24 :

7 253

i: 0 Ai : 1

8 506

8 759

11 1288

12 2576

12 1288

16 759

15 506

24 1

16 253

23 1

8.6 Golay Codes 8.6.1.1

401

Algebraic Decoding of the G23 Golay Code

The algebraic decoder works similar to those we have seen for BCH codes. An algebraic syndrome is first computed, which is used to construct an error-locator polynomial. The roots of the error-locator polynomial determine the error locations, which for a binary code is sufficient for the decoding. Having minimum distance 7, G23 is capable of correcting up to three errors. Let 𝛽 be a primitive 23rd root of unity in GF(211 ). Recall that the quadratic residues modulo 23 are Qp = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18} and the generator polynomial is ∏ g(x) = (x − 𝛽 i ). i∈Qp

Thus, 𝛽, 𝛽 3 , and 𝛽 9 are all roots of g(x), and hence of any codeword c(x) = m(x)g(x). Let c(x) be the transmitted codeword, and let r(x) = c(x) + e(x) be the received polynomial. We define the syndrome as si = r(𝛽 i ) = e(𝛽 i ). If there are no errors, then si = 0 for i ∈ Qp . Thus, for example, if s1 = s3 = s9 = 0, no errors are detected. If there is a single error, e(x) = xj1 , then s1 = 𝛽 j1 ,

s3 = 𝛽 3j1 ,

s9 = 𝛽 9j1 .

When this condition is detected, single-error correction can proceed. Suppose there are two or three errors, e(x) = xj1 + xj2 + xj3 . Let z1 = 𝛽 j1 , z2 = 𝛽 j2 , and z3 = 𝛽 j3 be the error locators, where z3 = 0 in the case of only two errors. The syndromes in this case are si = zi1 + zi2 + zi3 . Define the error-locator polynomial as L(x) = (x − z1 )(x − z2 )(x − z3 ) = x3 + 𝜎1 x2 + 𝜎2 x + 𝜎3 , where, by polynomial multiplication, 𝜎1 = z1 + z2 + z3 𝜎2 = z1 z2 + z1 z3 + z2 z3 𝜎3 = z1 z2 z3 . The problem now is to compute the coefficients of the error-locator polynomial using the syndrome values. By substitution of the definitions, it can be shown that s9 + s91 = 𝜎2 s7 + 𝜎3 s23

s7 = s1 s23 + 𝜎2 s5 + 𝜎3 s41

s5 = s51 + 𝜎2 s3 + 𝜎3 s21

s31 + s3 = 𝜎3 + 𝜎2 s1 .

By application of these equivalences (see golaysimp.m), it can be shown that D ≜ (s31 + s3 )2 +

s91 + s9 s31 + s3

= (𝜎2 + s21 )3 .

(8.17)

The quantity D thus has a cube root in GF(211 ). From (8.17), we obtain 𝜎2 = s21 + D1∕3 ; similarly for 𝜎3 . Combining these results, we obtain the following equations: 𝜎1 = s1

𝜎2 =

s21

1∕3

+D

𝜎3 = s3 + s1 D

An example of the decoder is shown in testGolay.cc.

1∕3

.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

golaysimp.m

402

Other Important Block Codes

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

testGolay.cc

8.6.1.2

Arithmetic Decoding of the G24 Code

In this section, we present an arithmetic coder, which uses the weight structure of the syndrome to determine the error patterns. Besides the generator matrix representation of (8.16), it is convenient to employ a systematic generator. The generator matrix can be written in the form ⎡ 1 ⎢ 1 ⎢ 1 ⎢ 1 ⎢ ⎢ ⎢ G=⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ] [ = I12 B .

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 ⎤ ⎥ ⎥ 1 ⎥ 1 ⎥ ⎥ ⎥ 1 ⎥ 1 ⎥ 1 ⎥ ⎥ ⎥ ⎥ ⎥ 1 ⎥⎦

It may be observed that B is orthogonal, BT B = I. Let 𝐫 = 𝐜 + 𝐞 and let 𝐞 = (𝐱, 𝐲), where 𝐱 and 𝐲 are each vectors of length 12. Since the code is capable of correcting up to three errors, there are only a few possible weight distributions of 𝐱 and 𝐲 to consider: wt(𝐱) ≤ 3 wt(𝐲) = 0 wt(𝐱) ≤ 2 wt(𝐲) = 1 wt(𝐱) ≤ 1 wt(𝐲) = 2 wt(𝐱) = 0 wt(𝐲) = 3. Since the code is self-dual, the generator matrix is also the parity check matrix. We can compute a syndrome by 𝐬 = G𝐫 T = G(𝐞T ) = G[𝐱, 𝐲]T = 𝐱T + B𝐲T . If 𝐲 = 0, then 𝐬 = 𝐱T . If 𝐬 has weight ≤ 3, we conclude that 𝐲 = 0. The error pattern is 𝐞 = (𝐱, 0) = (𝐬T , 0). Suppose now that wt(𝐲) = 1, where the error is in the ith coordinate of 𝐲 and that wt(𝐱) ≤ 2. The syndrome in this case is 𝐬 = 𝐱T + 𝐛i , where 𝐛i is the ith column of B. The position i is found by identifying the position such that wt(𝐬 + 𝐛i ) = wt(𝐱) ≤ 2. Having thus identified i, the error pattern is 𝐞 = ((𝐬 + 𝐛i )T , 𝐲i ). Here, the notation 𝐲i is the vector of length 12 having a 1 in position i and zeros elsewhere. If wt(𝐱) = 0 and wt(𝐲) = 2 or 3, then 𝐬 = 𝐛i + 𝐛j or 𝐬 = 𝐛i + 𝐛j + 𝐛k . Since B is an orthogonal matrix, BT 𝐬 = BT (B𝐲T ) = 𝐲T . The error pattern is 𝐞 = (0, (BT 𝐬)T ).

8.7 Exercises

403

Finally, if wt(𝐱) = 1 and wt(𝐲) = 2, let the nonzero coordinate of 𝐱 be at index i. Then BT 𝐬 = BT (𝐱T + B𝐲T ) = BT 𝐱T + BT B𝐲T = 𝐫iT + 𝐲T , where 𝐫i is the ith row of B. The error pattern is 𝐞 = (𝐱i + 𝐫i ). Combining all these cases together, we obtain the following decoding algorithm.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

, (BT 𝐬)T

α10 α11 α12 α13 α14 α15 α16

golayarith.m

Algorithm 8.2 Arithmetic Decoding of the Golay 24 Code (This presentation is due to Wicker [483]) 1 Input: 𝐫 = 𝐞 + 𝐜, the received vector 2 Output: 𝐜, the decoded vector 3 Compute 𝐬 = G𝐫 (compute the syndrome) 4 if wt(𝐬) ≤ 3 5 𝐞 = (𝐬T , 0) 6 else if wt(𝐬 + 𝐛 ) ≤ 2 for some column vector 𝐛 i i 7 𝐞 = ((𝐬 + 𝐛i )T , 𝐲i ) 8 else 9 Compute BT 𝐬 10 if wt(BT 𝐬) ≤ 3 11 𝐞 = (0, (BT 𝐬)T ) 12 else if wt(BT 𝐬 + 𝐫iT ) ≤ 2 for some row vector 𝐫i 13 𝐞 = (𝐱i , (BT 𝐬)T + 𝐫i ) 14 else 15 Too many errors: declare uncorrectable error pattern and stop. 16 end 17 end 18 𝐜 = 𝐫 + 𝐞

A MATLAB implementation that may be used to generate examples is in golayarith.m

8.7 8.1 8.2

8.3 8.4 8.5

Exercises Verify items 1, 9, and 10 of Theorem 8.3. Show that [ ] 1 1 ⊗ Hn 1 −1

and

Hn ⊗

[ 1 1

] 1 −1

are Hadamard matrices. Prove the first two parts of Theorem 8.8. Compute the quadratic residues modulo 19. Compute the values of the Legendre symbol 𝜒 19 (x) for x = 1, 2, . . . , 18. The Legendre symbol has the following properties. Prove them. In all cases, take (a, p) = 1 and (b, p) = 1. (a) 𝜒 p (a) ≡ a(p−1)∕2 (mod p). Hint: if 𝜒 p (a) = 1 then x2 ≡ a (mod p) has a solution, say x0 . p−1 Then a(p−1)∕2 ≡ x0 . Then use Fermat’s theorem.

404

Other Important Block Codes (b) 𝜒 p (a)𝜒 p (b) = 𝜒 p (ab). (c) a ≡ b (mod p) implies that 𝜒 p (a) = 𝜒 p (b). (d) 𝜒 p (a2 ) = 1. 𝜒 p (a2 b) = 𝜒 p (b). 𝜒 p (1) = 1. 𝜒 p (−1) = (−1)(p−1)∕2 . 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13

Construct a Hadamard matrix of order 20. Construct the Hadamard codes A20 , B20 , and C20 . Which of these are linear codes? Which are cyclic? Construct a generator and parity check matrix for RM(2, 4). Show that RM(r, m) is a subcode of RM(r + 1, m). Show that RM(0, m) is a repetition code. Show that RM(m − 1, m) is a simple parity check code. Show that if 𝐜 ∈ RM(r, m), then (𝐜, 𝐜) ∈ RM(r, m + 1). For each of the following received sequences received from RM(1, 3) codes, determine the transmitted codeword 𝐜. (a) 𝐫 = [1, 0, 1, 0, 1, 1, 0, 1]. (b) 𝐫 = [0, 1, 0, 0, 1, 1, 1, 1].

8.14 Prove that all codewords in RM(1, m) have weight 0, 2m−1 or 2m . Hint: By induction. 8.15 Show that the RM(1, 3) and RM(2, 5) codes are self-dual. Are there other self-dual RM codes? 8.16 For the RM(1, 4) code: (a) (b) (c) (d) (e) (f)

Write the generator G. Determine the minimum distance. ̂ 3, m ̂ 2, m ̂ 1 and m ̂ 0. Write down the parity checks for m ̂ 4, m Decode the received vector 𝐫 = [0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1], if possible. Decode the received vector 𝐫 = [1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0], if possible. Decode the received vector 𝐫 = [1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0], if possible.

8.17 Verify the parity check equations for Figure 8.6(a) and (b). 8.18 For the inverse Hadamard transform: (a) Provide a matrix decomposition analogous to (8.10). (b) Draw the signal flow diagram for the fast inverse Hadamard computation. (c) Implement your algorithm in MATLAB. 8.19 Construct the generator for a RM(2, 4) code using the [𝐮|𝐮 + 𝐯] construction. 8.20 Using the [𝐮|𝐮 + 𝐯] construction show that the generator for a RM(r, m) code can be constructed as GRM (r, m − 2) ⎡GRM (r, m − 2) ⎢ 0 GRM (r − 1, m − 2) G=⎢ 0 0 ⎢ 0 0 ⎣

GRM (r, m − 2) 0 GRM (r − 1, m − 2) 0

GRM (r, m − 2) ⎤ GRM (r − 1, m − 2)⎥ . GRM (r − 1, m − 2)⎥ ⎥ GRM (r − 2, m − 2)⎦

8.21 Construct the generator for a RM(2, 4) code using the Kronecker construction. ⎡1 1 0 1 0 0 0⎤ ⎢0 1 1 0 1 0 0⎥ 8.22 Let G = ⎢ be the generator for a (7, 4) Hamming code 0 . Let G1 be 0 0 1 1 0 1 0⎥ ⎢ ⎥ ⎣0 0 0 1 1 0 1⎦ formed from the first two rows of G. Let G2 be formed from the first row of G. (a) Identify G0∖1 and the elements of [0 ∕1 ].

8.8 Exercises (b) Write down the generator 𝔤0∕1 for the code 0∕1 . (c) Write down the generator 𝔤1∕2 for the code 1∕2 . (d) Write down the generator 𝔤0∕1∕2 for the code 0∕1∕2 . 8.23 Quadratic residue code designs. (a) Find the generator polynomials for binary quadratic residue codes of length 7 and dimensions 4 and 3. Also, list the quadratic residues modulo 7 and compare with the cyclotomic coset for 𝛽. (b) Are there binary quadratic residue codes of length 11? Why or why not? (c) Find the generator polynomials for binary quadratic residue codes of length 23 and dimensions 12 and 11. Also, list the quadratic residues modulo 23 and compare with the cyclotomic coset for 𝛽. 8.24 8.25 8.26 8.27 8.28

Find quadratic residue codes with s = 3 of length 11 having dimensions 5 and 6. Show that n(x) defined in (8.15) has coefficients in GF(s). In decoding the Golay code, show that the cube root of D may be computed by finding x = D1365 . Show that the Golay (24,12) code is self-dual. Let 𝐫 = [1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1] be a received vector from a Golay (24, 12) code. Determine the transmitted codeword using the arithmetic decoder. 8.29 Let 𝐫 = [1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1] be the received vector from a Golay (23, 12) code. Determine the transmitted codeword using the algebraic decoder.

8.8

References

This chapter was developed largely out of course notes based on [483] and closely follows it. The discussion of Hadamard matrices and codes is drawn from [292] and [483]. A more complete discussion on quadratic residues may be found in [325]. Considerably more detail about Reed–Muller codes is available in [292]. The Reed–Muller codes were first described in [321], with work by Reed immediately following [362] which reinforced the Boolean function idea and provided an efficient decoding algorithm. The graph employed in developing the Reed–Muller decoding orthogonal equations is an example of a Euclidean geometry EG(m, r), a finite geometry. This area is developed and explored, giving rise to generalizations of the majority logic decoding, in [271, Chapter 8]. Chapter 7 of [271] also develops majority logic decoding for cyclic codes. Majority logic decoding can also be employed on convolutional codes [271, Chapter 13], but increasing hardware capabilities has made this less-complex alternative to the Viterbi algorithm less attractive. The Golay codes are covered in lavish and fascinating detail in Chapters 2, 16, and 20 of [292]. Interesting connections between G24 and the 24-dimensional Leech lattice are presented in [79]. Another type of decoder which has worked well for the Golay code is an error trapping decoder. Such decoders employ the cycle structure of the codes, just as the Meggitt decoders do, but they simplify the number of syndromes the decoder must recognize. A thorough discussion of error trapping decoders is in [271] and [244]. Other Golay code decoders include conventional coset decoding; a method due to Berlekamp [458, p. 35]; and majority logic decoding [292].

405

Chapter 9

Bounds on Codes Let  be an (n, k) block code with minimum distance d over a field with q elements with redundancy r = n − k. There are relationships that must be satisfied among the code length n, the dimension k, the minimum distance dmin , and the field size q. We have already met two of these: the Singleton bound of Theorem 3.15, d ≤ n − k + 1 = r + 1, and the Hamming bound of Theorem 3.17, r ≥ logq Vq (n, t), where Vq (n, t) is the number of points in a Hamming sphere of radius t = ⌊(d − 1)∕2⌋, Vq (n, t) =

t ( ) ∑ n i=0

i

(q − 1)i

(9.1)

(see (3.9)). In this chapter, we present other bounds which govern the relationships among the parameters defining a code. In this, we are seeking theoretical limits without regard to the feasibility of a code for any particular use, such as having efficient encoding or decoding algorithms. This chapter is perhaps the most mathematical of the book. It does not introduce any good codes or decoding algorithms, but the bounds introduced here have been of both historical and practical importance, as they have played a part in motivating the search for good codes and helped direct where the search should take place. Definition 9.1 Let Aq (n, d) be the maximum number of codewords in any code over GF(q) of length n with minimum distance d. ◽ For a linear code the dimension of the code is k = logq Aq (n, d). Consider what happens as n gets long in a channel with probability of symbol error equal to pc . The average number of errors in a received vector is npc . Thus, in order for a sequence of codes to be asymptotically effective, providing capability to correct the increasing number of errors in longer codewords, the minimum distance must grow at least as fast as 2npc . We will frequently be interested in the relative distance and the rate k∕n as n → ∞. Definition 9.2 For a code with length n and minimum distance d, let 𝛿 = d∕n be the relative distance of the code. ◽ For a code with relative distance 𝛿, the distance is d ≈ ⌊𝛿n⌋ = 𝛿n + O(1).

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

408

Bounds on Codes Definition 9.3 Let1

1 𝛼q (𝛿) = lim sup logq Aq (n, ⌊𝛿n⌋). n→∞ n

(9.2)

For a linear code, logq Aq (n, d) is the dimension of the code and 1n logq Aq (n, d) is the code rate, so 𝛼q (𝛿) is the maximum possible code rate that an arbitrarily long code can have while maintaining a relative distance 𝛿. We call this the asymptotic rate. ◽ The functions Aq (n, d) and 𝛼q (𝛿) are not known in general, but upper and lower bounds on these functions can be established. For example, the Singleton bound can be expressed in terms of these functions as Aq (n, d) ≤ qn−d+1 and, asymptotically, 𝛼q (𝛿) ≤ 1 − 𝛿. +

01010

00000

α10 α11 α12 α13 α14 α15 α16

plotbds.m mcelieceg.m

Many of the bounds presented here are expressed in terms of 𝛼q (𝛿). A lower bound is the Gilbert–Varshamov bound (sometimes called the Varshamov–Gilbert bound). As upper bounds on 𝛼q (𝛿), we also have the Hamming and Singleton bounds, the Plotkin bound, the Elias bound, and the McEliece–Rodemich–Rumsey–Welch bound (in two forms). Figure 9.1 shows a comparison of the lower bound and these upper bounds. Codes exist which fall between the lower bound and the smallest of the upper bounds, that is, in the shaded region. A function which figures into some of the bounds is the entropy function. The binary entropy function H2 was introduced in Section 1.3. Here we generalize that definition. Definition 9.4 Let 𝜌 = (q − 1)∕q. Define the entropy function Hq on [0, 𝜌] by Hq (x) = xlogq (q − 1) − xlogq x − (1 − x)logq (1 − x),

x ∈ (0, 𝜌] ◽

and Hq (0) = 0.

1 McEliece−Rodemich− Rumsey−Welch

0.8

𝛼q(𝛿)

α α2 α3 α4 α5 α6 α7 α8 α9

0.6

Elias Plotkin Hamming Singleton

0.4 0.2 Gilbert− Varshamov 0 0 0.2

0.4

0.6

0.8

1

𝛿

Figure 9.1: Comparison of lower bound (Gilbert–Varshamov) and various upper bounds. 1 The lim sup is the least upper bound of the values that its argument function returns to infinitely often. Initially it may be helpful to think of lim sup as “sup” or “max.”

Bounds on Codes

409

BOX 9.1: O and o Notation The O and o notation are used to represent “orders” of functions. Roughly, these may be interpreted as follows:

• • • •

f = O(1) as x → x0 ⇔ f is bounded as x → x0 . f = o(1) as x → x0 ⇔ f → 0 as x → x0 . f = O(g) as x → x0 ⇔ f ∕g is bounded as x → x0 . f = o(g) as x → x0 ⇔ f ∕g → 0 as x → x0 .

It follows from the definitions that o(1) + o(1) = o(1) and O(xn ) + O(xm ) = O(xmax(n,m) ). The entropy and the number of points in the Hamming sphere are asymptotically related. Lemma 9.5 Let 0 ≤ x ≤ 𝜌 = (q − 1)∕q. Then lim

n→∞

1 log V (n, ⌊xn⌋) = Hq (x). n q q

(9.3)

Proof First we need a way to approximate the binomial. Stirling’s formula (see Exercise 9.10) says that(see Box 9.1 for the o notation) √ n! = 2𝜋 nnn e−n (1 + o(1)). Thus,

) ( √ 1 log n − n + o(1) = n log n − n + O(log n). 2𝜋 + n + 2 Now let m = ⌊xn⌋. Then by (9.1), log n! = log

Vq (n, m) =

m ( ) ∑ n i=0

i

(9.4)

(q − 1)i .

The last term of the summation is the largest over this index range. Also, it is clear that m−1 (

( ) ∑ n) n (q − 1)i ≤ m (q − 1)m . i m i=0

( ) ( ) n n (q − 1)m ≤ Vq (n, m) ≤ (1 + m) (q − 1)m . m m Take logq throughout and divide through by n to obtain ( ) m 1 1 n + logq (q − 1) ≤ logq Vq (n, m) logq n n n m ( ) m 1 1 n + logq (q − 1). ≤ logq (1 + m) + logq n n n m

We thus obtain

As n → ∞, 1n logq (1 + m) → 0. Using the fact that

m n

= x + o(1), we obtain ( ) m 1 1 n + logq (q − 1) lim logq Vq (n, m) = lim logq n→∞ n n→∞ n n m ( ) 1 n + xlogq (q − 1) + o(1). = lim logq n→∞ n m

(9.5)

410

Bounds on Codes Using (9.4), we have 1 log V (n, m) n q q = logq n − 𝛿logq m − (1 − 𝛿)logq (n − m) + 𝛿logq (q − 1) + o(1)

lim

n→∞

= logq n − 𝛿logq 𝛿 − 𝛿logq n − (1 − 𝛿)logq (1 − 𝛿) − (1 − 𝛿)logq n + 𝛿logq (q − 1) + o(1) = −𝛿logq 𝛿 − (1 − 𝛿)logq (1 − 𝛿) + 𝛿logq (q − 1) + o(1) = Hq (𝛿) + o(1).

9.1



The Gilbert–Varshamov Bound

The Gilbert–Varshamov bound is a lower bound on Aq (n, d). Theorem 9.6 For natural numbers n and d, with d ≤ n, Aq (n, d) ≥

qn . Vq (n, d − 1)

(9.6)

Proof [458] Let  be a code of length n and distance d with the maximum number of codewords. Then of all the qn possible n-tuples, there is none with distance d or more to some codeword in . (Otherwise, that n-tuple could be added to the code and  would not have had the maximum number of codewords.) Thus, the Hamming spheres of radius d − 1 around the codewords cover all the n-tuples, so that the sum of their volumes is ≥ the number of points. That is, ||Vq (n, d − 1) ≥ qn . ◽

This is equivalent to (9.6). For a linear code, the Gilbert–Varshamov bound can be manipulated as follows: logq Aq (n, d) ≥ n − logq Vq (n, d − 1) or n − logq Aq (n, d) ≤ logq Vq (n, d − 1).

The correction capability satisfies d = 2t + 1. The redundancy can be written as r = n − k = n − logq Aq (n, d). We obtain (9.7) r ≤ logq Vq (n, 2t). The Gilbert–Varshamov bound can thus be viewed as an upper bound on the necessary redundancy for a code: there exists a t-error correcting q-ary code with redundancy r bounded as in (9.7). The Gilbert–Varshamov bound also has an asymptotic form. Theorem 9.7 If 0 ≤ 𝛿 ≤ 𝜌 = (q − 1)∕q, then 𝛼q (𝛿) ≥ 1 − Hq (𝛿). Proof [458] Using (9.2), (9.6), and Lemma 9.5, we have ( ) 1 1 𝛼q (𝛿) = lim sup logq Aq (n, ⌊𝛿n⌋) ≥ lim 1 − logq Vq (n, 𝛿n) = 1 − Hq (𝛿). n→∞ n n→∞ n



9.2 The Plotkin Bound

411

The Gilbert–Varshamov bound is a lower bound: it should be possible to do at least as well as this bound predicts for long codes. However, for many years it was assumed that 𝛼q (𝛿) would, in fact, be equal to the lower bound for long codes, since no families of codes were known that were capable of exceeding the Gilbert–Varshamov bound as the code length increased. In 1982, a family of codes based on algebraic geometry was reported, however, which exceeded the lower bound [450]. Unfortunately, algebraic geometry codes fall beyond the scope of this book. (See [449] for a comprehensive introduction to algebraic geometry codes, or [457] or [351]. For mathematical background of these codes, see [421].)

9.2

The Plotkin Bound

Theorem 9.8 Let  be a q-ary code of length n and minimum distance d. Then if d > 𝜌n, Aq (n, d) ≤

d , d − 𝜌n

(9.8)

where 𝜌 = (q − 1)∕q. Proof [458] Consider a code  with M codewords in it. Form a list with the M codewords as the rows, and consider a column in this list. Let qj denote the number of times that the jth symbol in the code ∑q−1 alphabet, 0 ≤ j < q, appears in this column. Clearly, j=0 qj = M. Let the rows of the table be arranged so that the q0 codewords with the 0th symbol are listed first and call that set of codewords R0 , the q1 codewords with the 1st symbol are listed second and call that set of codewords R1 , and so forth. Consider the Hamming distance between all M(M − 1) pairs of codewords, as perceived by this selected column. For pairs of codewords within a single set Ri , all the symbols are the same, so there is no contribution to the Hamming distance. For pairs of codewords drawn from different sets, there is a contribution of 1 to the Hamming distance. Thus, for each of the qj codewords drawn from set Rj , there is a total contribution of M − qj to the Hamming distance between the codewords in Rj and all the other sets. Summing these up, the contribution of this column to the sum of the distances between all pairs of codewords is q−1 ∑ j=0

qj (M − qj ) = M

q−1 ∑ j=0

qj −

q−1 ∑ j=0

q2j = M 2 −

q−1 ∑

q2j .

j=0

Now use the Cauchy–Schwartz inequality (see Box 9.2 and Exercise 9.6) to write ( q−1 )2 ( ) q−1 ∑ 1 ∑ 1 2 2 qj (M − qj ) ≤ M − q =M 1− . q j=0 j q j=0 Now total this result over all n columns. There are M(M − 1) pairs of codewords, each a distance at least d apart. We obtain ( ) 1 M(M − 1)d ≤ n 1 − M 2 = n𝜌M 2 q or d M≤ . d − n𝜌 Since this result holds for any code, since the  was arbitrary, it must hold for the code with Aq (n, d) codewords. ◽ Equivalently,

n𝜌M . M−1 The Plotkin bound provides an upper bound on the distance of a code with given length n and size M. d≤

412

Bounds on Codes

BOX 9.2: The Cauchy–Schwartz Inequality For our purposes, the Cauchy-Schwartz inequality can be expressed as follows (see [319] for extensions and discussions): Let 𝐚 = (a1 , a2 , . . . , an ) and 𝐛 = (b1 , b2 , . . . , bn ) be sequences of real or complex numbers. Then n n n |∑ |2 ∑ ∑ | | |ai |2 |bi |2 . | ai bi | ≤ | | i=1 i=1 | i=1 |

9.3

The Griesmer Bound

Theorem 9.9 For a linear block (n, k) q-ary code  with minimum distance d, n≥

k−1 ∑ ⌈d∕qi ⌉. i=0

Proof Let N(k, d) be the length of the shortest q-ary linear code of dimension k and minimum distance d. Let  be an (N(k, d), k, d) code and let G be a generator matrix of the code. Assume (without loss of generality, by row and/or column interchanges and/or row operations) that G is written with the first row as follows: 1

G=

1

...

1

0

0

G1

...

0

G2

,

where G1 is (k − 1) × d and G2 is (k − 1) × (N(k, d) − d). Claim: G2 has rank k − 1. Otherwise, it would be possible to make the first row of G2 equal to 0 (by row operations). Then an appropriate input message could produce a codeword of weight < d, by canceling one of the ones from the first row with some linear combination of rows of G1 , resulting in a codeword of minimum distance < d. Let G2 be the generator for an (N(k, d) − d, k − 1, d1 ) code  ′ . We will now determine a bound on d1 . Now let [𝐮0 |𝐯] (the concatenation of two vectors) be a codeword in , [𝐮0 |𝐯] = 𝐦0 G, where 𝐯 ∈  ′ has weight d1 and where 𝐦0 is a message vector with 0 in the first position, 𝐦0 = [0, m2 , . . . , mk ]. Let z0 be the number of zeros in 𝐮0 , z0 = d − wt(𝐮0 ). Then we have wt(𝐮0 ) + d1 = (d − z0 ) + d1 ≥ d. Let 𝐦i = [i, m2 , . . . , mk ], for i ∈ GF(q), and let 𝐮i = 𝐦i G. Let zi be the number of zeros in 𝐮i , zi = d − wt(𝐮i ). As i varies over the elements in GF(q), eventually every element in 𝐮i will be set to 0. Thus, q−1 ∑ i=0

zi = d.

9.4 The Linear Programming and Related Bounds

413

Writing down the weight equation for each i, we have d1 + d − z0 ≥ d d1 + d − z1 ≥ d ⋮ d1 + d − zq−1 ≥ d. Summing all these equations, we obtain qd1 + qd − d ≥ qd or d1 ≥ ⌈d∕q⌉. G2 therefore generates an (N(k, d) − d, k − 1) code with a minimum Hamming distance which is at least ⌈d∕q⌉. We conclude that N(k − 1, ⌈d∕q⌉) ≤ N(k, d) − d. Now we simply proceed inductively: N(k, d) ≥ d + N(k − 1, ⌈d∕q⌉) ≥ d + ⌈d∕q2 ⌉ + N(k − 2, ⌈d∕q2 ⌉) ⋮ ≥

k−2 ∑ ⌈d∕qi ⌉ + N(1, ⌈d∕2k−1 ⌉)



∑ ⌈d∕qi ⌉.

i=0 k−1 i=0

Since N(k, d) < n for any actual code, the result follows.

9.4



The Linear Programming and Related Bounds

The linear programming bound takes the most development to produce of the bounds introduced so far. However, the tools introduced are interesting and useful in their own right. Furthermore, the programming bound technique leads to one of the tightest bounds known. We introduce first what is meant by linear programming, then present the main theorem, still leaving some definitions unstated. This is followed by the definition of Krawtchouk polynomials, the character (needed for a couple of key lemmas), and finally the proof of the theorem. Let 𝐱 be a vector, 𝐜 a vector, 𝐛 a vector, and A a matrix. A problem of the form maximize 𝐜T 𝐱 subject to A𝐱 ≤ 𝐛

(9.9)

(or other equivalent forms) is said to be a linear programming problem. The maximum quantity 𝐜T 𝐱 from the solution is said to be the value of the linear programming problem. Example 9.10

Consider the following problem: maximize x1 + x2 subject to x1 + 2x2 ≤ 10 15 x + 5x2 ≤ 45 2 1 x1 ≥ 0

x2 ≥ 0.

414

Bounds on Codes x2 15 2 x1

+ 5x2 = 45

x1 = x2 increasing Maximizing value

x1 Feasible region

x1 + 2x2 = 10

Figure 9.2: A linear programming problem.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

simplex1.m pivottableau.m reducefree.m restorefree.m

Figure 9.2 illustrates the geometry. The shaded region is the feasible region, the region where all the constraints are satisfied. The function x1 + x2 increases in the direction shown, so that the point in the feasible region maximizing x1 + x2 is as shown. The solution to this problem is x1 = 4, x2 = 3 and the value is x1 + x2 = 7.



The feasible region always forms a polytope and, due to the linear nature of the function being optimized, the solution is always found on a vertex or along an edge of the feasible region. Linear programming problems arise in a variety of applied contexts and algorithmic methods of solving them are well established. (See [319] and the references therein for an introduction.) The linear programming bound applies to both linear and nonlinear codes. Definition 9.11 Let  be a q-ary code with M codewords of length n. Define Ai =

1 |{(𝐜, 𝐝)|𝐜 ∈ , 𝐝 ∈ , dH (𝐜, 𝐝) = i}|. M

That is, Ai is the (normalized) number of codewords in the code at a distance i from each other. The sequence (A0 , A1 , . . . , An ) is called the distance distribution of the code. For a linear code, the distance distribution is the weight distribution. ◽ In Section 9.4.1, we will introduce a family of polynomials Kk (x), known as Krawtchouk polynomials. The theorem is expressed in terms of these polynomials as follows. Theorem 9.12 [458] (Linear programming bound) For a q-ary code of length n and minimum distance d Aq (n, d) ≤ M, where M is value of the linear programming problem maximize

n ∑

Ai

i=0

subjectto A0 = 1, Ai = 0for 1 ≤ i < d

9.4 The Linear Programming and Related Bounds n ∑

415

Ai Kk (i) ≥ 0 for k ∈ {0, 1, . . . , n}

i=0

Ai ≥ 0 for i ∈ {0, 1, . . . , n}. Furthermore, if q = 2 and d is even, then we may take Ai = 0 for odd i. Solution of the linear programming problem in this theorem not only provides a bound on Aq (n, d), but also the distance distribution of the code. Let us begin with an example that demonstrates what is involved in setting up the linear programming problem, leaving the Kk (i) functions still undefined. Example 9.13 Determine a bound on A2 (14, 6) for binary codes. Since q = 2, we have A1 = A3 = A5 = A7 = A9 = A11 = A13 = 0. Furthermore, since the minimum distance is ∑n 6, we also have A2 = A4 = 0. All the Ai are ≥ 0. The condition i=0 Ai Kk (i) ≥ 0 in the theorem becomes K0 (0) + K0 (6)A6 + K0 (8)A8 + K0 (10)A10 + K0 (12)A12 + K0 (14)A14 ≥ 0 K1 (0) + K1 (6)A6 + K1 (8)A8 + K1 (10)A10 + K1 (12)A12 + K1 (14)A14 ≥ 0 K2 (0) + K2 (6)A6 + K2 (8)A8 + K2 (10)A10 + K2 (12)A12 + K2 (14)A14 ≥ 0

(9.10)

⋮ K14 (0) + K14 (6)A6 + K14 (8)A8 + K14 (10)A10 + K14 (12)A12 + K14 (14)A14 ≥ 0. ◽

This problem can clearly be expressed in the form (9.9).

9.4.1

Krawtchouk Polynomials

The Krawtchouk polynomials mentioned above and used in the linear programming bound are now introduced. Definition 9.14 The Krawtchouk polynomial Kk (x; n, q) is defined by ( )( ) k ∑ x n−x (−1)j Kk (x; n, q) = (q − 1)k−j , j k − j j=0 ( ) x(x − 1) · · · (x − j + 1) x = j j!

where

for x ∈ ℝ.

Usually, Kk (x; n, q) is used in the context of a fixed n and q, so the abbreviated notation Kk (x) is used. ◽

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

Some of the important properties of Krawtchouk polynomials are developed in Exercise 9.16.

krawtchouk.m

9.4.2

Character

We also need the idea of a character. Definition 9.15 Let ⟨G, +⟩ be a group and let ⟨T, ⋅⟩ be the group of complex numbers which have absolute value 1, with multiplication as the operation. A character is a homomorphism 𝜒 ∶ G → T. That is, for all g1 , g2 ∈ G, (9.11) 𝜒(g1 + g2 ) = 𝜒(g1 )𝜒(g2 ). If 𝜒(g) = 1 for every g ∈ G, then 𝜒 is called the principal character.



416

Bounds on Codes It is straightforward to show that 𝜒(0) = 1, where 0 is the identity of G. The lemma which we need for our development is the following. Lemma 9.16 If 𝜒 is a character for ⟨G, +⟩, then { ∑ |G| if 𝜒 is the principal character 𝜒(g) = 0 otherwise. g∈G Proof If 𝜒 is principal, the first part is obvious. Let h be an arbitrary element of G. Then ∑ ∑ ∑ 𝜒(g) = 𝜒(h + g) = 𝜒(k), 𝜒(h) g∈G

g∈G

k∈G

where the first equality follows from the homomorphism (9.11) and the second equality follows since h + g sweeps out all elements of G as g sweeps over all the elements of G. We thus have ∑ 𝜒(g) = 0. (𝜒(h) − 1) g∈G

Since h was arbitrary, then ∑ if 𝜒 is not principal it is possible to choose an h ∈ G such that 𝜒(h) ≠ 1. ◽ We must therefore have g∈G 𝜒(g) = 0.

Example 9.17 The character property of Lemma 9.16 is actually familiar from signal processing, under a slightly different guise. Let G = ℤn , the set of integers modulo n, and let 𝜒 l (g) = e−j2𝜋lg∕n for g ∈ G. Then { n−1 ∑ ∑ n l ≡ 0 (mod n) −j2𝜋lg∕n 𝜒 l (g) = e = 0 otherwise. g∈G i=0 Thus, 𝜒 l (g) is principal if l = 0, and not principal if l ≠ 0.



Now let G be the additive group in GF(q). Let 𝜔 = ej2𝜋∕q be a primitive qth root of unity in ℂ. We want to define a character by 𝜒(g) = 𝜔g for g ∈ GF(q), but g is not an integer. However, for the purposes of defining a character, we can interpret the elements of GF(q) as integers. The only property we need to enforce is the homomorphism property, 𝜔g1 +g2 = 𝜔g1 𝜔g2 , which will follow if we make the integer interpretation. Thus, we can interpret the alphabet over which a q-ary code exists as ℤ∕qℤ ≅ ℤq , which we will denote as Q. Note that this character is not principal, so that, by Lemma 9.16, ∑ 𝜔g = 0 g∈Q

or



𝜔g = −1.

(9.12)

𝜔0g = (q − 1).

(9.13)

g∈Q∖{0}

Also note that

∑ g∈Q∖{0}

Let ⟨𝐱, 𝐲⟩ be the conventional inner product (see Definition 2.60) ⟨𝐱, 𝐲⟩ =

n ∑ i=1

xi yi .

9.4 The Linear Programming and Related Bounds 9.4.3

417

Krawtchouk Polynomials and Characters

We now present an important lemma which relates Krawtchouk polynomials and characters. Lemma 9.18 [458] Let 𝜔 be a primitive qth root of unity in ℂ and let x ∈ Qn be a fixed codevector with weight i. Then ∑ 𝜔⟨x,y⟩ = Kk (i). y ∈ Qn ∶ wt(y) = k

Proof Assume that all the weight of 𝐱 is in the first i positions, 𝐱 = [x1 , x2 , . . . , xi , 0, 0, . . . , 0], with x1 through xi not equal to 0. In the vector 𝐲 of weight k, choose positions h1 , h2 , . . . , hk such that 0 < h1 < h2 < · · · < hj ≤ i < hj+1 < · · · < hk ≤ n with yhk ≠ 0. That is, the first j nonzero positions of 𝐲 overlap the nonzero positions of 𝐱. Let D be the set of all words of weight k that have their nonzero coordinates at these k fixed positions. Then ∑



𝜔⟨𝐱,𝐲⟩ =

𝐲∈D

𝜔

xh yh +···+xh yh +0+···+0 1

j

1

𝐲∈D



= 1

=



···

yh ∈Q∖{0}



j

𝜔

yh ∈Q∖{0} k

𝜔

xh yh 1

1

1

j

1



···

yh ∈Q∖{0}

xh yh +···+xh yh +0+···+0

𝜔

j

j

yh

j

j

= (−1) (q − 1)

n−j



xh yh

yh ∈Q∖{0}

1

j

j+1



𝜔0 · · ·

∈Q∖{0}

𝜔0

yh ∈Q∖{0} k

,

where the last equality follows from (9.12) ) and (9.13). ( )( i n−i different ways for each fixed position j. We have The set D may be chosen in j k−j ∑

𝜔

⟨𝐱,𝐲⟩

=

𝐲 ∈ Qn ∶ wt(𝐲) = k

=





Different choices of D

𝐲∈D

𝜔

⟨𝐱,𝐲⟩

( )( ) i n − i ∑ ⟨𝐱,𝐲⟩ = 𝜔 j k − j 𝐲∈D j=0 k ∑

( )( ) i n−i (−1)j (q − 1)k−j = Kk (i), j k − j j=0

k ∑

by the definition of the Krawtchouk polynomial.



The final lemma we need in preparation for proving the linear programming bound is the following. Lemma 9.19 Let {A0 , A1 , . . . , An } be the distance distribution of a code  of length n with M codewords. Then n ∑ Ai Kk (i) ≥ 0. i=0

418

Bounds on Codes Proof From Lemma 9.18 and the definition of Ai , we can write n ∑

Ai Kk (i) =

i=0

=

n ∑ 1 M i=0





𝜔⟨𝐱−𝐲,𝐳⟩

(𝐱,𝐲) ∈  2 ∶ 𝐳 ∈ ∶ d(𝐱,𝐲) = i wt(𝐳) = k



1 ∑ M 𝐳 ∈ ∶

wt(𝐳) = k

1 ∑ = M 𝐳 ∈ ∶

(𝐱,𝐲) ∈  2

(

wt(𝐳)=k

𝜔⟨𝐱−𝐲,𝐳⟩



)( 𝜔⟨𝐱,𝐳⟩

𝐱∈



) 𝜔−⟨𝐲,𝐳⟩

𝐲∈

2 1 ∑ ||∑ ⟨𝐱,𝐳⟩ || = | 𝜔 | ≥ 0. | M 𝐳 ∈ ∶ ||𝐱∈ | wt(𝐳) = k



We are now ready to prove the linear programming bound. Proof of Theorem 9.12 Lemma 9.19 shows that the distance distribution must satisfy n ∑

Ai Kk (i) ≥ 0.

i=0

Clearly the Ai are nonnegative, A0 = 1 and the Ai = 0 for 1 ≤ i < d to obtain the distance properties. ∑ By definition, we also have ni=0 Ai = M, the number of codewords. Hence, any code must have its ∑ number of codewords less than the largest possible value of ni=0 Ai = M. For binary codes with d even we may take the Ai = 0 for i odd, since any codeword with odd weight can be modified to a codeword with even weight by flipping 1bit without changing the minimum distance of the code. ◽ Example 9.20 Let us return to the problem of Example 9.13. Now that we know about the Krawtchouk polynomials, the inequalities in (9.10) can be explicitly computed. These become k = 0 ∶ 1 + A6 + A8 + A10 + A12 + A14 ≥ 0 k = 1 ∶ 14 + 2A6 − 2A8 − 6A10 − 10A12 − 14A14 ≥ 0 k = 2 ∶ 91 − 5A6 − 5A8 + 11A10 + 43A12 + 91A14 ≥ 0 k = 3 ∶ 364 − 12A6 + 12A8 + 4A10 − 100A12 − 364A14 ≥ 0 k = 4 ∶ 1001 + 9A6 + 9A8 − 39A10 + 121A12 + 1001A14 ≥ 0 k = 5 ∶ 2002 + 30A6 − 30A8 + 38A10 − 22A12 − 2002A14 ≥ 0 k = 6 ∶ 3003 − 5A6 − 5A8 + 27A10 − 165A12 + 3003A14 ≥ 0 k = 7 ∶ 3432 − 40A6 + 40A8 − 72A10 + 264A12 − 3432A14 ≥ 0. α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ipboundex.m

The other inequalities are duplicates, by symmetry in the polynomials. Also note that the k = 0 inequality is implicit in maximizing A6 + A8 + · · · + A14 , so it does not need to be included. Solving the linear programming problem, we obtain the solution A6 = 42 Hence A2 (14, 6) ≤ 1 + 42 + 7 + 14 = 64.

A8 = 7 A10 = 14

A12 = 0 A14 = 0. ◽

9.5 The McEliece–Rodemich–Rumsey–Welch Bound

9.5

419

The McEliece–Rodemich–Rumsey–Welch Bound

The McEliece–Rodemich–Rumsey–Welch bound is a bound that is quite tight. It applies only to binary codes and is derived using the linear programming bound. Theorem 9.21 (The McEliece–Rodemich–Rumsey–Welch Bound) For a binary code, ( ) 1 1 √ 𝛼2 (𝛿) ≤ H2 − 𝛿(1 − 𝛿) , 0 ≤ 𝛿 ≤ . 2 2

(9.14)

The proof is subtle and makes use of some properties of Krawtchouk polynomials introduced in Exercise 9.16, an extension of the linear programming bound in Exercise 9.17, as well as other properties that follow from the fact that Krawtchouk polynomials are orthogonal. What is provided here is a sketch; the reader may want to fill in some of the details. Proof [303, 458] Let t be an integer in the range 1 ≤ t ≤ n2 and let a be a real number in [0, n]. Define the polynomial 1 𝛼(x) = (9.15) (K (a)Kt+1 (x) − Kt+1 (a)Kt (x))2 . a−x t Because the {Kk (x)} form an orthogonal set, they satisfy the Christoffel–Darboux formula ( ) ∑ K (x)K (y) Kk+1 (x)Kk (y) − Kk (x)Kk+1 (y) 2 n i i = ( ) n y−x k + 1 k i=0 k

(9.16)

i

(see [319, p. 224] for an introduction, or a reference on orthogonal polynomials such as [427]). Using (9.16), Equation (9.15) can be written as 𝛼(x) =

t ( ) ∑ Kk (a)Kk (x) 2 n (Kt (a)Kt+1 (x) − Kt+1 (a)Kt (x)) ( ) . n t+1 t k=0

(9.17)

k

Since Kk (x) is a polynomial in x of degree k, it follows that 𝛼(x) is a polynomial in x of degree 2t + 1. Then 𝛼(x) can also be written as a series in {Kk (x)} as ∑

2t+1

𝛼(x) =

𝛼k Kk (x)

k=0

∑ (since both are polynomials of degree 2t + 1). Let 𝛽(x) = 𝛼(x)∕𝛼0 = 1 + 2t+1 k=1 𝛽k Kk (x). We desire to choose a and t such that 𝛽(x) satisfies the conditions of the theorem in Exercise 9.17: 𝛽k ≥ 0 and 𝛽(j) ≤ 0 for j = d, d + 1, . . . , n. Note that if a ≤ d, then by (9.15) 𝛼(j) ≤ j for j = d, d + 1, . . . , n, so the only thing to be verified is whether 𝛼i ≥ 0, i = 1, . . . , n and 𝛼0 > 0. This is established using the interlacing property of the roots of the Krawtchouk polynomials: Kt+1 (x) has t + 1 distinct real zeros on (0, n). Denote these as (t+1) (t+1) , with x1(t+1) < x2(t+1) < · · · < xt+1 . Similarly, Kt (x) has t distinct real zeros on (0, n), x1(t+1) , . . . , xt+1 (t) (t) which we denote as x1 , . . . , xt and order similarly. These roots are interlaced: (t+1) 0 < x1(t+1) < x1(t) < x2(t+1) < x2(t) < · · · < xt(t) < xt+1 < n.

(This interlacing property follows as a result of the orthogonality properties of the Krawtchouk polynomials; see [427].)

420

Bounds on Codes So choose t such that x1(t) < d and choose a between x1(t+1) and x1(t) in such a way that Kt (a) = ∑ −Kt+1 (a) > 0. Then 𝛼(x) can be written ( )in the form 𝛼(x) = ckl Kk (x)Kl (x) where all the ckl ≥ 0. Thus, 2 n Kt (a)Kt+1 (a) > 0. the 𝛼i ≥ 0. It is clear that 𝛼0 = − t+1 t Thus, the theorem in Exercise 9.17 can be applied: Aq (n, d) ≤ 𝛽(0) =

𝛼(0) (n + 1) ( n ) . = 𝛼0 2a(t + 1) t

(9.18)

We now invoke another result about orthogonal polynomials. It is known (see √ [427]) that as n → ∞, if t → ∞ in such a way that t∕n → 𝜏 for some 0 < 𝜏 < 12 , then x1(t) ∕n → 12 − 𝜏(1 − 𝜏). So let n → ∞ √ and d∕n → 𝛿 in such a way that t∕n → 12 − 𝛿(1 − 𝛿). Taking the logarithm of (9.18), dividing by n, and using the results of Lemma 9.5, the result follows. ◽ In [303], a slightly stronger bound is given (which we do not prove here): 𝛼2 (𝛿) ≤

min {1 + g(u2 ) − g(u2 + 2𝛿u − 2𝛿)},

0≤u≤1−2𝛿

where g(x) = H2 ((1 −

(9.19)

√ 1 − x)∕2).

The bound (9.14) actually follows from this with u = 1 − 2𝛿. For 0.273 ≤ 𝛿 ≤ 12 , the bounds (9.14) and (9.19) coincide, but for 𝛿 < 0.273, (9.19) gives a slightly tighter bound. Another bound is obtained from (9.19) when u = 0. This gives rise to the Elias bound for binary codes. In its more general form, it can be expressed as follows. Theorem 9.22 (Elias Bound) For a q-ary code, 𝛼q (𝛿) ≤ 1 − Hq (𝜌 −



𝜌(1 − 𝛿))

0 ≤ 𝛿 ≤ 𝜌,

where 𝜌 = (q − 1)∕q. The Elias bound also has a nonasymptotic form: Let r ≤ 𝜌n, where 𝜌 = (q − 1)∕q. Then [458, p. 65] Aq (n, d) ≤

qn 𝜌nd . r2 − 2𝜌nr + 𝜌nd Vq (n, r)

The value of r can be adjusted to find the tightest bound.

9.6 9.1

Exercises Using the Hamming and Gilbert–Varshamov bounds, determine lower and upper bounds on the redundancy r = n − k for the following codes: (a) (b) (c) (d) (e) (f)

A single-error correcting binary code of length 7. A single-error correcting binary code of length 15. A triple-error correcting binary code of length 23. A triple-error correcting ternary code of length 23. A triple-error correcting 4-ary code of length 23. A triple-error correcting 16-ary code of length 23.

9.6 Exercises 9.2

421

With H2 (x) = −xlog2 x − (1 − x)log2 (1 − x), show that 2−nH2 (𝜆) = 𝜆n𝜆 (1 − 𝜆)n(1−𝜆) .

9.3

With H2 (x) as defined in the previous exercise, show that for any 𝜆 in the range 0 ≤ 𝜆 ≤ 12 , 𝜆n ( ) ∑ n i=0

9.4

i

≤ 2nH2 (𝜆) .

Hint: Use the binomial expansion on (𝜆 + (1 − 𝜆))n = 1; truncate the sum up to 𝜆n and use the fact that (𝜆∕(1 − 𝜆))i < (𝜆∕(1 − 𝜆))𝜆n . [45] Prove: In any set of M distinct nonzero binary words of length n having weight at most n𝜆, the sum of the weights W satisfies W ≥ n𝜆(M − 2nH2 (𝜆) ) ∑ (n) nH2 (𝜆) . Then argue that for each 𝜆 there for any 𝜆 ∈ (0, 1∕2). Hint: Use the fact that 𝜆n k=0 k ≤ 2

9.5 9.6

9.7 9.8 9.9

are at least M − 2nH2 (𝜆) words of weight exceeding n𝜆. For binary codes, show that A2 (n, 2l − 1) = A2 (n + 1, 2l). Using the Cauchy–Schwartz inequality, show that ( q−1 )2 q−1 ∑ 1 ∑ 2 qi ≥ q . q i=0 i i=0 What is the largest possible number of codewords of a ternary (q = 3) code having n = 13 and d = 9? Examine the proof of Theorem 9.8. Under what condition can equality be obtained in (9.8)? Prove the asymptotic Plotkin bound: 𝛼q (𝛿) = 0 if 𝜌 ≤ 𝛿 ≤ 1 𝛼q (𝛿) ≤ 1 − 𝛿∕𝜌 if 0 ≤ 𝛿 ≤ 𝜌.

Hint: For the second part, let n′ = ⌊(d − 1)∕𝜌⌋. Show that 1 ≤ d − 𝜌n′ ≤ 1 + 𝜌. Shorten a code of length n with M codewords to a code of length n′ with M ′ codewords, both having the same ′ distance. Show that M ′ ≥ qn −n M and apply the Plotkin bound. 9.10 [41, 183] Stirling’s formula. In this problem, you will derive the approximation √ n! ≈ nn e−n 2𝜋n.

(a) Show that

n

∫1

log x dx = n log n − n + 1.

(b) Use the trapezoidal rule, as suggested by Figure 9.3(a), to show that n

∫1

log x dx ≥ log n! −

1 log n, 2

where the overbound is the area between the function log x and its trapezoidal approximation. Conclude that √ n! ≤ nn e−n ne.

Bounds on Codes 2

2

1.5

1.5

1

1 log (x)

log (x)

422

0.5

0.5

0

0

−0.5

−0.5

−1 0.5

1

1.5

2

2.5

3 x

3.5

4

4.5

5

−1 0.5

5.5

1

1.5

2

2.5

(a)

3 x

3.5

4

4.5

5

5.5

(b)

Figure 9.3: Finding Stirling’s formula. (a) Under bound; (b) over bound.

(c) Using the integration regions suggested in Figure 9.3(b), show that n

log x dx ≤ log n! +

∫1 and that

1 1 − log n 8 2

√ n! ≥ nn e−n ne7∕8 .

In this integral approximation, for the triangular region, use a diagonal line tangent to the function at 1. In the approximation √ (9.20) n! ≈ nn e−n nC, we thus observe that It will turn out that C = Define the function



e7∕8 ≤ C ≤ e. 2𝜋 works as n → ∞. To see this, we take a nonobvious detour. 𝜋∕2

Ik =

∫0

cosk x dx.

(d) Show that Ik = Ik−2 − Ik ∕(k − 1), and hence that Ik = (k − 1)Ik−2 ∕k for k ≥ 2. Hint: cosk (x) = cosk−2 (x)(1 − sin2 x). (e) Show that I0 = 𝜋∕2 and I1 = 1. (f) Show that I2k−1 > I2k > I2k+1 , and hence that 1> (g) Show that 1 > 2k

[

I2k I2k−1

>

I2k+1 . I2k−1

3 2k − 1 2k − 3 ··· 2k 2k − 2 2

(h) Show that

[ 1 > 𝜋k

(2k)! k

k

2 k!2 k!

]2

]2 >

2k 𝜋 > . 2 2k + 1

2k . 2k + 1

9.6 Exercises

423

(i) Now substitute (9.20) for each factorial to show that ] √ [ √ √ (2k)2k e−2k 2kC 1 > 1− 1 > 𝜋k . √ √ 2k k −k 2k +1 k −k 2 k e kCk e kC √ (j) Show that C → 2𝜋 as k → ∞. 9.11 Show that asymptotically, the Hamming bound can be expressed as 𝛼q (𝛿) ≤ 1 − Hq (𝛿∕2). 9.12 Use the Griesmer bound to determine the largest possible value of k for a (13, k, 5) binary code. 9.13 Use the Griesmer bound to determine the largest possible value of k for a (14, k, 9) ternary code. 9.14 Find a bound on the length of shortest triple-error correcting (d = 7) binary code of dimension 5. 9.15 Let d = 2k−1 . Determine N(k, d). Is there a code that reaches the bound? (Hint: simplex.) 9.16 Properties of Krawtchouk polynomials. (a) Show that for x ∈ {0, 1, . . . , k}, ∞ ∑

Kk (x)zk = (1 + (q − 1)z)n−x (1 − z)x .

k=0

(9.21) ( )

(b) Show that Kk (x) is an orthogonal polynomial with weighting function n ( ) ∑ n i=0

i

(q − i)i Kk (i)Kl (i) = 𝛿kl

n i

(q − 1)i , in that

( ) n (q − 1)k qn . k

(9.22)

(c) Show that for q = 2, Kk (x) = (−1)k Kk (n − x). (d) Show that (q − 1)i

( ) ( ) n n Kk (i) = (q − 1)k Ki (k). i k

(e) Use this result to show another orthogonality relation: n ∑

Kl (i)Kk (i) = 𝛿lk qn .

i=0

(f) One way to compute the Krawtchouk polynomials is to use the recursion (k + 1)Kk+1 (x) = (k + (q − 1)(n − k) − qx)Kk (x) − (q − 1)(n − k + 1)Kk−1 (x), which can be initialized with K0 (x) = 1 and K1 (x) = n(q − 1) − qx from the definition. Derive this recursion. Hint: Differentiate both sides of (9.21) with respect to z, multiply both sides by (1 + (q − 1)z)(1 − z) and match coefficients of zk . (g) Another recursion is Kk (x) = Kk (x − 1) − (q − 1)Kk−1 (x) − Kk−1 (x − 1). Show that this is true. Hint: In (9.21), replace x by x − 1.

424

Bounds on Codes ∑ 9.17 [458] (Extension of Theorem 9.12) Theorem: Let 𝛽(x) = 1 + nk=1 𝛽k Kk (x) be a polynomial with 𝛽i ≥ 0 for 1 ≤ k ≤ n such that 𝛽(j) ≤ 0 for j = d, d + 1, . . . , n. Then Aq (n, d) ≤ 𝛽(0). Justify the following steps of the proof: ∑n (a) i=d Ai 𝛽(i) ≤ 0. ∑ ∑ ∑ (b) − ni=d Ai ≥ nk=1 𝛽k ni=d Ak Kk (i) ∑n ∑n ∑n (c) k=1 𝛽k i=d Ak Kk (i) ≥ − k=1 𝛽k Kk (0). ∑ (d) − nk=1 𝛽k Kk (0) = 1 − 𝛽(0). (e) Hence conclude that Aq (n, d) ≤ 𝛽(0). (f) Now put the theorem to work. Let q = 2, n = 2l + 1 and d = l + 1. Let 𝛽(x) = 1 + 𝛽1 K1 (x) + 𝛽2 K2 (x). a. b. c. d.

Show that 𝛽(x) = 1 + 𝛽1 (n − 2x) + 𝛽2 (2x2 − 2nx + 12 n(n − 1)). Show that choosing to set 𝛽(d) = 𝛽(n) = 0 leads to 𝛽1 = (n + 1)∕2n and 𝛽2 = 1∕n. Show that the conditions of the theorem in this problem are satisfied. Hence conclude that Aq (2l + 1, l + 1) ≤ 2l + 2.

9.18 Let 𝜒 be a character, 𝜒 ∶ G → T. Show that 𝜒(0) = 1, where 0 is the identity of G. 9.19 Show that (9.17) follows from (9.15) and (9.16).

9.7

References

Extensive discussions of bounds appear in [292]. Our discussion has benefited immensely from [458]. The Hamming bound appears in [181]. The Singleton bound appears in [412]. The Plotkin bound appears in [342]. The linear programming bound was developed in [92]. An introduction to orthogonal polynomials is in [319]. More extensive treatment of general facts about orthogonal polynomials is in [427].

Chapter 10

Bursty Channels, Interleavers, and Concatenation 10.1 Introduction to Bursty Channels The coding techniques introduced to this point have been appropriate for channels with independent random errors, such as a memoryless binary symmetric channel, or an AWGN channel. In such channels, each transmitted symbol is affected independently by the noise. We refer to the codes that are appropriate for such channels as random error correcting codes. However, in many channels of practical interest, the channel errors tend to be clustered together in “bursts.” For example, on a compact disc (CD), a scratch on the media may cause errors in several consecutive bits. On a magnetic medium such as a hard disk or a tape, a blemish on the magnetic surface may introduce many errors. A wireless channel may experience fading over several symbol times, or a stroke of lightning might affect multiple digits. In a concatenated coding scheme employing a convolutional code as the inner code, a single incorrect decoding decision might give rise to a burst of decoding errors. Using a conventional random error-correcting block code in a bursty channel leads to inefficiencies. A burst of errors may introduce several errors into a small number codewords, which therefore need strong correction capability, while the majority of codewords are not subjected to error and therefore waste error-correction capabilities. In this chapter, we introduce techniques for dealing with errors on bursty channels. The straightforward but important concept of interleaving is presented. The use of Reed–Solomon codes to handle bursts of bit errors is described. We describe methods of concatenating codes. Finally, Fire codes are introduced, which is a family of cyclic codes specifically designed to handle bursts of errors. Definition 10.1 In a sequence of symbols, a burst of length l is a sequence of symbols confined to l consecutive symbols of which the first and last are in error. ◽ For example, in the error vector 𝐞 = (0 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ is a burst of length 15. A code  is said to have burst-error-correcting capability l if it is capable of correcting all bursts up to length l.

10.2 Interleavers An interleaver takes a sequence of symbols and permutes them. At the receiver, the sequence is permuted back into the original order by a deinterleaver. Interleavers are efficacious in dealing with bursts of errors because, by shuffling the symbols at the receiver, a burst of errors appearing in close proximity may be broken up and spread around, thereby creating an effectively random channel. A common way to interleave is with a block interleaver. This is simply an N × M array which can be read and written in different orders. Typically the incoming sequence of symbols is written into the Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

426

Bursty Channels, Interleavers, and Concatenation

Interleaver x0, x1, x2,..., x11 Write across rows

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10 x11 Read down columns x0, x4, x8, x1, x5, x9,..., x11

Deinterleaver

Write down columns x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10 x11

Read across rows

x0, x1, x2,..., x11

Figure 10.1: A 3 × 4 interleaver and deinterleaver.

interleaver in row order and read out in column order. Figure 10.1 shows a 3 × 4 interleaver. The input sequence x0 , x1 , . . . , x11 is read into the rows of array, as shown, and read off as the sequence x0 , x4 , x8 , x1 , x5 , x9 , x2 , x6 , x10 , x3 , x7 , x11 . Frequently, the width M is chosen to be the length of a codeword.

c1,1

c1,2

c1,3

...

c1,255

c2,1

c2,2

c2,3

...

c2,255

...

...

...

...

...

Example 10.2 We present here an application of interleaving. The UDP (user datagram protocol) internet protocol is one of the TCP/IP protocol suite which does not guarantee packet delivery, but experiences lower network latency. In this protocol, each packet carries with it a sequential packet number, so that any missing packets may be identified. Because of its lower latency, UDP is of interest in near real-time internet applications. Error correction coding can be used to significantly increase the probability that all packets are correctly received with only a moderate decrease in rate. The data stream is blocked into message blocks of 249 bytes and the data are encoded with a (255, 249) Reed–Solomon code having dmin = 7 and capable of correcting six erasures. The codewords are written into an N × 255 matrix as rows and read out in column order. Each column is transmitted as a data packet of length N. Suppose that the third packet associated with this interleaving block, corresponding to the third column, is lost in transmission, as suggested by the shading on this matrix.

cN,1

cN,2

cN,3

...

cN,255

By the protocol, the fact that the third packet is lost is known. When the data are written into the matrix for deinterleaving, the missing column is left blank and recorded as an erasure. Then each of the N codewords is

10.2 Interleavers

427

subjected to erasure decoding, recovering the lost byte in each codeword. For this code, up to six packets out of N may be lost in transmission (erased) and fully recovered at the receiver. ◽

In general, when a burst of length l is deinterleaved, it causes a maximum of ⌈l∕N⌉ errors to occur among the received codewords. If the code used can correct up to t errors, a decoding failure may occur if the burst length exceeds Nt. The efficiency 𝛾 of an interleaver can be defined as the ratio of the length of the smallest burst of errors that exceeds the correction capability of a block code to the amount of memory in the interleaver. Based on our discussion, for the block interleaver the efficiency is t Nt + 1 ≈ . NM M Another kind of interleaver is the cross interleaver or convolutional interleaver [361], which consists of a bank of delay lines of successively increasing length. An example is shown in Figure 10.2. This figure shows the input stream, and the state of the interleaver at a particular time as an aid in understanding its operation. The cross interleaver is parameterized by (M, D), where M is the number of delay lines and D the number of samples each delay element introduces. It is clear that adjacent symbols input to the interleaver are separated by MD symbols. If M is chosen to be greater than or equal to the length of the code, then each symbol in the codeword is placed on a different delay line. If a burst error of length l occurs, then ⌈l∕(MD + 1)⌉ errors may be introduced into the deinterleaved codewords. For a t-error-correcting code, a decoding failure may be possible when l exceeds (MD + 1)t. The total memory of the interleaver is (0 + 1 + 2 + · · · + (M − 1))D = DM(M − 1)∕2. The efficiency is thus 𝛾=

𝛾=

(MD + 1)t + 1 2t ≈ . DM(M − 1)∕2 M − 1

Interleaver Input: ..., x0, x1, x2, x3, x4, x5, x6, ... x0

x1

Interleaver input

x2

x3

D D D

x−3

x−2

x−1

D D

Interleaver output

x−6

x−5

D

x−9

Output: ..., x0, x−3, x−6, x−9, x4, x1, x−2, x−5, x8, x5, x2, x−1,...

Input: ..., x0, x−3, x−6, x−9, x4, x1, x−2, x−5, x8, x5, x2, x−1,... x12

x9

Deinterleaver input

x6

D D D

x8

x5

D D

x4

D

x0

x1

Deinterleaver output

x2

x3 Output: ..., x0, x1, x2, x3, x4, x5, x6, ...

Figure 10.2: A cross interleaver and deinterleaver system.

428

Bursty Channels, Interleavers, and Concatenation Comparison with the block interleaver shows that cross interleavers are approximately twice as efficient as block interleavers. While block interleaving can be accomplished in a straightforward way with an array, for cyclic codes there is another approach. If  is a cyclic code of length n with generator g(x), then the code obtained after interleaving with an M × n interleaver matrix is also cyclic, with generator polynomial g(xM ). The encoding and syndrome computation can thus be implemented in conventional fashion using shift registers [271, p. 272].

10.3 An Application of Interleaved RS Codes: Compact Discs By far the most common application of RS codes is to compact discs. The data on the CD is protected with a Reed–Solomon code, providing resilience to scratches on the surface of the disc. (Of course, scratches may still impede CD playback, as many listeners can attest, but these problems are more often the result of tracking problems with the read laser and not the problem of errors in the decoding stream read.) Because this medium is so pervasive, it is important to be familiar with the data representation. This presentation also brings out another important point. The error correction is only one aspect of the data processing that takes place. To be of practical value, error-correction coding must work as a component within a larger system design. In addition to basic error correction provided by Reed–Solomon codes, protection against burst errors due to scratches on the disc is provided by the use of a cross interleaver. The data stream is also formatted with an eight-to-fourteen modulation (EFM) code which prevents excessively long runs of ones. This is necessary because the laser motion control system employs a phase-locked loop (PLL) which is triggered by bit transitions. If there is a run of ones that is too long, the PLL may drift. Details on the data stream and the EFM code are in [211]. Our summary here follows [483].

Control and display information

Synch. bits

left Sampling A/D

Sound source right

44.1 K samples/s 16 bits/sample/channel

Error correction coding

1.4112 Mbps

MUX

1.88 Mbps

8–14 modulation (EFM) 1.94 Mbps

MUX

Optical recording channel

4.32 Mbps

Figure 10.3: The CD recording and data formatting process. The overall block diagram of the CD recording process is shown in Figure 10.3. The 1.41 Mbps sampled data stream passes through an error-correction system resulting in a data rate of 1.88 Mbps. The encoder system, referred to as CIRC, uses two interleaved, shortened, Reed–Solomon codes, 1 and 2 . Both codes are built on codes over GF(256). The 8-bit symbols of the field fit naturally with the 16-bit samples used by the A/D converter. However, the codes are significantly shortened: 1 is a (32, 28) code and 2 is a (28, 24) code. For every 24 input symbols there are 32 output symbols, resulting in a rate R = 24∕32 = 3∕4. Both codes have minimum distance 5. Encoding The outer code, 2 , uses 12 16-bit samples to create 24 8-bit symbols as its message word. The 28-symbol codeword is passed through a (28, 4) cross interleaver. The resulting 28 interleaved symbols are passed through the code 1 , resulting in 32 coded output symbols, as shown in Figure 10.4.

10.4 Product Codes

429

1D 2D C2 (28, 24, 5) shortened RS encoder

3D

Input: 24 8-bit symbols obtained from 12 16-bit samples

26 D

Output: 28 8-bit symbols

27 D

C1 (32, 28, 5) shortened RS encoder

Output: 32 8-bits ymbols

M = 28 D=4 cross interleaver

Figure 10.4: The error-correction encoding in the compact disc standard.

Decoding At the decoder (in the CD player), the data that reaches the CIRC decoder first passes through the outer decoder 1 . Since 1 has minimum distance 5, it is capable of correcting two errors, or correcting one error and detecting two errors in the codeword, or detecting up to four errors. If the 1 decoder detects a high-weight error pattern, such as a double error pattern or any error pattern causing a decoder failure, the decoder outputs 28 erased symbols. The deinterleaver spreads these erased symbols over 28 2 codewords, where the erasure correction capability of the 2 code can be used. The 2 decoder can correct any combination of e errors and f erasures satisfying 2e + f < 5. Since 1 is likely to be able to produce error-free output, but will declare erasures when there seem to be too many errors for it to correct, the 2 decoder is frequently built to be an erasures-only decoder. Since erasure decoding involves only straightforward linear algebra (e.g., Forney’s algorithm) and does not require finding an error-locator polynomial, this can be a low-complexity decoder. In the rare event that a vector is presented having more than four erasures that 2 is not able to correct the 2 decoder outputs 24 erasures. In this case, the playback system uses an “error concealment” system which either mutes the corresponding 12 samples of music or performs some kind of interpolation. The performance specifications for this code are summarized in Table 10.1.

10.4 Product Codes Let 1 be an (n1 , k1 ) linear block code and let 2 be an (n2 , k2 ) linear block code over GF(q). An (n1 n2 , k1 k2 ) linear code called the product code, denoted 1 × 2 , can be formed as diagrammed in Figure 10.5. A set of k1 k2 symbols is written into a k2 × k1 array. Each of the k2 rows of the array is (systematically) encoded using code 1 , forming n1 columns. Each of the n1 columns of the array is then (systematically) encoded using code 2 , forming an n2 × n1 array. Because of linearity, it does not matter whether the encoding procedure is reversed (2 encoding followed by 1 encoding). Theorem 10.3 If 1 has minimum distance d1 and 2 has minimum distance d2 , then the product code 1 × 2 has minimum distance d1 d2 .

430

Bursty Channels, Interleavers, and Concatenation Table 10.1: Performance Specification of the Compact Disc Coding System [211, p. 57] Maximum completely correctable burst length Maximum interpolatable burst length in the worst case Sample interpolation rate Undetected error samples (producing click in output) Code rate Implementation

≈ 4000 bits (≈ 2.5 mm track length) ≈ 12,300 data bits (≈ 7.7 mm track length) One sample every 10 hours at a bit error rate (BER) of 10−4 . 1000 samples per minute at BER of 10−3 Less than one every 750 hours at BER=10−3 . Negligible at BER≤ 10−4 R = 3∕4 One LSI chip plus one random-access memory of 2048 bytes.

n1 k1

k2

Information symbols

Checks on rows

Checks on columns

Checks on checks

n2

Figure 10.5: The product code 1 × 2 . Proof The minimum weight cannot be less than d1 d2 : Each nonzero row of the matrix in Figure 10.5 must have weight ≥ d1 and there must be at least d2 nonzero rows. To show that there is a codeword of weight d1 d2 , let 𝐜1 be a codeword in 1 of minimum weight and let 𝐜2 be a codeword in 2 of minimum weight. Form an array in which all columns corresponding ◽ to zeros in 𝐜1 are zeros and all columns corresponding to ones in 𝐜1 are 𝐜2 . Using the Chinese remainder theorem it is straightforward to show [292, p. 570] that if 1 and 2 are cyclic and (n1 , n2 ) = 1, then 1 × 2 is also cyclic. The product code construction can be iterated. For example, the code 1 × 2 × 3 can be produced. It may be observed that by taking such multiple products, codes with large distance can be obtained. However, the rate of the product code is the product of the rates, so that the product code construction produces codes with low rate. We do not present a detailed decoding algorithm for product codes here. However, decoding of codes similar to product codes is discussed in Chapter 14. Product codes are seen there to be an instance of turbo codes, so a turbo code decoding algorithm is used. However, we discuss here the burst-error-correction capability of product codes [271, p. 275]. Suppose 1 has burst-error-correction capability l1 and 2 has burst- error-correction capability l2 . Suppose that the code is transmitted out of the matrix row by row. At the receiver the data are written back into an array in row-by-row order. A burst of length n1 l2 or less can affect no more than l2 + 1 consecutive rows, since when the symbols are arranged into the array, each column is affected by a burst of length at most l2 . By decoding first

10.6 Concatenated Codes

431

on the columns, the burst can be corrected, so the burst correction capability of the product code is at least n1 l2 . Similarly, it can be shown that bursts of length at least n2 l1 can be corrected, so the overall burst-error-correction capability of the code is max(n1 l2 , n2 l1 ).

10.5 Reed–Solomon Codes Reed–Solomon codes and other codes based on larger-than-binary fields have some intrinsic ability to correct bursts of binary errors. For a code over a field GF(2m ), each coded symbol can be envisioned as a sequence of m bits. Under this interpretation, a (n, k) Reed–Solomon code over GF(2m ) is a binary (mn, mk) code. The RS code is capable of correcting up to t symbols of error. It does not matter that a single symbol might have multiple bits in error — it still is a single symbol error from the perspective of the RS decoder. A single symbol might have up to m bits in error. Under the best of circumstances then, when all the errors affect adjacent bits of a symbol, a RS code may correct up to mt bits in error. This means that RS codes are naturally effective for transmitting over bursty binary channels: since bursts of errors tend to cluster together, there may be several binary errors contributing to a single erroneous symbol. As an example [271, p. 278], a burst of length 3m + 1 cannot affect more than four symbols, so a RS code capable of correcting 4 errors can correct any burst of length 3m + 1. Or any burst of length m + 1 cannot affect more than 2 bytes, so the four-error-correcting code could correct up to two bursts of length m + 1. In general, a t-error-correcting RS code over GF(2m ) can correct any combination of t 1 + ⌊l + m − 2)∕m⌋ or fewer bursts of length l, or correcting a single burst up to length (t − 1)m + 1. And, naturally, it also corrects any combination of t or fewer random errors.

10.6 Concatenated Codes Concatenated codes were proposed by Forney [126] as a means of obtaining long codes (as required by the Shannon channel coding theorem for capacity-approaching performance) with modest decoding complexity. The basic concatenation coding scheme is shown in Figure 10.6. The inner code is conventionally a binary code. The outer code is typically a (n2 , k2 ) Reed–Solomon code over GF(2k ). The outer code uses k2 k-tuples of bits from the inner code as the message sequence. In the encoding, the outer code takes kk2 bits divided into k-tuples which are employed as k2 symbols in GF(2k ) and encodes them as a Reed–Solomon codeword (c0 , c1 , . . . , cn2 ). These symbols, now envisioned as k-tuples of binary numbers, are encoded by the inner encoder to produce a binary sequence transmitted over the channel. The inner code is frequently a convolutional code. The purpose of the inner code is to improve the quality of the “superchannel” (consisting of the inner encoder, the channel, and the inner decoder) that

“Superchannel”

Outer encoder

Inner encoder

Channel

Inner decoder

Encoder

Outer decoder

Decoder

Figure 10.6: A concatenated code.

432

Bursty Channels, Interleavers, and Concatenation the outer RS code sees so that the RS code can be used very effectively. When the Viterbi decoder (the inner decoder) makes a decoding error, it typically involves a few consecutive stages of the decoding trellis, which results in a short burst of errors. The bursts of bit errors which tend to be produced by the inner decoder are handled by the RS decoder with its inherent burst-error-correction capability. Example 10.4 [483, p. 432] Figure 10.7 shows the block diagram of a concatenated coding system employed by some NASA deep-space missions. The outer code is a (255, 223) Reed–Solomon code followed by a block interleaver. The inner code is a rate 1∕2 convolutional code, where the generator polynomials are Rate 1/2: dfree = 10

g0 (x) = 1 + x + x3 + x4 + x6

g1 (x) = 1 + x3 + x4 + x5 + x6 .

The RS code is capable of correcting up to 16 8-bit symbols. The dfree path through the trellis traverses seven branches, so error bursts most frequently have length seven, which in the best case can be trapped by a single RS code symbol. To provide for the possibility of decoding bursts exceeding 16 × 8 = 128 bits, a symbol interleaver is placed between the RS encoder and the convolutional encoder. Since it is a symbol interleaver, burst errors which occupy a single byte are still clustered together. But bursts crossing several bytes are randomized. Block interleavers holding from 2 to 8 Reed–Solomon codewords have been employed. By simulation studies [177], it is shown that to achieve a bit error rate of 10−5 , with interleavers of sizes of 2, 4, and 8, respectively, an Eb ∕N0 of 2.6, 2.45, and 2.35dB, respectively, are required. Uncoded BPSK performance would require 9.6dB; using only the rate 1∕2 convolutional code would require 5.1 dB, so the concatenated system provides approximately 2.5 dB of gain compared to the convolutional code alone.

(255, 233) RS encoder

Block interleaver

Inner encoder: R = 1/2 convolutional encoder Channel

RS decoder

Block deinterleaver

Inner decoder: Viterbi algorithm with softdecision decoding

Figure 10.7: Deep-space concatenated coding system. ◽

10.7 Fire Codes 10.7.1

Fire Code Definition

Fire codes, named after their inventor [124], are binary cyclic codes designed specifically to be able to correct a single burst of errors. They are designed as follows. Let p(x) be an irreducible polynomial of degree m over GF(2). Let 𝜌 be the smallest integer such that p(x) divides x𝜌 + 1. 𝜌 is called the period of p(x). Let l be a positive integer such that l ≤ m and 𝜌| 2l − 1. Let g(x) be the generator polynomial defined by (10.1) g(x) = (x2l−1 + 1)p(x). Observe that the factors p(x) and x2l−1 + 1 are relatively prime. The length n of the code is the least common multiple of 2l − 1 and the period: n = LCM(2l − 1, 𝜌) and the dimension of the code is k = n − m − 2l + 1.

10.7 Fire Codes

433

Example 10.5 Let p(x) = 1 + x + x4 . This is a primitive polynomial, so 𝜌 = 24 − 1 = 15. Let l = 4 and note that 2l − 1 = 7 is not divisible by 15. The Fire code has generator g(x) = (x7 + 1)(1 + x + x4 ) = 1 + x + x4 + x7 + x8 + x11 with length and dimension n = LCM(7, 15) = 105 and

k = 94.



The burst-error-correction capabilities of the Fire code are established by the following theorem. Theorem 10.6 The Fire code is capable of correcting any burst up to length l. Proof [271, p. 262] We will show that bursts of different lengths reside in different cosets, so they can be employed as coset leaders and form correctable error patterns. Let a(x) and b(x) be polynomials of degree l1 − 1 and l2 − 1, representing bursts of length l1 and l2 , respectively, a(x) = 1 + a1 x + a2 x2 + · · · + al1 −2 xl1 −2 + xl1 −1 b(x) = 1 + b1 x + b2 x2 + · · · + bl2 −2 xl2 −2 + xl2 −1 , with l1 ≤ l and l2 ≤ l. Since a burst error can occur anywhere within the length of the code, we represent bursts as xi a(x) and xj b(x), where i and j are less than n and represent the starting position of the burst. Suppose (contrary to the theorem) that xi a(x) and xj b(x) are in the same coset of the code. Then the polynomial 𝑣(x) = xi a(x) + x j b(x) must be a code polynomial in the code. We show that this cannot occur. Without loss of generality, take i ≤ j. By the division algorithm, dividing j − i by 2l − 1, we obtain j − i = q(2l − 1) + b

(10.2)

for some quotient q and remainder b, with 0 ≤ b < 2l − 1. Using this, we can write 𝑣(x) = xi (a(x) + xb b(x)) + xi+b b(x)(xq(2l−1) + 1).

(10.3)

Since (by our contrary assumption) 𝑣(x) is a codeword, it must be divisible by g(x) and, since the factors of g(x) in (10.1) are relatively prime, 𝑣(x) must be divisible by x2l−1 + 1. Since xq(2l−1) + 1 is divisible by x2l−1 + 1, it follows that a(x) + xb b(x) is either divisible by x2l−1 + 1 or is 0. Let us write a(x) + xb b(x) = d(x)(x2l−1 + 1)

(10.4)

for some quotient polynomial d(x). Let 𝛿 be the degree of d(x). The degree of d(x)(x2l−1 + 1) is 𝛿 + 2l − 1. The degree of a(x) is l1 − 1 < 2l − 1, so the degree of a(x) + xb b(x) must be established by the degree of xb b(x). That is, we must have b + l2 − 1 = 2l − 1 + 𝛿. Since l1 ≤ l and l2 ≤ l, subtracting l2 from both sides of (10.5), we obtain b ≥ l1 + 𝛿. From this inequality we trivially observe that b > l1 − 1

and

b > d.

(10.5)

434

Bursty Channels, Interleavers, and Concatenation Writing out a(x) + xb b(x), we have a(x) + xb b(x) = 1 + a1 x + a2 x2 + · · · + al1 −2 xl1 −2 + xl1 −1 + xb (1 + b1 x + b2 x2 + · · · + bl2 −2 xl2 −2 + xl2 −1 ) so that xb is one of the terms in a(x) + xb b(x). On the other hand, since 𝛿 < b < 2l − 1, the expression d(x)(x2l−1 + 1) from (10.4) does not have the term xb , contradicting the factorization of (10.4). We must therefore have d(x) = 0 and a(x) + xb b(x) = 0. In order to cancel the constant terms in each polynomial, we must have b = 0, so we conclude that a(x) = b(x). Since b must be 0, (10.2) gives j − i = q(2l − 1).

(10.6)

Substituting this into (10.3), we obtain 𝑣(x) = xi b(x)(xj−i + 1). Now the degree b(x) is l2 − 1 < l, so deg(p(x)) < m = deg(p(x)). But since p(x) is irreducible, b(x) and p(x) must be relatively prime. Therefore, since 𝑣(x) is (assumed to be) a code polynomial, xj−i + 1 must be divisible by p(x) (since it cannot be divisible by x2l−1 + 1). Therefore, j − i must be a multiple of 𝜌. By (10.6), j − i must also be multiple of 2l − 1. So j − i must be a multiple of the least common multiple of 2l − 1 and m. But this least common multiple is n. We now reach the contradiction which leads to the final conclusion: j − i cannot be a multiple of n, since j and i are both less than n. We conclude, therefore, that 𝑣(x) is not a codeword, so the bursts xi a(x) and xj b(x) are in different cosets. Hence they are correctable error patterns. ◽

10.7.2

Decoding Fire Codes: Error Trapping Decoding

There are several decoding algorithms which have been developed for Fire codes. We present here the error trapping decoder. Error trapping decoding is a method which works for many different cyclic codes, but is particularly suited to the structure of Fire codes. Let r(x) = c(x) + e(x) be a received polynomial. Let us recall that for a cyclic code, the syndrome may be computed by dividing r(x) by g(x), e(x) = q(x)g(x) + s(x). The syndrome is a polynomial of degree up to n − k − 1, s(x) = s0 + s1 x + · · · + sn−k−1 xn−k−1 . Also recall that if r(x) is cyclically shifted i times to produce r(i) (x), the syndrome s(i) (x) may be obtained either by dividing r(i) (x) by g(x), or by dividing xi s(x) by g(x). Suppose that an l burst-error-correcting code is employed and that the errors occur in a burst confined to the l digits, e(x) = en−k−l xn−k−l + en−k−l+1 xn−k−l+1 + · · · + en−k−1 xn−k−1 . Then the syndrome digits sn−k−l , sn−k−l+1 , . . . , sn−k−1 match the error values and the syndrome digits s0 , s1 , . . . , sn−k−l−1 are zeros. If the errors occur in a burst of l consecutive positions at some other location, then after some number i of cyclic shifts, the errors are shifted to the positions xn−k−l , xn−k−l+1 , . . . , xn−k−1 of r(i) (x). Then the corresponding syndrome s(i) (x) of r(i) (x) matches the errors at positions xn−k−l , xn−k−l+1 , . . . , xn−k−1 of r(i) (x) and the digits at positions x0 , x1 , . . . , xn−k−l−1 are zeros. This fact allows us to “trap” the errors: when the condition of zeros is detected among the lower syndrome digits, we conclude that the shifted errors are trapped in the other digits of the syndrome register.

10.7 Fire Codes

435

An error trapping decoder is diagrammed in Figure 10.8. The operation is as follows [271]: 1. With gate 1 and gate 2 open, the received vector r(x) is shifted into the syndrome register, where the division by g(x) takes place by virtue of the feedback connections, so that when r(x) has been shifted in, the syndrome register contains s(x). r(x) is also simultaneously shifted into a buffer register. 2. Successive shifts of the syndrome register occur with gate 2 still open. When the left n − k − l memory elements contain only zeros, the right l stages are deemed to have “trapped” the burst error pattern and error correction begins. The exact correction actions depends upon how many shifts were necessary.







If the n − k − l left stages of the syndrome register are all zero after the ith shift for 0 ≤ i ≤ n − k − l, then the errors in e(x) are confined only to the parity check positions of r(x), so that the message bits are error free. There is no need to correct the parity bits. In this case, gate 4 is open, and the buffer register is simply shifted out. If, for this range of shifts the n − k − l left stages are never zero, then the error burst is not confined to the parity check positions of r(x). If the n − k − l left stages of the syndrome register are all zero after the (n − k − l + i)th shift, for 1 ≤ i ≤ l, then the error burst is confined to positions xn−i , . . . , xn−1 , x0 , . . . , xl−i−1 or r(x). (This burst is contiguous in a cyclic sense.) In this case, l − i right digits of the syndrome buffer register match the errors at the locations x0 , x1 , . . . , xl−i−1 of r(x), which are parity check positions and the next i stages of the syndrome register match the errors at locations xn−i , . . . , xn−2 , xn−1 , which are message locations. The syndrome register is shifted l − i times with gate 2 closed (no feedback) so that the errors align with the message digits in the buffer register. Then gate 3 and gate 4 are opened and the message bits are shifted out of the buffer register, being corrected by the error bits shifted out of the syndrome register. If the n − k − l left stages of the syndrome register are never all zeros by the time the syndrome register has been shifted n − k times, the bits are shifted out of the buffer register with gate 4 open while the syndrome register is simultaneously shifted with gate 2 open. In the event that the n − k − l left stages of the syndrome register become equal to all zeros, the digits in the l right stages of the syndrome register match the errors of the next l message bits. Gate 3 is then opened and the message bits are corrected as they are shifted out of the buffer register.

g (x) connections

r (x)

gate 1

+

gate 2

Syndrome register (n – k stages)

Test for all 0s (n – k – l stages) gate 3 Buffer register

gate 4

+

Corrected output

Figure 10.8: Error trapping decoder for burst-error-correcting codes [271, p. 260].

436

Bursty Channels, Interleavers, and Concatenation



If the n − k − l left stages of the syndrome register are never all zeros by the time the k message bits have been shifted out of the buffer, an uncorrectable error burst has been detected.

10.8 Exercises 10.1 Let an (n, k) cyclic code with generator g(x) be interleaved by writing its codewords into an M × n array, then reading out the columns. The resulting code is an (Mn, kn) code. Show that this code is cyclic with generator g(xM ). ⨂ 10.2 Let G1 and G2 be generator matrices for 1 and 2 , respectively. Show that G1 G2 is a ⨂ is the Kronecker product introduced in Chapter 8. generator matrix for 1 × 2 , where 10.3 Let 1 and 2 be cyclic codes of length n1 and n2 , respectively, with (n1 , n2 ) = 1. Form the product code 1 × 2 . In this problem you will argue that 1 × 2 is also cyclic. Denote the codeword represented by the matrix ⎡ c00 c01 · · · c0 n2 −1 ⎤ ⎢ c c11 · · · c1 n2 −1 ⎥ ⎥ ⎢ 10 𝐜=⎢ ⋮ ⎥ ⎢cn −1 0 cn −1 1 · · · cn −1 n −1 ⎥ 1 1 2 ⎥ ⎢ 1 ⎦ ⎣ by the polynomial ∑ ∑

n1 −1 n2 −1

c(x, y) =

cij xi yj ,

i=0 j=0

where c(x, y) ∈ 𝔽 [x]∕(xn1 − 1, yn2 − 1). That is, in the ring where xn1 = 1 and yn2 = 1 (since the codes are cyclic). Thus, xc(x, y) and yc(x, y) represent cyclic shifts of the rows and columns, respectively, of 𝐜. (a) Show that there exists a function I(i, j) such that for each pair (i, j) with 0 ≤ i < n1 and 0 ≤ j < n2 , I(i, j) is a unique integer in the range 0 ≤ I(i, j) < n1 n2 , such that I(i, j) ≡ i (mod n1 ) I(i, j) ≡ j (mod n2 ). (b) Using I(i, j), rewrite c(x, y) in terms of a single variable z = xy by replacing each xi yj by zI(i,j) to obtain the representation d(z). (c) Show that the set of code polynomials d(z) so obtained is cyclic. 10.4 For each of the following (M, D) pairs, draw the cross interleaver and the corresponding deinterleaver. Also, for the sequence x0 , x1 , x2 , . . . , determine the interleaved output sequence. (a) (M, D) = (2, 1) (b) (M, D) = (2, 2)

(c) (M, D) = (3, 2) (d) (M, D) = (4, 2)

10.5 [271] Find a generator polynomial of a Fire code capable of correcting any single error burst of length 4 or less. What is the length of the code? Devise an error trapping decoder for this code.

10.9 Exercises

10.9 References The topic of burst-error-correcting codes is much more fully developed in [271] where, in addition to the cyclic codes introduced here, several other codes are presented which were found by computer search. Fire codes were introduced in [124]. The cross interleaver is examined in [361]. Error trapping decoding is also fully developed in [271]; it was originally developed in [246, 247, 314, 315, 388]. The CD system is described in [210]. A thorough summary is provided in [211]. Product codes are discussed in [292, chapter 18]. Application to burst error correction is described in [271]. Cyclic product codes are explored in [53, 274]. The interpretation of Reed–Solomon codes as binary (mn, mk) codes is discussed in [292]. Decoders which attempt binary-level decoding of Reed–Solomon codes are in [315] and references therein.

437

Chapter 11

Soft-Decision Decoding Algorithms 11.1 Introduction and General Notation Most of the decoding methods described to this point in the book have been based on discrete field values, usually bits obtained by quantizing the output of the matched filter. However, the actual value of the matched filter output might be used, instead of just its quantization, to determine the reliability of the bit decision. For example, in BPSK modulation if the matched filter output is very near to zero, then any bit decision made based on only that output would have low reliability. A decoding algorithm which takes into account reliability information or uses probabilistic or likelihood values rather than quantized data is called a soft-decision decoding algorithm. Decoding which uses only the (quantized) received bit values and not their reliabilities is referred to as hard-decision decoding. As a general rule of thumb, soft-decision decoding can provide as much as 3 dB of gain over hard-decision decoding. In this short chapter, we introduce some of the most commonly used historical methods for soft-decision decoding, particularly for binary codes transmitted using BPSK modulation over the AWGN channel. Some modern soft-decision decoding techniques are discussed in the context of turbo codes (Chapter 14) and LDPC codes (Chapter 15) and polar codes (Chapter 17). Some clarification in the terminology is needed. The algorithms discussed in the chapter actually provide hard output decisions. That is, the decoded values are provided without any reliability information. However, they rely on “soft” input decisions — matched filter outputs or reliabilities. They should thus be called soft-input hard-output algorithms. A soft-output decoder would provide decoded values accompanied by an associated reliability measure, or a probability distribution for the decoded bits. Such decoders are called soft-input, soft-output decoders. The turbo and LDPC decoders provide this capability. Let  be a code and let a codeword 𝐜 = (c0 , c1 , . . . , cn−1 ) ∈  be modulated as the vector 𝐜̃ = (̃c0 , c̃ 1 , . . . , c̃ n−1 ) (assuming for convenience a one-dimensional signal space; modifications for two- or higher-dimensional signal spaces are straightforward). We will denote the operation of modulation — mapping into the signal space for transmission — by M, so that we can write 𝐜̃ = M(𝐜). The modulated signal 𝐜̃ is passed through a memoryless channel to form the received vector 𝐫 = (r0 , r1 , . . . , rn−1 ). For example, for an AWGN ri = c̃ i + ni , where ni ∼  (0, 𝜎 2 ), with 𝜎 2 = N0 ∕2. The operation of “slicing” the received signal into signal constellation values, the detection problem, can be thought of as an “inverse” modulation. We denote the “sliced” values as 𝑣i . Thus, 𝑣i = M−1 (ri ) or 𝐯 = M−1 (𝐫). Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

440

Soft-Decision Decoding Algorithms If the ri values are sliced into discrete detected values 𝑣i and only this information is used by the decoder, then hard-decision decoding occurs. Example 11.1 For example, if BPSK modulation is used, then c̃ i = M(ci ) = signal point. The received values are sliced into detected bits by { 0 ri < 0 ∈ {0, 1}. 𝑣i = M−1 (ri ) = 1 ri ≥ 0



Ec (2ci − 1) is the modulated

In this case, as described in Section 1.5.7, there is effectively a BSC model between transmitter and receiver. Figure 11.1 illustrates the signal labels.

c



Modulate

c

Channel

(matched filter v∼ or outputs) v (remodulated) r Slice Reliability

Zi

Figure 11.1: Signal labels for soft-decision decoding. ◽

It is possible to associate with each sliced value 𝑣i a reliability zi , which indicates the quality of the decision. The reliabilities are ordered such that zi > zj if the ith symbol is more reliable — capable of producing better decisions — than the jth symbol. If the channel is AWGN, then zi = |ri | ∈ ℝ can be used as the reliability measure, since the absolute log likelihood ratio α α2 α3 α4 α 5 α6 α 7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

utiltkm.cc utiltkm.h

| | |log p(ri |ci = 1) | | p(ri |ci = 0) || |

(11.1)

is proportional to |ri |. Associated with each channel is a distance measure. For the BSC, the appropriate distance measure is the Hamming distance,

dH (𝐯, 𝐜) =

n−1 ∑

[𝑣i ≠ ci ].

i=0

However, for soft-decision decoding over the AWGN, the Euclidean metric between the (soft) received vector and the transmitted codeword dE (𝐫, 𝐜̃) = dE (𝐫, M(𝐜)) =∥𝐫 − 𝐜̃∥2 is more appropriate. (See Section 1.5 for further discussion.) A key idea in soft-decision decoding is sorting the symbols in order of reliability. When the symbols are sorted in order of decreasing reliability, then the first sorted symbols are more likely to be correct than the last sorted symbols. The set of symbols which have the highest reliability (appearing first in the sorted list) are referred to as being in the most reliable positions and the set of symbols appearing at the end of the sorted list are said to be in the least reliable positions.

11.2 Generalized Minimum Distance Decoding

11.2 Generalized Minimum Distance Decoding Recall (see Section 3.8) that an erasure-and-error decoder is capable of correcting twice as many erasures as errors. That is, a code with minimum Hamming distance dmin can simultaneously correct e errors and f erasures provided that 2e + f ≤ dmin − 1. The generalized minimum distance (GMD) decoder devised by Forney [127] makes use of this fact, deliberately erasing symbols which have the least reliability, then correcting them using an erasure-and-error decoder. The GMD decoding considers all possible patterns of up to f ≤ dmin − 1 erasures in the least reliable positions. The decoder operates as follows. Algorithm 11.1 Generalized Minimum Distance (GMD) Decoding Initialize: For the (soft) received sequence 𝐫 = (r0 , r1 , … , rn−1 ), form the hard-decision vector 𝐯 = (𝑣0 , 𝑣1 , … , 𝑣n−1 ) and the reliabilities zi = |ri |. Sort the reliabilities to find the dmin − 1 least reliable positions if dmin is even: for j = 1 to dmin − 1 by 2 erase the j least reliable symbols in 𝐯 to form amodified vector 𝐯̂ Decode and Select the best codeword else if dmin is odd: for j = 0 to dmin − 1 by 2 erase the j least reliable symbols in 𝐯 to form amodified vector 𝐯̂ Decode and Select the best codeword end if Decode and Select the best codeword Decode 𝐯̂ using an erasures-and-errors decoding algorithmto obtain a codeword 𝐜. Compute the soft-decision (e.g., Euclidean) distance betweenM(𝐜) and 𝐫, dE (𝐫, M(𝐜)), and select the codeword with the best distance.

As discussed in Exercise 11.2, the correlation discrepancy 𝜆 can be computed instead of the distance. Since hard-decision values are actually used in the erasures-and-errors decoding algorithm, an algebraic decoder can be used, if it exists for the particular code being decoded. (The Chase algorithms described below are also compatible with algebraic decoding algorithms.) Note that for dmin either even or odd, there are only ⌊(dmin + 1)∕2⌋ different vectors that must be decoded. While the GMD algorithm is straightforward and conceptually simple, justifying it in detail will require a bit of work, which follows in the next section. 11.2.1

Distance Measures and Properties

There are two theorems which will establish the correctness of the GMD algorithm, which will require some additional notation. We first generalize the concept of Hamming distance. In defining the Hamming distance between elements of a codeword 𝐜 and another vector 𝐯, there were essentially two “classes” of outcomes, those where 𝑣i matches ci , and those where 𝑣i does not match ci . We generalize this by introducing J reliability classes C1 , C2 , . . . , CJ , each of which has associated with it two parameters 𝛽cj and 𝛽ej such that 0 ≤ 𝛽cj ≤ 𝛽ej ≤ 1. We also introduce the weight of the class, 𝛼j = 𝛽ej − 𝛽cj . Then 𝛽ej is the “cost” when 𝑣i is in class j and 𝑣i ≠ ci , and 𝛽cj is the “cost” when 𝑣i is in class j and 𝑣i = ci . It is clear that 0 ≤ 𝛼j ≤ 1. We write { 𝛽ej 𝑣i ∈ Cj and 𝑣i ≠ ci dG (𝑣i , ci ) = 𝛽cj 𝑣i ∈ Cj and 𝑣i = ci .

441

442

Soft-Decision Decoding Algorithms Then the generalized distance dG (𝐯, 𝐜) is defined as dG (𝐯, 𝐜) =

n−1 ∑

dG (𝑣i , ci ).

(11.2)

i=0

(Note that this is not a true distance, since it is not symmetric in 𝐯 and 𝐜.) Now let ncj be the number of symbols received correctly (i.e., 𝑣i = ci ) and put into class Cj . Let nej be the number of symbols received incorrectly (so that 𝑣i ≠ ci ) and put into class Cj . Then (11.2) can be written as dG (𝐯, 𝐜) =

J ∑

𝛽cj ncj + 𝛽ej nej .

(11.3)

j=1

Example 11.2 For conventional, errors-only decoding, there is only one reliability class, C1 , having 𝛽c1 = 0 and 𝛽e1 = 1, so 𝛼1 = 1. In this case, the generalized distance specializes to the Hamming distance. Introducing erasures introduces a second reliability class C2 having 𝛽c2 = 𝛽e2 = 0, or 𝛼2 = 0. Symbols in the erasure class can be considered to be equally distant from all transmitted symbols. ◽

A class for which 𝛼j = 1 is said to be fully reliable. The first theorem provides a basis for declaring when the decoding algorithm can declare a correct decoded value. Theorem 11.3 [127] For a code having minimum distance dmin , if c is sent and ncj and nej are such that J ∑ [(1 − 𝛼j )ncj + (1 + 𝛼j )nej ] < dmin , (11.4) j=1

then dG (v, c) < dG (v, c′ ) for all codewords c′ ≠ c. That is, v is closer to c in the dG measure than to all other codewords. Note that for errors-only decoding, this theorem specializes as follows: if 2ne1 < dmin , then dH (𝐫, 𝐜) < dH (𝐫, 𝐜′ ), which is the familiar statement regarding the decoding capability of a code, with ne1 the number of correctable errors. For erasures-and-errors decoding, the theorem specializes as follows: if 2ne1 + (nc2 + ne2 ) < dmin , then dG (𝐯, 𝐜) < dG (𝐯, 𝐜′ ). Letting f = nc2 + ne2 be the number of erasures, this recovers the familiar condition for erasures-and-errors decoding. Proof Let 𝐜′ be any codeword not equal to 𝐜. Partition the n symbol locations into sets such that ⎧S ⎪ 0 i ∈ ⎨Scj ⎪S ⎩ ej

if ci = c′i if ci ≠ c′i and 𝑣i = ci and 𝑣i ∈ Cj if ci ≠ c′i and 𝑣i ≠ ci and 𝑣i ∈ Cj .

Clearly we must have |Scj | ≤ ncj and |Sej | ≤ nej .

• • •

For i ∈ S0 , dG (𝑣i , c′i ) ≥ 0 = dH (ci , c′i ). For i ∈ Scj , dG (𝑣i , c′i ) = 𝛽ej = dH (c′i , ci ) − 1 + 𝛽ej For i ∈ Sej , dG (𝑣i , c′i ) ≥ 𝛽cj = dH (c′i , ci ) − 1 + 𝛽cj .

11.2 Generalized Minimum Distance Decoding

443

Summing both sides of these equalities/inequalities over j = 1, . . . , J, we obtain n−1 ∑ [(1 − 𝛽ej )|Scj | + (1 − 𝛽cj )|Sej |] dG (𝐯, 𝐜 ) ≥ dH (𝐜, 𝐜 ) − ′



i=0

≥ dmin −

n−1 ∑ [(1 − 𝛽ej )ncj + (1 − 𝛽cj )nej )]. i=0

If, as stated in the hypothesis of the theorem, dmin >

J ∑ [(1 − 𝛽ej + 𝛽cj )ncj + (1 − 𝛽cj + 𝛽ej )nej ], j=1

then dG (𝐯, 𝐜′ ) >

J ∑

ncj 𝛽cj + nej 𝛽j = dG (𝐯, 𝐜).

j=1



This theorem allows us to draw the following conclusions:



If generalized distance dG is used as a decoding criterion, then no decoding error will be made when ncj and nej are such that (11.4) is satisfied. Let us say that 𝐜 is within the minimum distance of 𝐯 if (11.4) is satisfied.



The theorem also says that there can be at most one codeword within the minimum distance of any received word 𝐯.

Thus, if by some means a codeword 𝐜 can be found within the minimum distance of the received word 𝐯, then it can be concluded that this is the decoded value. The next theorem will determine the maximum number of different decoded values necessary and suggest how to obtain them. Let the classes be ordered according to decreasing reliability (or weight), so that 𝛼j ≥ 𝛼k if j < k. Let 𝜶 = (𝛼1 , 𝛼2 , . . . , 𝛼J ) be the vector of all class weights. Let Rb = {0, 1, 2, . . . , b} and Eb = {b + 1, b + 2, b + 3, . . . , J}. Let 𝜶 b = {1, 1, . . . , 1, 0, 0, . . . , 0}. ⏟⏞⏞⏟⏞⏞⏟ b ones

Theorem 11.4 [127] Let the weights be ordered according to decreasing reliability. If J ∑ [(1 − 𝛼j )ncj + (1 + 𝛼j )nej ] < dmin , j=1

then there is some integer b such that 2

b ∑ j=1

nej +

J ∑

(ncj + nej ) < dmin .

i=b+1

444

Soft-Decision Decoding Algorithms Since erased symbols are associated with a class having 𝛼j = 0, which would occur last in the ordered reliability list, the import of the theorem is that if there is some codeword such that the inequality (11.4) is satisfied, then there must be some assignment in which (only) the least reliable classes are erased which will enable an erasures-and-errors decoder to succeed in finding that codeword. Proof Let f (𝜶) =

J ∑

[(1 − 𝛼j )ncj + (1 + 𝛼j )nej ].

j=1

∑ ∑ Note that f (𝛼b ) = 2 bj=1 nej + Jj=b+1 (ncj + nej ). The proof is by contradiction. Suppose (contrary to the theorem) that f (𝛼b ) ≥ dmin . Let 𝜆0 = 1 − 𝛼1

𝜆b = 𝛼b − 𝛼b+1 1 ≤ b ≤ J − 1 𝜆J = 𝛼J . ∑ By the ordering, 0 ≤ 𝜆b ≤ 1 for 0 ≤ b ≤ J, and Jb=0 𝜆b = 1. Now let J ∑ 𝜆b 𝛼b . 𝜶= b=0

Then f (𝜶) = f

( J ∑ b=0

) 𝜆b 𝜶 b

=

J ∑

𝜆b f (𝜶 b ) ≥ dmin

b=0

J ∑

𝜆b = dmin .

b=0

Thus, if f (𝜶 b ) ≥ d for all b, then f (𝜶) ≥ dmin . But the hypothesis of the theorem is that f (𝜶) < dmin , ◽ which is a contradiction. Therefore, there must be at least one b such that f (𝜶 b ) < dmin . Let us examine conditions under which an erasures-and-errors decoder can succeed. The decoder can succeed if there are apparently no errors and dmin − 1 erasures, or one error and dmin − 3 erasures, and so forth up to t0 errors and dmin − 2t0 − 1 erasures, where t0 is the largest integer such that 2t0 ≤ dmin − 1. These possibilities are exactly those examined by the statement of the GMD decoder in Algorithm 11.1.

11.3 The Chase Decoding Algorithms In [60], three other soft-decision decoding algorithms were presented, now referred to as Chase-1, Chase-2, and Chase-3. These algorithms provide varying levels of decoder capability for varying degrees of decoder effort. In the Chase-1 algorithm, all patterns of up to dmin − 1 errors are added to the received signal vector 𝐯 to form 𝐰 = 𝐯 + 𝐞. Then 𝐰 is decoded, if possible, and compared using a soft-decision metric to 𝐫. The decoded codeword closest to 𝐫 is selected. However, since all patterns of up to dmin − 1 errors are used, this is very complex for codes of appreciable length or distance, so Chase-1 decoding has attracted very little interest. In the Chase-2 algorithm, the ⌊dmin ∕2⌋ least reliable positions are identified from the sorted reliabilities. The set E consisting of all errors in these ⌊dmin ∕2⌋ least reliable positions is generated. Then the Chase-2 algorithm can be summarized as follows.

11.4 Halting the Search: An Optimality Condition

Algorithm 11.2 Chase-2 Decoder Initialize: For the (soft) received sequence 𝐫 = (r0 , r1 , … , rn−1 ), form the hard-decision vector 𝐯 = (𝑣0 , 𝑣1 , … , 𝑣n−1 ) and the reliabilities zi = |ri |. Identify the ⌊dmin ∕2⌋ least reliablepositions Generate the set E of error vectors (one at a time, in practice) for each 𝐞 ∈ E Decode 𝐯 + 𝐞 using an errors-only decoder to produce thecodeword 𝐜 Compute dE (𝐫, M(𝐜)) and select the candidate codewordwith the best metric. end for

The Chase-3 decoder has lower complexity than the Chase-2 algorithm, and is similar to the GMD. Algorithm 11.3 Chase-3 Decoder Initialize: For the (soft) received sequence 𝐫 = (r0 , r1 , … , rn−1 ), form the hard-decision vector 𝐯 = (𝑣0 , 𝑣1 , … , 𝑣n−1 ) and the reliabilities zi = |ri |. Sort the reliabilities to find the ⌊dmin − 1⌋least reliable positions Generate a list of at most ⌊dmin ∕2 + 1⌋sequences by modifying 𝐯: If dmin is even, modify 𝐯 bycomplementing no symbols, then the least reliable symbol, then thethree least reliable symbols, …, the dmin−1 leastreliable symbols. If dmin is odd, modify 𝐯 bycomplementing no symbols, then the two least reliable symbols, thenthe four least reliable symbols, …, the dmin−1 leastreliable symbols. Decode each modified 𝐯 into a codeword 𝐜 usingan error-only decoder. Compute dE (𝐫, M(𝐜)) and select the codewordthat is closest.

11.4 Halting the Search: An Optimality Condition The soft-decision decoding algorithms presented so far require searching over an entire list of candidate codewords to select the best. It is of interest to know if a given codeword is the best that is going to be found, without having to complete the search. In this section, we discuss an optimality condition appropriate for binary codes transmitted using BPSK which can be tested which establishes exactly this condition. Let 𝐫 be the received vector, with hard-decision values 𝐯. Let 𝐜 = (c0 , c1 , . . . , cn−1 ) be a binary codeword in  and let 𝐜̃ = (̃c0 , c̃ 1 , . . . , c̃ n−1 ) be its bipolar representation (i.e., for BPSK modulation), with c̃ i = 2ci − 1. In this section, whenever we indicate a codeword 𝐜, its corresponding bipolar representation 𝐜̃ is also implied. We define the index sets D0 (𝐜) and D1 (𝐜) with respect to a codevector 𝐜 by D0 (𝐜) = {i ∶ 0 ≤ i < n and ci = 𝑣i } and D1 (𝐜) = {i ∶ 0 ≤ i < n and ci ≠ 𝑣i }. Let n(𝐜) = |D1 (𝐜)|; this is the Hamming distance between the codeword 𝐜 and the hard-decision vector 𝐯. It is clear from the definitions that ri c̃ i < 0 if and only if 𝑣i ≠ ci . In Exercise 11.2, the correlation discrepancy 𝜆(𝐫, 𝐜̃) is defined as ∑ |ri |. 𝜆(𝐫, 𝐜̃) = i∶ri c̃ i 0 and a common multiple xi from G(x). That is, it is not possible to write G(x) = xi G(x) ̃ any G(x). 12.1.1

The State

A convolutional encoder is a state machine. For both encoding and decoding purposes, it is frequently helpful to think of the state diagrams of the state machines, that is, a representation of the temporal relationships between the states portraying state/nextstate relationships as a function of the inputs and the outputs. For an implementation with 𝜈 memory elements, there are 2𝜈 states in the state diagram. (Box 12.1 summarizes basic definitions for graphs.) Another representation which is extremely useful is a graph representing the connections from states at one time instant to states at the next time instant. The graph is a bipartite graph, that is, a graph which contains two subsets of nodes, with edges running only between the two subsets. By stacking these bipartite graphs to show several time steps, one obtains a graph known as a trellis, so-called because of its resemblance to a trellis that might be used for decorative purposes in landscaping. It may be convenient for some implementations to provide a table indicating the state/next state information explicitly. From the state/next state table the state/previous state information can be extracted. Box 12.1: Graphs: Basic Definitions Graph concepts are used throughout the remainder of the book. The following definitions summarize some graph concepts [486]. A graph G is a pair (V, E), where V is a nonempty finite set of vertices or nodes often called the vertex set and E is a finite family of unordered pairs of elements of V called edges. (A family is like a set, but some elements may be repeated.) In the graph here, the vertex set is V = {a, b, c, d} and the edge family is E = {{a, a}, {a, b}, {a, b}, {a, b}, {a, c}, {b, c}, {c, d}, {d, d}}. A loop is an edge joining a vertex to itself. If the edge family is in fact a set (no repeated elements) and there are no loops, then the graph is a simple graph.

459

460

Convolutional Codes

b

a

d c A graph V1 V2 A bipartite graph a

b d

c A directed graph

Two vertices of a graph G are adjacent if there is an edge joining them. We also say that adjacent nodes are neighbors. The two vertices are said to be incident to such an edge. Two distinct edges of a graph are adjacent if they have at least one vertex in common. The degree of a vertex is the number of edges incident to it. The vertex set V of a bipartite graph can be split into two disjoint sets V = V1 ∪ V2 in such a way that every edge of G joins a vertex of V1 to a vertex of V2 . A walk through a graph G = (V, E) is a finite sequence of edges {𝑣0 , 𝑣1 }, {𝑣1 , 𝑣2 }, . . . , {𝑣m−1 , 𝑣m }, where each 𝑣i ∈ V and each {𝑣i , 𝑣j } ∈ E. The length of a walk is the number of edges in it. If all the edges are distinct, the walk is a trail; if all the vertices are also distinct (except possibly 𝑣0 = 𝑣m ), the walk is a path. A path or trail is closed if 𝑣0 = 𝑣m . A closed path containing at least one edge is called a circuit. The girth of a graph is the number of edges in its shortest circuit. A graph is connected if there is a path between any pair of vertices 𝑣, 𝑤 ∈ V. A connected graph which contains no circuits is a tree. A node of a tree is a leaf if its degree is equal to 1. A directed graph (digraph) G is a pair (V, E), where E is a finite family of ordered pairs of elements of V. Such graphs are frequently represented using arrows to represent edges. In the directed graph here, the edge set is E = {(a, a), (a, b), (b, a), (b, a), (a, c), (b, c), (c, d), (d, d)}.

Example 12.4 Consider again the convolutional encoder of Example 12.1 with transfer function matrix G(x) = [1 + x2 , 1 + x + x2 ]. A realization and its corresponding state diagram are shown in Figure 12.5(a, b). The state is indicated as a pair of bits, with the first bit representing the least significant bit (lsb). The branches along the state diagram indicate input/output values. One stage of the trellis (corresponding to the transition between two time instants) is shown in Figure 12.5(c). Three trellis stages are shown in Figure 12.5(d). ◽ Example 12.5

For the rational systematic encoder with matrix transfer function ] [ x 1 0 1+x 3 , G(x) = x2 0 1 1+x 3

(12.4)

with the circuit realization of Figure 12.4, the state diagram and trellis are shown in Figure 12.6. In the state diagram, the states are represented as integers from 0 to 7 using the numbers (a, b, c) corresponding to the registers shown on the circuit. (That is, the lsb is on the right of the diagram this time.) Only the state transitions are shown, not the inputs or outputs, as that would excessively clutter the diagram. The corresponding trellis is also shown, with the branches input/output information listed on the left, with the order of the listing corresponding to the sequence of branches emerging from the corresponding state in top-to-bottom order. ◽

12.2 Definition of Codes and Equivalent Codes Having now seen several examples of codes, it is now time to formalize the definition and examine some structural properties of the codes. It is no coincidence that the code sequences (c(1) (x), c(2) (x)) are the same for Examples 12.1 and 12.2. The sets of sequences that lie in the range of the transfer function matrices Ga (x) and Gb (x) are identical: even though the encoders are different, they encode to the same code. (This is analogous to having different generator matrices to represent the same block code.) We formally define a convolutional code as follows: Definition 12.6 [[397], p. 94] A rate R = k∕n code over the field of rational Laurent series F[[x]] over the field 𝔽 is the image of an injective linear mapping of the k-dimensional Laurent series ◽ 𝐦(x) ∈ 𝔽 [[x]]k into the n-dimensional Laurent series 𝐜(x) ∈ 𝔽 [[x]]n .

12.2 Definition of Codes and Equivalent Codes

461 0/00 00

(1)

ck

+

0/11

1/11 0/01 10

D

D

01

1/00

msb

lsb

1/10

+

0/10

(2)

11

ck

1/01 (a) State (lsb, msb)

Next state

0/00

00

(b) State (lsb, msb)

00

00

10

10

01

01

11

11

1/11 0/11

10

1/00

0/00

0/00

0/00

1/11

1/11

1/11

0/11

0/10

0/10

1/10

1/01

11

t+1

t

0/01

1/10 0/10 1/01

(c)

1/00

0/01

1/10 0/10 1/01

0/11

1/00

0/01

0/01

01

0/11

1/00

1/10 1/01

t+2

t+3

(d)

Figure 12.5: Encoder, state diagram, and trellis for G(x) = [1 + x2 , 1 + x + x2 ].(a) Encoder; (b) state diagram; (c) one stage of trellis; (d) three stages of trellis. In other words, the convolutional code is the set {𝐜(x)} of all possible output sequences as all possible input sequences {𝐦(x)} are applied to the encoder. The code is the image set or (row) range of the linear operator G(x), not the linear operator G(x) itself. Example 12.7

Let G2 (x) =

and note that G2 (x) =

] [ 1 x2 x x 1 0

(12.5)

[ ] 1 x2 G1 (x) = T2 (x)G1 (x), x 1

(where G1 (x) was defined in (12.2)) and consider the encoding operation [ ] 1 x2 m(x)G2 (x) = m(x) G1 (x) = m′ (x)G1 (x), x 1 where m′ (x) = m(x)

] [ 1 x2 = m(x)T2 (x). x 1

Corresponding to each m(x) there is a unique m′ (x), since T2 (x) is invertible. Hence, as m′ (x) varies over all possible input sequences, m(x) also varies over all possible input sequences. The set of output sequences {c(x)} produced is the same for G2 (x) as G1 (x): that is, both encoders produce the same code. Figure 12.7 shows a schematic representation of this encoder. Note that the implementation of both G1 (x) (of Figure 12.4) and G2 (x) have three 1-bit memory elements in them. The contents of these memory elements may

462

Convolutional Codes

0

1

2

3

4

5

6

State 0 1 2 3 4 5 6 7

0 4 1 5 2 6 3 7

Next State 1 2 5 6 0 3 4 7 3 0 7 4 2 1 6 5

3 7 2 6 1 5 0 4

7

(b)

(a) Input/output state 00/000 10/100 0 01/010 11/110 00/001 10/101 1 01/011 11/111 10/100 00/000 2 11/110 01/010 10/101 00/001 3 11/111 01/011 01/010 11/110 4 00/000 10/100 01/011 11/111 5 00/001 10/101 11/110 01/010 6 10/100 00/000 11/111 01/011 7 10/101 00/001

(c)

Figure 12.6: State diagram and trellis for a rate R = 2∕3 systematic encoder. (a) State diagram; (b) State/next state information; (c) trellis.

12.2 Definition of Codes and Equivalent Codes

m(1)(x)

D

m(2)(x)

D

463

D

+

c(1)(x)

+

c(2)(x) c(3)(x)

Figure 12.7: A feedforward R = 2∕3 encoder.

+ + m(1)(x)

D

c(1)(x)

+ D

+

c(2)(x) c(3)(x)

+ m(2)(x)

+

D

D

Figure 12.8: A less efficient feedforward R = 2∕3 encoder. be thought of as the state of the devices. Since these are binary circuits, there are 23 = 8 distinct states in either implementation. ◽ Example 12.8

Another encoder for the code of Example 12.1 is [ ] 1 + x 1 + x2 x G3 (x) = , x + x2 1 + x 0

where, it may be observed,

(12.6)

[ G3 (x) =

] 1 + x 1 + x2 G1 (x) = T3 (x)G1 (x). x + x2 1 + x

The encoding operation m(x)G3 (x) = m(x)T3 (x)G1 (x) = m′ (x)G1 (x), with m′ (x) = m(x)T3 (x), again results in the same code, since T3 (x) is invertible. The schematic for G3 (x) in Figure 12.8 would require more storage blocks than either G1 (x) or G2 (x): it is not as efficient in terms of hardware. ◽

These examples motivate the following definition: Definition 12.9 Two transfer function matrices G(x) and G′ (x) are said to be equivalent if they generate the same convolutional code. Two transfer function matrices G(x) and G′ (x) are equivalent if ◽ G(x) = T(x)G′ (x) for an invertible matrix T(x).

464

Convolutional Codes These examples also motivate other considerations: For a given a code, is there always a feedforward transfer matrix representation? Is there always a systematic representation? What is the “minimal” representation, requiring the least amount of memory? As the following section reveals, another question is whether the representation is catastrophic. 12.2.1

Catastrophic Encoders

Besides the hardware inefficiency, there is another fundamental problem with the encoder G3 (x) of (12.6). Suppose that the input is ] [ 1 𝐦(x) = 0 1+x , where, expanding the formal series by long division, 1 = 1 + x + x2 + x3 + · · · 1+x The input sequence thus has infinite Hamming weight. The corresponding output sequence is [ ] 𝐜(x) = m(x)G3 (x) = x 1 0 , a sequence with total Hamming weight 2. Suppose now that 𝐜(x) is passed through a channel and that two errors occur at precisely the locations of the nonzero code elements. Then the received [ sequence ] ̂ is exactly zero, which would decode (under any reasonable decoding scheme) to 𝐦(x) = 0 0 . Thus, a finite number of errors in the channel result in an infinite number of decoder errors. Such an encoder is called a catastrophic encoder. It may be emphasized, however, that the problem is not with the code but the particular encoder, since G1 (x), G2 (x), and G3 (x) all produce the same code, but G1 (x) and G2 (x) do not exhibit catastrophic behavior. Letting wt(c(x)) denote the weight of the sequence c(x), we have the following definition: Definition 12.10 [[397], p. 97] An encoder G(x) for a convolutional code is catastrophic if there exists a message sequence 𝐦(x) such that wt(𝐦(x)) = ∞ and the weight of the coded sequence wt(𝐦(x)G(x)) < ∞. ◽ To understand more of the nature of catastrophic codes, we introduce the idea of a right inverse of a matrix. Definition 12.11 Let k < n. A right inverse of a k × n matrix G is a n × k matrix G−1 such that GG−1 = Ik,k , the k × k identity matrix. (This is not the same as the inverse, which cannot exist when G is not square.) A right inverse of G can exist only if G is full rank. ◽

Example 12.12

For G1 (x) of (12.2), a right inverse is ⎡1 G1 (x)−1 = ⎢0 ⎢ ⎣0

For G2 (x) of (12.5), a right inverse is

0⎤ 1⎥ . ⎥ 0⎦

x ⎤ ⎡1 G2 (x)−1 = ⎢ x 1 + x2 ⎥ . ⎥ ⎢ 2 ⎣x 1 + x + x 3 ⎦

For G3 (x) of (12.6), a right inverse is G3 (x)−1 =

⎡ 1 + x 1 + x2 ⎤ 1 ⎢x + x2 1 + x ⎥ . ⎥ 1 + x + x3 + x4 ⎢ 0 ⎦ ⎣ 0



12.2 Definition of Codes and Equivalent Codes

465

Note that G1 (x)−1 and G2 (x)−1 are polynomial matrices — they have only polynomial entries — while the right inverse G3 (x)−1 has non-polynomial entries — some of its entries involve rational functions. Example 12.13 It ] should be observed that right inverses are not necessarily unique. For example, the matrix [ 1 + x2 , 1 + x + x2 has the right inverses [

1+x x

]T

and

[

x x+1

]T 1 .

Of these, one has all polynomial elements.



Definition 12.14 A transfer function matrix with only polynomial entries is said to be a polynomial encoder (i.e., it uses FIR filters). More briefly, such an encoder is said to be polynomial. A transfer function matrix with rational entries is said to be a rational encoder (i.e., it uses IIR filters), or simply rational. ◽ For an encoder G(x) with right inverse G(x)−1 , the message may be recovered (in a theoretical sense when there is no noise corrupting the code — this is not a decoding algorithm!) by 𝐜(x)G(x)−1 = 𝐦(x)G(x)G(x)−1 = 𝐦(x).

(12.7)

Now suppose that 𝐜(x) has finite weight, but 𝐦(x) has infinite weight: from (12.7) this can only happen if one or more elements of the right inverse G(x)−1 has an infinite number of coefficients, that is, they are rational functions. It turns out that this is a necessary and sufficient condition: Theorem 12.15 A transfer function matrix G(x) is not catastrophic if and only if it has a right inverse G(x)−1 having only polynomial entries. From the right inverses in Example 12.12, we see that G1 (x) and G2 (x) have polynomial right inverses, while G3 (x) has non-polynomial entries, indicating that G3 (x) is a catastrophic generator. Definition 12.16 A transfer function matrix G(x) is basic if it is polynomial and has a polynomial right inverse. ◽ G2 (x) is an example of a basic transfer function matrix. Example 12.17

Another example of a transfer function matrix for the code is [ ] 1 + x + x2 + x3 1 + x x G4 (x) = . x 1 0

(12.8)

The invariant factor decomposition (presented below) can be used to show that this is basic. However, for sufficiently small matrices finding a right inverse may be done by hand. We seek a polynomial matrix such that ] ⎡a d ⎤ [ ] [ 1 + x + x2 + x3 1 + x x ⎢ 1 0 b e⎥ = . x 1 0 ⎢ 0 1 ⎥ ⎣c f ⎦ Writing out the implied equations, we have a(1 + x + x2 + x3 ) + b(1 + x) + cx = 1 ax + b = 0 d(1 + x + x2 + x3 ) + e(1 + x) + fx = 0 dx + e = 1.

466

Convolutional Codes From the second, we obtain b = ax; substituting this into the first, we find a(1 + x3 ) + cx = 1. By setting c = x2 and a = 1, we can solve this using polynomials. From the fourth equation, we obtain e = 1 + dx, so that from the third equation d(1 + x + x2 + x3 ) + (1 + dx)(1 + x) + fx = 0 or d(1 + x3 ) + fx = 1 + x. This yields d = 1 and f = x2 + 1. This gives a polynomial right inverse, so G4 (x) is basic. Note that the encoder requires four memory elements in its implementation. ◽

Two basic encoders G(x) and G′ (x) are equivalent if and only if G(x) = T(x)G′ (x), where 1. T(x) is not only invertible (as required by mere equivalence), 2. But also | det(T(x))| = 1, so that when the right inverse is computed all the elements remain polynomial. 12.2.2

Polynomial and Rational Encoders

We show in this section that every rational encoder has an equivalent basic encoder. The implication is that it is sufficient to use only feedforward (polynomial) encoders to represent every code. There is, however, an important caveat: there may not be an equivalent basic (or even polynomial) systematic encoder. Thus, if a systematic coder is desired, it may be necessary to use a rational encoder. This is relevant because the very powerful behavior of turbo codes relies on good systematic convolutional codes. Our results make use of the invariant factor decomposition of a matrix [[215], Section 3.7]. Let G(x) be a k × n polynomial matrix. Then2 G(x) can be written as G(x) = A(x)Γ(x)B(x), where A(x) is a k × k polynomial matrix and B(x) is a n × n polynomial matrix and where det(A(x)) = 1, det(B(x)) = 1 (i.e., they are unimodular matrices); and Γ(x) is the k × n diagonal matrix ⎡𝛾1 (x) ⎤ ⎢ ⎥ 𝛾2 (x) ⎢ ⎥ (x) 0 𝛾 3 k,n−k ⎥ . Γ(x) = ⎢ ⋱ ⎢ ⎥ ⎢ ⎥ 𝛾k (x) ⎢ ⎥ ⎣ ⎦ The nonzero elements 𝛾i (x) of Γ(x) are polynomials and are called the invariant factors of G(x). (If any of the 𝛾i (x) are zero, they are included in the zero block, so k is the number of nonzero elements.) Furthermore, the invariant factors satisfy 𝛾i (x) 𝛾i+1 (x). (Since we are expressing a theoretical result here, we won’t pursue the algorithm for actually computing the invariant factor decomposition3 ); it is detailed in [215].

2 The invariant factor decomposition has a technical requirement: The factorization in the ring must be unique, up to ordering and units. This technical requirement is met in our case, since the polynomials form a principal ideal domain, which implies unique factorization. See, e.g., [[144], chapter 32]. 3 The invariant factor decomposition can be thought of as a sort of singular value decomposition for modules.

12.2 Definition of Codes and Equivalent Codes

467

Extending the invariant factor theorem to rational matrices, a rational matrix G(x) can be written as G(x) = A(x)Γ(x)B(x), where A(x) and B(x) are again polynomial unimodular matrices and Γ(x) is diagonal with rational entries 𝛾i (x) = 𝛼i (x)∕𝛽i (x), such that 𝛼i (x)𝛼i+1 (x) and 𝛽i+1 (x) 𝛽i (x). Let G(x) be a rational encoding matrix, with invariant factor decomposition G(x) = A(x)Γ(x)B(x). Let us decompose B(x) into the blocks [ ′ ] G (x) B(x) = , B2 (x) where G′ (x) is k × n. Then, since the last k columns of Γ(x) are zero, we can write ⎡ 𝛼1 (x) ⎢ 𝛽1 (x) ⎢ G(x) = A(x) ⎢ ⎢ ⎢ ⎣

𝛼2 (x) 𝛽2 (x)

⎤ ⎥ ⎥ ′ ′ ′ ⎥ G (x) ≜ A(x)Γ (x)G (x). ⋱ ⎥ 𝛼k (x) ⎥ 𝛽k (x) ⎦

Since A(x) and Γ′ (x) are nonsingular matrices, G(x) and G′ (x) are equivalent encoders: they describe the same convolutional code. But G′ (x) is polynomial (since B(x) is polynomial) and since B(x) is unimodular (and thus has a polynomial inverse) it follows that G′ (x) has a polynomial right inverse. We have thus proved the following: Theorem 12.18 Every rational encoder has an equivalent basic encoder. The proof of the theorem is constructive: To obtain a basic encoding matrix from a rational transfer function G(x), compute the invariant factor decomposition G(x) = A(x)Γ(x)B(x) and take the first k rows of B(x). 12.2.3

Constraint Length and Minimal Encoders

Comparing the encoders for the code we have been examining, we have seen that the encoders for G1 (x) or G2 (x) use three memory elements, while the encoder G3 (x) uses four memory elements. We investigate in this section aspects of the question of the smallest amount of memory that a code requires of its encoder. Let G(x) be a basic encoder (so that the elements of G(x) are polynomials). Let 𝜈i = max deg(gij (x)) j

denote the maximum degree of the polynomials in row i of G(x). This is the number of memory elements necessary to store the portion of a realization (circuit) of the encoder corresponding to input i. The number k ∑ 𝜈i (12.9) 𝜈= i=1

represents the total amount of storage required for all inputs. This quantity is called the constraint length of the encoder. Note: In other sources (e.g., [483]), the constraint length is defined as the maximum number of bits in a single output stream that can be affected by any input bit (for a polynomial encoder). This is taken as the highest degree of the encoder plus one: 𝜈 = 1 + maxi,j deg(gi,j (x)). The reader should be aware that different definitions are used. Ours suits the current purposes.

468

Convolutional Codes We make the following definition: Definition 12.19 A minimal basic encoder is a basic encoder that has the smallest constraint length among all equivalent basic encoders. ◽ Typically, we are interested in minimal encoders: they require the least amount of hardware to build and they have the fewest evident states. We now explore the question of when an encoder is minimal basic. The first theorem involves a particular decomposition of the decoder matrix. We demonstrate first with some examples. Let G(x) = G2 (x) from (12.2). Write [ 2 ] [ 2 ][ ] [ ] 0 1 0 1 0 x 1 x x x + . (12.10) = G2 (x) = 0 1 0 x 1 0 0 x 1 0 As another example, when G(x) = G4 (x) from (12.8) [ ] 1 + x + x2 + x3 1 + x x G4 (x) = ] [1 0 ] [ 3 ] [x 1 0 0 1 + x + x2 1 + x x x + . = x 1 0 0 0 1 0

(12.11)

In general, given a basic encoder G(x), we write it as ⎡x𝜈1 ⎤ ⎢ ⎥ x 𝜈2 ̃ ̃ G(x) = ⎢ = Λ(x)Gh + G(x), ⎥ Gh + G(x) ··· ⎢ ⎥ x 𝜈k ⎦ ⎣

(12.12)

where Gh is a binary matrix with a 1 indicating the position where the highest degree term x𝜈i occurs in ̃ row i and each row of G(x) contains all the terms of degree less than 𝜈i . Using this notation, we have the following: Theorem 12.20 [230] Let G(x) be a k × n basic encoding matrix with constraint length 𝜈. The following statements are equivalent: (a) G(x) is a minimal basic encoding matrix. (b) The maximum degree 𝜇 among the k × k subdeterminants of G(x) is equal to the overall constraint length 𝜈. (c) Gh is full rank.

To illustrate this theorem, consider the decomposition of G4 (x) in (12.11). The 2 × 2 subdeterminants of G4 (x) are obtained by taking the determinant of the two 2 × 2 submatrices of G4 (x), ] [ ] [ 1+x x 1 + x + x2 + x3 1 + x det , det 1 0 x 1 the maximum degree of which is 3. Also, we note that Gh is not full rank. Hence, we conclude that G4 (x) is not a minimal basic encoding matrix. Proof To show the equivalence of (b) and (c): Observe that the degree of a subdeterminant of G(x) is determined by the k × k submatrices of Λ(x)Gh (which have the largest degree terms) and not by ̃ G(x). The degree of the determinants of the k × k submatrices of G(x) are then determined by the subdeterminants of Λ(x) and the k × k submatrices of Gh . Since det Λ(x) ≠ 0, if Gh is full rank, then

12.2 Definition of Codes and Equivalent Codes at least one of its k × k submatrices has nonzero determinant, so that at least one of the determinants of the k × k submatrices of Λ(x)Gh has degree 𝜇 equal to deg(det(Λ(x))) = 𝜈. On the other hand, if Gh is rank deficient, then none of the determinants of the submatrices of Λ(x)Gh can be equal to the determinant of Λ(x). To show that (a) implies (b): Assume that G(x) is minimal basic. Suppose that rank(Gh ) < k. Let the rows of G(x) be denoted by 𝐠1 , 𝐠2 , . . . , 𝐠k , let the rows of Gh be denoted by 𝐡1 , 𝐡2 , . . . , 𝐡k , and let ̃ the rows of G(x) be denoted by 𝐠̃ 1 , 𝐠̃ 2 , . . . , 𝐠̃ k . Then the decomposition (12.12) is 𝐠i = x𝜈i 𝐡i + 𝐠̃ i . By the rank-deficiency there is a linear combination of rows of Gh such that 𝐡i1 + 𝐡i2 + · · · + 𝐡id = 0 for some d ≤ k. Assume (without loss of generality) that the rows of G(x) are ordered such that 𝜈1 ≥ 𝜈2 ≥ · · · ≥ 𝜈k . The ith row of Λ(x)Gh is x𝜈i 𝐡i . Adding x𝜈i1 [𝐡i2 + 𝐡i3 + · · · + 𝐡id ] to the i1 st row of Λ(x)Gh (which is x𝜈i 𝐡i ) reduces it to an all-zero row. Note that x𝜈i1 [𝐡i2 + 𝐡i3 + · · · + 𝐡id ] = x𝜈i1 −𝜈i2 x𝜈i2 𝐡i2 + x𝜈i1 −𝜈i3 x𝜈i3 𝐡i3 + · · · + x𝜈i1 −𝜈id x𝜈id 𝐡id . Now consider computing G′ (x) = T(x)G(x), where T(x) is the invertible matrix

with an identity on the diagonal. This has the effect of adding x𝜈i1 −𝜈i2 𝐠i2 + x𝜈i1 −𝜈i3 𝐠i3 + · · · + x𝜈i1 −𝜈id 𝐠id to the i1 st row of G(x), which reduces the highest degree of the i1 st row of G(x) (because the term x𝜈i1 𝐡i1 is eliminated) but leaves other rows of G(x) unchanged. But G′ (x) is an equivalent transfer function matrix. We thus obtain a basic encoding matrix G′ (x) equivalent to G(x) with an overall constraint length less than that of G(x). This contradicts the assumption that G(x) is minimal basic, which implies that Gh must be full rank. From the equivalence of (b) and (c), 𝜇 = 𝜈. Conversely, to show (b) implies (a): Let G′ (x) be a basic encoding matrix equivalent to G(x). Then ′ G (x) = T(x)G(x), where T(x) is a k × k polynomial matrix with det T(x) = 1. The maximum degree among the k × k subdeterminants of G′ (x) is equal to that of G(x) (since det T(x) = 1). Hence, rank(Gh ) is invariant over all equivalent basic encoding matrices. Since rank(Gh ) is less than or equal to the overall constraint length, if 𝜇 = 𝜈, it follows that G(x) is minimal basic. ◽ The proof is essentially constructive: given a non-minimal basic G(x), a minimal basic encoder can be constructed by finding rows of Gh such that 𝐡i1 + 𝐡i2 + 𝐡i3 + · · · + 𝐡id = 0,

469

470

Convolutional Codes where the indices are ordered such that 𝜈id ≥ 𝜈ij , 1 ≤ j < d, then adding x𝜈id −𝜈i1 𝐠i1 + · · · + x𝜈id −𝜈id−1 𝐠id−1

(12.13)

to the id th row of G(x). Example 12.21

Let G(x) = G4 (x), as before. Then [ ] 𝐡1 = 1 0 0

[ ] 𝐡2 = 1 0 0

so that 𝐡1 + 𝐡2 = 0. We have i1 = 2 and i2 = 1. We thus add ] [ ] [ x3−1 𝐠2 = x2 x 1 0 = x3 x2 0 to row 1 of G(x) to obtain the transfer function matrix ] [ 1 + x + x2 1 + x + x2 x G5 (x) = x 1 0 ◽

to obtain an equivalent minimal basic encoder.

Comparing G5 (x) with G2 (x), we make the observation that minimal basic encoders are not unique. As implied by its name, the advantage of a basic minimal encoder is that it is “smallest” in some sense. It may be built in such a way that the number of memory elements in the device is the smallest possible and the number of states of the device is the smallest possible. There is another advantage to minimal encoders: it can be shown that a minimal encoder is not catastrophic. 12.2.4

Systematic Encoders

Given an encoder G(x), it may be turned into a systematic decoder by identifying a full-rank k × k submatrix T(x). Then form G′ (x) = T(x)−1 G(x). Then G′ (x) is of the form (perhaps after column permutations) [ ] G′ (x) = Ik,k Pk,n−k (x) , where Pk,n−k (x) is a (generally) rational matrix. The outputs produced by Pk,n−k — that is, the non-systematic part of the generator — are frequently referred to as the parity bits, or check bits, of the coded sequence. Example 12.22

[230] Suppose G(x) =

and T(x) is taken as the first two columns: [ ] 1+x x T(x) = x2 1

[ ] 1+x x 1 2 2 x 1 1+x+x

T −1 (x) =

Then ′

−1

G (x) = T (x)G(x) =

[ ] 1 x 1 . 1 + x + x3 x2 1 + x

[ 1 0 0 1

1+x+x2 +x3 1+x+x3 1+x2 +x3 1+x+x3

] . ◽

12.3 Decoding Convolutional Codes

471

Historically, polynomial encoders (i.e., those implemented using FIR filters) have been much more commonly used than systematic encoders (employing IIR filters). However, there are some advantages to using systematic codes. First, it can be shown that every systematic encoding matrix is minimal. Second, systematic codes cannot be catastrophic (since the data appear explicitly in the codeword). For a given constraint length, the set of systematic codes with polynomial transfer matrices has generally inferior distance properties compared with the set of systematic codes with rational transfer matrices. In fact, it has been observed [[471], p. 252] that for large constraint lengths 𝜈, the performance of a polynomial systematic code of constraint length K is approximately the same as that of a nonsystematic code of constraint length K(1 − k∕n). (See Table 12.4) For example, for a rate R = 1∕2 code, polynomial systematic codes have about the performance of nonsystematic codes of half the constraint length, while requiring exactly the same optimal decoder complexity. Because of these reasons, recent work in turbo codes has relied almost exclusively on systematic encoders.

12.3 Decoding Convolutional Codes 12.3.1

Introduction and Notation

Several algorithms have been developed for decoding convolutional codes. The one most commonly used is the Viterbi algorithm, which is a maximum likelihood sequence estimator (MLSE). A variation on the Viterbi algorithm, known as the soft-output Viterbi algorithm (SOVA), which provides not only decoded symbols but also an indication of the reliability of the decoded values, is presented in Section 12.11 in conjunction with turbo codes. Another decoding algorithm is the maximum a posteriori (MAP) decoder frequently referred to as the BCJR algorithm, which computes probabilities of decoded bits. The BCJR algorithm is somewhat more complex than the Viterbi algorithm, without significant performance gains compared to Viterbi decoders for convolutional codes. It is, however, ideally suited for decoding turbo codes, and so is also detailed in Chapter 14. It is also shown there that the BCJR and the Viterbi are fundamentally equivalent at a deeper level. Suboptimal decoding algorithms are also occasionally of interest, particularly when the constraint length is large. These provide most of the performance of the Viterbi algorithm, but typically have substantially lower computational complexity. In Section 12.8, the stack algorithm (also known as the ZJ algorithm), Fano’s algorithm, and the M-algorithm are presented as instances of suboptimal decoders. To set the stage for the decoding algorithm, we introduce some notation for the stages of processing. Consider the block diagram in Figure 12.9. The time index is denoted by t, which indexes the times at which states are distinguished in the state diagram; there are thus k bits input to the encoder and n bits output from the encoder at each time step t.



The information — message — data may have a sequence appended to drive the state to 0 at the (i) end of some block of input. At time t there are k input bits, denoted as m(i) t or xt , i = 1, 2, . . . , k. (1) (2) (k) The set of k input bits at time t is denoted as 𝐦t = (mt , mt , . . . , mt ) and those with the (optional) appended sequence are 𝐱t . An input sequence consisting of L blocks is denoted as 𝐱: 𝐱 = {𝐱0 , 𝐱1 , . . . , 𝐱L−1 }.



The corresponding coded output bits are denoted as c(i) t , i = 1, 2, . . . , n, or collectively at time t as 𝐜t . The entire coded output sequence is 𝐜 = {𝐜0 , 𝐜1 , . . . , 𝐜L−1 }.



The coded output sequence is mapped to a sequence of M symbols selected from a signal constellation with Q points in some signal space, with Q = 2p . We must have 2nL (the number of coded bits in the sequence) equal to 2pM , so that M = nL∕p. For convenience in notation, we assume that p = 1 (e.g., BPSK modulation), so that M = nL. The mapped signals at time t are denoted as a(i) t , i = 1, 2, . . . , n. The entire coded sequence is 𝐚 = {𝐚0 , 𝐚1 , . . . , 𝐚L−1 }.

472

Convolutional Codes Channel Append zero-state forcing sequence (Optional)

m

Convolutional encoder

x

c

R = k/n

Signal mapper (e.g. BPSK)

a

r

n

Figure 12.9: Processing stages for a convolutional code.



The symbols 𝐚t pass through a channel, resulting in a received symbol rt(i) , i = 1, 2, . . . , n, or a block 𝐫t . We consider explicitly two channel models: an AWGN and a BSC. For the AWGN, we have N0 (i) 2 2 where n(i) . rt(i) = a(i) t + nt , t ∼  (0, 𝜎 ), and where 𝜎 = 2 For the AWGN the received data are real- or complex-valued. For the BSC, the mapped signals are equal to the coded data, 𝐚t = 𝐜t . The received signal is (i) rt(i) = c(i) t ⊕ nt ,



where nt ∼ (pc ),

where denotes addition modulo 2 and pc is the channel crossover probability and (pc ) indicates a Bernoulli random variable. For both channels it is assumed that the n(i) t are mutually independent for all i and t, resulting in a memoryless channel. We denote the likelihood function for these channels as f (𝐫t |𝐚t ). For the AWGN channel, ] [ 1 2 exp − 2 (rt(i) − a(i) ) t 2𝜎 2𝜋𝜎 i=1 ] [ √ 1 = ( 2𝜋𝜎)−n exp − 2 ||𝐫t − 𝐚t ||2 , 2𝜎

f (𝐫t |𝐚t ) =

n ∏



1

where || ⋅ ||2 denotes the usual Euclidean distance, ||𝐫t − 𝐚t || = 2

n ∑

2 (rt(i) − a(i) t ) .

i=1

For the BSC, f (𝐫t |𝐚t ) =

n ∏

(i)

i=1

( =

(i)

[r ≠at ]

pc t

pc 1 − pc

(i)

(1 − pc )[rt

)dH (𝐫t ,𝐚t )

(i)

=at ]

d (𝐫t ,𝐚t )

= pc H

(1 − pc )n−dH (𝐫t ,𝐚t )

(1 − pc )n ,

where [rt(i) ≠ a(i) t ] returns 1 if the indicated condition is true and dH (𝐫t , 𝐚t ) is the Hamming distance. Since the sequence of inputs uniquely determines the sequence of outputs, mapped outputs, and states, the likelihood function can be equivalently expressed as f (𝐫|𝐜) or f (𝐫|𝐚) or f (𝐫|{Ψ0 , Ψ1 , . . . , ΨL }).



Maximizing the likelihood is obviously equivalent to minimizing the negative log likelihood. We deal with negative log-likelihood functions and throw away terms and/or factors that do not depend upon the conditioning values. For the Gaussian channel, since √ 1 − log f (𝐫t |𝐚t ) = +n log 2𝜋𝜎 + 2 ||𝐫t − 𝐚t ||2 , 2𝜎

12.3 Decoding Convolutional Codes

473

we use ||𝐫t − 𝐚t ||2

(12.14)

as the “negative log-likelihood” function. For the BSC, since − log f (𝐫t |𝐚t ) = −dH (𝐫t , 𝐚t ) log

pc − n log(1 − pc ), 1 − pc

we use dH (𝐫t , 𝐚t )

(12.15)

as the “negative log-likelihood” (since log(pc ∕(1 − pc )) < 0). More generally, the affine transformation a[− log f (𝐫t |𝐚) − b]

(12.16)

provides a function equivalent for purposes of detection to the log- likelihood function for any a > 0 and any b. The parameters a and b can be chosen to simplify computations.



The state at time t in the trellis of the encoder is denoted as Ψt . States are represented with integer values in the range 0 ≤ Ψt < 2𝜈 , where 𝜈 is the constraint length for the encoder. (We use 2𝜈 since we are assuming binary encoders for convenience. For a q-ary code, the number of states is q𝜈 .) It is always assumed that the initial state is Ψ0 = 0.

• •

A sequence of symbols such as {𝐱0 , 𝐱1 , . . . , 𝐱l } is denoted as 𝐱0l .



A sequence of states (Ψ0 , Ψ1 , . . . , Ψt−1 ), representing a path through the trellis, is denoted as Ψt−1 0 . Such a sequence of states is assumed to always denote a valid path through the trellis for the encoder. As suggested by Figure 12.10, quantities associated with the transition from state p to state q are denoted with (p,q) . For example, the input which causes the transition from state Ψt = p to the state Ψt+1 = q is denoted as 𝐱(p,q) . (If the trellis had different structure at different times, one (p,q) might use the notation 𝐱t .) The code bits output sequence as a result of this state transition is (p,q) and the mapped symbols are 𝐚(p,q) . 𝐜 Time t

ψt = p

Time t + 1

x(p,q)/a(p,q) ψt+ 1 = q rt

Figure 12.10: Notation associated with a state transition.



Generalizing the idea of 𝐱(p,q) , a sequence of states Ψt0 corresponds to a sequence of inputs 𝐱0 , 𝐱1 , . . . , 𝐱t−1 , which takes the encoder through that sequence of states. This sequence of states is denoted as 𝐱(Ψt0 ).



Similarly, the sequence of coded outputs, and its corresponding sequence of modulated outputs produced along that sequence of states is denoted as 𝐜(Ψt0 ) and 𝐚(Ψt0 ), respectively.

474

Convolutional Codes 12.3.2

The Viterbi Algorithm

The Viterbi algorithm was originally proposed by Andrew Viterbi [467], but its optimality as a maximum likelihood sequence decoder was not originally appreciated. In [128], it was established that the Viterbi algorithm computes the maximum likelihood code sequence given the received data. The Viterbi algorithm is essentially a shortest path algorithm, roughly analogous to Dijkstra’s shortest path algorithm (see, e.g., [[400], p. 415]) for computing the shortest path through the trellis associated with the code. The Viterbi algorithm has been applied in a variety of other communications problems, including maximum likelihood sequence estimation in the presence of intersymbol interference [134] and optimal reception of spread-spectrum multiple access communication (see, e.g., [466]). It also appears in many other problems where a “state” can be defined, such as in hidden Markov modeling (see, e.g., [91]). See also [319] for a survey of applications. The decoder takes the input sequence 𝐫 = {𝐫0 , 𝐫1 , . . . } and determines an estimate of the transmitted data {𝐚0 , 𝐚1 , . . . } and from that an estimate of the sequence of input data {𝐱0 , 𝐱1 , . . . }. The basic idea behind the Viterbi algorithm is as follows. A coded sequence {𝐜0 , 𝐜1 , . . . }, or its signal-mapped equivalent {𝐚0 , 𝐚1 , . . . }, corresponds to a path through the encoder trellis through some sequence of states Ψ0 , Ψ1 , Ψ2 , . . . . Due to noise in the channel, the received sequence 𝐫 = (𝐫0 , 𝐫1 , 𝐫2 , . . . ) may not correspond exactly to a path through the trellis. The decoder finds a path through the trellis which is closest to the received sequence, where the measure of “closest” is determined by the likelihood function appropriate for the channel. In light of (12.14), for an AWGN channel the maximum likelihood path corresponds to the path through the trellis which is closest in Euclidean distance to 𝐫. In light of (12.15), for a BSC the maximum likelihood path corresponds to the path through the trellis which is closest in Hamming distance to 𝐫. Naively, one could find the maximum likelihood path by computing separately the path lengths of all of the possible paths through the trellis. This, however, is computationally intractable. The Viterbi algorithm organizes the computations in an efficient recursive form. For an input 𝐱t the output 𝐜t depends on the state of the encoder Ψt , which in turn depends upon previous inputs. This dependency among inputs means that optimal decisions cannot be made based upon a likelihood function for a single time f (𝐫t |𝐱t ). Instead, optimal decisions are based upon an entire received sequence of symbols. The likelihood function to be maximized is thus f (𝐫|𝐱), where f (𝐫|𝐱) = f (𝐫0L−1 |𝐱0L−1 ) = f (𝐫0 , 𝐫1 , . . . , 𝐫L−1 |𝐱0 , 𝐱1 , . . . , 𝐱L−1 ) =

L−1 ∏

f (𝐫t |𝐱t ).

t=0

The fact that the channel is assumed to be memoryless is used to obtain the last equality. It is convenient to deal with the log- likelihood function, log f (𝐫|𝐱) =

L−1 ∑

log f (𝐫t |𝐱t ).

t=0

Consider a putative estimated information sequence 𝐱̂ 0t−1 = (𝐱̂ 0 , 𝐱̂ 1 , . . . , 𝐱̂ t−1 ), with corresponding modulated sequence 𝐚̂ t−1 , resulting from the path (Ψ0 , Ψ1 , . . . , Ψt ) = Ψt0 which leaves the encoder in 0 state Ψt = p at time t. The log- likelihood function for this sequence is log f (𝐫0t−1 |𝐚̂ t−1 0 )=

t−1 ∑

log f (𝐫i |𝐚̂ i ).

i=0

Let

Mt−1 (p) = − log f (𝐫0t−1 |𝐚̂ t−1 0 )

(12.17)

denote the path metric for the path Ψt0 through the trellis defined by the information sequence 𝐱̂ 0t−1 = 𝐱(Ψt0 ) and terminating in state Ψt = p. The negative sign in this definition means that we seek to minimize the path metric (to maximize the likelihood).

12.3 Decoding Convolutional Codes

475

Other notations are also used for the path metric. The path metric could also be denoted M(𝐫0t−1 |Ψt0 ), where Ψt = p, which specifically indicates that the metric depends upon the path. Alternatively, the ̂ t0 ), denoting that the metric depends on the sequence of path metric could be denoted as M(𝐫0t−1 |𝐱(Ψ t inputs along the path Ψ0 . All of these represent the negative log likelihood (or simplifications discussed below). For the moment these latter notations are unduly heavy (although it will be used below), and the definition in (12.17) is used here. Now let the sequence 𝐱̂ 0t = (𝐱̂ 0 , 𝐱̂ 1 , . . . , 𝐱̂ t ) be obtained by appending the input 𝐱̂ t to 𝐱̂ 0t−1 and suppose the input 𝐱̂ t is such that the state at time t + 1 is Ψt+1 = q. The path metric for this longer sequence is Mt (q) = −

t ∑

log f (𝐫i |𝐚̂ i ) = −

i=0

t−1 ∑

log f (𝐫i |𝐚̂ i ) − log f (𝐫t |𝐚̂ t ) = Mt−1 (p) − log f (𝐫t |𝐚̂ t ).

i=0

Let 𝜇t (𝐫t , 𝐱̂ t ) = − log f (𝐫t |𝐚̂ t ) = − log f (𝐫t |𝐚(p,q) ) denote the negative log likelihood for this input, where 𝐚̂ t is the output resulting from the input 𝐱̂ t when the encoder is in state p at time t. As pointed out in (12.16), we could equivalently use 𝜇t (𝐫t , 𝐱̂ t ) = a[− log f (𝐫t |𝐚̂ (p,q) ) − b],

(12.18)

for any a > 0. The quantity 𝜇t (𝐫t , 𝐱̂ t ) is called the branch metric for the decoder. Since 𝐱̂ t moves the trellis from state p at time t to state q at time t + 1, we can write 𝜇t (𝐫t , 𝐱̂ t ) as 𝜇t (𝐫t , 𝐱̂ (p,q) ). Then Mt (q) =

t−1 ∑

𝜇i (𝐫i , 𝐱̂ i ) + 𝜇t (𝐫t , 𝐱̂ t ) = Mt−1 (p) + 𝜇t (𝐫t , 𝐱̂ (p,q) ).

i=0

That is, the path metric along a path to state q at time t is obtained by adding the path metric to the state p at time t − 1 to the branch metric for an input which moves the encoder from state p to state q. (If there is no such input, then the branch metric is ∞.) With this notation, we now come to the crux of the Viterbi algorithm: What do we do when paths merge? Suppose Mt−1 (p1 ) is the path metric of a path ending at state p1 at time t and Mt−1 (p2 ) is the path metric of a path ending at state p2 at time t. Suppose further that both of these states are connected to state q at time t + 1, as suggested in Figure 12.11. The resulting path metrics to state q are Mt−1 (p1 ) + 𝜇t (𝐫t , 𝐱̂ (p1 ,q) )

and

Mt−1 (p2 ) + 𝜇t (𝐫t , 𝐱̂ (p2 ,q) ).

Path through trellis leading to state Ψt = p1 Time t Mt–1 (p1)

Time t + 1

μt (rt ,xˆ (p1,q)) Mt (q) = Mt–1(p1) + μt (rt, xˆ (p1,q)) OR

Mt (q) = Mt–1(p2) + μt (rt, xˆ (p2,q)) Mt–1 (p2)

μt (rt ,xˆ (p2,q))

Path through trellis leading to state Ψt = p2

Figure 12.11: The Viterbi step: select the path with the best metric.

476

Convolutional Codes The governing principle of the Viterbi algorithm is this: To obtain the shortest path through the trellis, the path to state q must be the shortest possible. Otherwise, it would be possible to find a shorter path through the trellis starting from state q by finding a shorter path to state q. (This is Bellman’s principle of optimality; see, e.g., [28].) Thus, when the two or more paths merge, the path with the shortest path metric is retained and the other path is eliminated from further consideration. That is, Mt (q) = min{Mt−1 (p1 ) + 𝜇t (𝐫t , 𝐱̂ (p1 ,q) ), Mt−1 (p2 ) + 𝜇t (𝐫t , 𝐱̂ (p2 ,q) )} and the path with minimal length becomes the path to state q. This is called the survivor path. Since it is not known at time t < L which states the final path passes through, the paths to each state are found for each time. The Viterbi algorithm thus maintains the following data:

• •

A path metric to each state at time t. A path to each state at time t. (This may be efficiently represented by storing at each state and each time the state which is previous to it on the path leading to that state.)

Alternatively, the Viterbi algorithm could maintain the following data:

• •

A path metric to each state at time t. ̂ t0 ) along the path leading to that state. The sequence of inputs 𝐱(Ψ

The Viterbi algorithm is thus summarized as follows: 1. Initialize the path metrics M−1 (p). (For states p which can be valid starting states set M−1 (p) = 0. For states p which are invalid starting states set M−1 (p) = ∞; path extensions from those states will never be accepted by the Viterbi algorithm.) 2. For t = 0, 1, . . . , L − 1: For each state q at time t + 1, find the path metric for each path to state q by adding the path metric Mt−1 (p) of each survivor path to state p at time t to the branch metric 𝜇t (𝐫t , 𝐱̂ (p,q) ). 3. The survivor path to q is selected as that path to state q which has the smallest path metric. 4. Store the path and path metric to each state q. 5. Increment t and repeat until complete. In the event that the path metrics of merging paths are equal, a random choice can be made with no negative impact on the likelihood. More formally, there is the description in Algorithm 12.1. In this description, the path to state q is specified by listing for each state its predecessor in the graph. Other descriptions of the path are also possible. The algorithm is initialized reflecting the assumption that the initial state is 0 by setting the path metric at Ψ0 = 0 to 0 and all other path metrics to ∞ (i.e., some large number), representing the assumption that the decoder starts in state 0. In this algorithm, Πp denotes the path to state p. The operations of extending and pruning that constitute the heart of the Viterbi algorithm are summarized as: (12.19) Mt (q) = min[ Mt−1 (p) + 𝜇t (𝐫t , 𝐱̂ (p,q) ) ]. p ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Extend all paths at time t to state q . . .

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Then choose smallest cost

12.3 Decoding Convolutional Codes

477

Algorithm 12.1 The Viterbi Algorithm

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

17

18

Input: A sequence 𝐫0 , 𝐫1 , … , 𝐫L−1 Output: The sequence 𝐱̂ 0 , 𝐱̂ 1 , … , 𝐱̂ L−1 which maximizes the likelihood f (𝐫0L−1 |𝐱̂ 0L−1 ). Initialize: Set M(0) = 0 and M(p) = ∞ for p = 1, 2, … , 2𝜈 − 1 (initial path costs) Set Πp = ∅ for p = 0, 1, … , 2𝜈 − 1 (initial paths) Set t = 0 Begin For each state q at time t + 1 Find the path metric for each path to state q: For each pi connected to state q corresponding to input 𝐱̂ (pi ,q) , compute mi = M(pi ) + 𝜇t (𝐫t , 𝐱̂ (pi ,q) )). Select the smallest metric M(q) = mini mi and the corresponding predecessor state p. Extend the path to state q: Πq = [Πp q] end (for) t =t+1 if t < L − 1, goto line 6. Termination: If terminating in a known state (e.g. 0) Return the sequences of inputs along the path to that known state If terminating in any state Find final state with minimal metric; Return the sequence of inputs along that path to that state. End

Example 12.23

Consider the encoder ] [ G(x) = x2 + 1 x2 + x + 1

of Example 12.1, whose realization and trellis diagram are shown in Figure 12.5, passing the data through a BSC. When the data sequence [ ] 𝐱 = 1, 1, 0, 0, 1, 0, 1, 0, . . . [ ] = x0 , x1 , x2 , x3 , x4 , x5 , x6 , x7 , . . . is applied to the encoder, the coded output bit sequence is [ ] 𝐜 = 11, 10, 10, 11, 11, 01, 00, 01, . . . [ ] = 𝐜0 , 𝐜1 , 𝐜2 , 𝐜3 , 𝐜4 , 𝐜5 , 𝐜6 , 𝐜7 , . . . . For the BSC, we take the mapped data the same as the encoder output 𝐚t = 𝐜t . The output sequence and corresponding states of the encoder are shown here, where Ψ0 = 0 is the initial state. t

Input xk

Output 𝐜t

State Ψt+1

0 1 2 3 4 5 6 7

1 1 0 0 1 0 1 0

11 10 10 11 11 01 00 01

1 3 2 0 1 2 1 2

478

Convolutional Codes 0 1 2 3 t=0

t=1

t=2

t=3

t=4

t=5

t=6

t=7

Figure 12.12: Path through trellis corresponding to true sequence.

The sequence of states through the trellis for this encoder is shown in Figure 12.12; the solid line shows the state sequence for this sequence of outputs. The path is Ψ70 = (0, 1, 3, 2, 0, 1, 2, 1, 2) and the sequence of inputs is 𝐱(Ψ70 ) = (1, 1, 0, 0, 1, 0, 1, 0). The coded output sequence passes through a channel, producing the received sequence ] [ ] [ 𝐫 = 11 10 00 10 11 01 00 01 . . . = 𝐫0 , 𝐫1 , 𝐫2 , 𝐫3 , 𝐫4 , 𝐫5 , 𝐫6 , 𝐫7 , . . . . The two underlined bits are flipped by noise in the channel. The algorithm proceeds as follows: t = 0: The received sequence is 𝐫0 = 11. We compute the metric to each state at time t = 1 by finding the (Hamming) distance between 𝐫0 and the possible transmitted sequence 𝐜0 along the branches of the first stage of the trellis. Since state 0 was known to be the initial state, we end up with only two paths, with path metrics 2 and 0, as shown here:

r0 = 11 0

2 [0]

1

0 [1]

2 3 t=0

t=1

At each state (with a path to it), these figures show the sequence of inputs leading to that state in brackets.

12.3 Decoding Convolutional Codes

479

t = 1: The received sequence is 𝐫1 = 10. Again, each path at time t = 1 is simply extended, adding the path metric to each branch metric:

r1 = 10 0

3 [00]

1

3 [01]

2

2 [10]

3

0 [11] t=1

t=0

t=2

The memory of this encoder is m = 2. By t = m = 2, the set of paths has reached all states in the trellis. t = 2: The received sequence is 𝐫2 = 00. Each path at time t = 2 is extended, adding the path metric to each branch metric of each path.

r2 = 00 [000] [100]

0

3

3

1

3

5 2

[001] [101]

2

2

4 1

[010] [110]

3

0

1

4

t=0

t=1

t=2

4

[011] [111] t=3

There are now multiple paths to each node at time t + 1 = 3. We select the path to each node with the best metric and eliminate the other paths. This gives the diagram as follows:

r2 = 00 0

3 [000]

1

2 [101]

2

1 [110]

3

1 [111] t=0

t=1

t=2

t=3

In general, for a convolutional code with memory m, the paths at time t + 1 = m + 1 begin to merge on the states, and the Viterbi algorithm begins to make its decisions about paths. Another observation that can be made is that for a state p, the last m = 2 input bits along paths leading to the state are the binary representation of p. For example, for state p = 1 = (01)2 , all input bit sequences on paths leading to this state have their last 2 bits as 01. For the state p = 3 = (11)2 , all input bit sequences on paths leading to this state have their last 2 bits as 11. This fact is true for all times in this example. It is true in general for feedforward (FIR) encoders, and may not be true for recursive (IIR) encoders.

480

Convolutional Codes t = 3: The received sequence is 𝐫3 = 10. Each path at time t = 3 is extended, adding the path metric to each branch metric of each path.

r3 = 10 0

3

1

2

2

1

3

1

t=0

t=1

t=2

[0000] [1100]

4 2 2 1 3

t=3

4

[0001] [1101]

4

[1010] [1110]

2

[1011] [1111] t=4

Again, the best path to each state is selected. We note that in selecting the best paths, some of the paths to some states at earlier times have no successors; these orphan paths are deleted now in our portrayal:

r3 = 10 0

2 [1100]

1

2 [1101]

2

1 [1110]

3

2 [1011]

t=0

t=1

t=2

t=3

t=4

t = 4: The received sequence is 𝐫4 = 11. Each path at time t = 4 is extended, adding the path metric to each branch metric of each path.

r4 = 11 3

0 1

2

2

1

3

1

t=0

t=1

t=2

t=3

[11000] [11100]

4 1

2 2

3

1

3 2

3

t=4

2

[11001] [11101]

3

[11010] [10110]

3

[11011] [10111] t=5

In this case, we note that there are multiple paths into state 3 which both have the same path metric; also there are multiple paths into state 2 with the same path metric. Since one of the paths must be selected, the choice can be made arbitrarily (e.g., at random). After selecting and pruning of orphan paths, we obtain:

12.3 Decoding Convolutional Codes

481 r4 = 11

0

1 [11100]

1

2 [11001]

2

3 [11010]

3

3 [11011] t=0

t=1

t=2

t=3

t=4

t=5

t = 5: The received sequence is 𝐫5 = 01.

r5 = 01 2 4

[111000] [110100]

2

2 4

[111001] [110101]

2

3

2 5

[110101] [110010]

3

3

0

1

1

t=0

t=1

t=2

t=3

t=4

3

4

t=5

[110011] [110111] t=6

After selecting and pruning:

r5 = 01 0

2 [111000]

1

2 [111001]

2

2 [110010]

3

3 [110111] t=0

t=1

t=2

t=3

t=4

t=5

t=6

t = 6: The received sequence is 𝐫6 = 00.

r6 = 00 0

2

1

2

2

2

3

3

t=0

t=1

t=2

After selecting and pruning:

t=3

t=4

t=5

t=6

[1110000] [1100100]

2 4 2

4

3 4 4

[1110001] [1100101] [1110010] [1101110]

3

[1110011] [1101111]

t=7

482

Convolutional Codes r6 = 00

0

2 [1110000]

1

2 [1100101]

2

3 [1110010]

3

3 [1110011]

t=0

t=1

t=2

t=3

t=4

t=5

t=6

t=7

t = 7: The received sequence is 𝐫7 = 01.

r7 = 01 0

2

1

2

2

3

3

3

t=0

t=1

t=2

t=3

t=4

t=5

t=6

3 4 4 5 3

[11100001] [11100100] 3

[11100001] [11100101]

2

[11001010] [11100110]

4

[11001011] [11100111]

t=7

t=8

After selecting and pruning:

r7 = 01

0

3 [11100001]

1

3 [11100001]

2

2 [11001010]

3

3 [11100111]

t=0

t=1

t=2

t=3

t=4

t=5

t=6

t=7

t=8

The decoding is finalized at the end of the transmission (the 16 received data bits) by selecting the state at the last stage having the lowest cost, traversing backward along the path so indicated to the beginning of the trellis, then traversing forward again along the best path, reading the input bits and decoded output bits along the path. This is shown with the solid line below; input/output pairs are indicated on each branch.

r7 = 01 0 1

3

1/11

1/11 1/10

3

0/01

0/11

2

0/01

2

1/00 0/10

3

3 t=0

t=1

t=2

t=3

t=4

t=5

t=6

t=7

t=8

Note that the path through the trellis is the same as in Figure 12.12 and that the recovered input bit sequence is the same as the original bit sequence. Thus, out of this sequence of 16 bits, 2 bit errors have been corrected. ◽

12.3 Decoding Convolutional Codes 12.3.3 12.3.3.1

483

Some Implementation Issues The Basic Operation: Add-Compare-Select

The basic operation of the Viterbi algorithm is Add-Compare-Select (ACS): Add the branch metric to the path metric for each path leading to a state; compare the resulting path metrics at that state; and select the better metric. A schematic of this idea appears in Figure 12.13. High-speed operation can be obtained in hardware by using a bank of 2𝜈 such ACS units in parallel. A variation on this theme, the compare-select-add (CSA) operation, is capable of somewhat improving the speed for some encoders. The algorithm employing CSA is called the differential Viterbi algorithm; see [143] for a description. Branch metric state p1 to state q Path metric to state p1

ADD COMPARE

Path metric to state p2

SELECT

ADD

Path metric from state pi to state q

Branch metric state p2 to state q

Figure 12.13: Add-compare-select operation.

12.3.3.2

Decoding Streams of Data: Windows on the Trellis

In convolutional codes, data are typically encoded in a stream. Once the encoding starts, it may continue indefinitely, for example, until the end of a file or until the end of a data transmission session. If such a data stream is decoded using the Viterbi algorithm as described above, the paths through the trellis would have to have as many stages as the code is long. For a long data stream, this could amount to an extraordinary amount of data to be stored, since the decoder would have to store 2𝜈 paths whose lengths grow longer with each stage. Furthermore, this would result in a large decoding latency: strictly speaking, it would not be possible to output any decoded values until the maximum likelihood path is selected at the end of the file. Fortunately, it is not necessary to wait until the end of transmission. Consider the paths in Example 12.23. In this example, by t = 4, there is a single surviving path in the first two stages of the trellis. Regardless of how the Viterbi algorithm operates on the paths as it continues through the trellis, those first two stages could be unambiguously decoded. In general, with very high probability there is a single surviving path some number of stages back from the “current” stage of the trellis. The initial stages of the survivor paths tend to merge if a sufficient decoding delay is allowed. Thus, it is only necessary to keep a “window” on the trellis consisting of the current stage and some number of previous stages. The number of stages back that the decoding looks to make its decision is called the decoding depth, denoted by Γ. At time t the decoder outputs a decision on the code bits 𝐜t−Γ . While it is possible to make an incorrect decoding decision on a finite decoding depth, this error, called the truncation error, is typically very small if the decoding depth is sufficiently large. It has been found (see, e.g., [129,197]) that if a decoding depth of about five to ten constraint lengths is employed, then there is very little loss of performance compared to using the full length due to truncation error.

484

Convolutional Codes It is effective to implement the decoding window using a circular queue of length Γ to hold the current window. As the window is “shifted,” it is only necessary to adjust the pointers to the beginning and end of the window. As the algorithm proceeds through the stream of data, the path metrics continue to accumulate. Overflow is easily avoided by periodically subtracting from all path metrics an equal amount (for example, the smallest path metric). The path metrics then show the differential qualities of each path rather than the absolute metrics (or their approximations), but this is sufficient for decoding purposes. 12.3.3.3

Output Decisions

When a decision about the output 𝐜t−Γ is to be made at time t, there are a few ways that this can be accomplished [483]: Output 𝐜t−Γ on a randomly selected survivor path; Output 𝐜t−Γ on the survivor path with the best metric; Output 𝐜t−Γ that occurs most often among all the survivor paths; Output 𝐜t−Γ on any path. In reality, if Γ is sufficiently large that all the survivor paths have merged Γ decoding stages back, then the performance difference among these alternatives is very small. When the decision is to be output, a survivor path is selected (using one of the methods just mentioned). Then it is necessary to determine the 𝐜t−Γ . There are a couple of ways of accomplishing this, the register exchange and the traceback. In the register exchange implementation, an input register at each state contains the sequence of input bits associated with the surviving path that terminates at that state. A register capable of storing the kΓ bits is necessary for each state. As the decoding algorithm proceeds, for the path selected from state p to state q, the input register at state q is obtained by copying the input register for state p and appending the k input bits resulting in that state transition. (Double buffering of the data may be necessary, so that the input registers are not lost in copying the information over.) When an output is necessary, the first k bits of the register for the terminating state of the selected path can be read immediately from its input register. Example 12.24 For the decoding sequence of Example 12.23, the registers for the register exchange algorithm are shown here (boxed) for the first five steps of the algorithm.

0

0

0

00

1

1

1

01

2

2

10

3

3

11

t=0

t=1

t=0

0

000

1

101

2

110

3

111

t=0

t=1

t=2

t=3

t=1

t=2

12.3 Decoding Convolutional Codes

485

0

1100

1

1101

2

1110

3

1011

t=0

t=1

t=2

t=3

t=4

0

11100

1

11001

2

11010

3

11011

t=0

t=1

t=2

t=3

t=4

t=5



The number of initial bits all the registers have in common is the number of branches of the path that are shared by all paths. These common input bits are shown in bold above. In some implementations, it may be of interest to output only those bits corresponding to path branches which have merged. In the traceback method, the path is represented by storing the predecessor to each state. This sequence of predecessors is traced back Γ stages. The state transition Ψt−Γ to Ψt−Γ+1 determines the output 𝐜t−Γ and its corresponding input bits 𝐱t−Γ . Example 12.25 t:

The table

1

2

3

0/0 0/1 -

0/0 0/1 1/0 1/1

0/0 2/1 3/0 3/1

State 0: 1: 2: 3:

4

5

6

7

8

0/0 0/1 1/0 3/1

0/0 2/1 1/0 1/1

0/0 0/1 1/0 3/1

Previous State/Input 2/0 2/1 3/0 1/1

2/0 0/1 1/0 1/1

shows the previous state traceback table which would be built up by the decoding of Example 12.23. For example, at time t = 8, the predecessor of state 0 is 2, the predecessor of state 1 is 0, and so forth. Starting from state 2 (having the lowest path cost), the sequence of states can be read off in reverse order from this table (the bold entries): 2 → 1 → 2 → 1 → 0 → 2 → 3 → 1 → 0. Thus, the first state transition is from state 0 to state 1 and the input at that time is a 1. The inputs for the entire sequence can also be read off, starting at the right, 11001010.



In the traceback method, it is necessary to trace backward through the trellis once for each output. (As a variation on the theme, the predecessor to a state could be represented by the input bits that lead to that state.) In comparing the requirements for these two methods, we note that the register exchange method requires shuffling registers among all the states at each time. In contrast, the traceback method requires no such shuffling, but it does require working back through the trellis to obtain a decision. Which

486

Convolutional Codes is more efficient depends on the particular hardware available to perform the tasks. Typically, the traceback is regarded as faster but more complicated. 12.3.3.4

Hard and Soft Decoding; Quantization

Example 12.23 presents an instance of hard-decision decoding. If the outputs of a Gaussian channel had been used with the Euclidean distance as the branch metric, then soft-decision decoding could have been obtained. Comparing soft-decision decoding using BPSK over an AWGN with hard-decision decoding over a √ BSC, in which received values are converted to binary values with a probability of error of pc = Q( 2Eb ∕N0 ) (see Section 1.5.6), it has been determined that soft-decision decoding provides 2 to 3 dB of gain over hard-decision decoding. For a hard-decision metric, 𝜇t (𝐫t , 𝐱(p,q) ) can be computed and stored in advance. For example, for an n = 2 binary code, there are four possible received values, 00, 01, 10, 11, and four possible transmitted values. The metric could be stored in a 4 × 4 array, such as the following: 𝜇t (𝐫t , 𝐱̂ (p,q) ) 𝐫t = 00 01 10 11 𝐱(p,q) 𝐱(p,q) 𝐱(p,q) 𝐱(p,q)

= 00 = 01 = 10 = 11

0 1 1 2

1 0 2 1

1 2 0 1

2 1 1 0

Soft-decision decoding typically requires more expensive computation than hard-decision decoding. Furthermore, soft-decision decoding cannot exactly precompute these values to reduce the ongoing decoding complexity, since 𝐫t takes on a continuum of values. Despite these disadvantages, it is frequently desirable to use soft-decision decoding because of its superior performance. A computational compromise is to quantize the received value to a reasonably small set of values, then precompute the metrics for each of these values. By converting these metrics to small integer quantities, it is possible to efficiently accumulate the metrics. It has been found [196] that quantizing each rt(i) into 3 bits (eight quantization levels) results in a loss in coding gain of around only 0.25 dB. It is possible to trade metric computation complexity for performance, using more bits of quantization to reduce the loss. As noted above, if a branch metric 𝜇 is modified by 𝜇̃ = a𝜇 + b for any a > 0 and any real b, an equivalent decoding algorithm is obtained; this simply scales and shifts the resulting path metrics. In quantizing, it may be convenient to find scale factors which make the arithmetic easier. A widely used quantizer is presented in Section 12.4. Example 12.26 A 2-bit quantizer. In a BPSK-modulated system, the transmitted signal amplitudes are a = 1 or a = −1. The received signal rt is quantized by a quantization function Q[⋅] to obtain quantized values qt = Q[rt ] using quantization thresholds at ±1 and 0, as shown in Figure 12.14, where the quantized values are denoted as 00, 01, 10, and 11. These thresholds determine the regions Rq . That is, ⎧00 ⎪ ⎪01 qt = Q[rt ] = ⎨ ⎪10 ⎪11 ⎩

if if if if

rt rt rt rt

∈ R 00 ∈ R 01 ∈ R 10 ∈ R 11

= (−∞, −1] = (−1, 0] = (0, 1] = (1, ∞).

(This is not an optimal quantizer, merely convenient.) For each quantization bin, we can compute the likelihood that rt falls in that region, given a particular input, as P(qt |a) =

∫R q

fR (r|a) dr. t

12.3 Decoding Convolutional Codes

487

00

01 10 −1

11 1

Figure 12.14: A 2-bit quantization of the soft-decision metric.

For example, −1

P(00|a = −1) =

∫−∞

fR (r| − 1) dr =

1 . 2

Suppose the likelihoods for all quantized points are computed as follows. P(qt |a)

qt =

a=1 a = −1

00

01

10

11

0.02 0.5

0.14 0.34

0.34 0.14

0.5 0.02

The − log probabilities are − log(P(qt |a)) a=1 a = −1

qt =

00

01

10

11

3.9120 0.6931

1.9661 1.0788

1.0788 1.966

0.6931 3.9120

The logarithms of these probabilities can be approximated as integer values by computing a transformation a(log P(qt |a) − b). We use −1.619(log P(qt |a) − log P(00|1)), where the factor a = 1.619 was found by a simple computer search and b was chosen to make the smallest value 0, resulting in the following metrics:

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

concodequant.m

a(− log(P(qt |a) − b)

qt =

a=1 a = −1

00

01

10

11

5.211 0

2.061 0.624

0.624 2.061

0 5.211

which can be rounded to round a(− log(P(qt |a) − b) a=1 a = −1

qt =

00

01

10

11

5 0

2 1

1 2

0 5

Although the signal is quantized into 2 bits, the metric requires 3 bits to represent it. With additional loss of coding gain, this could be reduced to 2 bits of metric (reducing the hardware required to accumulate the path metrics). For example, the first row of the metric table could be approximated as 3, 2, 1, 0. Note that, by the symmetry of the pdf and constellation, both rows of the table have the same values, so that in reality only a single row would need to be saved in an efficient hardware implementation. ◽

488

Convolutional Codes 12.3.3.5

Synchronization Issues

The decoder must be synchronized with the stream of incoming data. If the decoder does not know which of the n symbols in a block initiates a branch of the trellis, then the data will be decoded with a very large number of errors. Fortunately, the decoding algorithm can detect this. If the data are correctly aligned with the trellis transitions, then with high probability, one (or possibly two) of the path metrics are significantly smaller than the other path metrics within a few stages of decoding. If this does not occur, the data can be shifted relative to the decoder and the decoding re-initialized. With at most n tries, the decoder can obtain symbol synchronization. Many carrier tracking devices employed in communication systems experience a phase ambiguity. For a BPSK system, it is common that the phase is determined only up to ±𝜋, resulting in a sign change. For QPSK or QAM systems, the phase is often known only up to a multiple of 𝜋∕2. The decoding algorithm can possibly help determine the absolute phase. For example, in a BPSK system if the all ones sequence is not a codeword, then for a given code sequence 𝐜, 𝟏 + 𝐜 cannot be a codeword. In this case, if decoding seems to indicate that no path is being decoded correctly (i.e., no path seems to emerge as a strong candidate compared to the other paths), then the receiver can complement all of its zeros and ones and decode again. If this decodes correctly, the receiver knows that it has the phase off by 𝜋. For a QPSK system, four different phase shifts could be examined to see when correct decoding behavior emerges.

12.4 Some Performance Results Bit error rate characterization of convolutional codes is frequently accomplished by simulation and approximation. In this section, we present performance as a function of quantization, constraint length, window length, and codeword size. These results generally follow [196], but have been recomputed here. Quantization of the metric was discussed in the previous section. In the results here, a simpler quantized metric is used. Assume that BPSK modulation is employed and that the transmitted signal amplitude a is normalized to ±1. The received signal r is quantized to m bits, resulting in M = 2m different quantization levels, using uniformly spaced quantization thresholds. The distance between quantization thresholds is Δ. Figure 12.15 shows four-level quantization with Δ = 1 and eight-level quantization using Δ = 0.5 and Δ = 13 . Rather than computing the log likelihood of the probability of falling in a decision region, in this approach the quantized q value itself is used as the branch metric if the signal amplitude 1 is sent, or the complement M − q − 1 is used if −1 is sent. The resulting integer branch metric is computed as shown in Table 12.1. This branch metric is clearly suboptimal, not being an affine transformation of the log likelihood. However, simulations have shown that it performs very close to optimal and it is widely used. Obviously, the performance depends upon the quantization threshold Δ employed. For the eight-level quantizer, the value Δ = 0.4 is employed. For the 16-level quantizer, Δ = 0.25 is used, and for the four-level quantizer, Δ = 1 is used. We demonstrate below the dependence of the bit-error rate upon Δ. 3

2

1

–1.0 7

6 –1.5

0 5

–1.0 6 –1.0

1.0

4

3

–0.5

7

0

5

0 4

1

0.5 3

0

2

2

1.0 1

0 1.5

0 1.0

Figure 12.15: Quantization thresholds for four- and eight-level quantization.

12.4 Some Performance Results

489

Table 12.1: Quantized Branch Metrics Using Linear Quantization Quantization Level Signal Amplitude

0

1

2

3 4 5 Branch Metric 𝜇

6

7

−1 1

7 0

6 1

5 2

4 3

1 6

0 7

3 4

2 5

Figure 12.16(a) shows the bit error rate as a function of SNR for codes with constraint lengths (here employing K = 1 + max deg(gj ) as the constraint length) of K = 3, K = 5, and K = 7 using eight-level uniform quantization with Δ = 0.42. The generators employed in this and the other simulations are the following: K g1 (x) g2 (x) dfree 3 1 + x2 1 + x + x2 5 1 + x + x2 + x3 6 4 1 + x + x3 5 1 + x3 + x4 1 + x + x2 + x4 7 1 + x + x2 + x3 + x5 8 6 1 + x2 + x4 + x5 10 7 1 + x2 + x3 + x5 + x6 1 + x + x2 + x3 + x6 8 1 + x2 + x5 + x6 + x7 1 + x + x2 + x3 + x4 + x7 10 The Viterbi decoder uses a window of 32 bits. As this figure demonstrates, the performance improves with the constraint length. Figure 12.16(b) shows 1-bit (hard) quantization for K = 3 through 8.

10−2

10−1

K=3 K=5 K=7

K=4 K=5 K=7 K=8

Bit error rate

Bit error rate

K=6

10−2

10−3

10−4

10−5

10−6

K=3

10−3

10−4

10−5

2

4 6 Eb/N0 (dB)

(a)

8

10−6

3

4

5 6 Eb/N0 (dB)

7

8

(b)

Figure 12.16: Bit error rate as a function of Eb ∕N0 of R = 1∕2 convolutional codes with 32 bit window decoding (following [196]). (a) Eight-level quantization, K = 3, 5, 7. (b) 1-bit (hard) quantization, K = 3 through 8.

490

Convolutional Codes Comparisons of the effect of the number of quantization levels and the decoding window are shown in Figure 12.17. In part (a), the performance of a code with K = 5 is shown with 2, 4, 8, and 16 quantization levels. As the figure demonstrates, there is very little improvement from 8 to 16 quantization levels; 8 is frequently chosen as an adequate performance/complexity tradeoff. In part (b), again a K = 5 code is characterized. In this case, the effect of the length of the decoding window is shown for two different quantization levels. With a decoding window of length 32, most of the achievable performance is attained.

10−1

10−1

2-level 4-level 8-level 16-level

10−2

W = 8, Q = 2 W = 16, Q = 2 W = 32, Q = 2 W = 8, Q = 8 W = 16, Q = 8 W = 32, Q = 8

10−2

Bit error rate

Bit error rate

10−3

10−4

10−3

10−4

10−5 10−5

10−6

10−7

2

4 6 Eb/N0 (dB)

(a)

8

10−6

2

4

6 Eb/N0 (dB)

8

10

(b)

Figure 12.17: Bit error rate as a function of Eb ∕N0 of R = 1∕2, K = 5 convolutional code with different quantization levels and decoding window lengths (following [196]). (a) 2-, 4-, 8-, and 16-level quantization, decoding window 32; (b) Decoding window length W = 8, 16, and 32; quantization levels Q = 2 and 8.

Figure 12.18 shows the effect of the quantizer threshold spacing Δ on the performance for an eight-level quantizer with a K = 5, R = 1∕2 code and a K = 5, R = 1∕4 code. The plots are at SNRs of 3.3, 3.5, and 3.7 dB (reading from top to bottom) for the R = 1∕2 code and 2.75, 2.98, and 3.19 dB (reading from top to bottom) for the R = 1∕4 code. (These latter SNRs were selected to provide roughly comparable bit error rate for the two codes.) These were obtained by simulating, counting 20,000 bit errors at each point of data. A convolutional code can be employed as a block code by simply truncating the sequence at a block length N (see Section 12.9). This truncation results in the last few bits in the codeword not having the same level of protection as the rest of the bits, a problem referred to as unequal error protection. The shorter the block length N, the higher the fraction of unequally protected bits, resulting in a higher bit error rate. Figure 12.19 shows BER for maximum likelihood decoding of convolutional codes truncated to blocks of length N = 200, N = 2000, as well as the “conventional” mode in which the codeword simply streams.

12.5 Error Analysis for Convolutional Codes

14

491

x 10−4 Rate 1/2 Rate 1/4

Bit error rate

12 10 8 6 4 0.2

0.4 0.6 0.8 Quantizer threshold spacing ∆

1

Figure 12.18: Viterbi algorithm bit error rate performance as a function of quantizer threshold spacing. 10−1 N = 200, Q = 2 N = 2000, Q = 2 N = , Q = 2

N = 200, Q = 8 N = 2000, Q = 8

Bit error rate

10−2

N = , Q = 8

10−3

10−4

2

4

6

8

10

Eb/N0 (dB)

Figure 12.19: BER performance as a function of truncation block length, N = 200 and N = 2000, for two- and eight-level quantization.

12.5 Error Analysis for Convolutional Codes While for block codes it is conventional to determine (or estimate) the probability of decoding a block incorrectly, the performance of convolutional codes is largely determined by the rate and the constraint length. It is not very meaningful to determine the probability of a block in error, since the block may

492

Convolutional Codes be very long. It is more useful to explore the probability of bit error, or the bit error rate, which is the average number of message bits in error in a given sequence of bits divided by the total number of message bits in the sequence. We shall denote the bit error rate by Pb . In this section, we develop an upper bound for Pb [471]. Consider how errors can occur in the decoding process. The decision mechanism of the Viterbi algorithm operates when two paths join together. If two paths join together and the path with the lower (better) metric is actually the incorrect path, then an incorrect decision is made at that point. We call such an error a node error and say that the error event occurs at the place where the paths first diverged. We denote the probability of a node error as Pe . A node error, in turn, could lead to a number of input bits being decoded incorrectly. Since the code is linear, it suffices to assume that the all-zero codeword is sent: With dH (𝐫, 𝐜) the Hamming distance between 𝐜 and 𝐫, we have dH (𝐫, 𝐜) = dH (𝐫 + 𝐜, 𝐜 + 𝐜) = dH (𝐫 + 𝐜, 0). Consider the error events portrayed in Figure 12.20. The horizontal line across the top represents the all-zero path through the trellis. Suppose the path diverging from the all-zero path at a has a lower (better) metric when the paths merge at a′ . This gives rise to an error event at a. Suppose that there are error events also at b and d. Now consider the path diverging at c: even if the metric is lower (better) at c′ , the diverging path from c may not ultimately be selected if its metric is worse than the path emerging at b. Similarly, the path emerging at d may not necessarily be selected, since the path merging at e may take precedence. This overlapping of decision paths makes the exact analysis of the bit error rate difficult. We must be content with bounds and approximations.

a



b

c



bʹ d

e

dʹ eʹ

Figure 12.20: Error events due to merging paths.

The following example illustrates some of these issues.

Example 12.27

Consider again the convolutional code from Example 12.1, with [ ] G(x) = 1 + x2 , 1 + x + x2 .

Suppose that the input sequence is 𝐱 = [0, 0, 0, 0, . . . ] with the resulting transmitted sequence 𝐜 = [00, 00, 00, 00, . . . ], but that the received sequence after transmission through a BSC is 𝐫 = [11, 01, 00, . . . ]. A portion of the decoding trellis for this code is shown in Figure 12.21. After three stages of the trellis when the paths merge, the metric for the lower path (shown as a dashed line) is lower than the metric for the all-zero path (the solid line). Accordingly, the Viterbi algorithm selects the erroneous path, resulting in a node error at the first node. However, while the decision results in three incorrect branches on the path through the trellis, the input sequence corresponding to this selected path is [1, 0, 0], so that only 1bit is incorrectly decoded due to this decision. As the figure shows, there is a path of metric 5 which deviates from the all-zero path. The probability of incorrectly selecting this path is denoted as P5 . This error occurs when the received sequence has three or more errors (1s) in it. In general, we denote Pd = Probability of a decoding error on a path of metric d.

12.5 Error Analysis for Convolutional Codes

r0 = 11

r1 = 01

M2 (0) = 3

r2 = 00

0/00

0/00

00

493

0/00

1/11

10

M2 (0) = 2

0/11 0/01

01

Total distance along diverging path = 5

11

(a)

r = 10 00

r = 10 0/00

r = 10 0/00

M3 (0) = 3

r = 00 0/00

0/00

1/11 0/11

10

M3 (0) = 3

1/10

01

0/10 11

(b)

Figure 12.21: Two decoding examples. (a) Diverging path of distance 5; (b) diverging path of distance 6. For a deviating path of odd weight, there will be an error if more than half of the bits are in error. The probability of this event for a BSC with crossover probability pc is Pd =

d ( ) ∑ d i pc (1 − pc )d−i i i=(d+1)∕2

(with d odd).

(12.20)

Suppose now the received signal is 𝐫 = [10, 10, 10, 00]. Then the trellis appears as in Figure 12.21(b). In this case, the path metrics are equal; one-half of the time the decoder chooses the wrong path. If the incorrect path is chosen, the decoded input bits would be [1, 1, 0, 0], with 2 bits incorrectly decoded. The probability of the event of choosing the incorrect path in this case is P6 , where Pd =

1 2

(

d d∕2

) d∕2

pc (1 − pc )d∕2 +

d ( ) ∑ d i pc (1 − pc )d−i i i=d∕2+1

is the probability that more than half of the bits are in error, plus are in error.

1 2

(with d even)

(12.21)

times the probability that exactly half the bits ◽

We can glean the following information from this example:



Error events can occur in the decoding algorithm when paths merge together. If the erroneous path has lower (better) path metric than the correct path, the algorithm selects it.



Merging paths may be of different lengths (number of stages).

494

Convolutional Codes 0/00 00 1/11 10

0/11

0/01

01

1/00 1/10

00

1/11

01

1/00 1/10

0/10

11

0/01 10

0/11

00

0/10

11 1/01

1/01

(a)

(b) D

D2

D2

1 D

D D

(c)

Figure 12.22: The state diagram and graph for diverging/remerging paths. (a) State diagram; (b) split state 0; (c) output weight irepresented by Di.



This trellis has a shortest path of metric 5 (three stages long) which diverges from the all-zero path then remerges. We say there is an error path of metric 5. There is also an error path of metric 6 (four stages long) which deviates then remerges.



When an error path is selected, the number of input bits that are erroneously decoded depends on the particular path.

• •

The probability of a particular error event can be calculated and is denoted as Pd . The error path of metric 5 was not disjoint of the error path of metric 6, since they both share a branch.

In the following sections, we first describe how to enumerate the paths through the trellis. Then bounds on the probability of node error and the bit error rate are obtained by invoking the union bound. 12.5.1

Enumerating Paths Through the Trellis

In computing (or bounding) the overall probability of decoder error, it is expedient to have a method of enumerating all the paths through the trellis. This initially daunting task is aided somewhat by the observation that, for the purposes of computing the probability of error, since the convolutional code is linear it is sufficient to consider only those paths which diverge from the all-zero path then remerge. We develop a transfer function method which enumerates all the paths that diverge from the all-zero path then remerge. This transfer function is called the path enumerator. We demonstrate the technique for the particular code we have been examining. Example 12.28 Figure 12.22(a) shows the state diagram for the encoder of Example 12.1. In Figure 12.22(b), the 00 state has been “split,” or duplicated. Furthermore, the transition from state 00 to state 00 has been omitted, since we are interested only in paths which diverge from the 00 state. Any path through the graph in Figure 12.22(b) from the 00 node on the left to the 00 node on the right represents a path which diverges from the all-zero path then remerges. In Figure 12.22(c), the output codeword of weight i along each edge is represented using Di . (For example, an output of 11 is represented by D2 ; an output of 10 is represented by D1 = D and an output of 00 is represented by D0 = 1.) For convenience the state labels have been removed.

12.5 Error Analysis for Convolutional Codes

495

Blocks in series H

=

G

Blocks in parallel

H

=

HG

H+G

G Feedback configuration

=

H

H 1–HG

G

Figure 12.23: Rules for simplification of flow graphs. D

D2

D2

1 D

D

D2

1

D

D2 1–D

1 1–D D 1–D

D2

D2

1 D+

D2 1–D

=

D 1–D

D2

D 1–D 1– 1 D –D

D2

D2

)

)

Figure 12.24: Steps simplifying the flow graph for a convolutional code. The labels on the edges in the graph are to be thought of as transfer functions. We now employ the conventional rules for flow graph simplification as summarized in Figure 12.23.

• • •

Blocks in series multiply the transfer functions. Blocks in parallel add the transfer functions. Blocks in feedback configuration employ the rule “forward gain over 1 minus loop gain.”

(For a thorough discussion on more complicated flow graphs, see [294].) For the state diagram of Figure 12.22, we take each node as a summing node. The sequence of steps by successively applying the simplification rules is shown in Figure 12.24. Simplifying the final diagram, we find T(D) = D2

1

D 1−D D − 1−D

D2 =

D5 . 1 − 2D

To interpret this, we use the formal series expansion4 (check by long division) 1 = 1 + D + D2 + D3 + · · · . 1−D Expanding T(D), we find T(D) = D5 (1 + 2D + (2D)2 + (2D)3 + · · · ) = D5 + 2D6 + 4D7 + · · · + 2k Dk+5 + · · · . 4

A formal series is an infinite series that is obtained by symbolic manipulation, without particular regard to convergence.

496

Convolutional Codes Interpreting this, we see that we have:

• • •

One diverging/remerging error path at metric 5 from the all-zero path; Two error paths of metric 6; Four error paths of metric 7, etc. ◽

Furthermore, the shortest error path has metric 5.

Definition 12.29 The minimum metric of a path diverging from then remerging to the all-zero path is called the free distance of the convolutional code, and is denoted as dfree . The number of paths at that ◽ metric is denoted as a(dfree ) = Nfree . The number of paths at a distance d is denoted as a(d). In general, we write ∞ ∑

T(D) =

a(d)Dd ,

d=dfree

where a(dfree ) = Nfree . Additional information about the paths in the trellis can be obtained with a more expressive transfer function. We label each path with three variables: Di , where i is the output code weight; N i , where i is the input weight; and L, to account for the length of the branch. Example 12.30 Returning to the state diagram of Figure 12.22, we obtain the labeled diagram in Figure 12.25. Using the same rules for block diagram simplification as previously, the transfer function for this diagram is T(D, N, L) =

D5 L3 N . 1 − DLN(1 + L)

(12.22)

Expanding this as a formal series, we have T(D, N, L) = D5 L3 N + D6 L4 (1 + L)N 2 + D7 L5 (1 + L)2 N 3 + · · · ,

(12.23)

which has the following interpretation:



There is one error path of metric 5 which is three branches long (from L3 ) and one input bit is 1 (from N 1 ) along that path.



There are two error paths of metric 6: one of them is four branches long and the other is five branches long. Along both of them, there are two input bits that are 1.



There are four error paths of metric 7. One of them is five branches long; two of them are six branches long; and one is seven branches long. Along each of these, there are three input bits that are 1. ◽ Etc.



Clearly, we can obtain the simpler transfer function by T(D) = T(D, N, L)|N=1,L=1 . When we don’t care to keep track of the number of branches, we write T(D, N) = T(D, N, L)|L=1 . DL

D2LN

D2L

LN DLN

DL DLN

Figure 12.25: State diagram labeled with output weight, input weight, and branch length.

12.5 Error Analysis for Convolutional Codes 12.5.1.1

497

Enumerating on More Complicated Graphs: Mason’s Rule

Some graphs are more complicated than the three rules introduced above can accommodate. A more general approach is Mason’s rule [294]. Its generality leads to a rather complicated notation, which we summarize here and illustrate by example (not by proof) [483]. We will enumerate all paths from the 0 state to the 0 state in the state diagram shown in Figure 12.26.

0

D2

D

1 D

D 4

D 2

1

3 5

D

D

1 D2

D

D

D2

7

D2

1 6

0

Figure 12.26: A state diagram to be enumerated.

A loop is a sequence of states which starts and ends in the same state, but otherwise does not enter any state more than once. We will say that a forward loop is a loop that starts and stops in state 0. A set of loops is nontouching if they have no vertices in common. Thus, {0, 1, 2, 4, 0} is a forward loop. The loop {3, 6, 5, 3} is a loop that does not touch this forward loop. The set of all forward loops in the graph is denoted as L = {L1 , L2 , . . . }. The corresponding set of path gains is denoted as F = {F1 , F2 , . . . }. Let  = {1 , 2 , . . . } denote the set of loops in the graph that does not contain the vertex 0. Let  = {1 , 2 , . . . } be the set of corresponding path gains. (Determining all the loops requires some care.) For the graph in Figure 12.26, the forward loops and their gains are L1 L2 L3 L4 L5 L6 L7

∶ ∶ ∶ ∶ ∶ ∶ ∶

{0, 1, 3, 7, 6, 5, 2, 4, 0} {0, 1, 3, 7, 6, 4, 0} {0, 1, 3, 6, 5, 2, 4, 0} {0, 1, 3, 6, 4, 0} {0, 1, 2, 5, 3, 7, 6, 4, 0} {0, 1, 2, 5, 3, 6, 4, 0} {0, 1, 2, 4, 0}

F1 F2 F3 F4 F5 F6 F7

= D8 = D6 = D10 = D8 = D8 = D10 = D6

and the other loops and their gains are 1 ∶ 2 ∶ 3 ∶ 4 ∶ 5 ∶ 6 ∶ 7 ∶ 8 ∶ 9 ∶ 10 ∶ 11 ∶

{1, 3, 7, 6, 5, 2, 4, 1} {1, 3, 7, 6, 4, 1} {1, 3, 6, 5, 2, 4, 1} {1, 3, 6, 4, 1} {1, 2, 5, 3, 7, 6, 4, 1} {1, 2, 5, 3, 6, 4, 1} {1, 2, 4, 1} {2, 5, 2} {3, 7, 6, 5, 3} {3, 6, 5, 3} {7, 7}

1 = D4 2 = D2 3 = D6 4 = D4 5 = D4 6 = D6 7 = D2 8 = D2 9 = D2 10 = D4 11 = D2

We also need to identify the pairs of nontouching loops in , the triples of nontouching loops in , etc., and their corresponding product gains. There are ten pairs of nontouching loops in :

498

Convolutional Codes (2 , 8 ) (4 , 8 ) (6 , 11 ) (7 , 10 ) (8 , 11 )

2 8 = D4 4 8 = D6 6 11 = D8 7 10 = D6 8 11 = D4

(3 , 11 ) (4 , 11 ) (7 , 9 ) (7 , 11 ) (10 , 11 )

3 11 = D8 4 11 = D6 7 9 = D4 7 11 = D4 10 11 = D6

There are two triplets of nontouching loops in , (4 , 8 , 11 )

4 8 11 = D8

(7 , 10 , 11 )

7 10 11 = D8

but no sets of four or more nontouching loops in . With these sets collected, we define the graph determinant Δ as ∑ ∑ ∑ i + i j − i j k + · · · , Δ=1− i

(i ,j )

(i ,j ,k )

where the first sum is over all the loops in , the second sum is over all pairs of nontouching loops in , the third sum is over all triplets of nontouching loops in , and so forth. We also define the graph cofactor of the forward path Li , denoted as Δi , which is similar to the graph determinant except that all loops touching Li are removed from the summations. This can be written as ∑ ∑ ∑ j + j k − j k l + · · · Δi = 1 − (Li ,j )

(Li ,j ,k )

(Li ,j ,k ,l )

With this notation we finally can express Mason’s rule for the transfer function through the graph: ∑ Fl Δl T(D) = l , (12.24) Δ where the sum is over all forward paths. For our example, we have Δ1 = 1 Δ5 = 1 (since there are no loops that do not contain vertices in the forward paths L1 and L5 ), Δ3 = Δ6 = 1 − 11 = 1 − D2 (since L3 and L6 do not cross vertex 7 and so do not touch loop 11 ), Δ2 = 1 − 8 = 1 − D2 (since L2 does not touch 8 but it does touch all other loops), and Δ4 = 1 − (8 + 11 ) + 8 11 = 1 − 2D2 + D4 Δ7 = 1 − (9 + 10 + 11 ) + (10 11 ) = 1 − 2D2 − D4 + D6 . The graph determinant is Δ = 1 − (D4 + D2 + D6 + D4 + D4 + D6 + D2 + D2 + D2 + D4 + D2 ) + (D4 + D8 + D6 + D6 + D8 + D4 + D6 + D4 + D4 + D6 ) − (D8 + D8 ) = 1 − 5D2 + 2D6 . Finally, using (12.24), we obtain T(D) =

2D6 − D10 . 1 − 5D2 + 2D6

12.5 Error Analysis for Convolutional Codes 12.5.2

499

Characterizing Node Error Probability Pe and Bit Error Rate Pb

We now return to the question of the probability of error for convolutional codes. Let Pj denote the set of all error paths that diverge from node j of the all-zero path in the trellis then remerge and let 𝐩i,j ∈ Pj be one of these paths. Let ΔM(𝐩i,j , 0) denote the difference between the metric accumulated along path 𝐩i,j and the all-zero path. An error event at node j occurs due to path 𝐩i,j if ΔM(𝐩i,j , 0) < 0. Letting Pe (j) denote the probability of an error event at node j, we have [ ] ⋃ Pe (j) ≤ Pr {ΔM(𝐩i,j , 0) ≤ 0} , (12.25) i

where the inequality follows since an error might not occur when ΔM(𝐩i,j , 0) = 0. The paths 𝐩i,j ∈ Pj are not all disjoint since they may share branches, so the events {ΔM(𝐩i,j , 0) ≤ 0} are not disjoint. This makes (12.25) very difficult to compute exactly. However, it can be upper bounded by the union bound (see Box 1.1 in Chapter 1) as ∑ Pe (j) ≤ Pr(ΔM(𝐩i,j , 0) ≤ 0). (12.26) i

Each term in this summation is now a pairwise event between the paths 𝐩i,j and the all-zero path. We develop expressions or bounds on the pairwise events in (12.26) for the case that the channel is memoryless. (For example, we have already seen that for the BSC, the probability Pd developed in (12.20) and (12.21) is the probability of the pairwise events in question.) For a memoryless channel, ΔM(𝐩i,j , 0) depends only on those branches for which 𝐩i,j is nonzero. Let d be the Hamming weight of 𝐩i,j and let Pd be the probability of the event that this path has a lower (better) metric than the all-zero path. Let a(d) be the number of paths at a distance d from the all-zero path. The probability of a node error event can now be written as follows: Pe (j) ≤

∞ ∑

Pr(error caused by any of the a(d) incorrect paths at distance d)

d=dfree

=

∞ ∑

a(d)Pd .

(12.27)

d=dfree

Any further specification on Pe (j) requires characterization of Pd . We show below that bounds on Pd can be written in the form Pd < Z d (12.28) for some channel-dependent function Z and develop explicit expressions for Z for the BSC and AWGN channel. For now, we simply express the results in terms of Z. With this bound, we can write Pe (j)
P 5 .

12.5.3



A Bound on Pd for Discrete Channels

In this section, we develop a bound on Pd for discrete channels such as the BSC [[483], Section 12.3.1]. Let p(ri |1) denote the likelihood of the received signal ri , given that the symbol corresponding to 1 was sent through the channel; similarly p(ri |0). Then Pd = P(ΔM(𝐩i,j , 0) ≤ 0 = P[

d ∑

and

log P(ri |1) −

i=1

d ∑

dH (𝐩i,j , 0) = d)

log P(ri |0) ≥ 0],

i=1

where {r1 , r2 , . . . , rd } are the received signals at the d coordinates where 𝐩i,j is nonzero. Continuing, [ d ] [ d ] ∑ ∏ p(ri |1) p(ri |1) log ≥0 =P ≥1 . Pd = P p(ri |0) p(ri |0) i=1 i=1 Let R′ be the set of vectors of elements 𝐫 = (r1 , r2 , . . . , rd ) such that d ∏ p(ri |1) i=1

p(ri |0)

≥ 1.

(12.33)

(For example, for d = 5 over the BSC, R′ is the set of vectors for which 3 or 4 or 5 of the ri are equal ∏ to 1.) The probability of any one of these elements is di=1 p(ri |0), since we are assuming that all zeros ∏ are sent. Thus, the probability of any vector in R′ is di=1 p(ri |0). The probability Pd can be obtained by summing over all the vectors in R′ : Pd =

d ∑∏

p(ri |0).

𝐫∈R′ i=1

Since the left-hand side of (12.33) is ≥ 1, we have Pd ≤

d ∑∏ 𝐫∈R′ i=1

[

p(ri |1) p(ri |0) p(ri |0)

]s

for any s such that 0 ≤ s < 1. The tightest bound is obtained by minimizing with respect to s: [ ] d d ∑∏ ∑∏ p(ri |1) s p(ri |0) = min p(ri |0)1−s p(ri |1)s . Pd ≤ min 0≤s 0 for all codes of rate R < 1. The cumulative path length bias for a path of nni bits is nni (1 − R): the path length bias increases linearly with the path length. For the BSC, the branch metric is { if rj = aj log2 (1 − pc ) − log2 12 − R = log2 2(1 − pc ) − R 𝜇(rj , aj ) = (12.47) log2 pc − log2 12 − R = log2 2pc − R if rj ≠ aj . Example 12.39 Let us contrast the Fano metric with the ML metric for the data in Example 12.38, assuming that pc = 0.1. Using the Fano metric, we have M([00], 𝐫) = log2 (1 − pc ) + log2 pc + 2(1 − 1∕2) = −2.479 M([11, 10, 10, 11, 11, 01], 𝐫) = 10log2 (1 − pc ) + 2log2 pc + 12(1 − 1∕2) = −2.164. Thus, the longer path has a better (higher) Fano metric than the shorter path. Example 12.40



Suppose that R = 1∕2 and pc = 0.1. Then from (12.47), { 0.348 rj = aj 𝜇(rj , aj ) = −2.82 rj ≠ aj .

It is common to scale the metrics by a constant so that they can be closely approximated by integers. Scaling the metric by 1∕0.348 results in the metric { 1 rj = aj 𝜇(rj , aj ) = −8 rj ≠ aj . Thus, each bit aj that agrees with rj results in a +1 added to the metric. Each bit aj that disagrees with rj results in −8 added to the metric. ◽

A path with only a few errors (the correct path) tends to have a slowly increasing metric, while an incorrect path tends to have a rapidly decreasing metric. Because the metric decreases so rapidly, incorrect paths are not extended far before being effectively rejected.

514

Convolutional Codes For BPSK transmission through an AWGN , the branch metric is 𝜇(rj , aj ) = log2 p(rj |aj ) − log2 p(rj ) − R, where p(rj |aj ) is the PDF of a Gaussian r.v. with mean aj and variance 𝜎2 = N0 ∕2 and p(rj ) = 12.8.3

p(rj |aj = 1) + p(rj |aj = −1) 2

.

The Stack Algorithm

Let 𝐚̃ (i) represent a path through the tree and let M(̃𝐚(i) , 𝐫) represent the Fano metric between 𝐚̃ (i) and the received sequence 𝐫. These are stored together as a pair (M(̃𝐚(i) , 𝐫), 𝐚̃ (i) ) called a stack entry. In the stack algorithm, an ordered list of stack entries is maintained which represents all the partial paths which have been examined so far. The list is ordered with the path with the largest (best) metric on top, with decreasing metrics beneath. Each decoding step consists of pulling the top stack entry off the stack, computing the 2k successor paths and their path metrics to that partial path, then rearranging the stack in order of decreasing metrics. When the top partial path consists of a path from the root node to a leaf node of the tree, then the algorithm is finished.

Algorithm 12.2 The Stack Algorithm 1 2 3 4 5 6

7

Input: A sequence 𝐫0 , 𝐫1 , … , 𝐫L−1 Output: The sequence 𝐚0 , 𝐚1 , … , 𝐚L−1 Initialize: Load the stack with the empty path with Fano path metric 0: S = (∅, 0). Compute the metrics of the successors of the top path in the stack Delete the top path from the stack Insert the paths computed in step 4 into the stack, and rearrange the stack in order of decreasing metric values. If the top path in the stack terminates at a leaf node of the tree, Stop. Otherwise, goto step 4.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

Example 12.41

The encoder and received data of Example 12.23 are used in the stack algorithm. We have ] [ 𝐫 = 11 10 00 10 11 01 00 01 . . . .

α10 α11 α12 α13 α14 α15 α16

teststack.m stackalg.m fanomet.m

Figure 12.30 shows the contents of the stack as the algorithm progresses. After 14 steps of the algorithm, the algorithm terminates with the correct input sequence on top. (The metrics here are not scaled to integers.) ◽

A major part of the expense of the stack algorithm is the need to sort the metrics at every iteration of the algorithm. A variation on this algorithm due to Jelinek [219] known as the stack bucket algorithm avoids some of this complication. In the stack bucket algorithm, the range of possible metric values (e.g., for the data in Figure 12.30, the range is from 0.7 to −9.2) is partitioned into fixed intervals, where each interval is allocated a certain number of storage locations called a bucket. When a path

12.8 Suboptimal Decoding Algorithms for Convolutional Codes Step 1 0.7 [1] –5:6 [0]

Step 3 –1:1 [111] –1:1 [110] –4:9 [10] –5:6 [0]

Step 2 1.4 [11] –4:9 [10] –5:6 [0]

–1:1 –2:2 –2:2 –4:9 –5:6 –6 –6:7

Step 6 [110] [111001] [111000] [10] [0] [11101] [1111]

–3:6 –3:6 –3:9 –3:9 –4:6 –4:6 –4:9 –5:6 –6 –6:7 –7:8

Step 10 [1100] [1101] [11100000] [11100001] [1110011] [1110010] [10] [0] [11101] [1111] [1110001]

–1:5 –3:6 –3:9 –3:9 –4:6 –4:6 –4:9 –5:6 –6 –6:7 –7:8 –7:8 –8:5 –9:2

Step 13 [1100101] [1101] [11100001] [11100000] [1110010] [1110011] [10] [0] [11101] [1111] [1110001] [1100100] [110011] [11000]

–2:2 –2:2 –3:6 –3:6 –4:9 –5:6 –6 –6:7

Step 7 [111000] [111001] [1101] [1100] [10] [0] [11101] [1111]

–2:9 –3:6 –3:9 –3:9 –4:6 –4:6 –4:9 –5:6 –6 –6:7 –7:8 –9.2

Step 11 [11001] [1101] [11100001] [11100000] [1110010] [1110011] [10] [0] [11101] [1111] [1110001] [11000]

–0:77 –3:6 –3:9 –3:9 –4:6 –4:6 –4:9 –5:6 –6 –6:7 –7:1 –7:8 –7:8 –8:5 –9:2

Step 14 [11001010] [1101] [11100000] [11100001] [1110011] [1110010] [10] [0] [11101] [1111] [11001011] [1110001] [1100100] [110011] [11000]

Step 4 –0:39 [1110] –1:1 [110] –4:9 [10] –5:6 [0] –6:7 [1111] –1:5 –2:2 –3:6 –3:6 –4:9 –5:6 –6 –6:7 –7:8

Step 8 [1110000] [111001] [1100] [1101] [10] [0] [11101] [1111] [1110001] –2:2 –3:6 –3:9 –3:9 –4:6 –4:6 –4:9 –5:6 –6 –6:7 –7:8 –8:5 –9:2

Step 12 [110010] [1101] [11100000] [11100001] [1110011] [1110010] [10] [0] [11101] [1111] [1110001] [110011] [11000]

515 Step 5 0.31 [11100] –1:1 [110] –4:9 [10] –5:6 [0] –6 [11101] –6:7 [1111] Step 9 –2:2 [111001] –3:6 [1101] –3:6 [1100] –3:9 [11100001] –3:9 [11100000] –4:9 [10] –5:6 [0] –6 [11101] –6:7 [1111] –7:8 [1110001]

Figure 12.30: Stack contents for stack algorithm decoding example: metric, [input list]. is extended, it is deleted from its bucket and a new path is inserted as the top item in the bucket containing the metric interval for the new metric. Paths within buckets are not reordered. The top path in the nonempty bucket with the highest metric interval is chosen as the path to be extended. Instead of sorting, it only becomes necessary to determine which bucket new paths should be placed in. Unfortunately, the bucket approach does not always choose the best path, but only a “very good” path, to extend. Nevertheless, if there are enough buckets that the quantization into metric intervals is not too coarse, and if the received signal is not too noisy, then the top bucket contains only the best path. Any degradation from optimal is minor.

516

Convolutional Codes Another practical problem is that the size of the stack must necessarily be limited. For long codewords, there is always the probability that the stack fills up before the correct codeword is found. This is handled by simply throwing away the paths at the bottom of the stack. Of course, if the path that would ultimately become the best path is thrown away at an earlier stage of the algorithm, the best path can never be found. However, if the stack is sufficiently large, the probability that the correct path will be thrown away is negligible. Another possibility is to simply throw out the block where the frame overflow occurs and declare an erasure. 12.8.4

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

The Fano Algorithm

While the stack algorithm moves around the decoding tree, and must therefore save information about each path still under consideration, the Fano algorithm retains only one path and moves through the tree only along the edges of the tree. As a result, it does not require as much memory as the stack algorithm. However, some of the nodes are visited more than once, requiring recomputation of the metric values. The general outline of the decoder is as follows. A threshold value T is maintained by the algorithm. The Fano decoder moves forward through the tree as long as the path metric at the next node (the “forward” node), denoted by MF , exceeds the threshold and the path metric continues to increase (improve). The algorithm is thus a depth-first search. When the path metric would drop below the threshold if a forward move were made, the decoder examines the preceding node (the “backward” node, with path metric MB ). If the path metric at the backward node does not exceed the current threshold, then the threshold is reduced by Δ, and the decoder examines the next forward node. If the path metric at the previous node does exceed the threshold, then the decoder backs up and begins to examine other paths from that node. (This process of moving forward, then backward, then adjusting the threshold, and moving forward again is why nodes may be visited many times.) If all nodes forward of that point have already been examined, then the decoder once again considers backing up. Otherwise, the decoder moves forward on one of the remaining nodes. Each time a node is visited for the first time (and if it is not at the end of the tree), the decoder “tightens” the threshold by the largest multiple of Δ such that the adjusted threshold does not exceed the current metric. (As an alternative, tightening is accomplished in some algorithms by simply setting T = MF .) Since the Fano algorithm does backtracking, the algorithm needs to keep the following information at each node along the path that it is examining: the path metric at the previous node, MB ; which of the 2k branches it has taken; the input at that node; and also the state of the encoder, so that next branches in the tree can be computed. A forward move consists of adding this information to the end of the path list. A backward move consists of popping this information from the end of the path list. The path metric at the root node is set at MB = −∞; when the decoder backs up to that node, the threshold is always reduced and the algorithm moves forward again. Figure 12.31 shows the flowchart for the Fano algorithm.

fanoalg.m

The number i indicates the length of the current path. At each node, all possible branches might be taken. The metric to each next node is stored in sorted order in the array P. At each node along the path, the number ti indicates which branch number has been taken. When ti = 0, the branch with the best metric is chosen, when ti = 1 the next best metric is chosen, and so forth. The information stored about each node along the path includes the input, the state at that node, and ti . (Recomputing the metric when backtracking could be avoided by also storing P at each node.) The threshold is adjusted by the quantity Δ. In general, the larger Δ is, the fewer the number of computations are required. Ultimately, Δ must be below the likelihood of the maximum likelihood path, and so must be lowered to that point. If Δ is too small, then many iterations might be required to get to that point. On the other hand, if Δ is lowered in steps that are too big, then the threshold might be

12.8 Suboptimal Decoding Algorithms for Convolutional Codes

517

Initialize: i = 0, T = 0, M–1 = –∞, M0 = 0

ti = 0 (to choose best node) A Compute and sort metrics for 2k paths from current node. Store in P

Look forward to next unexamined node and find metric Mi+1 = P(ti)

Mi+1 ≥ T?

(Examine next unexamined node)

Yes (No more paths ahead of this node to examine)

No

Yes Move forward: save ti, inputi, statei. i=i+1

Look back: Mi–1 ≥ T ?

Yes

(See if there are any unexamined No nodes forward of this node)

ti = 2k ?

ti = ti + 1

Move back: i=i–1

No Loosen threshold T=T–∆

Yes End of search? i=L

No No

STOP

First visit to this node?

Yes Tighten threshold

Figure 12.31: Flowchart for the Fano algorithm. set low enough that other paths which are not the maximum likelihood path also exceed the threshold and can be considered by the decoder. Based on simulation experience [271], Δ should be in the range of (2, 8) if unscaled metrics are used. If scaled metrics are used, then Δ should be scaled accordingly. The value of Δ employed should be explored by thorough computer simulation to ensure that it gives adequate performance.

518

Convolutional Codes Example 12.42 Suppose the same code and input sequence as in Example 12.23 is used. The following traces the execution of the algorithm when the scaled (integer) metric of Example 12.40 is used with Δ = 10. The step number n is printed every time the algorithm passes through point A in the flow chart. 1. n = 1: T = 0 P = [2 − 16] t0 = 0 2. Look forward: MF = 2 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = 2 MB = 0 Node=[1] 6. M = 2 T = 0

1. n = 2: T = 0 P = [4 − 14] t1 = 0 2. Look forward: MF = 4 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = 4 MB = 2 Node=[11] 6. M = 4 T = 0

1. n = 3: T = 0 P = [−3 − 3] t2 = 0 2. Look forward: MF = −3 3. MF < T: Look back 4. MB = 2 5. MB ≥ T: Move back 6. All forward nodes not yet tested. t1 = 1 7. MF = −3 MB = 2 Node=[1] 8. M = 2 T = 0

1. n = 4: T = 0 P = [4 − 14] t1 = 1 2. Look forward: MF = −14 3. MF < T: Look back 4. MB = 0 5. MB ≥ T: Move back 6. All forward nodes not yet tested. t0 = 1 7. MF = −14 MB = 0 Node=[] 8. M = 0 T = 0

1. n = 5: T = 0 P = [2 − 16] t0 = 1 2. Look forward: MF = −16 3. MF < T: Look back 4. MB = −∞ 5. MB < T: T = T − Δ

12.8 Suboptimal Decoding Algorithms for Convolutional Codes 6. MF = −16 MB = −∞ Node=[] 7. M = 0 T = −10

1. n = 6: T = −10 P = [2 − 16] t0 = 0 2. Look forward: MF = 2 3. MF ≥ T. Move forward 4. MF = 2 MB = 0 Node=[1] 5. M = 2 T = −10

1. n = 7: T = −10 P = [4 − 14] t1 = 0 2. Look forward: MF = 4 3. MF ≥ T. Move forward 4. MF = 4 MB = 2 Node=[11] 5. M = 4 T = −10

1. n = 8: T = −10 P = [−3 − 3] t2 = 0 2. Look forward: MF = −3 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −3 MB = 4 Node=[111] 6. M = −3 T = −10

1. n = 9: T = −10 P = [−1 − 19] t3 = 0 2. Look forward: MF = −1 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −1 MB = −3 Node=[1110] 6. M = −1 T = −10

1. n = 10: T = −10 P = [1 − 17] t4 = 0 2. Look forward: MF = 1 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = 1 MB = −1 Node=[11100] 6. M = 1 T = 0

1. n = 11: T = 0 P = [−6 − 6] t5 = 0 2. Look forward: MF = −6 3. MF < T: Look back

519

520

Convolutional Codes 4. MB = −1 5. MB < T: T = T − Δ 6. MF = −6 MB = −1 Node=[11100] 7. M = 1 T = −10

1. n = 12: T = −10 P = [−6 − 6] t5 = 0 2. Look forward: MF = −6 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −6 MB = 1 Node=[111001] 6. M = −6 T = −10

1. n = 13: T = −10 P = [−13 − 13] t6 = 0 2. Look forward: MF = −13 3. MF < T: Look back 4. MB = 1 5. MB ≥ T: Move back 6. All forward nodes not yet tested. t5 = 1 7. MF = −13 MB = 1 Node=[11100] 8. M = 1 T = −10

1. n = 14: T = −10 P = [−6 − 6] t5 = 1 2. Look forward: MF = −6 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −6 MB = 1 Node=[111000] 6. M = −6 T = −10

1. n = 15: T = −10 P = [−4 − 22] t6 = 0 2. Look forward: MF = −4 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −4 MB = −6 Node=[1110000] 6. M = −4 T = −10

1. n = 16: T = −10 P = [−11 − 11] t7 = 0 2. Look forward: MF = −11 3. MF < T: Look back 4. MB = −6 5. MB ≥ T: Move back

12.8 Suboptimal Decoding Algorithms for Convolutional Codes 6. All forward nodes not yet tested. t6 = 1 7. MF = −11 MB = −6 Node=[111000] 8. M = −6 T = −10

1. n = 17: T = −10 P = [−4 − 22] t6 = 1 2. Look forward: MF = −22 3. MF < T: Look back 4. MB = 1 5. MB ≥ T: Move back 6. No more forward nodes 7. MB = −1 8. MB ≥ T: Move back 9. All forward nodes not yet tested. t4 = 1 10. MF = −22 MB = −1 Node=[1110] 11. M = −1 T = −10

1. n = 18: T = −10 P = [1 − 17] t4 = 1 2. Look forward: MF = −17 3. MF < T: Look back 4. MB = −3 5. MB ≥ T: Move back 6. All forward nodes not yet tested. t3 = 1 7. MF = −17 MB = −3 Node=[111] 8. M = −3 T = −10

1. n = 19: T = −10 P = [−1 − 19] t3 = 1 2. Look forward: MF = −19 3. MF < T: Look back 4. MB = 4 5. MB ≥ T: Move back 6. All forward nodes not yet tested. t2 = 1 7. MF = −19 MB = 4 Node=[11] 8. M = 4 T = −10

1. n = 20: T = −10 P = [−3 − 3] t2 = 1 2. Look forward: MF = −3 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −3 MB = 4 Node=[110] 6. M = −3 T = −10

521

522

Convolutional Codes 1. n = 21: T = −10 P = [−10 − 10] t3 = 0 2. Look forward: MF = −10 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −10 MB = −3 Node=[1101] 6. M = −10 T = −10

1. n = 22: T = −10 P = [−17 − 17] t4 = 0 2. Look forward: MF = −17 3. MF < T: Look back 4. MB = −3 5. MB ≥ T: Move back 6. All forward nodes not yet tested. t3 = 1 7. MF = −17 MB = −3 Node=[110] 8. M = −3 T = −10

1. n = 23: T = −10 P = [−10 − 10] t3 = 1 2. Look forward: MF = −10 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −10 MB = −3 Node=[1100] 6. M = −10 T = −10

1. n = 24: T = −10 P = [−8 − 26] t4 = 0 2. Look forward: MF = −8 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −8 MB = −10 Node=[11001] 6. M = −8 T = −10

1. n = 25: T = −10 P = [−6 − 24] t5 = 0 2. Look forward: MF = −6 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −6 MB = −8 Node=[110010] 6. M = −6 T = −10

1. n = 26: T = −10 P = [−4 − 22] t6 = 0 2. Look forward: MF = −4 3. MF ≥ T. Move forward

12.8 Suboptimal Decoding Algorithms for Convolutional Codes

523

Table 12.6: Performance of Fano Algorithm on a Particular Sequence as a Function of Δ Δ

Number of decoding steps

Correct decoding

Δ

Number of decoding steps

Correct decoding

1 2 3 4 5 6

158 86 63 47 40 33

Yes Yes No No Yes No

7 8 9 10 11 12

31 31 31 27 16 16

No No No Yes No No

4. First visit: Tighten T 5. MF = −4 MB = −6 Node=[1100101] 6. M = −4 T = −10

1. n = 27: T = −10 P = [−2 − 20] t7 = 0 2. Look forward: MF = −2 3. MF ≥ T. Move forward 4. First visit: Tighten T 5. MF = −2 MB = −4 Node=[11001010] 6. M = −2 T = −10 For this particular set of data, the value of Δ has a tremendous impact both on the number of steps the algorithm takes and whether it decodes correctly. Table 12.6 shows that the number of decoding steps decreases typically as Δ gets larger, but that the decoding might be incorrect for some values of Δ. ◽

In comparing the stack algorithm and the Fano algorithm, we note the following.

• •

The stack algorithm visits each node only once, but the Fano algorithm may revisit nodes. The Fano algorithm does not have to manage the stack (e.g., resort the metrics).

Despite its complexity, when the noise is low, the Fano algorithm tends to decode faster than the stack algorithm. However, as the noise increases, more backtracking might be required and the stack algorithm has the advantage. Overall, the Fano algorithm is usually selected when sequential decoding is employed. 12.8.5

Other Issues for Sequential Decoding

We briefly introduce some issues related to sequential decoding, although space precludes a thorough treatment. References are provided for interested readers. 12.8.5.1

Computational complexity

The computational complexity is a random variable, and so is described by a probability distribution. Discussions of the performance of the decoder appear in [129, 214, 218, 396]. 12.8.5.2

Code design

The performance of a code decoded using the Viterbi algorithm is governed largely by the free distance dfree . For sequential decoding however, the codewords must have a distance that increases as rapidly as

524

Convolutional Codes possible over the first few symbols of the codeword (i.e., the code must have a good column distance function) so that the decoding algorithm can make good decisions as early as possible. A large free distance and a small number of nearest neighbors are also important. A code having an optimum distance profile is one in which the column distance function over the first constraint length is better than all other codes of the same rate and constraint length. Tables of codes having optimum distance profiles are provided in [231]. Further discussion and description of the algorithms for finding these codes appear in [57, 225–229]. 12.8.5.3

Variations on sequential decoding algorithms

In the interest of reducing the statistical variability of the decoding, or improving the decoder performance, variations on the decoding algorithms have been developed. In [69], a multiple stack algorithm is presented. This operates like the stack algorithm, except that the stack size is limited to a fixed number of entries. If the stack fills up before decoding is complete, the top paths are transferred to a second stack and decoding proceeds using these best paths. If this stack also fills up before decoding is complete, a third stack is created using the best paths, and so forth. In [112] and [220], hybrid algebraic/sequential decoding was introduced in which algebraic constraints are imposed across frames of sequentially decoded data. In [173], a generalized stack algorithm was proposed, in which more than one path in the stack can be extended at one time (as in the Viterbi algorithm) and paths merging together are selected as in the Viterbi algorithm. Compared to the stack algorithm, the generalized stack algorithm does not have buffer size variations as large and the error probability is closer to that of the Viterbi algorithm. 12.8.6

A Variation on the Viterbi Algorithm: The 𝐌 Algorithm

For a trellis with a large number of states at each time instant, the Viterbi algorithm can be very complex. Furthermore, since there is only one correct path, most of the computations are expended in propagating paths that are not be used, but must be maintained to ensure the optimality of the decoding procedure. The M algorithm (see, e.g., [347]) is a suboptimal, breadth-first decoding algorithm whose complexity is parametric, allowing for more complexity in the decoding algorithm while decoding generally closer to the optimum. A list of M paths is maintained. At each time step, these M paths are extended to M2k paths (where k is the number of input bits), and the path metric along each of these paths is computed just as for the Viterbi algorithm. The path metrics are sorted, then the best M paths are retained in preparation for the next step. At the end of the decoding cycle, the path with the best metric is used as the decoded value. While the underlying graphical structure for the Viterbi algorithm is a trellis, in which paths merge together, the underlying graphical structure for the M algorithm is a tree: the merging of paths is not explicitly represented, but better paths are retained by virtue of the sorting operation. The M-algorithm is thus a cross between the stack algorithm (but using all paths of the same length) and the Viterbi algorithm. If M is equal to the number of states, the M algorithm is nearly equivalent to the Viterbi algorithm. However, it is possible for M to be significantly less than the number of states with only modest loss of performance. Another variation is the T-algorithm. It starts just like the M algorithm. However, instead of retaining only the best M paths, in the T algorithm all paths which are within a threshold T of the best path at that stage are retained.

12.9 Convolutional Codes as Block Codes and Tailbiting Codes Convolutional codes may be used as block codes, in which a fixed number of bits is taken as the input bits, producing a fixed number of output bits. For a rate R = k∕n convolutional code in an encoder with memory 𝜈 , there are various ways that this can be accomplished.

12.10 A Modified Expression for the Path Metric



(Direct truncation) Starting from a 0 initial state, a sequence of Lk bits can be input to the encoder, and Ln output bits are taken as the codeword. This produces an (Ln, Lk) block code. The encoder is left in a state determined by the input sequence, unknown in advance at the decoder. This straightforward method is referred to as direct truncation. The problem is that the full effect of the last bits 𝜈 on the codeword are not seen, since the output is just truncated. The result is that these bits do not have the same error protection as earlier bits.



(Zero tail). In order to let the latter bits propagate through the decoder and experience full code protection, a sequence of 𝜈 bits can be input to the encoder after the Lk input bits are provided, taking (L + 𝜈)n output bits from the decoder. All the bits have the same error protection. The decoder is left in the all-zero state, information which is used by the decoder. But the rate of the code is now, k k 𝜈 kL = − . (L + 𝜈)n n n L + 𝜈 The rate is reduced by the term 𝜈∕(L + 𝜈), which is called the rate loss.

A third alternative is to use a tailbiting code, in which the encoder starts and ends in the same state. While this state may not be known in advance by the decoder, knowing that the starting and ending starts are the same can be used by the decoder. Consider a sequence of L = kh bits x1 , x2 , . . . , xL to be encoded. Tailbiting encoding can be achieved as follows: 1. Starting with the encoder in the all-zero state, encode the last 𝜈 bits xL−𝜈+1 , . . . , xL and discard the encoder output. This places the encoder in some state Ψ. 2. Encode the entire sequence of bits x1 , x2 , . . . , xL , and use the sequence of kn bits as the codeword. As the last 𝜈 bits are encoded, the encoder returns back to state Ψ. The Viterbi algorithm decoder can exploit the fact that both the starting state and the ending state are the same (although not yet known). There are different ways that the decoder can exploit the fact that the starting and ending states are the same. The optimum way is to use the Viterbi algorithm to decode 2𝜈 times, starting successively in each state, and taking the path metric in the starting state. The path metric that is smallest out of all these trials determines the decoded sequence. This is quite expensive, requiring 2𝜈 operations of the Viterbi algorithm over the entire codewords. A less expensive approach, referred to as the Bar–David approach [285], operates as follows. 1. Choose an arbitrary state. 2. Decode using the Viterbi algorithm, selecting the state at the end with the best path metric. 3. If this ending state is the same as the starting state, then decoding is complete. 4. Otherwise, use the end state determined in step 2 as the new starting state. If this starting state has been tried before, return to step 2. Otherwise, return to step 1. At signal-to-noise ratios high enough to produce usably low probability of error, the Bar-David approach may have less than a dB of penalty compared with the multiple Viterbi approach.

12.10 A Modified Expression for the Path Metric In this section, we modify the expression for the path metric defined in (12.17), expressing the branch metric in terms of likelihood ratios. Looking ahead to the SOVA (Section 12.11), we will also incorporate prior probabilities into the path metric.

525

526

Convolutional Codes For the rate R = k∕n encoder, we assume here that k = 1, and that BPSK modulation is employed. The input bit xt produces the coded sequence 𝐜t ∈ {0, 1}n , which can be represented as signed quantities as 𝐜̃t = 2𝐜t − 1 ∈ {−1, 1}n . The corresponding modulated symbols, modulated with signal amplitude a, are 𝐚t = 2𝐜t − 1 = ã𝐜t ∈ {−1, 1}n . For an input bit xt ∈ {0, 1}, denote x̃ t ∈ {−1, 1}, where x̃ t = 2xt − 1. The received signal is 𝐫t = 𝐚t + 𝐧t , where each 𝐧t,i ∼  (0, 𝜎 2 ), independently. For a received sequence 𝐫0t−1 = (𝐫0 , 𝐫1 , . . . , 𝐫t−1 ), the path metric along the path determined by the state sequence Ψt0 = (Ψ0 , Ψ1 , . . . , Ψt ), which terminates in state Ψt , is M(𝐫0t−1 |Ψt0 ) = − log[f (𝐫0t−1 |𝐚(Ψt0 ))P(x(Ψt0 ))].

(12.48)

It is similar to the path metric defined in (12.17), except that it includes the factor P(x(Ψt0 )), which is the prior probability of the sequence of inputs x(Ψt0 ). It is assumed that each bit is independent, so that P(x(Ψt0 )) = P(x(Ψ10 ))P(x(Ψ21 )) · · · P(x(Ψtt−1 )). Assuming a memoryless channel, M(𝐫0t−1 |Ψt0 )

=−

[ t−1 ∑ 𝓁=0

] log f (𝐫𝓁 |𝐚(Ψ𝓁+1 )) 𝓁

+

log P(x(Ψ𝓁+1 )) 𝓁

.

(12.49)

The path metric M(𝐫0t−1 |Ψt0 ) can be extended along the path to the state Ψt+1 at the next time as [ t ] ∑ M(𝐫0t |Ψt+1 )=− log f (𝐫𝓁 |𝐚(Ψ𝓁+1 )) + log P(x(Ψ𝓁+1 )) 0 𝓁 𝓁 𝓁=0

=−

[( t−1 ∑ 𝓁=0

) log f (𝐫𝓁 |𝐚(Ψ𝓁+1 )) 𝓁

+

log P(x(Ψ𝓁+1 )) 𝓁

] +

log f (𝐫t |𝐚(Ψt+1 t ))

+

log P(x(Ψt+1 t ))

= M(𝐫0t−1 |Ψt0 ) + 𝜇(𝐫t , Ψt+1 ), where t+1 𝜇(𝐫t , Ψt , Ψt+1 ) = −[log f (𝐫t |𝐚(Ψt+1 t )) + log P(x(Ψt ))]

is the branch metric, incorporating information along the state transition from Ψt to Ψt+1 and the log probability of the information symbol x(Ψt+1 t ) which causes the transition. The branch metric can be further decomposed, assuming a memoryless channel, as [ n ] ∑ t+1 t+1 log f (𝐫t,j |𝐚(Ψt )j ) + log P(x(Ψt )) . 𝜇(𝐫t , Ψt , Ψt+1 ) = − j=1

The posterior probability of the sequence 𝐚(Ψt+1 ), given the sequence 𝐫0t can be expressed in terms of 0 this metric as ))P(𝐚(Ψt+1 )) exp(−M(𝐫0t |Ψt+1 )) f (𝐫0t |𝐚(Ψt+1 0 0 0 t )|𝐫 ) = P(𝐚(Ψt+1 = 0 t t 0 f (𝐫0 ) f (𝐫0 ) or, more simply, P(𝐚(Ψt+1 )|𝐫0t ) ∝ exp(−M(𝐫0t |Ψt+1 )). 0 0 The metric (12.48) can modified to be expressed in terms of log probability ratios. Recalling that a metric may be transformed by affine transformations (recall (12.16), with a > 0), a new metric is created by multiplying the metric M by 2 and subtracting off a constant that does not depend on the path. Define the constants applicable at time t j

Cr,t = −[log f (𝐫t,j |̃𝐜t,j = +1) + log f (𝐫t,j |̃𝐜t,j = −1)]

12.10 A Modified Expression for the Path Metric

527

and cu,t = −[log P(xt = +1) + log P(xt = −1)]. These constants do not depend on the branch in the path taken, since they depend on all possible outputs across any branch. At time t = 0, assuming initial path metrics are 0, the path metric is simply the branch metric, [ n ] ∑ 1 1 1 0 log f (𝐫0,j |𝐚(Ψ0 )j ) + log P(x(Ψ0 )) . M(𝐫0 |Ψ0 ) = − j=1

Form a new metric M ∗ (𝐫00 |Ψ10 ) by multiplying M(𝐫00 |Ψ10 ) by 2 and subtracting off the sum of the constants, n ∑ j M ∗ (𝐫00 |Ψ10 ) = 2M(𝐫00 |Ψ10 ) − Cr,0 − cu,0 . j=1

Consider the jth term of the likelihoods in this expression, 2f (𝐫0,j |𝐚(Ψ10 )j ) − log f (𝐫0,j |̃𝐜0,j = +1) + log f (𝐫0,j |̃𝐜0,j = −1). There are two cases to consider, when 𝐜̃(Ψ10 )j = +1 and 𝐜̃(Ψ10 )j = −1. In the first case, we find j

2f (𝐫0,j |𝐚(Ψ10 )j ) − Cr,0 = log f (𝐫0,j |̃𝐜(Ψ10 ) = +1) − log f (𝐫0,j |̃𝐜0,j = −1) = log

f (𝐫0,j |̃𝐜0,j = +1) f (𝐫0,j |̃𝐜0,j = −1)

= 𝐜̃(Ψ10 )j log

f (𝐫0,j |̃𝐜0,j = +1) f (𝐫0,j |̃𝐜0,j = −1)

.

In the case that 𝐜̃(Ψ10 )j = −1, j

2f (𝐫0,j |𝐚(Ψ10 )j ) − Cr,0 = log f (𝐫0,j |̃𝐜(Ψ10 ) = −1) − log f (𝐫0,j |̃𝐜0,j = +1) = − log

f (𝐫0,j |̃𝐜0,j = +1) f (𝐫0,j |̃𝐜0,j = −1)

In light of this, define Lr (𝐫t,j ) = log

= 𝐜̃(Ψ10 )j log

f (𝐫t,j |̃𝐜t,j = +1) f (𝐫t,j |̃𝐜t,j = −1)

f (𝐫0,j |̃𝐜0,j = +1) f (𝐫0,j |̃𝐜0,j = −1)

.

Then j

2f (𝐫0,j |𝐚(Ψ10 )j − Cr,0 = 𝐜̃(Ψ10 )j Lr (𝐫0,j ). Similarly, the term involving the prior probabilities, 2 log P(x(Ψ10 ))

− cu,0 = log

P(̃x(Ψ10 ) = +1) P(̃x(Ψ10 )

where Lx (̃x) =

= −1)

= x̃ (Ψ10 )Lx (̃x(Ψ10 )),

P(̃x = +1) . P(̃x = −1)

The modified metric at t = 0 can thus be expressed as [ n ] ∑ M ∗ (𝐫00 |𝐚(Ψ10 )) = − 𝐜̃(Ψ10 )j Lr (𝐫0,j ) + x̃ (Ψ10 )Lx (̃x(Ψ10 )) . j=1

.

528

Convolutional Codes Extending this forward in time, we find M



(𝐫0t |𝐚(Ψt+1 )) 0

=M



(𝐫0t−1 |𝐚(Ψt0 ))



[ n ∑

] 𝐜̃(Ψt+1 t )j Lr (𝐫t,j )

+

x̃ (Ψt+1 x(Ψt+1 t )Lx (̃ t ))

.

j=1

This metric, expressed now in terms of log-likelihood ratios instead of likelihoods, can be used in regular Viterbi algorithm. If it is assumed that each Lx (̃x) = 0 (equal prior probabilities), then this will give detection performance identical to the Viterbi algorithm using the previously defined metric. As an example of the computation of Lr (𝐫t,j ), for an AWGN channel, Lr (r) = log

exp(− 2𝜎1 2 (r − a)2 ) exp(− 2𝜎1 2 (r + a)2 )

=

2a r. 𝜎2



The constant Lc = 2a∕𝜎 2 is referred to as the channel reliability factor. Using this, the metric can be expressed as [ n ] ∑ t+1 t t+1 t+1 t+1 ∗ t ∗ t−1 𝐜̃(Ψt )Lc 𝐫t,j + x̃ (Ψt )Lx (̃x(Ψt )) . M (𝐫0 |𝐚(Ψ0 )) = M (𝐫0 |𝐚(Ψ0 )) − j=1

As above, the posterior probability can be expressed as ( ) 1 t+1 t t )|𝐫 ) ∝ exp − |Ψ ) . P(𝐚(Ψt+1 M(𝐫 0 0 0 0 2

12.11 Soft Output Viterbi Algorithm (SOVA) The SOVA is an alternative to the BCJR algorithm (introduced in Section 14.3.1) which accepts prior input probabilities and produces a sequence of estimated bits along with reliability information about them. It can be implemented so that it has lower computational complexity than the BCJR algorithm. The key to SOVA is to keep track of the reliability of bits along the paths through the trellis as the t Viterbi algorithm proceeds. Consider two paths through the trellis, Ψt0 and Ψ′ 0 , where Ψt = Ψ′t = p, that is, both paths terminate at the same state p, with their corresponding metrics M ∗ (𝐫0t−1 Ψt0 ) and t t M ∗ (𝐫0t−1 Ψ′ 0 ). The putative sequences of information bits along these paths are x̂ (Ψt0 ) and x̂ (Ψ′ 0 ), which ′ ′ ′ we will denote as (̂x0 , x̂ 1 , . . . , x̂ t−1 ) and (̂x0 , x̂ 1 , . . . , x̂ t−1 ). The Viterbi algorithm will select one of t these two paths, depending on which of M ∗ (𝐫0t−1 |Ψt0 ) or M ∗ (𝐫0t−1 |Ψ′ 0 ) is smaller. Of the two paths, t let Ψt0 denote the correct path. Then if M ∗ (𝐫0t−1 |Ψt0 ) < M ∗ (𝐫0t−1 |Ψ′ 0 ), the Viterbi algorithm selects the correct path. Example 12.43 In the Viterbi decoding Example 12.23, at time step t = 3, the first time step when paths converge and decisions have to be made, the two paths leading to state 2 have state sequences Ψ30 = (0, 1, 3, 2) 3 and Ψ′ 0 = (0, 0, 1, 2) (shown in solid lines in figure 12.32) with respective path metrics M(𝐫02 |Ψ30 ) = 1 with 3 3 2 M(𝐫0 |Ψ′ 0 ) = 4, and corresponding bit sequences x̂ (Ψ30 ) = (1, 1, 0) and x̂ (Ψ′ 0 ) = (0, 1, 0). These two sequences agree on the last 2 bits (no information on these paths to distinguish between them), but they disagree on the first bit. ◽

At times j such that x̂ j = x̂ j′ , as far as either path is able to determine, this must be the value of the bit, xj = x̂ j . On the other hand, at times j such that x̂ j ≠ x̂ j′ , the presumption is that x̂ j is a better estimate t of xj than x̂ j′ . SOVA is based on the idea that the larger the value of M ∗ (𝐫0t−1 |Ψ′ 0 ) − M ∗ (𝐫0t−1 |Ψt0 ), the more reliable the estimate. Define the metric difference between the paths as Δt−1 (p) =

1 ∗ t−1 ′ t (M (r0 |Ψ 0 ) − M ∗ (r0t−1 |Ψt0 )) 2

or

t

Δt−1 (p) = (M(r0t−1 |Ψ′ 0 ) − M(r0t−1 |Ψt0 ))

12.11 Soft Output Viterbi Algorithm (SOVA)

529 r2 = 00

0L 2 (xˆ

0

1

3

;2 )=

3

[000] [100]

4 3

5 2

[001] [101]

2

4 1

[010] [110]

0 xˆ 2 ; 2 1 (

L2

[011] [111]

t=2

t=3

3

(xˆ 1 L2 )= ;2

2



∞ )= 4

3 t=0

t=1

Figure 12.32: Computations of SOVA at t = 3 (see Example 12.23). t

Between the two paths Ψt0 and Ψ′ 0 (with Ψt = Ψ′t = p) where Ψt0 is the correct path, the probability choosing the correct path is proportional to exp(− 12 M ∗ (𝐫0t−1 |Ψt0 )). More explicitly, ( ) exp − 12 M ∗ (𝐫0t−1 |Ψt0 ) P(choose correct path to state p) = (12.50) ( ) ( ) t exp − 12 M ∗ (𝐫0t−1 |Ψt0 ) + exp − 12 M ∗ (𝐫0t−1 |Ψ′ 0 ) ) ( t exp 12 M ∗ (𝐫0t−1 |Ψ′ 0 ) × ( ) t exp 12 M ∗ (𝐫0t−1 |Ψ′ 0 ) =

exp(Δt−1 (p)) 1 + exp(Δt−1 (p))



= 𝜋t−1 (p).

(12.51)

Similarly, P(choose incorrect path to state p) = 1 − 𝜋t−1 (p) =

1 . 1 + exp(Δt−1 (p))

The subscript on 𝜋t−1 (p) indicates that this probability affects the bits x0 , x1 , . . . , xt−1 . In the SOVA algorithm, at each time step of the Viterbi algorithm a reliability probability pt−1 (xi ; p) is assigned to each bit xi along the survivor path, i ∈ Ψt0 , for the survivor path to each state p. The reliability probability is based upon (12.51) for bits that differ. These reliabilities are determined initially at the time that the Viterbi algorithm begins making decisions, when paths first merge. Let the memory of the convolutional encoder be 𝜈. As observed in Example 12.23, the first decision in the Viterbi algorithm results from the paths extending time to t = 𝜈 + 1. For the sequence of estimated bits x0 , x̂ 1 , . . . , x̂ m ) to state p, the initial reliability probability Preliability is on the surviving path x̂ (Ψm 0 ) = (̂ defined as follows: { 1 if x̂ i = x̂ i′ , i = 0, 1, . . . , 𝜈. p𝜈 (̂xi ; p) = exp(Δ𝜈 (p)) if x̂ i ≠ x̂ i′ 1+exp(Δ (p)) 𝜈

For the entire sequence of bits on the surviving path to state p there is a reliability probability vector 𝐩𝜈 (p) = (p𝜈 (̂x0 ; p), p𝜈 (̂x1 ; p), . . . , p𝜈 (̂x𝜈 ; p)),

for each state p = 0, 1, . . . , 2𝜈 .

It will also be convenient to express this in terms of log probability ratios. Let Lt−1 (̂xi ; p) = log

pt−1 (̂xi ) , i = 0, 1, . . . , t − 1. 1 − pt−1 (̂xi ; p)

530

Convolutional Codes This can be turned around with exp(Lt−1 (̂xi ; p) pt−1 (̂xi ; p) = 1 + exp(Lt−1 (̂xi ; p))

1 − pt−1 (̂xi ; p) =

1 . 1 + exp(Lt−1 (̂xi ; p))

(12.52)

Initially, that is, at t = 𝜈 + 1,

{ Δ𝜈 (p) L𝜈 (̂xi ; p) = ∞

if x̂ i ≠ x̂ i′ if x̂ i = x̂ i′ ,

i = 0, 1, . . . , m.

There reliability (likelihood) vector for the sequence of bits on the path to state p, 𝐋𝜈 (p) = (L𝜈 (̂x0 ; p), L𝜈 (̂x1 ; p), . . . , L𝜈 (̂x𝜈 ; p)). Example 12.44

Return to t = 2 in Example 12.43, 3

Δ2 = M(𝐫02 |Ψ′ 0 ) − M(𝐫02 |Ψ30 ) = 3. For the paths to state p = 2, the reliability probability is p2 (2) =

e3 = 0.9526. 1 + e3

The reliability probability vector along the surviving path is 𝐩2 (2) = (0.9526, 1, 1), and the corresponding reliability vector is 𝐋2 (2) = (3, ∞, ∞). In the actual computation of the SOVA, similar computations are done for the paths to each state.



For later times, t > m + 1, the reliability is determined by combining the previous reliability for a bit with the new reliability stemming from the new decision. Let Ψt0 be the surviving path to state q at time t, and let the previous state be Ψt−1 = p.



For bits x̂ i on this surviving path with x̂ i ≠ x̂ i′ , the updated reliability probability is the probability the correct path is chosen and the current reliability, or that the incorrect path is chosen, and the complement of the current reliability: pt−1 (̂xi ; q) = pt−2 (̂xi ; p)𝜋t−1 (q) + (1 − pt−2 (̂xi ; p))(1 − 𝜋t−1 (q)) = 0, 1, . . . , t − 2 such that x̂ i ≠ x̂ i′ ,

q = 0, 1, . . . , 2𝜈 − 1.

Using (12.52) at time t − 2, this becomes pt−1 (̂xi ; q) =

exp(Lt−2 (̂xi ; p) + Δt−1 (q)) + 1 1 + exp(Δt−1 (q)) + exp(Lt−2 (̂xi ; p) + exp(Δt−1 (q) + Δt−1 (q)

so that Lt−1 (̂xi ; q) = log

pt−1 (̂xi ; q) 1 + exp(Lt−2 (̂xi ; p) + Δt−1 (q)) = log , 1 − pt−1 (̂xi ; q) exp(Δt−1 (q)) + exp(Lt−2 (̂xi ; p))

i = 0, 1, . . . , t − 2 such that x̂ i ≠ x̂ i′ ,

q = 0, 1, . . . , 2𝜈 − 1.

This function can be approximated as Lt−1 (̂xi ; q) = min(Lt−2 (̂xi ; p), Δt−1 (q)).

12.11 Soft Output Viterbi Algorithm (SOVA)

531 r3 = 10 3 2

2

)= ,0 2

(.xˆ 2

3

L3

t=0

t=1

t=2

)

,0

=∞

1

4

[0001] [1101]

4

[1010] [1110]

2

[1011] [1111] t=4

2

L

(xˆ 1 L3

2

,0 )=

)=



,0

1

[0000] [1100]

4 2

3 (.xˆ 3

0 L 3 (xˆ 0

1

t=3

1 3

Figure 12.33: Computations of SOVA at t = 4.



For bits on the surviving path with x̂ i = x̂ i′ , the reliability remains unchanged: Lt−1 (̂xi ; q) = Lt−2 (̂xi ; p) i = 0, 1, . . . , t − 2 such that x̂ i ≠ x̂ i′ ,

q = 0, 1, . . . , 2𝜈 − 1.

For the reliability of x̂ t−1 (the extension to the latest time), the reliability is as it was for the initial bits: { ′ x̂ t−1 ≠ x̂ t−1 Δt−1 (q) Lt−1 (̂xt−1 , q) = ′ . ∞ x̂ t−1 = x̂ t−1 Example 12.45 Continuing example 12.43, at time step t = 4 the two paths to state q = 0 are Ψ40 = (0, 1, 3, 2, 0) 4 3 and Ψ′ 0 = (0, 0, 0, 0, 0) with respective path metrics M(𝐫02 |Ψ40 ) = 2 with M(𝐫02 |Ψ′ 0 ) = 4, and corresponding bit 4 ′4 ′ sequences x̂ (Ψ0 ) = (1, 1, 0, 0) and x̂ (Ψ 0 ) = (0, 0, 0, 0). These two sequences agree on the last 2 bits, but they disagree on the first bit. See Figure 12.33. 4

Δ3 = M(𝐫03 |Ψ′ 0 ) − M(𝐫03 |Ψ40 ) = 2. For the paths to state p = 2, the reliability probability is p3 (0) =

e2 = 0.881. 1 + e2

The reliability probability vector along the surviving path is 𝐋3 (0) = (min(L2 (̂x0 , 2), Δ3 ), min(L2 (̂x1 , 2), Δ3 ), ∞, ∞) = (min(3, 2), min(3, 2), ∞, ∞) = (2, 2, ∞, ∞).



At each time step t of the Viterbi algorithm, the reliability of each bit along the path to each state must be updated. As t increases, this results in a large computational burden. But recall that as the Viterbi algorithm progresses, early branches in the path merge together (see, e.g., the trellis in Example 12.23, starting at time t = 4). As discussed with regard to running the VA on a window, taking a window length of approximately five constraint lengths is sufficient that with high probability paths will have merged, and it is not necessary to update the reliability of the bits earlier than this length. Let W denote the number of bits in a window that are used in the SOVA. Also recall that for feedforward encoders with memory 𝜈, the last 𝜈 bits in each putative bit sequence are the same, so it is not necessary to do an update computation over those bits. Regardless, let 𝜈 ′ be the number of bits in putative bit sequences that agree in the range t − 𝜈 ′ to t. In the SOVA algorithm, the following data structures are employed:

532

Convolutional Codes

• • •

A path metric M to each state at each time. For each state p, the sequence of input bits x̂ 0t (p) on the survivor path that state. (For a windowed t . Where t − W < 0, this is intended to denote Viterbi algorithm, the sequence of bits goes is xt−W t x0 .) For each state p, the reliability sequence 𝐋(p) of the bits on the survivor path to state p. For a windowed Viterbi algorithm, the sequence of reliability values is for the bits in the window. 𝐋(p) = (L(̂xt−W , q), L(̂xt−W+1 , q), . . . , L(̂xt , q)). In the algorithm below the reliability L(̂xt , q) is denoted as Lt (q).

Algorithm 12.3 The Soft Output Viterbi Algorithm

Input: A sequence 𝐫0 , 𝐫1 , … , 𝐫L−1 Output: The sequence x̂ 0 , x̂ 1 , … , x̂ L−1 which maximizes the likelihood f (𝐫0L−1 |̂x0L−1 ), and a reliability measure of these bit estimates. 3 Initialize: Set M(0) = 0 and M(p) = ∞ for p = 1, 2, … , 2𝜈 − 1 (initial path costs) 4 Set x̂ 0−1 (p) = ∅ for p = 0, 1, … , 2𝜈 − 1 5 Set t = 0 6 Begin 7 For each state q at time t + 1 8 Find the path metric for each path to state q: 9 For each pi connected to state q corresponding to input 𝐱̂ (pi ,q) , compute mi = M(pi ) + 𝜇t (𝐫t , 𝐱̂ (pi ,q) )) 10 Select the smallest metric: imin = arg mini mi and the largest metric imax = arg maxi mi . 11 Set M(q) = mimin . 12 Δ = mimax − mimin 13 Extend the sequence of input bits on the survivor path and the other path to state q t t−1 x̂ t−W (q) = (̂xt−W (pimin ), x(pimin ,q) ) (survivor path) ″t t−1 x̂ t−W (q) = (̂xt−W (pimax ), x(pimax ,q) ) (other path) 14 Let Lt (q) = +∞ 15 For j = t − W to t − 𝜈 ″ (Loop over the bit times in the window) 16 If(̂x(j) ≠ x̂ ″ (j)) (Update the reliability) 17 Lj (q) = min(Lj (pimin ), Δ) 18 End 19 end (for) 20 end (for) 21 t = t + 1 20 if t < L − 1, goto line 6. 21 Termination: 22 If terminating in a known state (e.g. 0) t 23 Return the sequences of inputs x̂ t−W (0) along the path to that known state t and the reliability vector to that state Lt−W (q) 24 If terminating in any state 25 Find final state with minimal metric 26 Return the sequence of inputs along that path to that state and the reliability vector to that state. 27 End 1 2

12.12 Trellis Representations of Block and Cyclic Codes

533

12.12 Trellis Representations of Block and Cyclic Codes In this section, we take a dual perspective to that of the previous section: we describe how linear block codes can be represented in terms of a trellis. Besides theoretical insight, the trellis representation can also be used to provide a means of soft-decision decoding that does not depend upon any particular algebraic structure of the code. These decoding algorithms can make block codes “more competitive with convolutional codes” [[273], p. 3]. 12.12.1

Block Codes

We demonstrate the trellis idea with a (7, 4, 3) binary Hamming code. Let ⎡1 1 1 0 1 0 0⎤ [ ] H = ⎢1 1 0 1 0 1 0⎥ = 𝐡1 𝐡2 𝐡3 𝐡4 𝐡5 𝐡6 𝐡7 ⎢ ⎥ ⎣1 0 1 1 0 0 1⎦

(12.53)

be the parity check matrix for the code. Then a column vector 𝐱 is a codeword if and only if 𝐬 = H𝐱 = 0; that is, the syndrome 𝐬 must satisfy ⎡x1 ⎤ ⎢x2 ⎥ ⎢ ⎥ 7 [ ] ⎢x3 ⎥ ∑ 𝐬 = 𝐡1 𝐡2 𝐡3 𝐡4 𝐡5 𝐡6 𝐡7 ⎢x4 ⎥ = 𝐡i xi = 0. ⎢x5 ⎥ i=1 ⎢x ⎥ ⎢ 6⎥ ⎣x7 ⎦ We define the partial syndrome by 𝐬r+1 =

r ∑

𝐡i xi = 𝐬r + 𝐡r xr ,

i=1

with 𝐬1 = 0. Then 𝐬n+1 = 𝐬. A trellis representation of a code is obtained by using 𝐬r as the state, with an edge between a state 𝐬r and 𝐬r+1 if 𝐬r+1 = 𝐬r (corresponding to xr = 0) or if 𝐬r+1 = 𝐬r + 𝐡r (corresponding to xr = 1). Furthermore, the trellis is terminated at the state 𝐬n+1 = 0, corresponding to the fact that a valid codeword has a syndrome of zero. The trellis has at most 2n−k states in it. Figure 12.34 shows the trellis for the parity check matrix of (12.53). Horizontal transitions correspond to xi = 0 and diagonal transitions correspond to xi = 1. Only those paths which end up at 𝐬8 = 0 are retained. As may be observed, the trellis for a block code is “time-varying” — it has different connections for each section of the trellis. The number of states “active” at each section of the trellis also varies. For a general code, the trellis structure is sufficiently complicated that it may be difficult to efficiently represent in hardware. There has been recent work, however, on families of codes whose trellises have a much more regular structure. These are frequently obtained by recursive constructions (e.g., based on Reed–Muller codes). Interested readers can consult [273]. 12.12.2

Cyclic Codes

An alternative formulation of a trellis is available for a cyclic code. Recall that a cyclic code can be encoded using a linear feedback shift register as a syndrome computer. The sequence of possible states in this encoder determines a trellis structure which can be used for decoding. We demonstrate the idea again using a (7, 4, 3) Hamming decoder, this time represented as a cyclic code with generator polynomial g(x) = x3 + x + 1.

534

Convolutional Codes h1 = [111]

h2 = [110]

h3 = [101]

h4 = [011]

h5 = [100]

h6 = [010]

h7 = [001]

000 001 010 011 100 101 110 111

s1

s2

s3

s4

s5

s6

s7

s8

Figure 12.34: The trellis of a (7, 4) Hamming code. Figure 12.35 shows a systematic encoder. For the first k = 4 clock instants, switch 1 is closed (enabling feedback) and switch 2 is in position ‘a’. After the systematic part of the data has been clocked through, switch 1 is opened and switch 2 is moved to position ‘b’. The state contents then shift out as the coefficients of the remainder polynomial. Figure 12.36 shows the trellis associated with this encoder. For the first k = 4 bits, the trellis state depends upon the input bit. The coded output bit is equal to the input bit. For the last n − k = 3 bits, the next state is determined simply by shifting the current state. There are no input bits so the output is equal to the bit that is shifted out of the registers. Switch 1

D

+

D

D

+

b Switch 2

c(x)

m (x) a

Figure 12.35: A systematic encoder for a (7, 4, 3) Hamming code.

12.12.3

Trellis Decoding of Block Codes

Once a trellis for a code is established by either of the methods described above, the code can be decoded with a Viterbi algorithm. The time-varying structure of the trellis makes the indexing in the Viterbi algorithm perhaps somewhat more complicated, but the principles are the same. For example, if BPSK modulation is employed, so that the transmitted symbols are ai = 2ci − 1 ∈ {±1}, and that the channel is AWGN, the branch metric for a path taken with input xi is (ri − (2xi − 1))2 . Such soft-decision decoding can be shown to provide up to 2 dB of gain compared to hard decision decoding (see, for example, [[397], pp. 222–223]). However, this improvement does not come without a cost: for codes of any appreciable size, the number of states 2n−k can be so large that trellis-based decoding is infeasible.

12.12 Trellis Representations of Block and Cyclic Codes

535

State 000 001 010 011 100 101 Output bit = 1

110

Output bit = 0

111

Shift message in (Input bit = output bit)

Shift parity out (No input bits)

Figure 12.36: A trellis for a cyclically encoded (7,4,3) Hamming code.

Programming Laboratory 9: Programming Convolutional Encoders

Objective In this lab, you are to create a program structure to implement both polynomial and systematic rational convolutional encoders. Background Reading: Sections 12.1 and 12.2. Since both polynomial and systematic rational encoders are “convolutional encoders,” they share many attributes. Furthermore, when we get to the decoding operations, it is convenient to employ one decoder which operates on data from either kind of encoder. As a result, it is structurally convenient to create a base class BinConv, then create two

derived classes, BinConvFIR and BinConvIIR. Since the details of the encoding operation and the way the state is determined differ, each of these classes employs its own encoder function. To achieve this, a virtual function encode is declared in the base class, which is then realized separately in each derived class.6 Also, virtual member functions getstate and setstate are used for reading and setting the state of the encoder. These can be used for testing purposes; they are also used to build information tables that the decoder uses. The declaration for the BinConv.h base class is shown here.

α α2 α3 α4 α 5 α6 α7 α8 α9 +

01010

00000

Algorithm 12.4 Base Class for Binary Convolutional Encoder File: BinConv.h

α 10 α11 α12α 13α 14 α 15 α 16

6 This is a tradeoff between flexibility and speed. In operation, the virtual functions are called via a pointer, so there is a pointer-lookup overhead associated with them. This also means that virtually called functions cannot be inline, even if they are very small. However, most of the computational complexity associated with these codes is associated with the decoding operation, which takes advantage of precomputed operations. So for our purposes, the virtual function overhead is not too significant.

536

Convolutional Codes

The derived classes BinConvFIR and BinConvIIR are outlined here. α α2 α3 α4 α 5 α6 α7 α8 α9 +

01010

00000

α 10 α11 α12α 13α 14 α 15 α 16

Algorithm 12.5 Derived classes for FIR and IIR Encoders File: BinConvFIR.h BinConvIIR.h BinConvFIR.cc BinConvIIR.cc

verify that the impulse response is correct, that the getstate and nextstate functions work as expected, and that the state/nextstate table is correct. Use Figure 12.5. The program testconvenc.cc may be helpful. α α2 α3 α4 α 5 α6 α7 α8 α9 +

01010

00000

Programming Part

Algorithm 12.6 Test program for convolutional encoders File: testconvenc.cc

α 10 α11 α12α 13α 14 α 15 α 16

1. Write a class BinConvFIR that implements convolutional encoding for a general polynomial encoder. That is, the generator matrix is of the form in (12.1), where each g(i,j) (x) is a polynomial. The class should have an appropriate constructor and destructor. The class should implement the virtual functions encode, getstate, and setstate, as outlined above.

(b) The polynomial transfer function [ 2 ] x + x + 1 x2 1 + x G(x) = x 1 0

(12.54)

has the state diagram and trellis shown in Figure 12.37. Verify that for this encoder, the impulse response is correct, that the

0/000 1/101 0 2/010 3/111 0/101 1/000 1 2/111 3/010

0

1

2

4

3

0/110 1/011 2 2/100 3/001

5

6

7

0/011 1/110 3 2/001 3/100 0/100 1/001 4 2/110 3/011 0/001 1/100 5 2/011 3/110 0/010 1/111 6 2/000 3/101 0/111 1/010 7 2/101 3/000

(a)

(b)

Figure 12.37 (a) State diagram and (b) trellis for the encoder in (12.54). getstate and nextstate functions work as expected, and that the state/nextstate table is correct. Test your encoder as follows: (a) Using the encoder with transfer function ] [ G(x) = 1 + x2 1 + x + x2 ,

2. Write a class BinConvIIR that implements a systematic encoder (possibly employing IIR filters). The generator matrix is of the form

12.12 Trellis Representations of Block and Cyclic Codes ⎡ ⎢ ⎢ G(x) = ⎢ I ⎢ ⎢ ⎣

537

⎤ ⎥ ⎥ ⎥ ⋮ ⎥ pk (x) ⎥ qk (x) ⎦

for polynomials pi (x) and qi (x). Test your class using the recursive systematic encoder of (12.2), checking as for the first case. (You may find it convenient to find the samples of the impulse by long division.)

p1 (x) q1 (x) p2 (x) q2 (x)

Programming Laboratory 10: Convolutional Decoders: The Viterbi Algorithm



showpaths — You may find it helpful while debug-



getinpnow — This function decodes the last available branch in the set of paths stored, based on the best most recent metric. If adv is asserted, the pointer to the end of the branches is incremented. This can be used for dumping out the decisions when the end of the input stream is reached.



buildprev builds the state/previous state array,

Objective You are to write a convolutional decoder class that decodes using both hard and soft metrics with the Viterbi algorithm. Background Reading: Section 12.3. While there are a variety of ways that the Viterbi algorithm can be structured in C++, we recommend using a base class Convdec.h that implements that actual Viterbi algorithm and using a virtual function metric to compute the metric. This is used by derived classes BinConvdec01 (for binary 0–1 data) and BinConvBPSK (for BPSK modulated data), where each derived class has its own metric function.

ging to dump out information about the paths. This function (which you write) should do this for you.

which indicates the connections between states of the trellis.

The derived class BinConvdec01 The first derived class is BinConvdec01.h, for binary 0–1 decoding using the Hamming distance as the branch metric. α α2 α3 α4 α 5 α6 α7 α8 α9 +

The base class Convdec This class is a base class for all of the Viterbi-decoded objects. α α2 α3 α4 α 5 α6 α7 α8 α9 +

01010

00000

α 10 α11 α12α 13α 14 α 15 α 16

Algorithm 12.7 The Base Decoder Class

Declarations File: Convdec.h Convdec.cc

In this class, an object of type BinConv (which could be either an FIR or IIR convolutional encoder, if you have used the class specification in Lab 9) is passed in. The constructor builds appropriate data arrays for the Viterbi algorithm, placing them in the variables prevstate and inputfrom. A virtual member function metric is used by derived classes to compute the branch metric. The core of the algorithm is used in the member function viterbi, which is called by the derived classes. Some other functions are declared:

01010

00000

α 10 α11 α12α 13α 14 α 15 α 16

Algorithm 12.8 Convolutional

decoder for binary (0,1) data File: BinConvdec01.h BinConvdec01.cc

This class provides member data outputmat, which can be used for direct lookup of the output array given the state and the input. Since the output is, in general, a vector quantity, this is a three-dimensional array. It is recommended that space be allocated using CALLOCTENSOR defined in matalloc.h. The member variable data is used by the metric function, as shown. The class description is complete as shown here, except for the function buildoutputmat, which is part of the programming assignment. The derived class BinConvdecBPSK The next derived class is BinConvdecBPSK.h, for decoding BPSK-modulated convolutionally coded data using the Euclidean distance as the branch metric.

538

Convolutional Codes

α α2 α3 α4 α 5 α6 α7 α8 α9 +

01010

00000

α 10 α11 α12α 13α 14 α 15 α 16

Algorithm 12.9 Convolutional

decoder for BPSK data File: BinConvdecBPSK.h BinConvdecBPSK.cc

As for the other derived class, space is provided for outputmat and data; the class is complete as presented here except for the function buildoutputmat.

Programming Part 1. Finish the functions in Convdec.cc. 2. Test the binary (0,1) BinConvdec01 decoder for the encoder G(x) = [1 + x2 , 1 + x + x2 ] by reproducing the results in Example 12.23. The program testconvdec can help. α α2 α3 α4 α 5 α6 α7 α8 α9 +

01010

00000

Algorithm 12.10 Test the convolutional decoder File: testconvdec.cc

α 10 α11 α12α 13α 14 α 15 α 16

3. Test the convolutional decoder BinConvdecBPSK by modulating the {0, 1} data. Again, testconvdec can help. 4. [Determine the performance of the encoder G(x) = ] 1 + x2 1 + x + x2 by producing an error curve on the AWGN channel using BPSK modulation with a soft metric. Compare the soft metric performance with hard metric performance, where the BSC is √ modeled as having crossover probability pc = Q( 2Ec ∕N0 ). Compare the two kinds of coded performances with uncoded BPSK modulation. Also, plot the bound (12.41) and the approximation (12.43) on the same graph. How much coding gain is achieved? How do the simulation results compare with the theoretical bounds/approximations? How do the simulations compare with the theoretically predicted asymptotic coding gain? How much better is the soft-decoding than the hard-decoding? 5. Repeat the testing, but use the catastrophic code with encoder G(x) = [1 + x, 1 + x2 ]. How do the results for the noncatastrophic encoder compare with the results for the catastrophic encoder?

12.13 Exercises 12.1

For the R = 1∕2 convolutional encoder with [ ] G(x) = 1 + x2 + x3 1 + x + x3

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)

(12.55)

Draw a hardware realization of the encoder. Determine the convolutional generator matrix G. For the input sequence 𝐦 = [1, 0, 1, 1, 0, 1, 1] determine the coded output sequence. Draw the state diagram. Label the branches of the state diagram with input/output values. Draw the trellis. What is the constraint length of the code? Determine the State/Next State table. Determine the State/Previous State table. Is this a catastrophic realization? Justify your answer. Determine the weight enumerator T(D, N). What is dfree ? Determine upper and lower bounds on Pb for a BSC using (12.37) and (12.38) and an approximation using (12.31). Plot as a function of the signal-to-noise ratio, where √ pc = Q( 2Ec ∕N0 ). Compare the bounds to uncoded performance.

12.13 Exercises

539

(m) Determine upper and lower bounds on Pb for an AWGN channel using (12.41) and (12.42) and plot as a function of the signal-to-noise ratio. (n) Determine the theoretical asymptotic coding gain for the BSC and AWGN channels. Compare with the results from the plots. Also, comment on the difference (in dB) between the hard and soft metrics. (o) Express G(x) as a pair of octal numbers using both leading 0 and trailing 0 conventions. (p) Suppose the output of a BSC is 𝐫 = [11, 11, 00, 01, 00, 00, 10, 10, 10, 11]. Draw the trellis for the Viterbi decoder and indicate the maximum likelihood path through the trellis. Determine the maximum likelihood estimate of the transmitted codeword and the message bits. According to this estimate, how many bits of 𝐫 are in error? 12.2

For the R = 1∕3 convolutional coder with [ ] G(x) = 1 + x 1 + x2 1 + x + x2

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)

12.3

12.4 12.5

12.6

12.7

Draw a hardware realization of the encoder. Determine the convolutional generator matrix G. For the input sequence 𝐦 = [1, 0, 1, 1, 0, 1, 1] determine the coded output sequence. Draw the state diagram. Label the branches of the state diagram with input/output values. Draw the trellis. What is the constraint length of the code? Determine the State/Next State table. Determine the State/Previous State table. Is this a catastrophic realization? Justify your answer. Determine the weight enumerator T(D, N). What is dfree ? Express G(x) as a triplet of octal numbers. [ ] Find a catastrophic encoder equivalent to G(x) = 1 + x2 1 + x + x2 and determine an infinite-weight message m(x) that results in a finite-weight codeword for this catastrophic encoder. Show that G4 (x) defined in (12.8) is equivalent to G2 (x) of (12.5). Let G(x) be the transfer function matrix of a basic convolutional code. Show that G(x) is equivalent to a basic transfer function matrix G′ (x) if and only if G′ (x) = T(x)G(x), where T(x) is a polynomial unimodular matrix. Determine a polynomial systematic encoder transfer function matrix equivalent to [ ] 1+x x 1 G(x) = . x2 1 1 + x + x2 For the transfer function matrices [ ] 1+x x 1 G(x) = x2 1 1 + x + x2

[ and

G′ (x) =

] 1+x x 1 1 + x2 + x3 1 + x + x2 + x3 0

(a) Show that G(x) is equivalent to G′ (x). (b) Show that G(x) is a minimal basic encoder matrix. (c) Show that G′ (x) is not a minimal basic encoder matrix.

540

Convolutional Codes (d) Using the procedure described in association with (12.13), determine a transfer function matrix G′′ (x) which is a minimal basic encoding matrix equivalent to G′ (x), but different from G(x). 12.8

For the code generated by

[ G(x) =

12.9

1 x 1 1+x+x2 1+x3 1+x3 x2 1 1 1+x3 1+x3 1+x3

] ,

use elementary row operations to convert the generator matrix to systematic form. Draw a circuit realization of the systematic encoder. Catastrophic codes. (a) For a rate R = 1∕2 code, let g1 (x) = 1 + x, g2 (x) = 1 + x2 . Show that when m(x) = the transmitted sequence has finite weight. Determine GCD(g1 (x), g2 (x)). (b) Motivated by this result, prove the following: For a rate 1∕n code, if

1 1+x

GCD[g1 (x), g2 (x), . . . , gn (x)] = 1, then the code is noncatastrophic. 12.10 For the catastrophic code with generators g1 (x) = 1 + x, g2 (x) = 1 + x2 : (a) (b) (c) (d)

Draw the state diagram. Determine the weight enumerator T(D, N) for the code. What is the minimum free distance of the code? How is the catastrophic nature of the code evidenced in the weight enumerator? [ ] 12.11 For the generator G(x) = 1 + x2 , 1 + x + x2 + x3 : (a) Find the GCD of the generator polynomials. (b) Find an infinite-weight message sequence that generates a codeword of finite weight. 12.12 Prove that dfree is independent of the encoder realization, so that it is a property of the code and not of a particular encoder for the code. 12.13 Show that the formal series expansion 1 = 1 + x + x2 + x3 + · · · 1−x is correct. Show the formal series expansions of 1 1 − 2D

and

D5 L3 N . 1 − DLN(1 + L)

12.14 Show that the expressions for can be by Pd < [4pc (1 − pc )]d∕2 . ( P)d in (12.20) and (12.21) ( bounded ) ∑ ∑d d∕2 Hint: Show that i=(d+1)∕2 di pic (1 − pc )d−i < di=(d+1)∕2 di pc (1 − pc )d∕2 . √ 12.15 For a BSC where Z = 4pc (1 − pc ), show that (12.28) can be replaced by Pd < Z d+1 when d is odd. Using this result, show that (12.29) can be replaced by Pe (j)
500 8 64 6.02 316 8 128 6.02 124 8 8 5.27 764 8 16 5.27 316 8 32 5.27 124 8 64 5.27 60

[? ] [? ] [? ] [454] [? ] [? ] [? ] [56] [56] [56] [56]

13.8 Multidimensional TCM Example: The V.34 Modem Standard In this section, we discuss the error-correction coding which is used for the V.34 modem standard. This modem is capable of transmitting up to 33.6 kB/second over the standard telephone system (on some lines). There are many technical aspects to this modem; space permits detailing only those related to the error-correction coding. A survey and pointers to the literature appears in [136]. However, we briefly summarize some of the aspects of the modem:



3

The modem is adaptive in the symbol rate it employs and the size of the constellation. It is capable of sending at symbol rates of 2400, 2743, 2800, 3000, 3200, or 3429 symbols/second. (These peculiar-looking choices are rational multiples of the basic rate of 2400 symbols/second.)

A very clear derivation of this appears in [180].

13.8 Multidimensional TCM Example: The V.34 Modem Standard

579

The symbol rate is selected by a line probing sequence employed during initialization which determines the available bandwidth.



The modem is capable of transmitting variable numbers of bits per symbol. At the highest rate, 8.4 bits/symbols are carried. Rate is established at link initialization, and rate adjustments can occur during data transmission.



Adaptive precoding [113, 139, 191, 450] and decision feedback equalization is also employed to compensate for channel dispersion.



Shaping via shell mapping is employed, which provides modest gains in addition to the coding gains.



Adaptive trellis coding is employed. The main code is a 16-state four-dimensional trellis code (to be described below). This code provides 4.66 dB of gain and is rotationally invariant. However, two other trellis codes are also included: a 32-state four-dimensional code with 4.5 dB of gain and a 64-state four- dimensional code providing 4.7 dB of gain. These codes are not described here.

Given the complexity of the modem, it is a technological marvel that they are so readily affordable and effective! The trellis encoder for the modem is shown in Figure 13.24, with the corresponding trellis diagram in Figure 13.25.

I 7n+1

Z 7n +1

I 6n+1

Z 6n +1

I 5n+1

Z 5n +1

I 4n+1

Z 4n +1

I 3n+1

Z 3n +1

I 2n+1

4D block encoder

I 1n+1

(Table 13.10)

Z 2n +1 Z 3n Z 2n

I 7n

Z 7n

I 6n

Z 6n

I 5n

Z 5n

I 4n I 3n I 2n I 1n

Z 4n Differential encoder

I 3′n

Z 1n +1

I 2′n

W 2n + 2

W 3n+2

2D

+ W 2n

2D

W 4n + 2

W 1n + 2

2D

+

2D

Bit converter

Z 0n +1

(Table13.9)

Z 0n

Y 0n

Z 1n

W 4n

W 3n

Figure 13.24: 16-State trellis encoder for use with V.34 standard [? ].

The bit converter in the encoder supports the rotational invariance and is outlined in Table 13.9. The 4D block converter supports the shell shaping, controlling the selection of “inner” and “outer” constellation points. The operation is detailed below.

580

Trellis-Coded Modulation Current state: W 1n W2n W3n W4n (LSB = W4)

4D subset

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0213 4657 2031 6475 1302 5746 3120 7654 2031 6475 0213 4657 3120 7564 1302 5746

Next state: W 1n+2 W 2n+2 W 3n+2 W 4n+2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure 13.25: Trellis diagram of V.34 encoder [? ]. Table 13.9: Bit Converter: Sublattice Partition of 4D Rectangular Lattice [? ] 4D Sublattice (Subset) 0 1 2 3 4 5 6 7

Y0n

I1n

I2n′

I3n′

4D Types

Z0n

Z1n

Z0n+1

Z1n+1

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(A, A) (B, B) (C, C) (D, D) (A, B) (B, A) (C, D) (D, C) (A, C) (B, D) (C, B) (D, A) (A, D) (B, C) (C, A) (D, B)

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0

0 1 0 1 1 0 1 0 0 1 1 0 1 0 0 1

To transmit 𝜂 information bits per signaling interval using 2N = 4-dimensional modulation, N = 2 signaling intervals are required. For uncoded transmission, 2𝜂N points in the signal constellation are necessary. For coded transmission, 2𝜂N+1 points are necessary. The V.34 standard carries 𝜂 = 7 bits per symbol, so 215 points in the signal constellation are necessary. The V.34 standard does this using

13.8 Multidimensional TCM Example: The V.34 Modem Standard

581

an interesting constellation. The four-dimensional constellation consists of the cross product of the 192-point constellation shown in Figure 13.26. The 192-point constellation contains a 128-point cross constellation in the inner points, plus an additional 64 outer points. The inner points can be used to transmit seven uncoded bits per symbol. The outer points are selected as close to the origin as possible (outside of the inner constellation) to minimize energy. Each of the A, B, C, and D sets has the same number of points. Also, a rotation of an outer point yields another outer point.

c

yn or yn + 1 c b

b

b

a d 101110 c b

14 a d a d 100110 100010 c b c b

a d 101010 c b

a d 100100 c b

a d 011110 c b

10 a d a d 010110 8 010010 c b c b

a d 011010 c b

a d 100000 c b

b

a d 010100 c b

a d 001110 c b

6 a d a d 000110 4 000010 c b c b

a d 001010 c b

a d 010000 c b

a 100101 c b

a d 101100 c 14 b

a d 011100 c 10 b

a d 001100 8 c 6 b

a d 000100 4 c 2 b

a d a d a d 000000 001000 011000 c 2 b 4 c 6 b 8 c 10 b

a d 101000 c 14 b

c

b d c

2

2

c

b

c

a d 101101 b

a d 011101 c b

a d 001101 c b

a d a d 000101 4 000001 c b c b

a d 001001 c b

a d 011011 c b

a d 101001 c

d

a d 010101 c b

a d 001111 c b

6 a d a d 000111 8 000011 c b c b

a d 001011 c b

a d 010001 c b

a 100001

a d 101111

a d 011111 c b

10 a d a d 010111 010011 c b c b

a d 011011 c b

a d 101011

14 a d a d 100111 100011

xn or xn + 1

d0

Number beneath each point: Z 2n+i Z 3n +i Z 4n+i Z 5n+i Z 6n+ i Z 7n+i (i = 0 or 1)

Figure 13.26: The 192-point two-dimensional constellation employed in the V.34 standard. The 215 points in the constellation are obtained by concatenating a pair of 192-point constellations (which would result in a 1922 point constellation, but 1922 > 215 ), excluding those 4D points whose corresponding pair of two-dimensional points are both outer points. There are thus 1922 − 642 = 215 points in this constellation. The inner points are used three-fourths of the time. By using the inner constellation more often, the average power is reduced compared to other (more straightforward) constellations. The average power of the constellation can be shown to be 28.0625d02 , where d02 is the minimum squared Euclidean distance (MSED) of the constellation. The peak power (which is also the peak power of the inner constellation) is 60.5d02 .

582

Trellis-Coded Modulation The partition of the constellation proceeds through a sequence of steps which are illustrated in Figure 13.27.

Λ (2D constellation) 2

A∪B

(MSED = 2d0) 2

(MSED = 4d0)

A

Λ (2D constellation)

C∪D B

C

(MSED = 4d20) (A, A) (A, B) (A, C)

A∪B D

A

2D families

C∪D B

C

D

2D sublattices

(D, D) 4D types

(B, B)

(A, A), (B, B)(C, C), (D, D)(A, B), (B, A)(C, D), (D, C)(A, C), (B, D)(C, B), (D, A)(A, D), (B, C)(C, A), (D, B) 0 1 2 3 4 5 6 7

4D sublattices

2

(MSED = 4d0) 2

(MSED = 2d0)

{0, 1, 2, 3}

{4, 5, 6, 7}

4D families

Figure 13.27: Partition steps for the V.34 signal constellation.



Each constituent two-dimensional rectangular lattice is partitioned into two families A ∪ B and C ∪ D, where the sublattice A is composed of those points in the constellation of Figure 13.26 labeled with the letter ‘a,’ and similarly for B, C, and D. The MSED between these two families is 2d02 .



The two-dimensional families are further partitioned into four sublattices A, B, C, and D, with MSED 4d02 . The sublattices have the property that under a 90∘ counterclockwise rotation, sublattice A rotates to sublattice D. Collectively the sublattices rotate as A → D → B → C → A.

(13.11)



Sixteen four-dimensional types are defined by concatenating all pairs of two-dimensional sublattices. These types are (A, A), (A, B), . . . , (D, D). The MSED between the types is 4d02 , the same as for the sublattices, since two of the same sublattices can be used in a type.



The 16 types are grouped into 8 four-dimensional sublattices, denoted by 0, 1, . . . , 7, as denoted in Figure 13.27 and Table 13.9. The MSED between these sublattices is still 4d02 , which may be verified as follows. The two first constituent two-dimensional sublattices in each four-dimensional sublattice are in A ∪ B or C ∪ D, and likewise for the second two two-dimensional sublattices. Each of these thus have the minimum squared distance of the two-dimensional families, 2d02 . Since there are two independent two-dimensional components, the MSED is 2d02 + 2d02 . It can be verified that the four-dimensional sublattices are invariant under 180∘ rotation. ⋃ ⋃ The eight sublattices are further grouped into two four-dimensional families 3i=0 i and 7i=4 i, with MSED 2d02 .



Combining the trellis of Figure 13.25 with the decomposition of Figure 13.27, the assignments of Table 13.9 satisfy the following requirements:

13.8 Multidimensional TCM Example: The V.34 Modem Standard

d

a

d

000110

b

c

b

d

a

d

d 4 2

000100

4 c

2 b

d

a

d

000101

c

d 001010

c

b

c

a

d

a

000000

b

b

a

000010

b

c 2 4

2

a

001000

b d

4

Number beneath each point: Z 2n+i Z 3n+i Z 4n+i Z 5n+i Z 6n+i Z 7n+i (i = 0 or 1)

a

000001

c

b

Figure 13.28: Orbits of some of the points under rotation: all points in an orbit are assigned the same bit pattern.



The 4D sublattices associated with a transition from a state or to a state are all different, but all ⋃ ⋃ belong to the same family 3i=0 i or 7i=4 i.



The MSED between any two allowed sequences in the trellis is greater than 4d02 . In combination with the first requirement, this means that the free distance of the code is established by the MSED of each 4D sublattice, 4d02 . Compared to uncoded 128-point cross signal constellation with average energy 20.5d02 , the asymptotic coding gain is ( ) 4d02 d02 = 4.66 dB. ∕ 10 log10 28.0625d02 20.5d02



The assignment makes a rotationally invariant code (using just a linear trellis encoder). For a valid transition in the trellis from a state i to a state j, let X be the four-dimensional subset associated with the transition. Let Y be the four-dimensional subset that is obtained by rotating X by 90∘ . Then Y is associated with the valid transition from the state F(i) to the next state F(j) for some function F. For this particular code, the function is: F ∶ W1p W2p W3p W4p → W1p W2p W3p W4p , where the overbar denotes binary complementation. In combination with the fact that the sublattices are invariant with respect to 180∘ rotations, this makes the code rotationally invariant with respect to multiples of 90∘ rotations.

The encoding operation is now summarized. Fourteen bits, representing the information for two symbol periods, are presented to the encoder. These fourteen input bits are denoted by (I1n , . . . , I7n ) and (I1n+1 , . . . , I7n+1 ), where the subscript n denotes bits or symbols associated with even-numbered symbols and n + 1 denotes bits or symbols associated with odd-numbered symbols. We refer to the pair of symbol intervals at time n and time n + 1 as a coding epoch. From these 14 bits, two symbols in the coding epoch are selected according to the following steps:



The encoded bits Y0n , I1n , and I2′n select one of the eight four-dimensional sublattices. Then the nontrellis-encoded information bit I3′n selects one of the two four-dimensional types within the sublattice. This is done in such a way that the system is transparent to phase ambiguities of multiples of 90∘ . To see this, consider a bit pattern Y0n I1n I2′n I3′n and let X denote the associated

583

584

Trellis-Coded Modulation 4D type from Table 13.9. Let S21 S31 , S22 S32 , S23 S33 denote the bit pairs obtained when the bit pair I2′n I3′n is advanced in the circular sequence 00 → 11 → 01 → 10 → 00.

(13.12)

Let X1 , X2 , and X3 denote the types obtained by rotating X counterclockwise by successive multiples of 90∘ . Then the 4D types associated with the bit patterns Y0n I1n S21 S31 , Y0n I1n S22 S32 , and Y0n I1n S23 S33 are X1 , X2 , and X3 , respectively. Example 13.14 Let Y0n I1n I2′n I3′n = 0010. Then (from Table 13.9) the 4D type transmitted is X = (C, C). When this is rotated 90∘ , the type is (see (13.11)) (A, A), corresponding to a transmitted sequence Y0n I1n I2′n I3′n = 0000. Note that the last 2 bits correspond to the succession of (13.12). ◽

To obtain rotational invariance, the bits I2n I3′n are obtained as the output of a differential encoder, just as for the V.32 and V.33 standards presented in Section 13.6. The pair (I3n , I2n ) is converted to a number modulo 4 (I2n is the LSB), and the differential representation (13.6) is employed.



The 4D block encoder serves the purpose of selecting points in the inner and outer constellation, ensuring that the outer constellation is not used for both the symbols in the coding epoch. The 4D block encoder takes the input bits I1n+1 , I2n+1 and I3n+1 and generates two pairs of output bits (Z2n , Z3n ) and (Z2n+1 , Z3n+1 ) according to Table 13.10. The outputs (00) and (01) correspond to points in the inner constellation and the output 10 corresponds to points in the outer constellation. (See the labeling in Figure 13.26.) Each bit pair can be 00, 01, or 10, but they cannot both be 10. Table 13.10: 4D Block Encoder [? ] I1n+1

I2n+1

I3n+1

Z2n

Z3n

Z2n+1

Z3n+1

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 0 0 0 1 1 0 0

0 0 0 1 0 0 1 1

0 0 1 1 0 0 0 0

0 1 0 0 0 1 0 1



There are 16 points in the outer group of a 2D lattice (such as A) or in either half of the inner part of a 2D subset. These sixteen points are indexed by the bits Z4p Z5p Z6p Z7p , where p = n or n + 1.



The bits Z2p Z3p Z4p Z5p Z6p Z7p (p = n or n + 1) select from a set of four points in the signal constellation. To ensure rotational invariance, the four rotations of a point are all assigned the same set of bits. Figure 13.28 shows the “orbits” of some of the points under rotation. The a, b, c, and d points in the orbit are all assigned the same label Z2p Z3p Z4p Z5p Z6p Z7p ; then one of the points in the orbit is selected by Z1p Z0p . Since the bits Z2p Z3p Z4p Z5p Z6p Z7p are rotationally invariant by labeling, and the bits Z1p Z0p are invariant by differential encoding, the overall code is rotationally invariant.

The bits Z0p Z1p . . . Z7p (p = n or n + 1) are used to select two points in the signal constellation, corresponding to the two symbols sent in the coding epoch.

13.9 Exercises

585

13.9 Exercises 13.1 Verify the energy per symbol Es and the energy per bit Eb for BPSK signaling from Table 13.1. Repeat for 16-QAM and 32-cross signaling. 13.2 For each of the following signal constellations, determine a signal partition. Compute the minimum distance between signal points at each level of the tree.

d0 = 1

−7

−5

−3

−1

1

3

5

A0 = 8-QAM 1

7

2

13.3 [483] The simple TCM encoder shown here c1

m0

D

c0

Select subset and point

Signal point

is used with the 8-PAM signal constellation shown in Exercise 13.2. (a) Determine the trellis for the encoder. (b) Determine a signal partitioning scheme which transmits 2 bits/symbol. (c) Determine the squared minimum free distance for the coded system. (d) Compute the asymptotic coding gain in dB for this system compared with an uncoded 4-PAM system. (e) Determine the output transition matrix B(x) for this code and determine the components D(x), P(x), and M(x). 13.4 For the encoder shown here m2

m1

c2 + D

D

c1 c0

Select subset and point

Signal point

employed with an 8-PSK constellation partitioned as in Figure 13.4: (a) Draw the trellis for the convolutional encoder. (b) Draw the trellis for the trellis encoder, labeling the state transitions with the subsets from the constellation. (c) Determine the minimum free distance between paths which deviate from the all-zero path, both for non-parallel paths and for parallel paths. Determine the minimum free distance for the code. Assume Es = 1. (d) Determine the coding gain of the system, compared with 4-PSK transmission. 13.5 The sequence of data (m2,t , m1,t ) consisting of the pairs (0, 0), (1, 0), (1, 1), (1, 1), (0, 0), (0, 1), (0, 1) is applied to the encoder of Figure 13.16.

586

Trellis-Coded Modulation (a) Determine the differentially encoded sequence (m ̃ 2,t , m ̃ 1,t ). Assume that the differential encoder starts with previous input 0. (b) The sequence of inputs (m4,t , m3,t , m2,t , m1,t ) consisting of the 4-tuples (0, 1, 0, 0), (1, 1, 1, 0), (0, 1, 1, 1), (1, 0, 1, 1), (0, 0, 0, 0), (1, 1, 0, 1), (1, 0, 0, 1) is presented to the encoder of Figure 13.16. Determine the sequence of output subsets and plot the corresponding path on the trellis, as in Example 13.6. Assume the encoder starts in state 0. (c) Now take this sequence of output signals and rotate them by 𝜋∕2. Determine the sequence of received signal points. (d) Determine the state sequence decoded at the receiver. Assume the decoder is able to determine that state 7 is the starting state. (e) Determine the sequence of input bits corresponding to this decoded sequence. (f) Run the input bits through a differential decoder and verify that the decoded bits match the original sequence of transmitted bits. 13.6 The signal constellation below has 16 points, with the points on a hexagonal lattice with minimum distance between points equal to 1. Adjust the center location and the lattice points so that the constellation has minimum average signal energy. Compute the average signal energy Es . Compare this average signal energy with a 16-QAM constellation having equal minimum distance. How much energy advantage is there for the hexagonal lattice (in dB)? What practical disadvantages might the hexagonal constellation have?

Programming Laboratory 11: Trellis-Coded Modulation Encoding and Decoding

Programming Part 1) Construct an encoder to implement the trellis code for a four-state trellis with an 8-PSK signal constellation. Verify that it works as expected. 2) Construct a Viterbi decoder for the trellis code. Verify that it works as expected.

Objective In this laboratory, you will create an encoder and decoder for a particular TCM code. Background

3) Make a plot of P(e) as a function of SNR for the code. Compare P(e) with P(e) for uncoded 4-PSK. Plot the theoretical P(e). Is the theoretical coding gain actually achieved?

Reading: Section 13.3.

13.10 References The idea of combining coding and modulation can be traced at least back to 1974 [298]. It was developed into a mature technique by Ungerböck [456–459]. Important theoretical foundations were

13.10 Exercises later laid by Forney [132–134, 140]. Rotationally invariant codes are described in [475, 476]. Additional work in this area appears in [339, 341, 478]. It was also mentioned in [454]. Rotational invariance also appears in [29, 451] Our bound on the performance in Section 13.4 follows [397] very closely. See also [500]. A random coding bound is also presented there. Other analyses of the code performance appear in [40]. A thorough treatment of TCM appears in [272]. Theoretical foundations of coset codes appear in [130, 131]. TCM using lattices is described in [56]. An example of multidimensional TCM using an eight-dimensional lattice is described in [55]. Issues related to packing points in higher dimensional spaces and codes on lattices are addressed in [77–79, 81, 82]. The definitive reference related to sphere packings is [79]. The sphere-packing advantage as applied to data compression (vector quantization) is described in [278]. A trellis code in six dimensions based on the E6 lattice is described in [313]. Lattices also have use in some computer algebra and cryptographic systems; in this context an important problem is finding the shortest vector in the lattice. For discussions and references, see [472, Chapter 16]. A concise but effective summary of lattice coding is presented in [11]. Extensive design results appear in [272, 344, 345].

587

Part IV

Iteratively Decoded Codes

Chapter 14

Turbo Codes “Tell me how you decode and I’ll be able to understand the code.” When you have no particular gift for algebra, . . . then think about the decoding side before the encoding one. Indeed, for those who are more comfortable with physics than with mathematics, decoding algorithms are more accessible than coding constructions, and help to understand them. — Claude Berrou [37]

14.1 Introduction Shannon’s channel coding theorem implies strong coding behavior for random codes as the code block length increases, but increasing block length typically implies an exponentially increasing decoding complexity. Sequences of codes with sufficient structure to be easily decoded as the length increases were, until fairly recently, not sufficiently strong to approach the limits implied by Shannon’s theorem. However, in 1993, an approach to error-correction coding was introduced which provided for very long codewords with only (relatively) modest decoding complexity. These codes were termed turbo codes by their inventors [38, 39]. They have also been termed parallel concatenated codes [194, 397]. Because the decoding complexity is relatively small for the dimension of the code, very long codes are possible, so that the bounds of Shannon’s channel coding theorem become, for all practical purposes, achievable. Codes which can operate within a fraction of a dB of channel capacity are now possible. Since their announcement, turbo codes have generated considerable research enthusiasm leading to a variety of variations, such as turbo decoding of block codes and combined turbo decoding and equalization, which are introduced in this chapter. Actually, the turbo coding idea goes back somewhat earlier than the original turbo code announcement; the work of [276] and [277] also present the idea of parallel concatenated coding and iterative decoding algorithms. The turbo code encoder consists of two (or more) systematic block codes which share message data via permuters (sometimes referred to as interleavers). In its most conventional realization, the codes are obtained from recursive systematic convolutional (RSC) codes — but other codes can be used as well. A key development in turbo codes is the iterative decoding algorithm. In the iterative decoding algorithm, decoders for each constituent encoder take turns operating on the received data. Each decoder produces an estimate of the probabilities of the transmitted symbols. The decoders are thus soft output decoders. Probabilities of the symbols from one encoder known as extrinsic probabilities are passed to the other decoder (in the symbol order appropriate for the encoder), where they are used as prior probabilities for the other decoder. The decoder thus passes probabilities back and forth between the decoders, with each decoder combining the evidence it receives from the incoming prior probabilities with the parity information provided by the code. After some number of iterations, the decoder converges to an estimate of the transmitted codeword. Since the output of one decoder is fed to the input of the next decoder, the decoding algorithm is called a turbo decoder: it is reminiscent of turbo charging an automobile engine using engine-heated air at the air intake. Thus, it is not really the code which is “turbo,” but rather the decoding algorithm which is “turbo.” As an example of what turbo codes can achieve, Figure 14.1 shows the performance of a turbo code employing two RSC encoders with parity-producing transfer functions G(x) =

1 + x4 1 + x + x2 + x3 + x4

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

(14.1)

592

Turbo Codes 100

10−1

2

ra

tio

0

s

10−7

on

Rate 1/2 capacity bound

10−6

ati

10−5

ns

ter

s ation 6 iter

10−4

10−8

ite

3i

10−3

tions 10 itera s 18 iteration

Probability of bit error

ation

Waterfall or cliff region

10−2

Uncoded

1 iter

Error floor region

dfree boun

d

0.5

1

1.5 Eb/N0 (dB)

2

2.5

Figure 14.1: Decoding results for a (37, 21, 65536) code. in a rate R = 1∕2 turbo code (i.e., it is punctured) with block length N = 65536 and permuter (with the permutation selected essentially at random). (The numerator and denominator polynomials are represented using the octal numbers 21 = 10 001 and 37 = 11 111, respectively, so this code is sometimes referred to as a (37, 21, 65536) code.) The decoding performance for up to 18 decoding iterations is shown. Beyond 18 iterations, little additional coding gain is achieved. (These results were obtained by counting up to 100 bits in error.) We note that with 18 iterations of decoding, performance within about 0.5 dB of the BAWGN capacity limit is achieved by this code, at least for SNRs up to about 0.6 dB. However, an interesting phenomenon is observed at higher SNRs: while the decoding is still good, it fails to improve as dramatically as a function of SNR. At a certain SNR, the error curves nearly level off, so the improvement with increasing SNR is very modest. This phenomenon is referred to as the error floor and is discussed in Section 14.4. Briefly, it is due to the presence of low-weight codewords in the code. A bound due to the free distance of the convolutional coders is also shown in the plot, which indicates the slope of the error floor. The portion of the plot where the error plot drops steeply down as a function of SNR is referred to as the waterfall or cliff region. In this chapter, we discuss the structure of the encoder, present various algorithms for decoding, and provide some indication of the structure of the codes that leads to their good performance and the error floor. We also introduce the idea of turbo equalization and the concept of EXIT analysis for the study of the convergence of the decoding algorithm.

14.2 Encoding Parallel Concatenated Codes

593

14.2 Encoding Parallel Concatenated Codes The conventional arrangement for the (unpunctured) turbo encoder is shown in Figure 14.2. It consists of two transfer functions representing the non-systematic components of RSC encoders called the constituent encoders, and a permuter, which permutes the input symbols prior to input to the second constituent encoder. (It is also possible to use more than two encoder blocks [98], but the principles remain the same, so for the sake of specific notation we restrict attention here to only two constituent encoders.) As discussed in Chapter 12, systematic convolutional codes typically work best when the encoder is a feedback (IIR) encoder, so the transfer function of each convolutional encoder is the rational function h(x) . G(x) = g(x) Strictly speaking, there is no reason that both constituent transfer functions must be the same. However, it is conventional to use the same transfer function in each branch; research to date has not provided any reason to do otherwise. vt(0)

xt

Recursive systematic convolutional encoder

vt(1)

Recursive systematic convolutional encoder

vt(2)

Permuter Π

xʹ = Π(x)

Figure 14.2: Block diagram of a turbo encoder. A block of input symbols 𝐱 = {x0 , x1 , . . . , xN−1 } is presented to the encoder, where each xi is in some alphabet  with || elements in it. These input symbols may include an appended zero-state forcing sequence, as in Figure 14.6, or it may simply be a message sequence, 𝐱 = 𝐦 = {m0 , m1 , . . . , mN−1 }. In the encoder, the input sequence 𝐱 is used three ways. First, it is copied directly to the output to produce the systematic output sequence 𝑣t(0) = xt , t = 0, 1, . . . , N − 1. Second, the input sequence runs through the first RSC encoder with transfer function G(x), resulting in a parity sequence (1) {𝑣0(1) , 𝑣2(1) , . . . , 𝑣N−1 }. The combination of the sequence {𝑣t(0) } and the sequence {𝑣t(1) } results in a rate R = 1∕2 (neglecting the length of the zero-forcing tail, if any) systematically encoded convolutionally encoded sequence. Third, the sequence 𝐱 is also passed through a permuter or interleaver of length N, denoted by Π, which produces the permuted output sequence 𝐱′ = Π(𝐱). The sequence 𝐱′ is passed through another convolutional encoder with transfer function G(x) which produces the output sequence (2) 𝐯(2) = {𝑣0(2) , 𝑣(2) , . . . , 𝑣N−1 }. The three output sequences are multiplexed together to form the output 1 sequence (0) (1) (2) 𝐯 = {(𝑣(0) , 𝑣0(1) , 𝑣0(2) ), (𝑣(0) , 𝑣(1) , 𝑣1(2) ), . . . , (𝑣N−1 , 𝑣N−1 , 𝑣N−1 )}, 0 1 1 resulting in an overall rate R = 1∕3 linear, systematic, block code. The code has two sets of parity information, 𝐯(1) and 𝐯(2) which, because of the interleaving, are fairly independent. In an ideal setting, the sets of parity bits would be exactly independent.

594

Turbo Codes Frequently, in order to obtain higher rates, the filter outputs are punctured before multiplexing, as shown in Figure 14.3. Puncturing operates only on the parity sequences — the systematic bits are not punctured. The puncturing is frequently represented by a matrix, such as [ ] 1 0 P= . 0 1 The first column indicates which bits are output at the even output instants and the second column indicates which bits are output at the odd output instants. For example, this puncture matrix alternately selects the outputs of the encoding filters. vt(0) Recursive systematic convolutional encoder

xt

v∼t(1)

Permuter Π

vt(1)

Puncture (optional) Recursive systematic convolutional encoder

xʹ = Π(x)

v∼t(2)

Figure 14.3: Block diagram of a turbo encoder with puncturing. 1 Example 14.1 Consider the transfer function G(x) = 1+x incorporated in the turbo encoder of Figure 14.4(a), 2 with the trellis stage shown in Figure 14.4(b). Let the permuter be described by

Π = {8, 3, 7, 6, 9, 0, 2, 5, 1, 4}. Then, for example, x0′ = x8 , x1′ = x3 , etc. Let the input sequence be 𝐱 = [1, 1, 0, 0, 1, 0, 1, 0, 1, 1] = 𝐯(0) .

xt

vt(0) D

D

vt(1) 0 1

Permute {8, 3, 7, 6, 9, 0, 2, 5, 1, 4}

2

xʹt

0/0 1/0

1/1 0/1

0/0 1/1 1/0

3 D

D

vt(2)

(a)

Figure 14.4: Example turbo encoder with G(x) = 1∕(1 + x2 ).

0/1

(b)

14.3 Turbo Decoding Algorithms

595

Then the output of the first encoder is 𝐯(1) = [1, 1, 1, 1, 0, 1, 1, 1, 0, 0],

(14.2)

and the first encoder happens to be left in state 0 at the end of this sequence. The permuted bit sequence is 𝐱′ = [1, 0, 0, 1, 1, 1, 0, 0, 1, 1] and the output of the second encoder is 𝐯(2) = [1, 0, 1, 1, 0, 0, 0, 0, 1, 1];

(14.3)

the second encoder is left in state 3. When the 3 bit streams are multiplexed together, the bit stream is 𝐯 = [1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1]. If the encoded bits are punctured, the underlined parity bits of (14.2) and (14.3) are retained. The resulting rate R = 1∕2 encoded bit sequence is 𝐯 = [1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1].



It should be pointed out that there are also serially concatenated codes with iterative decoders. One such code is the repeat accumulate (RA) code, which is introduced in Section 16.2.

14.3 Turbo Decoding Algorithms The multiplexed and encoded data 𝐯 are modulated and transmitted through a channel, whose output is the received vector 𝐫. The received data vector 𝐫 is demultiplexed into the vectors 𝐫 (0) (corresponding to 𝐯(0) ), 𝐫 (1) (corresponding to 𝐯(1) ), and 𝐫 (2) (corresponding to 𝐯(2) ). The general operation of the turbo decoding algorithm is as follows, as summarized in Figure 14.5. The data (𝐫 (0) , 𝐫 (1) ) associated with the first encoder are fed to Decoder I. This decoder initially uses uniform priors on the transmitted bits and produces probabilities of the bits conditioned on the observed data. These probabilities are called the extrinsic probabilities, as described below. The output probabilities of Decoder I are permuted and passed to Decoder II, where they are used as “prior” probabilities in the decoder, along with the data associated with the second encoder, which is 𝐫 (0) (permuted) and 𝐫 (2) . The extrinsic output probabilities of Decoder II are depermuted and passed back to become prior probabilities to Decoder I. The process of passing probability information back and forth continues until the decoder determines (somehow) that the process has converged, or until some maximum number of iterations is reached. “Priors"

r(0) r(1) r(0)

De-permuter Π–1

Decoder I

Extrinsic information

Permuter Π

Extrinsic information “Priors" Decoder II

Permuter Π

r(2)

Figure 14.5: Block diagram of a turbo decoder.

De-permuter Π–1

xˆt

596

Turbo Codes The heart of the decoding algorithm is a soft-decision decoding algorithm which provides estimates of the posterior probabilities of each input bit. The algorithm most commonly used for the soft-decision decoding algorithm is the MAP algorithm, also commonly known as the BCJR algorithm. In Section 14.3.1, we describe this algorithm for the case of a general convolutional code. Then in Section 14.3.10, we describe modifications to the algorithm that apply to systematic codes, which sets the stage for the iterative turbo decoding algorithm. The MAP algorithm can also be expressed in a log-likelihood setting, as described in Section 14.3.12. A lower-complexity implementation of the MAP algorithm is discussed in Section 14.3.15. Another decoding algorithm, called the soft-output Viterbi algorithm (SOVA) is described in Section 12.11, which has even lower computational complexity (but slightly worse performance). 14.3.1

The MAP Decoding Algorithm

The maximum a posteriori (MAP) decoding algorithm suitable for estimating bit and/or state probabilities for a finite-state Markov system is frequently referred to as the BCJR algorithm, after Bahl, Cock, Jelenik, and Raviv who proposed it originally in [20]. The BCJR algorithm computes the posterior probability of symbols from Markov sources transmitted through discrete memoryless channels. Since the output of a convolutional coder passed through a memoryless channel (such as an AWGN channel or a BSC) forms a Markov source, the BCJR algorithm can be used for maximum a posteriori probability decoding of convolutional codes. In many respects, the BCJR algorithm is similar to the Viterbi algorithm. However, the Viterbi algorithm computes hard decisions — even if it is employing soft branch metrics — since a single path is selected to each state at each time. This result in an overall decision on an entire sequence of bits (or codeword) at the end of the algorithm, and there is no way of determining the reliability of the decoder decisions on the individual bits. Furthermore, the branch metric is based upon log-likelihood values; no prior information is incorporated into the decoding process. The BCJR algorithm, on the other hand, computes soft outputs in the form of posterior probabilities for each of the message bits. While the Viterbi algorithm produces the maximum likelihood message sequence (or codeword), given the observed data, the BCJR algorithm produces the a posteriori most likely sequence of message bits, given the observed data. (Interestingly, the sequence of bits produced by the MAP algorithm may not actually correspond to a continuous path through the trellis.) In terms of actual performance on convolutional codes, the distinction between the Viterbi algorithm and the BCJR algorithm is frequently insignificant, since the performance of the BCJR algorithm is usually comparable to that of the Viterbi algorithm and any incremental improvement offered by the BCJR algorithm is offset by its higher computational complexity. However, there are instances where the probabilities produced by the BCJR are important. For example, the probabilities can be used to estimate the reliability of decisions about the bits. This capability is exploited in the decoding algorithms of turbo codes. As a result, the BCJR algorithm lies at the heart of most turbo decoding algorithms. We first express the decoding algorithm in terms of probabilities then, in Section 14.3.12, we present analogous results for likelihood ratio decoding. The probabilistic description is more general, being applicable to the case of nonbinary alphabets. However, it also requires particular care with normalization. Furthermore, there are approximations that can be made in association with the likelihood ratio formulation that can reduce the computational burden somewhat. 14.3.2

Notation

We present the BCJR algorithm here in the context of a R = k∕n convolutional coder. Consider the block diagram of Figure 14.6. The encoder accepts message symbols mi coming from an alphabet , . . . , m(k−1) ]. It  — most frequently,  = {0, 1} — which are grouped into k-tuples 𝐦i = [m(0) i i is frequently convenient to employ convolutional encoders which terminate in a known state. To accomplish this, Figure 14.6 portrays the input sequence 𝐦 = [𝐦0 , 𝐦1 , . . . , 𝐦L−1 ] passing through a system that appends a sequence of 𝐱L , 𝐱L+1 , . . . , 𝐱L+𝜈−1 , where 𝜈 is the constraint length (or memory)

14.3 Turbo Decoding Algorithms

m

Append zero-state forcing sequence (optional)

x

597

Convolutional encoder

Channel c

R = k/n

Signal mapper (e.g. BPSK)

a

r n

Figure 14.6: Processing stages for BCJR algorithm. of the convolutional coder, which is used to drive the state of the encoder to 0. (For a polynomial encoder, the padding bits would be all zeros, but for the recursive encoder, the padding bits are a function of the state of the encoder after the last message bit mL−1 enters the encoder.) The sequence [ ] 𝐱 = 𝐦, 𝐱L , 𝐱L+1 , . . . , 𝐱L+𝜈−1 forms the input to the R = k∕n convolutional encoder. We denote the actual length of the input sequence by N, so N = L + 𝜈 if the appended sequence is used, or N = L if not. Each block 𝐱i is in k . The output of the encoder is the sequence of blocks of symbols [ ] 𝐯 = 𝐯0 , 𝐯1 , . . . , 𝐯N−1 , where each block 𝐯t contains the n output bits of the encoder for the tth input: [ ] 𝐯t = 𝑣t(0) , 𝑣t(1) , . . . , 𝑣(n−1) . t The encoder symbols 𝐯t are mapped to a signal constellation (such as BPSK) to produce the output symbols 𝐚t . The dimension of the 𝐚t depends on√the dimension of the signal constellation. For example, (0) (1) (n−1) ] if BPSK is employed, we might have a(i) t ∈ {± Ec }, where REb = Ec , with 𝐚t = [at , at , . . . , at and √ Ec (2𝑣(i) i = 0, 1, . . . , n − 1. (14.4) a(i) t = t − 1), We also use the notation

(i) 𝑣̃(i) t = 2𝑣t − 1 √ √ (i) to indicate the ±1 modulated signals without the Ec scaling, so a(i) Ec 𝑣̃t . t = The sequence of output symbols 𝐚 = [𝐚0 , 𝐚1 , . . . , 𝐚N−1 ] passes through an additive white Gaussian noise (AWGN) channel to form the received symbol sequence

𝐫 = [𝐫0 , 𝐫1 , . . . , 𝐫N−1 ], where 𝐫t = 𝐚t + 𝐧t ,

t = 0, 1, . . . , N − 1,

and where 𝐧i is a zero-mean Gaussian noise signal with variance 𝜎 2 = N0 ∕2 in each component. We denote the discrete time index as t. We denote the state of the encoder at time t by Ψt . There are Q = 2𝜈 possible states, where 𝜈 is the constraint length of the encoder, which we denote as integers in the range 0 ≤ Ψt < 2𝜈 . We assume that the encoder starts in state 0 (the all-zero state) before any message bits are processed, so that Ψt = 0 for t ≤ 0. When the zero-forcing sequence is appended, the encoder terminates in state 0, so that ΨN = 0. Otherwise, it is assumed that the encoder could terminate in any state with equal probability. The sequence of states associated with the input sequence is {Ψ0 , Ψ1 , . . . , ΨN }. A portion of the trellis associated with the encoder is shown in Figure 14.7, portraying a state Ψt = p at time t transitioning to a state Ψt+1 = q at time t + 1. The unique input 𝐱t which causes this transition is denoted by 𝐱(p,q) . The corresponding mapped symbols 𝐚t produced by this state transition are denoted by 𝐚(p,q) , with elements a(0,p,q) , a(1,p,q) , . . . , a(n−1,p,q) .

598

Turbo Codes Time t

ψt = p

Time t + 1

xt = at ψt +1 = q

rt

Figure 14.7: One transition of the trellis for the encoder. Notationally, quantities with the time-index subscript t are often random variables or their realizations, (e.g., 𝐚t , at(0) , or 𝐱t ), whereas quantities without the time-index subscript are usually not random variables. 14.3.3

Posterior Probability

It is clear that the convolutional code introduces dependencies among the symbols {𝐚t }. An optimal decoder should exploit these dependencies, examining the entire sequence of data to determine its estimates of the probabilities of the input bits. The goal of the decoder is thus: Determine the a posteriori probabilities of the input P(𝐱t = 𝐱|𝐫), that is, the probability that the input takes on some value 𝐱 conditioned upon the entire received sequence 𝐫. The BCJR algorithm provides an efficient way to compute these probabilities. The first step is to determine the probabilities of state transitions; once these are determined, finding the probabilities of the bits is straightforward. The convolutional code introduces a Markov property into the probability structure: Knowledge of the state at time t + 1 renders irrelevant knowledge of the state at time t or previous times. To exploit this Markovity, we partition the observations into three different sets, 𝐫 = 𝐫t , where 𝐫t = {𝐫l : l > t} is the set of the future observations. Then the posterior probability of the transition (Ψt = p, Ψt+1 = q) given the observed sequence 𝐫 is P(Ψt = p, Ψt+1 = q|𝐫) = p(Ψt = p, Ψt+1 = q, 𝐫)∕p(𝐫) = p(Ψt = p, Ψt+1 = q, 𝐫t )∕p(𝐫),

(14.5)

where P denotes a probability mass function and p denotes a probability density function. We now employ the conditioning factorization P(Ψt = p, Ψt+1 = q|𝐫) = p(Ψt = p, Ψt+1 = q, 𝐫t |Ψt = p, Ψt+1 = q, 𝐫t |Ψt = p, Ψt+1 = q, 𝐫t |Ψt+1 = q),

(14.7)

since knowledge of the state at time t + 1 renders irrelevant knowledge about prior states or received data. The first factor in (14.6) can be factored further and the Markovity can exploited again: p(Ψt = p, Ψt+1 = q, 𝐫 0 b̂ i = (15.12) 1 if Li→out < 0

15.4 Decoding LDPC Codes

649

then seeing if the detected codeword satisfies parity, Ĥ𝐛 = 0. If so, then the algorithm can stop with this decoded codeword. Another stopping condition is to set a maximum number of iterations and to stop when a maximum number of iterations is reached. The best estimate of the codeword obtained using (15.12) is returned, which may be correct in many positions. 15.4.1.8

Summary of Decoding: The Belief Propagation Algorithm

The steps of passing messages from check nodes to variable nodes and from variable nodes to check nodes are combined together for the overall LDPC decoding algorithm. At each check node, the “leave one out” computation is performed, in succession, leaving out the message from each neighboring variable node in turn. At each variable node, the “leave one out” computation is performed, in succession, leaving out the message from each variable node in turn. This is established in the following formal description. The notation is as follows:

• • • • • • • • • •

N: number of variable nodes (length of code). K: number of input bits (dimension of code). M: number of check nodes = N − K. n: index on variable nodes. m: index on check nodes.  (m): set of variable nodes (the set of n indices) which are neighbors to check node m. (n): set of check nodes (the set of m indices) which are neighbors to variable node n.

α α2 α3 α4 α5 α6 α7 α8 α9

 (m) − {n}: the neighbors to Vm , excluding n.

01010

Lm′ →n : The message (log probability ratio) from check node

m′

to variable node n.

Ln′ →m : The message (log probability ratio) from variable node n′ to check node m.

Algorithm 15.1 Log-Likelihood Belief Propagation Decoding Algorithm for Binary LDPC Codes Input: Description of the parity check matrix using  (m) and (n). The channel log likelihoods Ln , Maximum # of iterations, MAXITER Output: Estimate of codeword Initialization: For each n, and for each m ∈ (n), set Ln→m = Ln Check Node to Variable Node Step (horizontal step): for each check node m for each variable node n ∈  (m) Compute the message from Cm to Vn by ( Lm→n = 2 tanh−1



n′ ∈ (m)−{n}

end for end for Variable Node to Check Node Step (vertical step) for each variable node n

) tanh(Ln′ →m ∕2)

+

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc decode() lldecode()

650

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis for each check node m ∈ (n) Compute the message from Vn to Cm by



Ln→m = Ln +

Lm′ →n .

m′ ∈(n)−{m}

Also compute the output likelihoods Ln→out = Ln +



Lm′ →n .

m′ ∈(n)

end for end for for each n, decide ĉ n = 1 if Ln→out < 0. Check Parity: ̂ if H 𝐜̂ = 0 then return 𝐜. Otherwise, if # iterations < MAXITER, goto Check Node to Variable Node Step Else return 𝐜̂ and an indication of coding failure.

[ ]T Example 15.9 The message vector 𝐦 = 1 0 1 0 1 is encoded using a systematic generator G derived from (15.1) to obtain the code vector [ ]T (15.13) 𝐛= 0 0 0 1 0 1 0 1 0 1 . Then 𝐛 is mapped to a signal constellation with amplitude a = −2 to obtain the vector [ ]T 𝐭 = −2 −2 −2 2 −2 2 −2 2 −2 2 , which is transmitted through an AWGN channel with 𝜎 2 = 2. The vector [ ]T 𝐲 = −0.63 −0.83 −0.73 −0.04 0.1 0.95 −0.76 0.66 −0.55 0.58 is received. Using (15.10), it is found that the channel posterior probabilities are [ ]T P(𝐜 = 1|𝐲) = 0.22 0.16 0.19 0.48 0.55 0.87 0.18 0.79 0.25 0.76 .

(15.14)

If, ignoring the fact that the information is coded, these posterior probabilities were converted to a binary vector by thresholding the probabilities in (15.14) at 0.5, the estimated vector ] [ 𝐜̂ = 0 0 0 0 1 1 0 1 0 1 would be obtained, which differs from the original code vector at the two underlined locations. However, note that at the error locations, the channel posterior probability is only slightly different than the threshold 0.5, so that the bits only “weakly” decode to the values 0 and 1, respectively. Other bits more strongly decode to their true values. These weak and strong indications are exploited by a soft-decision decoder. Converting the received values to log-likelihood ratios, use the channel reliability Lc = 2a∕𝜎 2 = 2 to find the channel log likelihoods [ ] 𝐋n = (−Lc )𝐲 = 1.3 1.7 1.5 0.08 −0.2 −1.9 1.5 −1.3 1.1 −1.2 . The initial messages Ln→m (expressed as a matrix) is

Ln→m

⎡1.3 1.7 1.5 −1.9 1.5 −1.2⎤ ⎢1.3 ⎥ 1.5 −0.2 −1.9 −1.3 1.1 ⎢ ⎥ 1.5 0.08 −0.2 1.5 1.1 −1.2⎥ . =⎢ ⎢ 1.7 0.08 −0.2 −1.9 −1.3 −1.2⎥ ⎢1.3 1.7 ⎥ 0.08 1.5 −1.3 1.1 ⎣ ⎦

15.4 Decoding LDPC Codes

651

(Note that there are only values computed for elements where the parity check matrix is nonzero.) Iteration 1: After the first iteration, the messages Lm→n (expressed as a matrix) and the messages Ln→m and Ln→out are

Lm→n

Ln→m

Ln→out

⎡ 0.21 0.17 0.19 −0.16 0.18 −0.22 ⎤ ⎢−0.027 ⎥ −0.024 0.15 0.02 0.026 −0.03 ⎢ ⎥ 0.0013 0.021 −0.0083 0.0013 0.0017 −0.0016⎥ =⎢ ⎢ 0.0018 0.03 −0.012 −0.0016 −0.0021 −0.0023⎥ ⎢ −0.01 −0.0083 ⎥ −0.14 −0.0088 0.0097 −0.011 ⎣ ⎦ ⎡1.2 ⎢1.5 ⎢ =⎢ ⎢ ⎢1.4 ⎣ [ = 1.4

1.7 1.4 −1.9 1.5 −1.2⎤ ⎥ 1.6 −0.22 −2.1 −1.3 1.1 ⎥ 1.6 −0.031 −0.064 1.7 1.1 −1.4⎥ 1.8 −0.041 −0.06 −2 −1.3 −1.4⎥ ⎥ 1.8 0.13 1.7 −1.3 1.1 ⎦ ] 1.8 1.6 −0.011 −0.072 −2 1.7 −1.3 1.1 −1.4 .

The estimated bits 𝐜̂ and corresponding parity H 𝐜̂ are [ ] 𝐜̂ = 0 0 0 1 1 1 0 1 0 1

[ ] 𝐳= 0 1 1 1 0 .

Since the parities do not all check, two more iterations are required: Iteration 2:

Lm→n

⎡ 0.2 0.16 0.18 −0.15 0.18 −0.21 ⎤ ⎢ ⎥ ⎢−0.033 ⎥ −0.03 0.19 0.027 0.036 −0.041 ⎢ ⎥ =⎢ −0.0002 0.0085 0.0042 −0.00019 −0.00027 0.00022⎥ ⎢ ⎥ −0.00032 0.011 0.0077 0.0003 0.0004 0.00038⎥ ⎢ ⎢−0.018 −0.016 ⎥ −0.17 −0.016 0.02 −0.023 ⎣ ⎦

⎡1.2 1.6 ⎢1.5 ⎢ Ln→m = ⎢ ⎢ 1.8 ⎢1.4 1.8 ⎣ [ Ln→out = 1.4 1.8 [ 𝐜̂ = 0 0 0

1.4 −1.9 1.5 −1.2⎤ ⎥ 1.6 −0.19 −2.1 −1.3 1.1 ⎥ 1.6 −0.083 −0.0055 1.7 1 −1.4⎥ −0.085 −0.009 −2 −1.3 −1.4⎥ ⎥ 0.1 1.7 −1.3 1.1 ⎦ ] 1.6 −0.074 −0.0013 −2 1.7 −1.3 1 −1.4 ] [ ] 1 1 1 0 1 0 1 𝐳= 0 1 1 1 0 .

Iteration 3:

Lm→n

⎡ 0.2 0.16 0.18 −0.15 0.17 −0.21 ⎤ ⎢ ⎥ −0.028 −0.025 0.18 0.022 0.03 −0.035 ⎢ ⎥ =⎢ −4.4e − 05 0.0007 0.011 −4.2e − 05 −6.1e − 05 4.9e − 05⎥ ⎢ ⎥ ⎢ −9.9e − 05 0.0017 0.016 9.2e − 05 0.00013 0.00012 ⎥ ⎢ ⎥ −0.17 −0.012 0.015 −0.017 ⎣−0.014 −0.012 ⎦

⎡1.2 1.6 ⎢1.5 Ln→m = ⎢ ⎢ 1.8 ⎢ ⎣1.4 1.8 [ Ln→out = 1.4 1.8 [ 𝐜̂ = 0 0 0

1.4 −1.9 1.5 −1.2⎤ 1.6 −0.17 −2 −1.3 1.1 ⎥ 1.6 −0.087 −0.0029 1.7 1 −1.4⎥ ⎥ −0.088 −0.008 −2 −1.3 −1.4⎥ ⎦ 0.082 1.7 −1.3 1.1 ] 1.6 −0.086 0.0077 −2 1.7 −1.3 1 −1.4 ] [ ] 1 0 1 0 1 0 1 𝐳= 0 0 0 0 0 .

At this point, all the parities check and the codeword is corrected decoded.



652

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis 15.4.1.9

Messages on the Tanner Graph

As pointed out above, the decoding algorithm can be viewed as passing messages on the Tanner graph associated with the code. This view can be emphasized by unfolding the bipartite graph as suggested in Figure 15.6. If the Tanner graph could actually be unfolded this way, as a tree, then the belief propagation decoding algorithm would be optimal. However, as shown in Figure 15.7, using V1 as an example node, there exists a cycle in the graph: V1 → C1 → V2 → C5 → V1 . This means that as information propagates through the graph through multiple iterations, information emerging from a variable node (such as V1 ) will eventually come back to V1 . This means that the estimated probabilities are not independent of information coming from other nodes. As the decoding algorithm develops, the information will be biased by information previously at the node, not formed by collecting independently from other nodes in the graph. Since independence was an important assumption in deriving the decoding at both the check nodes and the variable nodes, the decoding at both check and variable nodes becomes biased. Another view of the iterative nature of the decoding algorithm is shown in Figure 15.8. Duplicating the Tanner graph in multiple stages to show the flow of information, in the iterative decoding information passes from the bit (or variable) nodes) to the check nodes, then from check nodes to bit nodes, then to check nodes, and so forth. As indicated by the dashed line in this portrayal, information emerging from the first bit eventually (after two full iterations) finds its way back to the first bit node. 15.4.2

Decoding Using Probabilities

While the log-likelihood approach to decoding is frequently used in implementations of LDPC decoding algorithm, there is insight and understanding to be gained by also considering the decoding from a probabilistic standpoint. There may also be some computational advantages. Use these bits At these parity checks To get information about these bits. Then use these bits At these parity checks To get information about these bits.

Cm Vn

Figure 15.6: A parity check tree associated with the Tanner graph. V10

V9

V8

V7

V6

C5

V5

C2

V4

V3

V2

C1

V1

Figure 15.7: A portion of the Tanner graph for the example code, with node for V1 as a root, showing a cycle.

15.4 Decoding LDPC Codes

Bits

653

Checks

Bits

Checks

Bits

Direction of processing

Figure 15.8: Processing information through the graph determined by H. The dashed line illustrates a cycle of length 4. 15.4.2.1

Probability of Even Parity: Decoding at the Check Nodes

Just as the likelihood ratio-based decoding started by considering the probability of the sum of bits, so does the probability-based decoding. Lemma 15.10 (Gallager’s Lemma) Let b1 , b2 , . . . , bd be independent bits, with P(bi = 1) = pi . Then ∏ 1 − di=1 (1 − 2pi ) P(b1 ⊕ b2 ⊕ · · · ⊕ bd = 1) = . 2 and P(b1 ⊕ b2 ⊕ · · · ⊕ bd = 0) =

1+

∏d

i=1 (1

2

− 2pi )

.

Proof The proof is inductive. The formulas are easily verified with d = 1. Assume the result is true for d − 1, that is, ∏ 1 − d−1 i=1 (1 − 2pi ) P(b1 ⊕ b2 ⊕ · · · ⊕ bd−1 = 1) = . 2 Denote this quantity as Wd−1 . Then P(b1 ⊕ b2 ⊕ · · · ⊕ bd = 1) = Wd−1 P(bd = 0) + (1 − Wd−1 )P(bd = 1) Wd−1 (1 − pd ) + (1 − Wd−1 )pd = Wd−1 (1 − 2pd ) + pd ∏ ∏ (1 − 2pd ) + di=1 (1 − 2pi ) 2pd 1 − di=1 (1 − 2pi ) = + = . 2 2 2



Gallager’s lemma is used for decoding at a check node. To repeat a previous example, suppose that at a check node V1 ⊕ V2 ⊕ V3 ⊕ V4 = 0 with p2 = P(V2 = 1), p3 = P(V3 = 1), and p4 = P(V4 = 1) given. Then since V1 = V2 ⊕ V3 ⊕ V4 , using Gallager’s lemma, P(V1 = 1) = P(V2 ⊕ V3 ⊕ V4 = 1) =

1 − (1 − 2p2 )(1 − 2p3 )(1 − 2p4 ) . 2

654

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis Cm

+

Pm→n Vn

(m) – {n}

Set of messages Pnʹ →m , nʹ ∈

Figure 15.9: Probabilistic message from check node m to variable nodes n. More generally, let Pn→m (b) denote the message from variable node Vn to check node Cm , representing the probability that the bit is equal to b (conditioned on channel measurement and ultimately parity constraints): Pn→m (b) ≈ P(Vn = b). The message that check node m sends to variable node Vn is computed by the probability that Vn is 1, based on the parity constraint and the probabilities of other variable nodes connected to Cm , as shown in Figure 15.9 By Gallager’s lemma, this is ∏ 1 − n′ ∈ (m)−{n} (1 − 2Pn′ →m (1)) , Pm→n (1) = 2 ∏ 1 + n′ ∈ (m)−{n} (1 − 2Pn′ →m (1)) Pm→n (0) = . 2 One way to implement this is as follows. Let 𝛿qm,n′ = Pn′ →m (0) − Pn′ →m (1) = 1 − 2Pn′ →m (1). Compute the leave-one-out product



𝛿rmn =

𝛿qm,n′ .

n′ ∈ (m)−{n}

Then Pm→n (1) = 15.4.2.2

1 − 𝛿rmn 2

Pm→n (0) =

1 + 𝛿rmn . 2

Probability of Independent Observations Decoding at a Variable Node

Consider now that d independent observations 𝐲 = (y1 , . . . , yd ) are available about a bit b. Then p(𝐲|b = 1)P(b = 1) p(𝐲|b = 1) = p(𝐲) p(𝐲|b = 0)P(b = 0) + p(𝐲|b = 1)P(b = 1) ∏d i=1 p(yi |b = 1) = p(𝐲|b = 0)P(b = 0) + p(𝐲|b = 1)P(b = 1)

P(b = 1|𝐲) =

and similarly, ∏d p(𝐲|b = 0)P(b = 0) i=1 p(yi |b = 1) P(b = 0|𝐲) = = . p(𝐲) p(𝐲|b = 0)P(b = 0) + p(𝐲|b = 1) The denominators in P(b = 1|𝐲) and P(b = 0|𝐲) are identical. As a result, assuming 0 and 1 are equiprobable, the probability can be computed by first computing the numerators as q′ (1) =

d ∏ i=1

p(yi |b = 1)

and

q′ (0) =

d ∏ i=1

p(yi |b = 0)

15.4 Decoding LDPC Codes

655

then forming the normalizing factor 𝛼=

q′ (1)

1 + q′ (0)

to that P(b = 1|𝐲) = 𝛼q′ (1)

P(b = 0|𝐲) = 𝛼q′ (0).

and

As applied to decoding at a variable node, let Pm′ →n (b) be messages (bit probabilities) sent from check node m′ to variable node n, and let Pn (b) be the channel information there. The probability message that variable node n sends to check node m is determined by the product of all the bit probabilities from check nodes adjacent to variable node n, except for the check node m to which the message is being sent. That is, ∏ Pm′ →n (1) Pn→m (1) = 𝛼Pn (1) m′ ∈(n)−{m}



Pn→m (0) = 𝛼Pn (0)

Pm′ →n (0),

m′ ∈(n)−{m}

where 𝛼 is chosen as a normalization constant such that Pn→m (0) + Pn→m (1) = 1. Also at variable node n, a message incorporating all the information from all check nodes and the channel information is computed: ∏ Pm′ →n (b), b = 0, 1, Pn→out (b) = 𝛼Pn (b) m′ ∈(n)

where again 𝛼 is a normalization constant such that Pn→out (0) + Pn→out (1) = 1. The inputs to the probability decoding algorithm are the channel probabilities, Pn (1) =

1 . 1 + eLn

For the AWGN channel and BPSK modulation, Pn (1) = Lc yn

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

with Lc = 2a∕𝜎 2 .

ldpcdecoder.cc probdecode()

Algorithm 15.2 Probability Belief Propagation Decoding Algorithm for Binary LDPCCodes Input: Description of the parity check matrix using  (m) and (n). The channel log likelihoods Pn (1), Maximum # of iterations, MAXITER Output: Estimate of codeword Initialization: For each n, and for each m ∈ (n), set Pn→m (1) = Pn (1), Pn→m (0) = 1 − Pn (1). Check Node to Variable Node Step (horizontal step): for each check node m for each variable node n ∈  (m)

656

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis Compute the message from Cm to Vn : ∏

𝛿rmn =

Pn′ →m (0) − Pn′ →m (1)

n′ ∈ (m)−{n}

Pm→n (1) =

1 − 𝛿rmn 2

Pm→n (0) =

1 + 𝛿rmn . 2

end for end for Variable Node to Check Node Step (vertical step): for each variable node n for each check node m ∈ (n) Compute the message from Vn to Cm by



Pn→m (1) = 𝛼Pn (1)

Pm′ →n (1).

m′ ∈(n)−{m}



Pn→m (0) = 𝛼Pn (0)

Pm′ →n (0),

m′ ∈(n)−{m}

where 𝛼 is such that Pn→m (1) + Pn→m (0) = 1. Also compute the output probabilities Pn→out (1) = 𝛼Pn (1)



Pm′ →n (1).

m′ ∈(n)

Pn→out (0) = 𝛼Pn (0)



Pm′ →n (0),

m′ ∈(n)

where 𝛼 is such that Pn→out (0) + Pn→out (1) = 1. end for end for for each n, decide ĉ n = 1 if Pn→out > 12 . Check Parity: ̂ if H 𝐜̂ = 0 then return 𝐜. Otherwise, if # iterations < MAXITER, goto Check Node to Variable Node Step Else return 𝐜̂ and an indication of coding failure.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpc.m ldpcdecode.m ldpclogdecode.m

This algorithm is properly called the sum–product algorithm, since it decodes using only sums and products. The log- likelihood-based algorithm is also sometimes referred to as the sum–product algorithm, because it is (ideally) numerically equivalent to this algorithm. A fundamental difference between the log-likelihood ratio-based decoder and the probability decoder is that the probability decoder requires computing probabilities for both 0 and 1 outcomes. This requires essentially twice the amount of memory, and twice as many computations (at the level of the loops in the algorithm). However, no complicated functions such as logarithms or tanh need to be computed. Further variations on decoding are developed which provide other tradeoffs in complexity. Example 15.11

For the parity check matrix (15.1) of Example 15.9, the decoding proceeds as follows:

Initialization: Set Pn→m (b) = pn (b) from (15.14) Pn→m ⎡0.22 0.16 0.19 0.87 0.18 0.76⎤ ⎢ ⎥ ⎢0.22 ⎥ 0.19 0.55 0.87 0.79 0.25 ⎢ ⎥ 0.19 0.48 0.55 0.18 0.25 0.76 ⎢ ⎥. ⎢ ⎥ 0.16 0.48 0.55 0.87 0.79 0.76⎥ ⎢ ⎢0.22 0.16 ⎥ 0.48 0.18 0.79 0.25 ⎣ ⎦

15.4 Decoding LDPC Codes

657

Iteration 1: Check Node to Variable Node Computation: Pm→n (1) ⎡0.45 0.46 0.45 0.54 0.45 0.56⎤ ⎢ ⎥ ⎢0.51 ⎥ 0.51 0.46 0.49 0.49 0.51 ⎢ ⎥ 0.5 0.49 0.5 0.5 0.5 0.5 ⎥ . ⎢ ⎢ ⎥ 0.5 0.49 0.5 0.5 0.5 0.5 ⎥ ⎢ ⎢ 0.5 0.5 ⎥ 0.54 0.5 0.5 0.5 ⎣ ⎦ Iteration 1: Variable Node to Check Node Computation: Pn→m (1) ⎡0.23 0.16 0.19 0.87 0.18 0.76⎤ ⎢ ⎥ ⎢0.19 ⎥ 0.16 0.56 0.89 0.79 0.25 ⎢ ⎥ 0.17 0.51 0.52 0.16 0.26 0.8 ⎥ ⎢ ⎢ ⎥ 0.14 0.51 0.51 0.88 0.78 0.8 ⎥ ⎢ ⎢0.19 0.14 ⎥ 0.47 0.15 0.79 0.26 ⎣ ⎦ (1) P n→out [ ] 0.19 0.14 0.17 0.5 0.52 0.88 0.16 0.78 0.26 0.8 [ ] [ ] 𝐜̂ = 0 0 0 1 1 1 0 1 0 1 𝐳= 0 1 1 1 0 .

(15.15)

At the end of the first iteration, the parity check condition is not satisfied. The algorithm runs through two more iterations (not shown here). At the end, the decoded value [ ] 𝐜̂ = 0 0 0 1 0 1 0 1 0 1 is obtained, which exactly matches the transmitted code vector 𝐜 of (15.13).

15.4.2.3



Computing the Leave-One-Out Product

The leave-one-out computation is an important computation in both the likelihood and probabilistic formulation. In this section, we consider how to efficiently compute these products in a numerically stable way. Given real numbers a, b, c, d, and e, it is desired to compute the products bcde

acde

abde

abce

abcd.

An obvious efficient way to compute this is to compute the overall product abcde (4 multiplications) then compute successive quotients abcde a

abcde b

abcde c

abcde d

abcde , e

an additional five operations, for a total of 9 operations. In general, with n operands, there will be 2n − 1 multiplications/divisions. Computing this way introduces difficulties if a factor is near 0 — the overall product is near 0, and division by a small number is numerically problematic. If one of the factors is 0, then this method fails. A superior approach does not require any division operations (but it does require a few more computations). This is done by using a table called the forward/backward table (not to be confused with the forward/backward algorithm used in turbo code decoding). The forward table is formed by successive products in the forward direction, and the backward table is formed by successive products in the reverse direction, as shown below.

658

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis

Forward Products f0 = a f1 = ab f2 = abc f3 = abcd

Backward Products b0 = e b1 = de b2 = cde b3 = bcde

Then the leave-one-out products are computed as bcde = b3

acde = f0 b2

abde = f1 b1

abcd = b3 .

With n operands, there are 2(n − 2) products to form the forward/backward table, and n − 2 additional products, for a total of 3n − 6. No divisions are required. This method can also be used for a leave-one-out sum, or, significantly, for a leave-one-out min operation (used for the min-sum algorithm below). 15.4.2.4

Sparse Memory Organization

Operations in the decoding algorithm only rely on data corresponding to nonzero elements of the parity check matrix H, which is sparse. In general, the parity check matrix may be very sparse, such as 3 nonzero elements in each column of a 1000 × 6000 parity check matrix. It makes no sense in terms of memory efficiency to allocate space for all the elements of H, since most of this would be unused space. In this section, a method is described that exploits the sparseness of the matrix. Let 𝑤c denote the maximum column weight of H. The number of elements to be stored is no more than 𝑤c N. In the representation here, think of all the elements “floating” to the top of a matrix. For example, for a check-to-variable message matrix, Lm→n −0.16

0.17 0.19 0.18 −0.22 ⎤ ⎡ 0.21 ⎢ ⎥ −0.024 0.15 0.02 0.026 −0.03 ⎢−0.027 ⎥ ⎢ 0.0013 0.021 −0.0083 0.0013 0.0017 −0.0016⎥ ⎢ ⎥ ⎢ 0.0018 0.03 −0.012 −0.0016 −0.0021 −0.0023⎥ ⎢ ⎥ ⎣ −0.01 −0.0083 ⎦ −0.14 −0.0088 0.0097 −0.011 the floated matrix is L̃ m→n

⎡ 0.21 0.17 0.19 0.021 0.15 −0.16 0.18 0.026 −0.03 −0.22 ⎤ ⎢ ⎥ 0.02 0.0013 −0.0021 0.0017 −0.0016⎥ ⎢−0.027 0.0018 −0.024 0.03 −0.0083 ⎢ ⎥. ⎢ −0.01 −0.0083 0.0013 −0.14 −0.012 −0.0016 −0.0088 0.0097 −0.011 −0.0023⎥ ⎢ ⎥ ⎣ ⎦ In this representation:

• •

Column operations (variable node to check node step) are easy: just work down the column. Row operations (check node to variable node step) requires more care. An array na is introduced with an integer value for each column n. This denotes the “number of elements above” the element being indexed that have been used in previous computations. As the elements in a column are used, the row index skips down to row na[n]. The counter is incremented as it is used:

15.4 Decoding LDPC Codes

659

// Check node to variable node computation for(m = 0; m < M; m++) { // for each check // work over nonzero elements of this row for(l = 0; l < Nmlen[m]; l++) { n = Nm[m][l]; tanhstuff[l] = tanh(Lntom[na[n]][n]/2); } // compute leave-one-out products on tanh stuff forbackprobs(tanhstuff,Nmlen[m],poutprobsLL); for(l = 0; l < Nmlen[m]; l++) { // for each n on this row n = Nm[m][l]; Lmton[n][na[n]++] = 2*atanh(poutprobsLL[l]); // ∧ // Note the increment } }

15.4.3

Variations on Decoding Algorithms: The Min-Sum Decoder

The complexity of the check node to variable node computation, with its ⊞ operations, has motivated several variations in an attempt to reduce computational complexity. (By contrast, the variable node to check node computation, being only additions, remain unchanged in these lower-complexity variations.) 15.4.3.1

The ⊞ Operation and the 𝜙(x) Function

In this section, we introduce the function 𝜙(x), which provides yet another representation of the ⊞ operation. This will be instrumental in several of the variations on the decoding algorithm. We consider the ⊞ operator for two messages, L1 and L2 : L1 ⊞ L2 = 2tanh−1 (tanh(L1 ∕2) tanh(L2 ∕2)). Noting that tanh and tanh−1 are odd functions, for any real x, tanh(x) = sign(x) tanh(|x|)

and

tanh−1 (x) = sign(x)tanh−1 (|x|).

This observation motivates representing L1 and L2 in terms of sign and magnitude, L1 = 𝛼1 𝛽1

L2 = 𝛼2 𝛽2 ,

where 𝛼i = sign(Li )

𝛽i = |Li |,

i = 1, 2.

Using the odd symmetry of the functions, we can now write L1 ⊞ L2 = 𝛼1 𝛼2 2tanh−1 (tanh(𝛽1 ∕2) tanh(𝛽2 ∕2)). Now interpose the composed functions log−1 (log(⋅)) (where log−1 (x) is just an alias of ex ). Then L1 ⊞ L2 = (𝛼1 𝛼2 )2tanh−1 log−1 (log(tanh(𝛽1 ∕2) tanh(𝛽2 ∕2))) = (𝛼1 𝛼2 )2tanh−1 log−1 (log(tanh(𝛽1 ∕2)) + log(tanh(𝛽2 ∕2))). Define the function 𝜙(x) = − log(tanh(x∕2)) = log

ex + 1 . ex − 1

(15.16)

660

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis 6 5

𝜙(x)

4 3 2 1 0

0

1

2

3 x

4

5

6

Figure 15.10: The function 𝜙(x). This function is plotted in figure 15.10. Since the function is symmetric across the line, y = x, 𝜙(x) is its own inverse: 𝜙(𝜙(x)) = x so that 𝜙(x) = 𝜙−1 (x) = 2tanh−1 (log−1 (−x)). Substituting these expressions for 𝜙(x) in (15.16), we obtain α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: phidecode() qphidecode()

L1 ⊞ L2 = (𝛼1 𝛼2 )𝜙(𝜙(𝛽1 ) + 𝜙(𝛽2 )).

(15.17)

If the 𝜙 function is able to be rapidly evaluated (such as by table lookup), this reduces the ⊞ operation to table lookups, addition, another table lookup, and keeping track of signs (which can be done using bit arithmetic). This thus has reduced complexity compared to computing the ⊞ operator using log, tanh, and tanh−1 function, at the expense of storing the table for 𝜙(x). Additional simplifications are possible. Since the 𝜙(x) is rapidly decreasing with x, in the sum 𝜙(𝛽1 ) + 𝜙(𝛽2 ), the term with the smallest value of 𝛽i will be the largest and may largely dominate the sum, so that 𝜙(𝛽1 ) + 𝜙(𝛽2 ) ≈ 𝜙(min(𝛽1 , 𝛽2 )). Using this approximation in (15.17), L1 ⊞ L2 ≈ (𝛼1 𝛼2 )𝜙(𝜙(min(𝛽1 , 𝛽2 )) = 𝛼1 𝛼2 min(𝛽1 , 𝛽2 ). That is, the ⊞ operation can be approximated by the message Li having the least reliability, min(|L1 |, |L2 |). Recalling that the ⊞ operation provides information about the parity of bits represented by L1 and L2 , this makes sense: the less certainty there is about the value of a bit in a parity computation, the less certainty there can be in the information about the parity. This insight is now applied to the message Lm→n . Each of the messages Ln′ →m , n′ ∈  (m), is represented in sign/magnitude form using 𝛼n′ m = sign(Ln′ →m )

and

𝛽n′ m = |Ln′ →m |.

15.4 Decoding LDPC Codes

661

Then (using sums and products over n′ as an abbreviation for n′ ∈  (m) − {n}), and using the notations above, ( ) ) ( ∏ ∑ Lm→n = an′ m 2tanh−1 log−1 log tanh(𝛽n′ m ∕2) ( n′ )n′ ) ( (15.18) ∏ ∑ = an′ m 𝜙 𝜙(𝛽n′ m ) . n′

n′

Now approximating the sum as above, ∑

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

𝜙(𝛽n′ m ) ≈ min 𝜙(𝛽n′ m ), ′ n

n′

α10 α11 α12 α13 α14 α15 α16

we obtain the min-sum decoder rule: ) ) ( ( ∏ ∏ Lm→n,min sum = 𝛼n′ m 𝜙(𝜙(min 𝛽n′ m )) = 𝛼n′ m min 𝛽n′ m . ′ ′ n

n′

n′

n

ldpcdecoder.cc: minsumdecode()

(15.19)

When this formulation is used, the decoder is called a min-sum decoder: it computes a min at the check nodes and sum at the variable nodes. 15.4.3.2

Attenuated and Offset Min-Sum Decoders

Figure 15.11(a) shows a scatter plot of message Lm→n computed using (15.6) compared with using (15.19). As the figure shows, the min-sum decoder message tends to overestimate its values compared with the LL decoder. The approximation can be improved by scaling back the values using ) ) ( ( ∏ ∏ 𝛼n′ m 𝜙(𝜙(min 𝛽n′ m )) = 𝛼n′ m cscale min 𝛽n′ m , Lm→n,min sum scaled = ′ ′ n

n′

n

n′

4

3

3

2

2 Min-sum decoder

Min-sum decoder

where cscale is a number less than 1. It may be found, for example, by generating Q messages (e.g., Q ≈ 10,000), denoted here as Lm→n (i) and Lm→n,min sum scaled (i) for message numbers i = 1, 2, . . . , Q by 4

1 0 –1 –2 –3 –4 –4

1 0 –1 –2 –3

–2

0 LL decoder

(a)

2

4

–4 –4

00000

–2

0 LL decoder

2

4

(b)

Figure 15.11: Comparison between LL-based computing and min-sum computing for check to variable node computation. (a) Original comparison; (b) scaled comparison.

662

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis decoding typically using the all-0 codeword and choosing cscale to minimize q ∑

(Lm→n (i) − Lm→n,min sum scaled (i))2 ,

i=1

which leads to

Q ∑

cscale = α α2 α3 α4 α5 α6 α7 α8 α9 +

11

Lm→n (i)Lm→n,min sum scaled (i) Q ∑ i=1

01010

α10

i=1

. Lm→n,min sum scaled (i)2

00000

12

13

14

15

α α α α α α

16

ldpcdecoder.cc: minsum correctdecode()

Figure 15.11(b) shows the scatterplot comparison after scaling, where cscale = 0.6248 when the signal-to-noise ratio is 1 dB. In hardware implementations, cscale may be set to 0.5 to make the multiplication simpler. Another way to modify the min-sum decoder is to subtract an offset coffset from each message, while imposing the constraint that the message must be nonnegative. This results in the message Lm→n,min sum, offset ) ) ( ( ∏ ∏ = 𝛼n′ m 𝜙(𝜙(min 𝛽n′ m )) = 𝛼n′ m max[min 𝛽n′ m − coffset , 0]. ′ ′ n

n′

15.4.4

n′

n

Variations on Decoding Algorithms: Min-Sum with Correction

The min-sum decoder loses performance compared to the sum-product algorithm. In this section, we consider better approximations to the ⊞ operator which still have low computational difficulty. Define the function max∗ (x, y) = log(ex + ey ). Consider this function when x > y: max∗ (x, y) = log(ex (1 + ey−x )) = x + log(1 + ey−x ). As x increases (compared to y), this becomes closer and closer to x. Similarly, if y > x, max∗ (x, y) = y + log(1 + ex−y ). In either case, we have that max∗ (x, y) ≈ max(x, y). The exact value can be seen to be max∗ (x, y) = max(x, y) + log(1 + e−|x−y| ).

(15.20)

We now consider expressing the ⊞ operation in terms of the max∗ function and related functions. From (15.4), L1 ⊞ L2 = log(1 + eL1 +L2 ) − log(eL1 + eL2 ) = max∗ (0, L1 + L2 ) − max∗ (L1 , L2 ). Using (15.20), it follows that L1 ⊞ L2 = max(0, L1 + L2 ) − max(L1 , L2 ) + s(L1 , L2 ), where s(x, y) is a correction term, s(x, y) = log(1 + e−|x+y| ) − log(1 + e−|x−y| ).

(15.21)

15.4 Decoding LDPC Codes

663

Observe that s(x, y) is odd in both its arguments, so that s(x, y) = sign(x)sign(y)s(|x|, |y|). By checking the various cases (such as L2 > 0, L1 > L2 ; L1 > 0, L2 > 0; L1 < 0, L2 < L1 ; etc.), it is straightforward to show that max(0, L1 + L2 ) − max(L1 , L2 ) = sign(L1 )sign(L2 )min(|L1 |, |L2 |). Using these facts in (15.21), we obtain L1 ⊞ L2 = sign(L1 )sign(L2 )min(|L1 |, |L2 |) + s(L1 , L2 ) = sign(L1 )sign(L2 )[min(|L1 |, |L2 |) + s(|L1 |, |L2 |)] . This is an exact expression, but would require computing s(x, y), which is no easier than computing the ⊞ function so an approximation is desirable. Figure 15.12(a) shows a plot of s(x, y). There is a cross-shaped ridge region where s(x, y) takes its highest values, from which it slopes down. Figure 15.12(b) shows an approximation s̃(x, y), with the same qualitative features, where ⎧c ⎪ s̃(x, y) = ⎨−c ⎪0 ⎩

if |x + y| < 2 and |x − y| > 2|x + y| if |x − y| < 2 and |x + y| > 2|x − y| otherwise.

In this representation, c = 0.5. In this computation, the numbers involved are powers of 2, so that multiplications can be performed using simple bit shift operations. The output is quantized to only two nonzero values, with the range determined by a simple computation. Figure 15.12(c) shows the function L1 ⊞ L2 computed using the exact s(x, y), and 15.12(d) shows the approximation computed using the approximation s̃(x, y). These plots show that the same qualitative shape is obtained in the approximation, but with some effects of quantization. 15.4.4.1

Approximate min∗ Decoder

Another complexity-reducing computation at the check nodes is called the approximate min∗ decoder, or a-min∗ decoder. It is based on the following idea. Suppose that a check node is connected to 𝑤c variable nodes. To compute a single message Lm→n (15.6) requires (𝑤c − 1) ⊞ operations. Sending a separate message to each of the 𝑤c neighbors requires 𝑤c (𝑤c − 1) ⊞ operations. (The forward/backward computation does not reduce this.) The a-min∗ decoder provides a method in which only 𝑤c ⊞ operations are needed at each check node. Recall from Section 15.4.3.1 that the smallest reliability has the dominant effect on the ⊞ operation. That is, larger reliabilities have a less significant effect on the ⊞ operation than smaller reliabilities. Let min |Ln′ →m | nmin = arg n′ ∈ (m)−{n}

denote the variable node with smallest reliability. The message sent to variable node nmin is the usual one: Ln′ →m Lm→nmin =



n′ ∈ (m)−{nmin }

(𝑤c − 2 ⊞ operations). The message sent to the other variable nodes is the ⊞ of all incoming messages (adjusted for their sign). ) ( | | ∏ | | Lm→n (approximate) = 𝛼n′ m | Ln′ →m | | | ′ | n ∈ (m) | n′ ∈ (m)−{n}



α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: aminstardecode()

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis

0

0

–0.2

–0.1

s(x, y)

s(x, y)

664

–0.4 –0.6

–0.2 –0.3 –0.4

–0.8 2

1

y 0

–1

0

–1

–2 –2

1

–0.5 2

2

1

y 0

x

–1

(a) 1.5

Approximate L1[+]L2

1

L1[+]L2

–1

2

(b)

1.5 0.5 0 –0.5 –1 –1.5 2

–2 –2

1

0 x

1

y 0

–1

–2

–1

–2

0 x

1

2

1 0.5 0 –0.5 –1 –1.5 2

1

y 0

–1

–2 –2

–1

0 x

1

2

(d)

(c)

Figure 15.12: s(x, y), the approximation s̃(x, y), L1 ⊞ L2 , and the approximation using s̃(x, y). (a) s(x, y); (b) s̃(x, y); (c) exact L1 ⊞ L2 ; (d) approximate L1 ⊞ L2 . The ⊞ in this case is computed over all variable nodes in  (m) – the leave-one-out computation is not done. This is not the correct message, since the message Lm→n should not include the message Ln→m . But since Ln→m is not the smallest (and therefore most significant) message that might be included, the effect of the error is generally small. Since Lm→n (approximate) can be computed as |Lm→n (approximate)| = |Lm→nmin ⊞ Lnmin →m |, the message sent to all other (non-minimum) nodes can be computed using just a single addition ⊞ operation. The computations can also be organized in an efficient way which does not require finding the nmin first. To explain this, note first that ∞ is an identity for the ⊞ operation: ∞ ⊞ x = x. As an example of this, ∞ ⊞∞ = ∞. (In code, some indicator of the ∞ concept has to be represented.) This is used in association with the initialization of the routine. In the implementation of the a-min∗ algorithm, ⊞ operations are computed as the algorithm proceeds. When a message with smaller reliability than any previously seen is encountered (which might not yet be the smallest), it is saved. If it turns out not to be the smallest, then it is ⊞ -ed in at the next iteration. If a message does not have smaller reliability than any encountered previously, it is immediately ⊞ -ed in.

15.4 Decoding LDPC Codes

665

Algorithm 15.3 a-Min∗ Algorithm for Check Node Computation

1

3 4 5 6 7 8 9

10 11

12 13 14 15

16 17

Function a-min∗ ( m, {Ln→m }) % a-min∗ operation at a check node; Inputs: m = this check node index; Ln→m =messages from adjacent variable nodes. 2 Outputs: {Lm→n }: messages from this check node to adjacent variable nodes. Initialize: |Lnmin →m | = ∞,|Lm→nmin | = ∞, s = 1 for n ∈  (m) do % for each neighboring variable node s = s ⋅ sign(Ln→m ) % accumulate all signs if |Ln→m | < |Lnmin →m | then % New smallest incoming message found |Lm→nmin | = |Lm→nmin | ⊞ |Lnmin →m | % ⊞ thesaved message (not the min) nmin = n % Save index of least reliable message found so far |Lnmin →m | = |Ln→m | % save this least reliable so far end else |Lm→nmin | = |Lm→nmin | ⊞ |Ln→m | % Not min: ⊞ this message immediately end end |Lm→∗ | = |Lm→nmin | ⊞ |Lnmin →m | % ⊞ all the messages for message to all other nodes for n ∈  (m) do % for each neighboring variable node if n = nmin then Lm→n = s ⋅ sign(Ln→m ) ⋅ |Ln→nmin | % message to leastreliable node end else Lm→n = s ⋅ sign(Ln→m ) ⋅ |Lm→∗ | % message to all other nodes end end

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: rcbpdecode()

15.4.4.2

The Reduced Complexity Box-Plus Decoder

Another quantization of the check node computation is provided by the reduced complexity box-plus (RCBP) decoder. The basic check node computation is operating on incoming messages a and b is chk(a, b) = a⊞ b = 2 tanh−1 (tanh(a∕2) tanh(b∕2)). If 6-bit quantization is used, integer values of in the range −32 to 31 are obtained. If a quantization interval of Δ = 0.5 is used (which is convenient as it can be represented with a bit shift) message values a or b are in the range −16 to 15.75. We have seen that the sign information can be factored out, a⊞b = sign(a)sign(b)2 tanh−1 (tanh(|a|∕2) tanh(|b|∕2)). Leaving out the sign bit, the remaining 5 bits of the integer range is 0 to 31, corresponding to values from 0 to 15.5. Let r and c be integer indices used to look up values in a lookup table T representing the ⊞ operator for positive operands. Let Δ be a quantization step size, and define the table as T(r, c) = |quantize(chk(rΔ + Δ∕2, cΔ + Δ∕2))|.

666

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis The “quantize” operation may be performed (for example) by dividing by Δ and rounding down. A step size of Δ = 0.5 has been found to give good performance. The 32 × 32 table can be further reduced using some properties of the ⊞ function. In the first place, it is clear that a⊞ b) = b⊞ a, so the lookup table satisfies the same symmetry T(r, c) = T(c, r). Also the table is monotonic nondecreasing, so that if r′ > r then T(r′ , c) ≥ T(r, c). Also, the ⊞ function is continuous, so that its quantized version in T increases by unit steps where it increases. Example 15.12 As an example of the computation, consider a 6 × 6 table with Δ = 1.33, whose values (after dividing by Δ and truncating) are in the table below ⎡0 ⎢0 ⎢ ⎢0 T=⎢ ⎢0 ⎢0 ⎢ ⎣0

0 0 1 1 1 1

0 1 1 2 2 2

0 1 2 2 3 3

0 1 2 3 3 4

0⎤ 1⎥ ⎥ 2⎥ ⎥ 3⎥ 4⎥ ⎥ 4⎦

Observe that in the first column, there is only one value; in the second column, there is one transition; in the third column, there are two transitions; in the fourth column, there are three transitions, etc. As the column number increases, the number of changes is nondecreasing. ◽

For efficiency of storage, the information in T may be represented using a run-length coding down each column, starting from the diagonal element (since the table is symmetric). The run-length coding for column c is stored in a list of information called 𝐯c . The first element in the list is the value T(c, c). The next element is the number of times that element is repeated (including the diagonal). If there are more elements down the column, then the number of times it is repeated is appended to the list. (There is no need to indicate what the number is, because it will always be exactly one more than the previous number.) Example 15.13

For the T matrix above, the column information is as follows: 𝐯0 = (0, 6) 𝐯1 = (0, 1, 4) 𝐯2 = (1, 1, 3) 𝐯3 = (2, 1, 2) 𝐯4 = (3, 1, 1) 𝐯5 = (4, 1).



Since the number of transitions in a column (generally) increases with column number, but since later columns start further down the column (since they start on the diagonal), this results in an efficient representation. For a full quantizer with a table of size 32 × 32, the column information is 𝐯0 = (0, 32) 𝐯1 = (0, 2, 29) 𝐯c = (c − 1, 3, 29 − c),

2 ≤ 2 ≤ 29

𝐯c = (c − 1, 2 − 𝛿(c − 31)),

30 ≤ c ≤ 31.

15.4 Decoding LDPC Codes

667

Here, 𝛿(⋅) is the Kronecker 𝛿 function. The structure makes it possible to represent the table with simple if statements. Assuming that r ≥ c (that is, representing the lower half of the table) with d = r − c, the logic can be expressed as follows: c=0∶ c=1∶

c = 2, . . . , 29 ∶

T(r, c) = 0 { 0 if d = 0 or 1 T(r, c) = 1 if d > 1 { c−1 if d = 0 or 1 or 2 T(r, c) = c d>2

c = 30 ∶

T(r, c) = 29

c = 31 ∶

T(r, c) = 30.

This logic can be summarized as follows in a way amenable to implementation in digital logic.

Algorithm 15.4 Quantize the chk Function Function T( r, c)% Quantize the ⊞ function Enforce: r > c Output: T(r, c) d = r − c; if c > 1 then T = (c − 1) + (d > 2); end else % c = 0 or c = 1 T = (d > 1) ⋅ (c == 1); end

The quantized ⊞ operation can be computed as follows: a⊞b ≈ Δ[T(floor(a∕Δ), floor(b∕Δ))] + Δ∕2. The performance of the quantized ⊞ operation is compared with ⊞ in figure 15.13 for different values of c, along with the percent error. The overall percent error across all different values of c is 2.64%. An efficient decoding can be obtained using the logic of the a-min∗ decoder but replacing the ⊞ computation with the quantized ⊞ operation.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

showchktable.m

15.4.4.3

The Richardson–Novichkov Decoder

The Richardson–Novichkov decoder provides another approximation to the 𝜙 function in the check node decoder computation in (15.18) based on integer representations. The magnitude of the check node message is ( ) ∑ |Lm→n | = 𝜙 𝜙(𝛽n′ m ) , n′ ∈ (m)−{n}

where 𝛽 is a real number (such as a message |Ln′ →m |) and where ( 𝛽 ) e +1 𝜙(𝛽) = log 𝛽 . e −1

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis

2 1 0

Exact Approx 0

10 r

6

10 r

c = 15 %err = 0.0082 8

0

10 r

10 5 0

20

0

10 r

6 4 2 0

20

c = 25 %err = 0.0044 15 chk(r, c), Approx.

chk(r, c), Approx.

5

0

2 0

c = 20 %err = 0.0058 15

0

4

20

10

c = 10 %err = 0.014 chk(r, c), Approx.

c = 5 %err = 0.031 chk(r, c), Approx.

chk(r, c), Approx.

3

0

10 r

20

c = 30 %err = 0.0022 15 chk(r, c), Approx.

668

20

10 5 0

0

10 r

20

Figure 15.13: Comparison of ⊞ and its RCBP quantization.

For large 𝛽, log(1 ± e−𝛽 ) ≈ ±e−𝛽 , so 𝜙(𝛽) ≈ 2e−𝛽 . This approximation is shown (along with the exact 𝜙(𝛽) function) in figure 15.14. As is clear from the figure, the approximation is fairly accurate for 𝛽 > 2 ≈ 2. Since 𝜙(𝛽) is its own inverse, another approximation to 𝜙(𝛽), denoted as 𝜙̂ −1 is 𝜙̂ −1 (𝛽) = − ln(𝛽∕2), which is accurate for 𝛽 < ln 2. Now another approximation is introduced which uses integer values. Let 𝛿 be a scale factor (as discussed below) and let b be an integer such that 𝛽 = 𝛿b (approximately). The integer b will represent the value of 𝛽. Define the function ( ) 1 + e−𝛿b 𝜙𝛿 (b) = log . 1 − e−𝛿b Using the approximation from above, write 𝜙̂ 𝛿 (b) = 2e−𝛿b

and

1 𝜙̂ −1 𝛿 (b) = − 𝛿 ln(b∕2).

(15.22)

Using the particular value 𝛿 = ln(2), the latter can be expressed as 𝜙̂ −1 𝛿 (b) = −log2 (b∕2).

(15.23)

The check node message is now expressed in terms of 𝜙𝛿 and its approximations. Let bn′ m denote the integerized representation of 𝛽n′ m . Then using (15.22) and (15.23), the check node message can be

15.4 Decoding LDPC Codes

669

5

𝜙(β) (exact) ˆ 𝜙(β)) = 2 exp (–β) ˆ –1 = – In(β/2) 𝜙(β)

4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

0

1

2

3

4

5

β

Figure 15.14: Approximations to the 𝜙(𝛽) function for the Richardson–Novichkov decoder. (

written |Lm→n | ≈



𝜙−1 𝛿

) 𝜙𝛿 (bn′ m )

n∈ (m)−{n}

(

≈ −log2 ( = −log2 ( = −log2



) exp(−𝛿bn′ m )

n∈ (m)−{n}



) exp(− ln(2)bn′ m )

n∈ (m)−{n}



) −bn′ m

2

.

n∈ (m)−{n}

The inner part of the computation 2−bn′ m can be achieved using bit shift operations. To compensate for the approximations, the algorithm can be tuned by introducing constants C1 and C2 , and expressly noting that integer values are used. Let [⋅] denote the integer part of its argument. Then the integer message is expressed as ( )] [ ∑ −bn′ m . |Lm→n | = C1 − log2 2 + C2 n∈ (m)−{n}

Further detail is provided in [371]. 15.4.5

Hard-Decision Decoding

The decoders above are soft-decision decoders. But there are instances where a hard-decision decoder is of interest, such as in very high-speed communications that can afford to lose some of the error-correction capability of the code, or when the channel is well-modeled as a BSC. We present in this section several approaches to hard-decision decoding.

670

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis 15.4.5.1

Bit Flipping

We illustrate the notion of bit flipping with an example. Let ⎡1 1 1 0 0 1 1 0 0 1⎤ ⎢1 0 1 0 1 1 0 1 1 0⎥ H = ⎢0 0 1 1 1 0 1 0 1 1⎥ ⎢ ⎥ ⎢0 1 0 1 1 1 0 1 0 1⎥ ⎣1 1 0 1 0 0 1 1 1 0⎦ [ ] be a parity check matrix and let 𝐫 = 0 1 0 1 0 1 0 1 0 1 be a received vector (over a BSC). The syndrome is [ ] (computations in GF(2)). 𝐬 = 𝐫H T = 1 0 0 1 1 There are three parity conditions (rows of H) which are not satisfied. Now consider for each bit the number of unsatisfied parity checks it is involved in: bit 1: 2 checks (1st and 5th)

bit 2: 3 checks (1st, 4th, and 5th)

bit 3: 1 check (1st) bit 4: 2 checks (4th and 5th)

etc.

Summarizing, we get the following number of unsatisfied checks for the received bits: [ ] Number of unsatisfied checks for each bit: 2 3 1 2 1 2 2 2 1 2 . Note that this list can be obtained by interpreting 𝐬 as a vector in ℤ (integers) and computing 𝐬H T , now doing the arithmetic over the integers. From this list, the second bit is involved in the most unsatisfied checks. Flip the second bit to produce the modified received vector: [ ] 𝐫̂ = 0 0 0 1 0 1 0 1 0 1 . Compute the syndrome again: α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.c: bitflip1decode()

[ ] 𝐬 = 𝐫̂ H T mod 2 = 0 0 0 0 0 .

In this case, a codeword is obtained. In general, in bit flipping, decoding syndromes are computed, and the number of unsatisfied checks that each bit is involved in is computed. For bits involved in too many checks, flip the bit. Then repeat. The computations are detailed as follows using both matrix notation and the notation of the data structures associated with the parity check matrix. For an input binary vector 𝐫: Algorithm 15.5 Bit Flipping Decoder 1. Set 𝐫̂ = 𝐫. 2. Compute the syndrome 𝐬 = 𝐫̂ H T , computations in GF(2). That is, for m = 1, 2, … , M compute ⨁ rn (modulo 2 addition) sm = n∈ (m)

If 𝐬 = 0, stop. 3. Compute 𝐟 = 𝐬H, computations in ℤ. That is, for n = 1, 2, … , N compute ∑ fn = sm (integer addition) m∈(m)

This computes the number of unsatisfied checks that each bit is involved in. 4. For those elements of 𝐟 greater than some threshold, flip the corresponding bits in 𝐫̂ . (As a variation on this step, flip the bits involved in the largest number of unsatisfied checks.) 5. Repeat from step 2 (up to some number of times).

15.4 Decoding LDPC Codes 15.4.5.2

671

Gallager’s Algorithms A and B

α α2 α3 α4 α5 α6 α7 α8 α9 +

Another hard decoding algorithm is Gallager’s Algorithm A. Let the channel be a BSC(p0 ). Let yn ∈ {0, 1} denote the channel input. Decoding is as follows:

01010

00000

α10 α11 α12 α13 α14 α15 α16

Algorithm 15.6 Gallager Algorithm A Bit Flipping Decoder Initialization Elements in the initial matrix Bn→m corresponding to nonzero elements in the parity check matrix are set to yn . That is, for each n = 0, 1, … , N − 1, Bn→m = yn for m ∈ (n) Check Node to Variable Node: The message Bm→n is the parity of the adjacent variable (bit) nodes, except the node the parity is being sent to, in the usual leave-one-out fashion: ⨁ Bn′ →m (modulo 2 addition) Bm→n = n′ ∈ (m)−{n}

Variable Node Computation: Set Bn→m = yn (the channel input bits) unless all of the incoming messages Bm′ →n disagree with yn . That is, { yn if ∃m′ ∈ (n) − m such that Bm′ →n = yn Bn→m = 1 − yn otherwise (that is, all the Bm′ →n disagree with yn ) Decoding Decision The decision about the bit cn is based on Bn→m . If |(n)| is even, then ĉ n = 1 if the majority of the bits in the set {Bn→m ∶ m ∈ (n)} ∪ {yn } are equal to 1. If |(n)| is odd, then ĉ n = 1 if the majority of the bits in the set {Bn→m ∶ m ∈ (n)} are equal to 1.

Gallager’s Algorithm B is similar to Algorithm A. The check node computation is identical. At the variable node, if among the messages Bm→n there are at least b values that disagree with yn , then the message Bn→m is set to 1 − yn . Otherwise, Bn→m = yn . (Gallager Algorithm A is the same as Gallager Algorithm B with b = 1.) Gallager showed that for a regular (𝑤c , 𝑤𝑣 ) code, the optimum value of b is the smallest integer b such that [ ]2b−d𝑣 +1 1 − p0 1 + (1 − 2p)𝑤c −1 ≤ , p0 1 − (1 − 2p)𝑤c −1 where p0 is the crossover probability for the BSC and p is the message error rate. It has been shown [15, p. 21, Chapter 3] that Gallager Algorithm B is the best possible binary message-passing algorithm for regular LDPC codes. Example 15.14

For the parity check matrix from (15.1), let the received data be [ ] 𝐲= 0 1 0 1 0 1 0 1 0 1 .

The initial Bn→m is

Bn→m

⎡0 ⎢ ⎢0 ⎢ = ⎢0 ⎢ ⎢0 ⎢0 ⎣

1 0 0 0 1 0 0 0 1⎤ ⎥ 0 0 0 0 1 0 1 0 0⎥ ⎥ 0 0 1 0 0 0 0 0 1⎥ . ⎥ 1 0 1 0 1 0 1 0 1⎥ 1 0 1 0 0 0 1 0 0⎥⎦

ldpcdecoder.c: galAbitflipdecode()

672

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis (Underlined elements correspond to nonzero elements in the parity check matrix.) The message Bm→n is

Bm→n

⎡1 ⎢ ⎢0 ⎢ = ⎢0 ⎢ ⎢0 ⎢1 ⎣

0 1 0 0 0 1 0 0 0⎤ ⎥ 0 0 0 0 1 0 1 0 0⎥ ⎥ 0 0 1 0 0 0 0 0 1⎥ . ⎥ 0 0 0 1 0 0 0 0 0⎥ 0 0 0 0 0 1 0 1 0⎥⎦

Bn→m

⎡0 ⎢ ⎢1 ⎢ = ⎢0 ⎢ ⎢0 ⎢0 ⎣

0 0 0 0 1 0 0 0 1⎤ ⎥ 0 0 0 0 0 0 0 0 0⎥ ⎥ 0 0 0 0 0 1 0 0 0⎥ . ⎥ 0 0 1 0 1 0 1 0 1⎥ 0 0 1 0 0 0 1 0 0⎥⎦

The message Bn→m is

The column sums (used to determine if a majority of messages are 1) are [

] 1 0 0 2 0 2 1 2 0 2 .

Based on a majority (out of 3 values), the decoded values are [ ] 𝐜̂ = 0 0 0 1 0 1 0 1 0 1 . This is a codeword which satisfies all parities, so decoding is done.

15.4.5.3



Weighted Bit Flipping

Let 𝐜 ∈ {0, 1}N denote a codeword, and let 𝐜̃ ∈ {−1, 1}N denote the bits of the codeword represented as ±1 values, c̃ i = 1 − 2ci . Let the codeword be BPSK modulated and passed through an AWGN channel to produce 𝐲 = 𝐜̃ + 𝐳, where zi ∼  (0, 𝜎 2 ). Some reliability of the received bit can be determined by |yn | — smaller values are less reliable. In this ±1 representation of bits, parity can be checked by multiplication. For example, ∏ c̃ n n∈(m)

computes the parity associated with check m. When these bits satisfy the parity check, then this product is equal to 1. Weighted bit flipping (WBF) deals with a function termed an inversion function, formed as follows. Let 𝐱 = (x1 , . . . , xn ) ∈ {−1, 1}N , and let ∑ ∏ Δ(WBF) (𝐱) = 𝛽m xn′ . n m∈(n)

n′ ∈ (m)

This is the weighted sum of all the parity products over all the parities that bit n is involved in. Here, 𝛽m is the reliability, determined as the smallest absolute received value of the bits associated with check m: 𝛽m = min |yn′ |. n′ ∈ (m)

The function Δ(WBF) (𝐱) can be viewed as a measure of reliability of 𝐱. Smaller values — due either to n small 𝛽 or to unsatisfied parities (resulting in the product being negative) — indicate less desirable bit configurations than large values.

15.4 Decoding LDPC Codes

673

A variation on this inversion function is used in modified weighted bit flipping algorithm (MWBF) which employs the received value as ∏ ∑ (𝐱) = 𝛼|yn | + 𝛽m xn′ . Δ(MWBF) n

α α2 α3 α4 α5 α6 α7 α8 α9 +

n′ ∈ (m)

m∈(n)

01010

Using either Δ(WBF) (𝐱) or Δ(MWBF) (𝐱), the bit flipping works by identifying the bit index n having smallest Δn (𝐱) (indicative of the least unreliable bit). The decoding algorithm can be summarized as follows.

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: weightedbitflip decode()

Algorithm 15.7 Weighted Bit Flipping Decoder

α α2 α3 α4 α5 α6 α7 α8 α9

Step 1 Initialization (for BPSK over AWGN): For n = 1, … , N, let xn = sign(yn ). (Quantize the received data to bit values). Let 𝐱 = (x1 , … , xN ). ∏ Step 2 If all parities are satisfied, exit. That is: If n∈ (m) xn = 1 for m = 1, … , M, then exit.

01010

+

Step 3 Find the least reliable bit position: 𝓁 = arg minn∈[1,N] Δn (𝐱). Flip the corresponding bit x𝓁 → −x𝓁 . Step 4 If the maximum number of iterations is not exceeded, goto step 2. Otherwise, return 𝐱 and indicate decoding failure.

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: modweighted bitflipdecode() α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

15.4.5.4

Gradient Descent Bit Flipping

Let the bits of a codeword be represented by 𝐱 = (x1 , x2 , . . . , xN ) ∈ {+1, −1}N . Assuming BPSK modulation, let the transmitted signal be sn = axn for some amplitude a. The signal received through an AWGN is

= exp −

N ∑

) y2n

] ∕2𝜎

2

[ ( exp −

n=1

N ∑

) xn yn

] ∕2𝜎

2

[

( 2

exp −a

n=1

N ∑

) xn2

] ∕2𝜎

2

.

n=1

Taking a logarithm and neglecting noninformative terms, the term N ∑

xn yn

n=1

may be regarded as the likelihood function. A codeword 𝐱 is sought that maximizes this correlation. To enforce the fact that 𝐱 needs to be a codeword, consider the product ∏ xn n∈ (m)

for m = 1, . . . , M. This is equal to 1 when the bits satisfy the mth parity constraint. For gradient descent bit flipping, we define an objective function combining both the likelihood consideration and the parity consideration, f (𝐱) =

N ∑ i=1

xi yi +

+

01010

00000

α10 α11 α12 α13 α14 α15 α16

The likelihood function is ) ] [ (N ∑ (yn − axn )2 ∕2𝜎 2 exp − n=1

α α2 α3 α4 α5 α6 α7 α8 α9

ldpcdecoder.cc: grad2bitflip decode()

yn = sn + 𝜈n

[ (

ldpcdecoder.cc: grad1bitflip decode()

M ∑ ∏ m=1 n′ ∈ (m)

xn′ .

(15.24)

674

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis The function f (𝐱) is maximized when the likelihood is maximized and when all the constraints are satisfied (the product is equal to 1). Thus, decoding can be posed as the problem 𝐱̂ = arg

max

𝐱∈{−1,1}N

f (𝐱).

(15.25)

This may equivalently be expressed as a minimization problem 𝐱̂ = arg

min

𝐱∈{−1,1}N

−f (𝐱).

(15.26)

Under the gradient-based bit flipping algorithms, the solution is approached using gradient-based information. It is natural to think of maximizing, as in (15.25), in which case the operation is gradient ascent. But the literature frequently refers to gradient descent, in which case the problem (15.26) is solved using gradient descent. With the intent of doing maximization or minimization, the partial derivative of (15.24) is computed as ∏ ∑ 𝜕 f (𝐱) = yn + xn′ . 𝜕xn m∈(n) n′ ∈ (m)∖n The product of xn and this partial derivative is xn

∏ ∑ 𝜕 f (𝐱) = xn yn + xn′ . 𝜕xn m∈(n) n′ ∈ (m)

Let s be a small real number and consider the first-order Taylor expansion f (x1 , . . . , xn + s, . . . , xN ) ≈ f (𝐱) + s It is desired to choose s to increase this function. When

𝜕 f (𝐱). 𝜕xn

𝜕 f (𝐱) > 0, s should be > 0. When 𝜕xn

𝜕 𝜕 f (𝐱) < 0, s should be < 0. In light of this, if xn f (𝐱) < 0, then flipping xn by setting xn → −xn is 𝜕xn 𝜕xn likely to increase the objective function. A reasonable way to determine which bit position n to flip is to choose the position at which the absolute value of the partial derivative is the largest. We formulate the function ∏ ∑ (𝐱) = xn yn + xn′ . Δ(GD) n m∈(n) n′ ∈ (m) α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: DC1decode()

15.4.6

Divide and Concur Decoding

In this section, an alternative approach to decoding is described which differs from the message-passing paradigm of previous sections. 15.4.6.1

Summary of the Divide and Concur Algorithm

The divide and concur (DC) algorithm is a flexible meta-algorithm that has been applied to a variety of problems such as the Boolean satisfiability problem. As shown below, it maps very well onto the problem of decoding. Because of its power and flexibility, it merits broader acquaintance among the general information-processing community. A problem described by N parameters is decomposed into a set of M + 1 constraint subproblems, each of which involves some subset of all the parameters (with overlap among the subsets). (The M + 1 looks ahead to the decoding problem, where there are M parity constraints, and an additional subproblem associated with an energy constraint.) Let N = {0, 1, . . . , N − 1} denote the set of indices associated with the parameters, M = {0, 1, . . . , M − 1} denote the set of indices associated with the

15.4 Decoding LDPC Codes

675

+ subproblems (the parity subproblems in the decoding case), and M = {0, 1, . . . , M} (including the energy constraint subproblem in the decoding case). Let m denote the set of indices of parameters associated with the mth subproblem. In the decoding problem, { for m ∈ M m m = N for m = M.

Let n be the set of indices of subproblems associated with the nth variable. For the decoding problem, n = n ∪ M. Information about the subproblems may be represented in a matrix 𝐫, which contains a replica of each of the parameters for each subproblem. The subset of parameters associated with the mth subproblem is denoted as 𝐫(m) = {rmn , n ∈ m }. These may be thought of as a subset of elements on the mth row of the matrix 𝐫. For each parameter n, let 𝐫[n] denote the set of replicas of variable rn across the subproblems, 𝐫[n] = {rmn , m ∈ n }. These may be thought of as a subset of the elements of the nth column of 𝐫. The operation of DC is illustrated in Figure 15.15. Each subproblem is represented by a constraint set m and a local solution is found for each subproblem by projecting its subset of parameters onto its constraint set m . This is called the divide step, since the parameters as separately projected (divided). In Figure 15.15(a), there are two replicas 𝐫(0) and 𝐫(1) . These replicas are projected onto their constraint sets. The “divide” projection operation associated with the mth constraint, denoted as ∶ ℝ|m | → ℝ|m | for m ∈ M , produces the vector in ℝ|m | that is closest to 𝐫(m) which satisfies the Pm D mth constraint: min ‖̃𝐫(m) − 𝐫(m) ‖2 . 𝐫̃ (m) = Pm D (𝐫(m) ) ≜ arg 𝐫̃ (m) ∶̃𝐫(m) satisfies constraint m

The set of all the divide projections is denoted as PD (𝐫) ∶ ℝ(M+1)×N → ℝ(M+1)×N , which operates as (𝐫 ) as a row vector, then stack into the appropriate columns follows: For each m ∈ M , compute Pm D (m) to form the matrix PD (𝐫). It has been found that using the projected values directly can result in the algorithm getting trapped at parameters that are not the best. Instead, an overshoot is used. As shown in Figure 15.15(a), there is (𝐫 ) − 𝐫(0) (shown as an arrow). The overshoot is computed as a vector Pm D (m) 𝐫(m) + 2[Pm D (𝐫(m) ) − 𝐫(m) ], (𝐫 ) − 𝐫(m) away from 𝐫(m) . The overshoot computation which moves two times the direction of Pm D (m) is shown in Figure 15.15(b). Collectively, each of these overshoot values are collected together in the rows of a matrix (placed into appropriate columns) denoted as 𝐫 + 2[PD (𝐫) − 𝐫]. r(1) r(1) PD1 (r(1)) r(0) PD0 (r(0))

r(1) + 2[P1D (r(1))–r(1)]

(a)

r[t(1)+1] r(0)

0 1

(b)

r[t(0)+1]

PC

PD0 (r(0))

r(0) + 2[P1D (r(0))–r(0)]

0 1

r(1)

PD1 (r(1))

r(0) 0

1

(c)

Figure 15.15: Steps in the DC algorithm. (a) Projection of 𝐫[0] and 𝐫[1] onto 0 and 1 ; (b) 𝐫(m) + (𝐫 ) − 𝐫(m) ) (Overshoot); (c) concur projection, and results of difference map. 2(Pm D (m)

676

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis After projection and overshoot on each subproblem, there may be different values for the parameters (different values on different rows of column n of the matrix), so the different solutions to the subproblems are combined together in the concur step. For each variable n, the concur projection PnC ∶ ℝ|n | → ℝ associated with the nth variable projects to a value in which all the replicas associated with parameter n (that is, in column n) concur. For each n this results in a scalar value bn = PnC ((𝐫 + 2[PD (𝐫) − 𝐫])[n] ). The set of all concur projections for all n ∈ N , denoted as PC (𝐫) ∶ ℝ(M+1)×N → ℝ(M+1)×N , operates as follows: For each n ∈ N , compute bn , then replicate these to form a column vector with values at indices m′ ∈ n . These columns are stacked to form the matrix PC (𝐫). The concur step is illustrated in figure 15.15(c) as the square block that is the average of the two overshoot points. Following the concur step, the updated parameter value is obtained by removing from the concurrence value in each row the original difference vector P(m) (𝐫(m) ) − 𝐫(m) , resulting in an update d rule known as the difference map [170, 171] 𝐫 [t+1] = PC (𝐫 [t] + 2[PD (𝐫 [t] ) − 𝐫 [t] ]) − [PD (𝐫 [t] ) − 𝐫 [t] ].

(15.27)

This is illustrated in Figure 15.15(c). The update rule in (15.27) is actually a simplification of a more general update rule proposed by [170, 171], 𝐫 [t+1] = 𝐫 [t] + 𝛽[PC (PD (𝐫 [t] ) − PD (fC (𝐫 [t] ))], where fs (𝐫 [t] ) = (1 + 𝛾s )Ps (𝐫 [t] ) − 𝛾s 𝐫 [t] for s = C or D, with 𝛾C = −1∕𝛽 and 𝛾D = 1∕𝛽. While 𝛽 can be tuned for performance, the update rule (15.27) takes 𝛽 = 1. [t+1] After a single iteration of the difference map update, the replicas 𝐫(m) may not satisfy the constraints of the subproblems, so the process is iterated, repeating the divide, concur, and difference map steps until convergence. It can be shown that if there is a fixed point 𝐫 ∗ such that 𝐫 [t+1] = 𝐫 [t] , then it corresponds to a solution to the problem, where the solution is found from PD (𝐫 ∗ ) or PC (𝐫 ∗ + 2[PD (𝐫 ∗ ) − 𝐫 ∗ ]. It has been shown that for many problems the difference map update may be less likely to get caught in non-optimum traps than other update rules, such as updating according to 𝐫 [t+1] = PC (PD (𝐫 [t] )). 15.4.6.2

DC Applied to LDPC Decoding

In applying DC to decoding, there are N parameters which are the decoded bit values, and M + 1 constraints. The first M constraints are the parity constraints associated with the M rows of the parity check matrix. The (M + 1)st constraint is that the codeword should correspond with the received information. In the development here, BPSK modulation is assumed here, but higher order modulation can be accommodated. A codeword 𝐜 = [c0 , c1 , . . . , cN−1 ]T , with ci ∈ {0, 1}, is also represented as 𝐱 = [x0 , x1 , . . . , xN−1 ]T with xn = 1 − 2cn . The transmitted vector is 𝐬 = (s0 , s1 , . . . , sN−1 ), where { a cn = 0 sn = = axn . −a cn = 1 Denote the received data as 𝐲. The log-likelihood ratios are ( ) p(yn |xn = 1) ; Ln = log p(yn |xn = −1)

15.4 Decoding LDPC Codes

677

Ln > 0 indicates that xn = 1 (that is, cn = 0) is more probable. Recall from (15.11) that for the AWGN channel the LLR can be expressed as Ln =

2a y , n ∈ N . 𝜎2 n

These log-likelihood ratios are collected into a vector ]T [ 𝓵 = L0 L1 · · · LN−1 . A negative “energy” term is defined which measures the similarity between estimated ±1 bit values {̂xn } and the log-likelihood ratios N ∑ ̂ (15.28) Ln x̂ n = −𝓵 T 𝐱. E=− n=1

The smaller (more negative) E is, the more similar 𝐱̂ is to 𝐲. In the DC algorithm, one of the constraints is to choose a decoding vector 𝐱̂ to make the energy as negative as possible. More precisely, a threshold is set, and values are selected to make the negative energy less than that threshold. As suggested by [497], rather than using the log-likelihood ratios directly, a sort of signed probability is used as the information for the algorithm. First, compute the bit probabilities according to pn =

eLn ∈ (0, 1) 1 + eLn

and then map these into p̃ n = 2pn − 1 ∈ (−1, 1). If p̃ n > 0, then it is more probable that xn = 1. Example 15.15

For the (10, 5) code with the data in Example 15.9, the signed probability vector is [ ] 𝐩̃ = 0.56 0.68 0.62 0.04 −0.10 −0.74 0.64 −0.58 0.50 0.52 .



Data associated with the algorithm is represented in the (M + 1) × N matrix 𝐫 (or its sparsely represented equivalent) whose nonzero elements are in positions corresponding to the nonzero elements of the parity check matrix for the code, with elements rmn , with an additional row to represent the replicas associated with the energy constraint. As above, let 𝐫(m) denote the (row) vector of data associated with the mth constraint. For m = 0, 1, . . . , M − 1, 𝐫(m) = {rmn , n ∈  (m)}, that is, the data from the mth row of the 𝐫 matrix associated with the variable nodes connected to the mth check. For m = M, the data from the mth row of 𝐫 is associated with the energy constraint. Let 𝐫[n] denote the vector of data associated with the nth variable, 𝐫[n] = {rmn , n ∈ n }. Example 15.16

For the example above, the initial 𝐫 is

𝐫 [0]

⎡0.56 ⎢ ⎢0.56 ⎢ =⎢ ⎢ ⎢ ⎢0.56 ⎢ ⎣0.56

0.68 0.62

−0.74 0.64

0.62

−0.1 −0.74

0.62 0.04 −0.1

0

0.68

0.04 −0.1 −0.74

0.68

0.04

−0.58 0.5 0.64

0.5 −0.58

0.64 −0.58 0.5

0.68 0.62 0.04 −0.1 −0.74 0.64 −0.58 0.5

−0.52⎤ ⎥ ⎥ −0.52⎥⎥ . −0.52⎥ ⎥ ⎥ ⎥ −0.52⎦

(15.29)

678

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis Then, for example, 𝐫(0) = [0.56, 0.68, −0.74, 0.64, −0.52] is the data associated with the first parity constraint, and 𝐫(M) = [0.56, 0.68, 0.62, 0.04, −0.1, −0.74, 0.64, −0.58, 0.5, −0.52) is associated with the energy constraint, and

𝐫[0]

⎡0.56⎤ ⎥ ⎢ ⎢0.56⎥ =⎢ ⎥ ⎢0.56⎥ ⎢0.56⎥ ⎦ ⎣ ◽

is associated with variable n = 0.

We present first the overall algorithm, making use of the divide projections PD and concur projection PC which are explained below. Algorithm 15.8 Divide and Concur LDPC Decoder 1. Initialization: Given likelihoods Ln , form 𝓁 and compute pn = eLn ∕(1 + eLn ) and p̃ n = 2pn − 1. Fill the [0] = p̃ n for m ∈ n . Let t = 0. replicant matrix 𝐫 [0] according to rmn 2. Compute the overshoot 𝐪[t] = 𝐫 [t] + 2[PD (𝐫 [t] ) − 𝐫 [t] ]

(15.30)

Here, PD (⋅) is the divide projection operator, discussed below. 3. Compute the belief information at each variable node using the concur projection, bn (t) = PnC (𝐪[t] )= [n]

1 ∑ [t] q |n | m∈ mn

n ∈ N .

n

and form a bit from this by

{ ĉ n =

1 0

bn (t) < 0 bn (t) > 0.

From these bits form 𝐜̂[t] . If parities all check, then output 𝐜̂ as the decoded codeword. 4. Otherwise form the matrix PC (𝐪[t] ) by duplicating bn (t) across the rows of the matrix and compute 𝐫 [t+1] = PC (𝐪[t] ) − [PD (𝐫 [t] ) − 𝐫 [t] ]

15.4.6.3

The Divide Projections

Parity Constraints Let cm (𝐫(m) ) = sat denote that the data in 𝐫(m) satisfy the parity constraint. This is determined as follows: Form hmn = sign(rmn ), where sign(r) = 1 if r > 0 and sign(r) = −1 if r < 0. The {hmn }, representing signed bits, satisfy parity if ∏ hmn = 1, n∈ (m)

15.4 Decoding LDPC Codes

679

that is, if there are an even number of −1s among them. Thus, ∏ cm (𝐫(m) ) = sat ↔ sign(rmn ) = 1. n∈ (m)

(𝐫 ) computes a vector 𝐫̃ (m) nearest to 𝐫(m) which For the vector 𝐫(m) , m ∈ M , the projection Pm D (m) satisfies the mth parity constraint, 𝐫̃ (m) = Pm D (𝐫(m) ) ≜ arg

min

𝐫̃ (a) ∶cm (̃𝐫(m) )=sat

‖̃𝐫(m) − 𝐫(m) ‖2 ,

m = 1, 2, . . . , M,

(15.31)

where ‖⋅‖2 is the squared Euclidean norm. The projection operation (15.31) is computed as follows.

• • •

Let hmn = sign(rmn ). Let 𝐡m denote the vector consisting of the elements of hmn . If 𝐡m contains an even number of −1s, then set Pm (𝐫 ) = 𝐡m and return. D (m) If 𝐡m does not contain an even number of −1s (a change is necessary), then let 𝜈 = arg minn∈m |rmn |. It is the index of the smallest element whose flipped value produces a vector satisfying parity. (If there are multiple indices with minimum value, select one at random.) (𝐫 ) = 𝐡m and return. Flip hm𝜈 by hm𝜈 → −hm𝜈 . Then set Pm D (m)

Energy Constraint In the DC algorithm, one of the constraints (goals) will be to make the negative ̂ The constraint is imposed that energy as small as possible, indicating that 𝐲 is similar to 𝐱. −

N ∑

Ln x̂ i sign(a) ≤ Emax ,

(15.32)

n=1

where Emax is a parameter of the decoding algorithm. It is reported [497] that a good choice is ∑ Emax = −(1 + 𝜖) Nn=1 |Ln |, where 0 < 𝜖 ≪ 1. The computed energy can never actually achieve this, but this aspirational bound works well, and in any event achievement of the bound is not used to determine convergence. The projection onto this constraint is done as follows.

• •

If the energy constraint (15.32) is already satisfied by 𝐫(M) , then PM (𝐫 ) = 𝐫(M) . D (M) Otherwise, find the vector 𝐡 which is the closest vector to 𝐫(M) and satisfies the energy constraint: 𝐡 = arg

min

𝐡∶−𝓵 T 𝐡≤Emax

‖𝐫(M) − 𝐡‖22 .

The solution to this inequality-constrained optimization problem is found as follows. Form the Lagrangian  = ‖𝐫(M) − 𝐡‖2 − 2𝜆(𝓵 T 𝐡 − Emax ). Computing the gradient and setting it to 0 yields 𝜕 = −2𝐫(M) + 2𝐡 − 2𝜆𝓵 = 0 𝜕𝐡 so 𝐡 = 𝐫(M) + 𝜆𝓵.

680

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis The Lagrange multiplier 𝜆 is selected so that −𝐡T 𝓵 = Emax , which leads to

𝓵(𝓵 T 𝐫(M) + Emax )

𝐡 = 𝐫(M) −

𝓵T 𝓵

Thus, in this case, PM D (𝐫(M) ) = 𝐫(M) −

Example 15.17

.

𝓵(𝓵 T 𝐫(M) + Emax ) 𝓵T 𝓵

.

For the 𝐫 [0] in (15.29), the divide step gives ⎡ 1 1 ⎢ ⎢ 1 ⎢ PD (𝐫 [0] ) = ⎢ ⎢ 1 ⎢ 1 1 ⎢ ⎢ ⎣0.91 1.1

−1 ⎤ ⎥ −1 −1 1 1 1 ⎥ 1 1 −1 1 1 −1 ⎥ ⎥. 1 −1 −1 −1 −1 ⎥ ⎥ −1 1 −1 10.0 ⎥ ⎥ 1 0.062 −0.16 −1.3 1.1 −0.95 0.81 −0.85⎦ 1

−1

1



Underlined elements are flipped in the parity projection.

15.4.6.4

The Concur Projection

For each n ∈ N , there are |n | replicas of the information rmn . In the concur projection, these |n | replicas are combined together in a concurrence bn which is closest to all of the replicas: ∑ (r − rmn )2 . bn = arg min r

m∈n

Taking the derivative with respect to r and solving yields 1 ∑ bn = r , |n | m∈ mn n

so the concur projection PnC simply averages across the information in column n. As mentioned above, these projected values (scalars) can be replicated along columns and then stacked to form the projection matrix. Example 15.18 The concur projection is applied to the overshoot 𝐪[t] = 2PD (𝐫 [t] ) − 𝐫 [t] , so this is computed first. The overshoot is ⎡1.4 ⎢1.4 ⎢ 𝐪[t] = ⎢ ⎢ ⎢1.4 ⎢ ⎣1.3

1.3 1.4 1.4 2.1 1.4 2.0 −1.9 1.3 2.0 −1.9 1.3 −2.0 1.6 1.4 0.085 −0.21

−1.3 1.4 −1.3 −1.4 1.4 −1.3 −1.4 1.4 −1.4 −1.8 1.5 −1.3

−1.5⎤ ⎥ 1.5 ⎥ 1.5 −1.5⎥ . −1.5⎥ ⎥ 1.5 ⎥ 1.1 −1.2⎦

15.4 Decoding LDPC Codes

681

Its concur projection is ⎡1.4 ⎢1.4 ⎢ PC (2PD (𝐫 [0] ) − 𝐫 [0] ) = ⎢ ⎢ ⎢1.4 ⎢ ⎣1.3

1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4

−1.4 1.4 −0.48 −1.4 −1.4 0.49 −0.48 1.4 0.49 −0.48 −1.4 −1.4 0.49 1.4 −1.4 0.49 −0.48 −1.4 1.4 −1.4

−1.4⎤ ⎥ 1.4 ⎥ 1.4 −1.4⎥ . −1.4⎥ ⎥ 1.4 ⎥ 1.4 −1.4⎦

This does not satisfy all the parities, so the parameters are updated for the next iteration. The updated parameters are 𝐫 [t+1] = PC (2PD (𝐫 [0] ) − 𝐫 [0] ) − (PD (𝐫 [0] ) − 𝐫 [0] ) 1 −1.1 1 −0.92⎤ ⎡0.96 1.1 ⎢0.96 ⎥ 1 −1.6 −1.1 −0.97 0.91 ⎢ ⎥ 1 −0.47 0.42 1 0.91 −0.92 ⎥. =⎢ 1.1 −0.47 0.42 −1.1 −0.97 −0.92⎥ ⎢ ⎢0.96 1.1 ⎥ 1.5 1 −0.97 0.91 ⎢ ⎥ 0.93 0.98 0.47 −0.42 −0.86 0.97 −1.0 1.1 −1.1 ⎦ ⎣ 1

15.4.6.5



A Message-Passing Viewpoint of DC Decoding

[t] The DC decoding algorithm can be interpreted as a message-passing algorithm. Let rmn be interpreted [t] as a message from variable node n to message node m, dn→m (t). Let the overshoot qmn from (15.30) be interpreted as the message from check node m to variable node n: [t] dn→m (t) = rmn

[t] dm→n (t) = qmn .

Note that PD (𝐫 [t] ) − 𝐫 [t] =

1 [t] [𝐪 − 𝐫 [t] ] 2

so that the updated message is 1 dn→m (t + 1) = bn (t) − [dm→n (t) − dn→m (t)]. 2 15.4.7

Difference Map Belief Propagation Decoding

Difference Map Belief Propagation (DMBP) decoding combines ideas from belief propagation (BP) decoding with the difference map from DC decoding. The decoder described here is a variation on min-sum decoding. Recall that the min-sum decoding messages can be computed using the (by now familiar) leave-one-out rule ∏ ∏ Lm→n = |Ln′ →m | sign(Ln′ →n ). (15.33) n′ ∈ (m)

n′ ∈ (m)

Example 15.19 Consider the use of the min-sum decoding algorithm at a check node m having four adjacent variable nodes, where three of the messages Ln′ →m are positive and one is negative. The outgoing messages Lm→n′ computed by (15.33) will have three negative messages and one positive message. Since the parity check is, in reality, connected to an even number of negative nodes — even parity — the presence of three computed negative messages in some sense overshoots the correct answer. ◽

682

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis In the standard min-sum decoding rule, the variable node computation uses the sum of the channel information and the messages from check nodes (using leave-one-out computation). For DMBF propagation, the check node computation is more of an average, and includes all incoming messages (no leave-one-out): ( ) ∑ Lm→m (t) . (15.34) bn (t) = Z Ln + m∈n

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

Z is a decoder parameter which may be tuned to optimize decoder performance. Typical values are Z = 0.35 or Z = 0.45. Because of the overshoot, an final step is to adjust the message for the next step by

00000

1 Ln→m (t + 1) = bn (t) − [Lm→n (t) − Ln→m (t)]. 2

α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: DMBPdecode()

(15.35)

The DMBP decoder is summarized below. Algorithm 15.9 Difference Map Belief Propagation Decoder 1. Initialization: Set t = 0. Set Ln→m (t) = Ln for m ∈ n , where Ln is the channel log likelihood. 2. Compute messages from checks to variables: Compute Lm→n using (15.33). 3. Compute bit probabilities and bit estimates: Compute bn (t) using (15.34). Set ĉ n = 1if bn (t) < 0 and ĉ n = 0 if bn (t) > 0. If these bits satisfy all parities (H 𝐜̂ = 0), output 𝐜̂ as the decoded codeword and return. 4. Compute messages from variable nodes to check nodes using (15.35).

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

15.4.8 α10 α11 α12 α13 α14 α15 α16

ldpcdecoder.cc: LP decode()

Linear Programming Decoding

Linear programming decoding is an interesting alternative to other decoders, providing some interesting theoretical insight, although possibly at the expense of some computational complexity. The presentation here is rather leisurely, with examples to illustrate the general concept. 15.4.8.1

Background on Linear Programming

Before turning to the decoding question, we summarize briefly the linear programming (LP) problem. A LP problem has a linear objective function of the vector 𝐱 ∈ ℝn to maximized or minimized, subject to linear inequality constraints minimize 𝐝T 𝐱 subject to 𝐚Ti 𝐱 ≤ bi

i = 1, 2 . . . , m.

A LP problem may also include some linear equality constraints minimize 𝐝T 𝐱 subject to 𝐚Ti 𝐱 ≤ bi 𝐟iT 𝐱 = ei This form can be expressed as

i = 1, 2 . . . , m i = 1, 2, . . . , n.

minimize 𝐝T 𝐱 subject to A𝐱 ⪯ 𝐛 F𝐱 = 𝐞.

15.4 Decoding LDPC Codes

683

(Here ⪯ means element-by-element inequality: each element of A𝐱 is less than the corresponding element of 𝐛.) A LP problem may also have lower and upper bounds on the variables, as in minimize 𝐝T 𝐱 subject to A𝐱 ⪯ 𝐛 F𝐱 = 𝐞 lb ⪯ 𝐱 ⪯ ub,

(15.36)

where the matrix F (typically wide) represents relationships between variables. In the LP decoding problem described below, the linear inequality constraint H𝐱 ⪯ 𝐛 is absent. All of these forms are essentially interchangeable by introduction of appropriate auxiliary variables. Example 15.20

Consider the problem

[ ] [ ] x1 minimize −1 −1.5 x2 subject to x1 + x2 ≤ 3 x1 + 2x2 ≤ 4 x1 ≥ 0

x2 ≥ 0.

The feasible region, points that satisfy the constraints, are shown in the shaded region in Figure 15.16.

x2

Direction of increasing objective function value Maximizing point x1

Feasible region

Figure 15.16: Linear programming problem. ◽

As this example illustrates, the linear constraints cause the feasible region to be a polytope in ℝn (a geometric object with flat sides). The linear objective function means that there a direction along which the function decreases. Typically, the optimal solutions lies on a vertex of the polytope (unless the direction of increase is orthogonal to a face of the polytope, in which case any point on that face is optimum). Once the problem is expressed as an LP problem, such as the 𝐝, H, 𝐛, F, and 𝐞 and lower and upper bounds of (15.36), the problem is typically solved by calling a library function such as the glpk (the Gnu linear programming kit) in C or linprog in MATLAB. LP problems arise in a variety of disciplines, so there is a huge body of literature related to them, and many algorithms for solving LP problems. The simplex algorithm essentially moves from vertex to vertex on the polytope in the direction of increase. For typical problems the simplex algorithm runs with moderate computational complexity, but there are pathological problems whose computational complexity is exponential in the number of variables. There are alternative algorithms known as interior point methods which have guaranteed polynomial performance. These operate (essentially) using a Newton-type algorithm with barrier constraints to ensure the constraints. Background on LP can be found in many places, such as [398] or [318].

684

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis 15.4.8.2

Formulation of the Basic LP Decoding Algorithm

Let yn be a signal received through a channel, and let cn be the corresponding bit, transmitted through a channel, such as BPSK or a BSC, which is assumed memoryless. The maximum likelihood rule chooses, for each n = 0, 1, . . . , N − 1, the value of the bit ĉ n that maximizes P(yn |cn = ĉ n ). From the difference P(yn |cn = 0) , Ln = log P(yn |cn = 0) − log P(yn |cn = 1) = log P(yn |cn = 1) the ML rule indicates

{ if Ln > 0, decide ĉ n = 0 if Ln < 0, decide ĉ n = 1.

This rule can be written as: choose ĉ n ∈ {0, 1} to minimize Jn = Ln ĉ n : If Ln < 0, then selecting ĉ n = 1 produces a smaller value of Jn ; if Ln > 0, then selecting ĉ n = 0 produces a smaller value of Jn . Extending this across N independent uses of the channel results in the following objective function: ∑

N−1

J=

Ln ĉ n .

n=0

This leads to the (incomplete) optimization problem ∑

N−1

minimize N ̂ 𝐜∈{0,1}

Ln ĉ n .

(15.37)

n=0

Example 15.21 For a BPSK channel where a bit cn = 0 is transmitted with amplitude a and a bit cn = 1 is transmitted with amplitude −a, ) ( exp − 2𝜎12 (yn − a)2 2a Ln = log ) = 2 yn ≜ Lc yn . ( 1 𝜎 exp − 2𝜎 2 (yn + a)2 ∑N−1 Thus, J = Lc n=0 yn ĉ n . For a BSC, if yn = 0, then Ln = log

P(yn = 0|cn = 0) 1−p = log > 0. P(yn = 0|cn = 1) p

Ln = log

P(yn = 1|cn = 0) p = log < 0. P(yn = 1|cn = 0) 1−p

If yn = 1, then

Since these costs are fixed, they can be rescaled to { Ln =

+1 −1

if yn = 0 if yn = 1.



The minimization problem in (15.37) is obviously incomplete, since 𝐜̂ is not yet constrained to be a codeword. For decoding a codeword, the ML principle can be restated as ∑

N−1

minimize

Ln ĉ n

n=0

(15.38)

subject to 𝐜̂ ∈ C, where C is the set of codewords. This is merely a restatement of the ML decoding rule, requiring a search over all codewords, so there is nothing new here yet.

15.4 Decoding LDPC Codes

685

We will now introduce some geometry, however, that turns this decoding rule into a LP problem. The complexity of this linear programming problem is still too high, so the problem is further relaxed to a LP problem that is easier to solve. For a code C, with its 2N codewords, define the codeword polytope as the convex hull of all possible codewords, } { ∑ ∑ 𝜆𝐜 𝐜, 𝜆𝐜 ≥ 0, 𝜆𝐜 = 1 . poly(C) = 𝐰 ∈ ℝN ∶ 𝐰 = 𝐜∈C

𝐜∈C

ℝ3 .

Figure 15.17 illustrates a polytope in The vertices of the polytope poly(C) correspond exactly to ̂ with components determined by the codewords of C. Every point in poly(C) corresponds to a vector 𝐜, ∑ ĉ i = 𝐜 𝜆𝐜 ci . The ML decoding problem can be expressed as ∑

N−1

minimize

Ln ĉ n

n=1

subject to 𝐜̂ ∈ poly(C). This is a linear programming problem. For each codeword 𝐜 ∈ C, let d𝐜 be the scalar ∑

N−1

d𝐜 =

Ln 𝐜n .

n=0

Then the LP problem can be expressed as minimize

∑ 𝐜∈C

𝜆𝐜 d𝐜

subject to 𝜆𝐜 ≥ 0 for each 𝜆𝐜 ∑ 𝜆𝐜 = 1. 𝐜∈C

2N

The vector 𝜆 ∈ ℝ whose elements are the 𝜆𝐜 providing the minimum likelihood is sought. Solution of this LP problem gives the ML solution. However, since there are 2N variables, the computational complexity is of the same order as the ML problem in (15.38). We therefore seek to relax some of the constraints to reduce the complexity. 15.4.8.3

LP Relaxation

In this section, the statement of general principles is accompanied by a specific example based on a Hamming (7, 4) code with parity check and generator matrices ⎡1 0 0 1 1 0 1⎤ H = ⎢0 1 0 1 0 1 1⎥ ⎢ ⎥ ⎣0 0 1 0 1 1 1⎦

⎡1 ⎢1 G=⎢ 0 ⎢ ⎣1

1 0 1 1

0 1 1 1

1 0 0 0

0 1 0 0

0 0 1 0

0⎤ 0⎥ 0⎥ ⎥ 1⎦

(0, 1, 1) (1, 0, 1) (0, 0, 0) (1, 1, 0)

Figure 15.17: Polytope of the codewords (0, 0, 0), (0, 1, 1), (1, 1, 0), (1, 0, 1).

686

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis

0 +

v0

1 +

v1

v2

v3

2 +

v4

v5

v3

v0 +

v6

v4

v6

v1 +

v5 + v2

Figure 15.18: Tanner graph for the Hamming (7, 4) code. and Tanner graph shown in Figure 15.18. The variable nodes adjacent to each check node are described by the sets  (0) = {0, 3, 4, 6}  (1) = {1, 3, 5, 6}  (2) = {2, 4, 5, 6}. At each check node of the Tanner graph for the code, define a local code, which is the set of binary vectors that have even weight at that check node. For example, for the Hamming code at check node 0, the local code has even parity on the bits in the set  (0) = {0, 3, 4, 6}. Possible codewords for this local code are (the S notation is described below) 0 1 1 0

∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

0 1 0 1

0 0 1 1

∗ ∗ ∗ ∗

0 0 0 0

S=∅ S = {0, 3} S = {0, 4} S = {3, 4}

1 0 0 1 ,

∗ ∗ 0 0 ∗ 1 ∗ ∗ 1 0 ∗ 1 ∗ ∗ 0 1 ∗ 1 ∗ ∗ 1 1 ∗ 1

S = {0, 6} S = {3, 6} S = {4, 6} S = {0, 3, 4, 6}

(15.39)

where ∗ denotes locations where a local codeword can be either 0 or 1. The global code (that is, the original code C) is the intersection of the local codes associated with each check node. Each check node determines a local codeword polytope as the convex hull of convex combinations of local codewords. For check node m, the set of adjacent variable nodes is  (m). Define subsets S ⊂  (m) that contain an even number of variable nodes. Each set S corresponds to the set of local codewords determined by setting { 1 n∈S cn = 0 n ∈  (m) but n ∉ S. In (15.39), each set of local code words is associated with an S set. Collecting together the S set for each check node, form Em = {S ⊆  (m) ∶ |S|even},

m = 0, 1, . . . , M.

For the Hamming code, E0 = {∅, {0, 3}, {0, 4}, {3, 4}, {0, 6}, {3, 6}, {4, 6}, {0, 3, 4, 6}} E1 = {∅, {1, 3}, {1, 5}, {3, 5}, {1, 6}, {3, 6}, {5, 6}, {1, 3, 5, 6}} E2 = {∅, {2, 4}, {2, 5}, {4, 5}, {2, 6}, {4, 6}, {5, 6}, {2, 4, 5, 6}}. Let 𝜅m denote the number of S-sets in Em , ⌊| (m)|∕2⌋ (

𝜅m = and let 𝜅 =

∑M−1 j=0

∑ i=0

| (m)| 2i

)

𝜅j . In our example, 𝜅0 = 𝜅1 = 𝜅2 = 8 and 𝜅 = 24.

(15.40)

15.4 Decoding LDPC Codes

687

To set up a LP problem, for each S in each Em , introduce an indicator variable 𝑤m,S which describes a codeword, where 𝑤m,S = 1 indicates that S is the set of bits in  (m) that are equal to 1. The variable 𝑤m,S can be interpreted as indicating that the codeword satisfies check m using the configuration S. For the Hamming code example the set of these indicator variables is m=0∶

𝑤0,∅ , 𝑤0,{0,3} , 𝑤0,{0,4} , 𝑤0,{3,4} , 𝑤0,{0,6} , 𝑤0,{3,6} , 𝑤0,{4,6} , 𝑤0,{0,3,4,6}

m=1∶

𝑤1,∅ , 𝑤1,{1,3} , 𝑤1,{1,5} , 𝑤1,{3,5} , 𝑤1,{1,6} , 𝑤1,{3,6} , 𝑤1,{5,6} , 𝑤1,{1,3,5,6}

m=2∶

𝑤2,∅ , 𝑤2,{2,3} , 𝑤2,{2,5} , 𝑤2,{4,5} , 𝑤2,{2,6} , 𝑤2,{4,6} , 𝑤2,{5,6} , 𝑤2,{2,4,5,6} .

As indicator variables, these must satisfy the constraints (bounds) 0 ≤ 𝑤m,S ≤ 1

for all S ∈ Em .

(15.41)

There are 𝜅 of these constraints. As an example, suppose that the codeword is 𝐜 = (0, 0, 0, 1, 1, 1, 0). Then the indicator variables associated with this codeword are 𝑤0,{3,4} = 1 𝑤1,{3,5} = 1

𝑤2,{4,5} = 1

and all other 𝑤j,S = 0. That is, the codeword satisfies check m = 0 because c3 ⊕ c4 = 0, and check m = 1 since c3 ⊕ c5 = 0, and check m = 2 because c4 ⊕ c5 = 1. For a given codeword, each parity check m is satisfied by exactly one even-sized subset of nodes in  (m). This introduces a constraint on the 𝑤m,S variables: ∑ 𝑤m,S = 1, m = 0, 1, . . . , M − 1. (15.42) S∈Em

(The number of constraints here is M.) For our example problem, these constraints are 𝑤0,∅ + 𝑤0,{0,3} + 𝑤0,{0,4} + 𝑤0,{3,4} + 𝑤0,{0,6} + 𝑤0,{3,6} + 𝑤0,{4,6} + 𝑤0,{0,3,4,6} = 1 𝑤1,∅ + 𝑤1,{1,3} + 𝑤1,{1,5} + 𝑤1,{3,5} + 𝑤1,{1,6} + 𝑤1,{3,6} + 𝑤1,{5,6} + 𝑤1,{1,3,5,6} = 1

(15.43)

𝑤2,∅ + 𝑤2,{2,4} + 𝑤2,{2,5} + 𝑤2,{4,5} + 𝑤2,{2,6} + 𝑤2,{4,6} + 𝑤2,{5,6} + 𝑤2,{2,4,5,6} = 1. For each m = 0, 1, . . . , M − 1, the set of indicator variables can be stacked to form a vector 𝐰j of length 𝜅j such as, in our example ⎡ 𝑤0,∅ ⎤ ⎥ ⎢ 𝑤 𝐰0 = ⎢ 0,{0,3} ⎥ . ⋮ ⎥ ⎢ ⎣𝑤0,{0,3,4,6} ⎦8×1 Let ĉ n denote the variable (bit) node value. As a softened bit value, each must satisfy the constraint (bound) 0 ≤ ĉ n ≤ 1 n = 0, 1, . . . , N − 1. (15.44) The bit value ĉ n must belong to the local codeword polytope associated with check node m, so that for all n ∈  (m), ∑ ĉ n = 𝑤n,S . (15.45) S∈Em such that n∈S

∑ (The number of contraints here is M−1 m=0 |m (m)|). In our example problem, these constraints are as follows:

688

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis m = 0:

ĉ 0 =



𝑤0,S = 𝑤0,{0,3} + 𝑤0,{0,4} + 𝑤0,{0,6} + 𝑤0,{0,3,4,6}

S∈E0 ∶0∈S

ĉ 3 =



𝑤0,S = 𝑤0,{0,3} + 𝑤0,{3,4} + 𝑤0,{3,6} + 𝑤0,{0,3,4,6}

S∈E0 ∶3∈S

ĉ 4 =



𝑤0,S = 𝑤0,{0,4} + 𝑤0,{3,4} + 𝑤0,{4,6} + 𝑤0,{0,3,4,6}

S∈E0 ∶4∈S

ĉ 6 =



𝑤0,S = 𝑤0,{0,6} + 𝑤0,{3,6} + 𝑤0,{4,6} + 𝑤0,{0,3,4,6}

S∈E0 ∶6∈S

. m = 1:

ĉ 1 =



𝑤1,S = 𝑤1,{1,3} + 𝑤1,{1,5} + 𝑤1,{1,6} + 𝑤1,{1,3,5,6}

S∈E1 ∶1∈S

ĉ 3 =



𝑤1,S = 𝑤1,{1,3} + 𝑤1,{3,5} + 𝑤1,{3,6} + 𝑤1,{1,3,5,6}

S∈E1 ∶3∈S

ĉ 5 =



𝑤1,S = 𝑤1,{1,5} + 𝑤1,{3,5} + 𝑤1,{5,6} + 𝑤1,{1,3,5,6}

S∈E1 ∶5∈S

ĉ 6 =



𝑤1,S = 𝑤1,{1,6} + 𝑤1,{3,6} + 𝑤1,{5,6} + 𝑤1,{1,3,5,6}

S∈E1 ∶6∈S

. m = 2:

ĉ 2 =



𝑤2,S = 𝑤2,{2,4} + 𝑤2,{2,5} + 𝑤2,{2,6} + 𝑤2,{2,4,5,6}

S∈E2 ∶2∈S

ĉ 4 =



𝑤2,S = 𝑤2,{2,4} + 𝑤2,{4,5} + 𝑤2,{4,6} + 𝑤2,{2,4,5,6}

S∈E2 ∶4∈S

ĉ 5 =



𝑤2,S = 𝑤2,{2,5} + 𝑤2,{4,5} + 𝑤2,{5,6} + 𝑤2,{2,4,5,6}

S∈E2 ∶5∈S

ĉ 6 =



𝑤2,S = 𝑤2,{2,6} + 𝑤2,{4,6} + 𝑤2,{5,6} + 𝑤2,{2,4,5,6}

S∈E2 ∶6∈S

. There are N + 𝜅 variables, which are the ĉ n and the 𝜅 set indicator variables. The solution vector 𝐱 (to be found) has the structure ⎡ 𝐜̂ ⎤ ⎢ 𝐰0 ⎥ 𝐱 = ⎢ 𝐰1 ⎥ . ⎢ ⎥ ⎢ ⋮ ⎥ ⎣𝐰M−1 ⎦ (N+𝜅)×1 ̂ 𝐰) (that is, the codeword Collectively a point such as this is denoted (with slight abuse of notation) as (𝐜, information, and the parity indicator variables.

15.4 Decoding LDPC Codes

689

Corresponding to 𝐱 is a 𝐝 vector in the LP notation of (15.36) ⎡ Lc 𝐲 ⎤ ⎥ ⎢ 0 ⎢ 𝜅0 ×1 ⎥ 0 𝐝 = ⎢ 𝜅1 ×1 ⎥ . ⎢ ⋮ ⎥ ⎥ ⎢0 ⎣ 𝜅M−1 ×1 ⎦ For the example problem of the (7, 4) Hamming code, there are 7 + 3 × 8 = 31 variables and 3 × 4 + 3 = 15 equality constraints. ∑ ̂ n can now be expressed as 𝐝T 𝐱. This leads to The objective function (to be minimized) N−1 n=0 Ln c the LP expressed as minimize 𝐝T 𝐱 subject to (15.42)

(M constraints) ∑

M−1

(15.45)

(

|m (m)| constraints)

m=0

(15.41)

(𝜅 bounds)

(15.44)

(N bounds)

The lower and upper bounds in (15.36) are 0 and 1 vectors, respectively. Other than the bounds, there are no inequality constraints in (15.36). The equality constraints (15.42) and (15.45) can be represented using the F matrix

and corresponding 𝐞 vector [ ] 𝐞= 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 . Using this notation the LP for decoding is minimize 𝐝T 𝐱 subject to F𝐱 = 𝐞 0 ⪯ 𝐱 ⪯ 1. The number of indicator variables, 𝑤m,S , is given by 𝜅, which depends on the number of combinations in (15.40). Being combinatorial, this number grows very quickly. Figure 15.19 shows that 𝜅m increases exponentially with | (m)|. Thus, while linear programming can be used for arbitrary linear codes, it is practical only for codes having | (m)| small, such as LDPC codes.

690

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis 106 105

km

104 103 102 101 100

4

6

8

10

12 (m)

14

16

18

20

Figure 15.19: Number of constraints 𝜅m as a function of | (m)| .

15.4.8.4

Examples and Discussion

In the examples below, assume (for convenience) Lc = 1.



The received data vector is 𝐲 = [1, 1, 1, −1, −1, −1, 1]T . Form the vector ⎡ Lc 𝐲 ⎤ ⎢0 ⎥ 𝐝 = ⎢ 8×1 ⎥ . 0 ⎢ 8×1 ⎥ ⎣08×1 ⎦ Using this and the matrix F above in a linear programming function results in the solution vector ] [ 𝐱= 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 . From this, we observe that the decoded codeword is 𝐜̂ = (0, 0, 0, 1, 1, 1, 0), and that 𝑤0,{3,4} = 1 𝑤1,{3,5} = 1

𝑤2,{4,5} = 1

(as expected). The value of the objective function is 𝐝T 𝐱 = −3.



Introducing an error, let 𝐲 = [1, 1, 1, −1, −1, 1, 1]T . Using this to form the first 7 elements of 𝐝 (leaving zeros in the other positions of 𝐝) yields the codeword 𝐜̂ = (0, 0, 0, 1, 1, 1, 0) (as expected), with the 𝑤j,S values as before. The value of the objective function is 𝐝T 𝐱 = −1.

Observe that in both of these examples, the solution vector produced integer values for both the ̂ 𝐰) in the polytope whose elements of 𝐜̂ and 𝑤m,S . These are examples of an integral point, a point (𝐜, ̂ 𝐰) is an integral point, then 𝐜 is a values are all integers. It is straightforward to show that if (𝐜, codeword. Also, if 𝐜̂ is a codeword, then there is a corresponding 𝐰 representing the parity information ̂ 𝐰) is an integral point. about 𝐜̂ such that (𝐜,

15.4 Decoding LDPC Codes

691

Now let the certainty of the erroneous term be even higher by letting 𝐲 = [1, 1, 1, −1, −1, 2, 1]T . This received vector strongly suggests that ĉ 6 = 0. Calling the LP solver gives the solution ] [ 𝐱 = 0 0 0 0.6667 0.6667 0 0.6667 0 0 0 0.3333 0 0.3333 0.3333 0 0.3333 0 0 0 0 0.6667 0 0 0.3333 0 0 0 0 0.6667 0 0 .



This vector does meet all the constraints, but the elements are not integral values. Thresholding this vector would suggest that 𝐜̂ = (0, 0, 0, 1, 1, 0, 1), which is not a codeword. The value of the objective function is 𝐝T 𝐱 = −0.6667. This example illustrates an important property about LP decoding. If the solution vector is an integral point, then the 𝐜̂ portion of it is guaranteed to be the ML codeword, and if the solution vector is not an integral point, then 𝐜̂ is not a codeword. This is referred to as the ML certificate property. Finding a nonintegral point can be used to declare decoder failure. This does not mean that if 𝐱 is an integral point that it is guaranteed to be the true (transmitted) codeword, since noise can cause the received vector to be in the basin of attraction of another codeword. In characterizing the performance of a code and decoder using simulation, it is often useful to assume the all-zero codeword is transmitted. It can be shown that the probability that the LP decoder fails is independent of what codeword is transmitted, so that nothing is lost evaluating using the all-zero codeword. The LP decoder is of theoretical interest, but for codes of any appreciable size, the decoding complexity is too large to be practical. 15.4.9

Decoding on the Binary Erasure Channel

The general belief propagation decoder described in Section 15.4.1 or 15.4.2 applies to the BEC , but it is interesting to examine the dynamics of the decoding. The algorithm here is referred to as a peeling decoder, since it peels off one erased bit after the other. Example 15.22 signal vector is

Consider the (7, 4) Hamming code with Tanner graph shown in Figure 15.20(a). The received [ ] 𝐲= 1 ? 0 ? 0 ? 0 ,

where ? denotes an erased value. In the figure, solid lines show edges connected to 1, dotted lines show edges connected to 0, and heavy dashed lines show edges connected to erasures. Before formally stating the decoding algorithm, consider the messages at check node 0. There are four incoming messages, with values 0, 0, 1, and an erasure. Since the parity of all messages must be even, it must be that the erased bit at 𝑣3 is a 1. This 1 can be propagated back through the graph. In light of this observation, here is a statement of the peeling algorithm. 1. Initialization: Send messages from each variable node to its connected message nodes. At the check nodes, record the parity of the unerased messages. Delete the variable nodes of known bits and edges from those nodes to the corresponding check nodes. The result of this initialization is shown in Figure 15.20(b), with edges deleted and parity values at each check node. 2. Decoding: Select, if possible, a check node with only one edge remaining (coming from an erased variable node). Based on the parity of that node, determine the erased bit value, and send the bit value to the variable node. Then delete the check node and its edge. In Figure 15.20(b), the check node with a single remaining edge is circled, check node 0. Based on the parity at check node 0, variable node 𝑣3 is determined to be a 1. After propagating this value throughout the graph and deleting the edges and variable nodes, the graph appears as in Figure 15.20(c). Note that the parity at check node 1 is now updated to 1. In this figure, the next node with a single edge is also circled, check node 2. Based on the parity at check node 2, variable node 𝑣5 is determined to be 0. After propagating this value throughout the graph and deleting the edges and variable nodes, the graph appears as in Figure 15.20(d).

692

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis v0

1

v1

?

0

v2

0

v3

?

v3

?

v4

0

v4

0

v5

?

v5

?

v6

0

v6

0

v0

1

v1

?

v2

+ 0 + 1

+ 2 (a)

v0

1

v1

?

v2

0

v3

1

v4

0

v5

?

v6

Parity = 1

+ 0

Parity = 0

+ 1 Parity = 0

+ 2 (b)

+ 0 Parity = 1

+ 1 Parity = 0

+ 2

0 (c)

v0

1

v1

?

v2

0

v3

1

v4

0

v5

0

v6

0

+ 0 Parity = 1

+ 1

+ 2 (d)

Figure 15.20: Hamming code Tanner graph. Solid = 1, dotted = 0, heavy dashed = erasure. (a) Tanner graph; (b) graph with deleted edges; (c) after one erasure corrected; (d) after two erasures corrected. In Figure 15.20(d), the check node with a single remaining edge (the last check node) is circled, check node 1. From the parity at check node 1, it can be determined that variable node 𝑣1 = 1. Node 𝑣1 and the edges connected to it are deleted. The decoded codeword is thus [ ] 𝐜̂ = 1 1 0 1 0 0 0 . 3. Termination: If the remaining graph is empty, the codeword has been successfully determined. If the graph is not empty (i.e., there are check nodes that have multiple erased edges still connected to them), then declare a decoding failure. The peeling algorithm was explained in a sequential fashion, but it can also be expressed in a conventional belief propagation mode. ◽

15.4.10

BEC Channels and Stopping Sets

The peeling algorithm helps provide insight into the random error correction capability of LDPC codes over the BEC channel. Let G be the Tanner graph associated with a linear block code. Let V be a subset of the variable nodes of G and let C be a subset of the check nodes of G that is a neighbor set of V, that is, each check node in C is connected to at least one variable node in V. A set V of variable nodes is called a stopping set of G if each check node in a neighbor set C of V is connected to at least two variable nodes in V (all of the neighbors of the variable nodes in a stopping set are connected to this set twice) [93]. Figure 15.21(a) illustrates the idea of a stopping set — neighbors of variables connected multiple times — and Figure 15.21(b) illustrates a Tanner graph in which nodes {𝑣0 , 𝑣4 , 𝑣6 } form a stopping set.

15.4 Decoding LDPC Codes

693

v1

c1

v2

c2

v3

v (a)

v0 v1 v2 v3 v4 v5 v6 v7 v8

+ + + + + + (b)

Figure 15.21: Stopping sets. (a) Idea of stopping sets; (b) stopping set {𝑣0 , 𝑣4 , 𝑣6 } on a (9, 3) example code [442].

Recall that in the peeling decoder for the BEC channel, when a check node is connected to only one erased bit then that erased bit can be determined by the parity constraints. (The peeling algorithm is simply one way to organize the message-passing algorithm, so this insight applies more generally to erasure decoding.) In light of this, suppose that a set of erased variables E is equal to a stopping set V; that is, all of the variables associated with the nodes in V are erased. Then since there are multiple connections of the check nodes to these variable nodes, these erased bits cannot be corrected. It is clear that the union of two or more stopping sets is a stopping set. The union of all stopping sets is said to be the maximum stopping set. Any erasure pattern (of more than one erasure) occurring in the maximum stopping set cannot be recovered. Perhaps more interesting is a set of variable nodes Vssf which is stopping-set-free, in that neighboring check nodes are not connected to more than one variable node in the stopping-set-free set. Then all patterns of errors in the stopping-set-free set can be decoded. A stopping set of minimum size in G is called a minimum stopping set, which we denote here as Vmin . Code symbols which are erased corresponding to the variables in Vmin are unrecoverable. In order for a code to have large random-erasure correction capability (e.g., capable of correcting any pattern of erasures), it is necessary to have large minimum stopping sets. There is an interesting relationship between minimum stopping sets and the row weight of a regular LDPC code. Lemma 15.23 Let H be the RC-constrained parity-check code of a regular (j, k) LPDC code. Then the size of the minimum possible stopping set is j + 1. (That is, there are no minimum stopping sets of size ≤ j). Proof Let  = {n1 , n2 , . . . , nt } be a pattern of t erasures, with 0 ≤ t ≤ g. For an erasure at any location, say n𝓁 ∈ , since the column weight of H is j, then there are j rows of H that all check location n𝓁 ; denote the set of these rows as H𝓁 = {𝐡m1 , 𝐡m2 , . . . , 𝐡mj }. Consider another erasure location in , say, n𝓁′ . By the RC condition, there will not be two (or more) rows in H𝓁 that also check n𝓁′ . This is true for any 𝓁 ′ ≠ 𝓁, so the other t − 1 erasures in  can be checked by at most t − 1 rows in H𝓁 . Therefore, there is at least one row in H𝓁 that checks one erased symbol n𝓁 and k − 1 other non-erased symbols. Thus, erasure n𝓁 can thus be recovered. Since this applies to any n𝓁 in , every erasure in  can be recovered. Hence every erasure pattern of up to j erasures can be recovered, and there is no minimum stopping set of size ≤ j. Now consider an erasure pattern  = {n1 , n2 , . . . , nj+1 } of j + 1 erasures. Consider the j rows in H𝓁 that check the erased symbol n𝓁 . The erasure pattern could be such that each of the other j erasures are checked separately by the rows of H𝓁 , so that each row of H𝓁 checks two erasures. In this situation, 

694

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis corresponds to a stopping set of size j + 1. Therefore, the minimum possible stopping set of a regular (j, k) code is j + 1. ◽ It can be shown that for regular (j, k) codes, the sizes of minimum stopping sets with girths 4, 6, and 8 are 2, j + 1, and 2j, respectively [329, 330]. This indicates that cycles of girth 4 are particularly deleterious to code performance, and many of the code design algorithms specifically avoid this. (It is also easy to detect cycles of girth 4, so this makes the design methods feasible.) It has also been shown that [443] that a code with minimum distance dmin contains stopping sets of size dmin . So a code having good random erasure capability must have large minimum distance.

15.5 Why Low-Density Parity-Check Codes? Gallager showed that for random LDPC codes, the minimum distance dmin between codewords increases with N when column and row weights are held fixed [148, p. 5], that is, as they become increasingly sparse. Sequences of LDPC codes as N → ∞ have been proved to reach channel capacity [288]. LDPC codes thus essentially act like the random codes used in Shannon’s original proof of the channel coding theorem. The decoding algorithm is tractable, at least on a per-iterations basis. As observed, the decoding algorithm has complexity linearly proportional to the length of the code. Thus, we get the benefit of a random code, but without the exponential decoding complexity usually associated with random codes. These codes fly in the face of the now outdated conventional coding wisdom, that there are “few known constructive codes that are good, fewer still that are practical, and none at all that are both practical and very good” [288, p. 399]. It is the extreme sparseness of the parity check matrix for LDPC codes that makes the decoding particularly attractive. The low-density nature of the parity check matrix thus, fortuitously, contributes both to good distance properties and the relatively low complexity of the decoding algorithm. For finite length (but still long) codes, excellent coding gains are achievable as we briefly illustrate. Figure 15.22(a) shows the BPSK probability of error performance for two LDPC codes, a rate 1/2 code with (N, K) = (20,000, 10,000) and a rate 1/3 code with (N, K) = (15,000, 5000) from [286], compared with uncoded BPSK. These plots were made by adding simulated Gaussian noise to a codeword then iterating the message-passing algorithm up to 1000 times. The minimum number of blocks codewords

Probability of bit error

10–1 10–2

120

Rate 1/2, N = 20,000 BER Rate 1/2, N = 20,000 WER Rate 1/3, N = 15,000 BER Rate 1/3, N = 15,000 WER Rate 1/2 capacity bound Rate 1/3 capacity bound No coding

Average number of decoding iterations

100

10–3 10–4

–1

–0.5

0 0.5 Eb/N0 (dB)

(a)

1

1.5

Rate 1/2, N = 20,000 Rate 1/3, N = 15,000

110 100 90 80 70 60 50 40 30 20 0.4

0.6

0.8

1

1.2

1.4

Eb/N0 (dB)

(b)

Figure 15.22: Illustration of the decoding performance of LPDC codes and the number of iterations to achieve decoding. (a) Performance for a rate 1/2 and a rate 1/3 code; (b) average number of decoding iterations.

15.6 The Iterative Decoder on General Block Codes

695

at each SNR was 2000 . In all cases, the errors counted in the probability of error are detected errors; in no case did the decoder declare a successful decoding that was erroneous. (This is not always the case. We have found that for very short codes, the decoder may terminate with the condition H 𝐜̂ = 0, but 𝐜̂ is erroneous. However, for long codes, decoding success essentially means correct decoding.) For both of these codes the performance is within about 1.5 dB of the Shannon channel capacity. Figure 15.22(b) shows the average number of iterations for correctly decoded codewords to complete the decoding. (The peak number of iterations, not shown, would reach to the maximum set of 1000 in the case that the codeword is not found.) As the SNR decreases, the number of decoding iterations increases — additional work is required as the noise increases. At lower SNR, the number of iterations is around 100. The high number of iterations suggests a rather high potential decoding complexity, even though each iteration is readily computed. As suggested by EXIT chart analysis, as the decoding threshold is approached, the number of iterations must increase. There are, of course, some potential disadvantages to LDPC codes. First, the best code performance is obtained for very long codes (as predicted by the channel coding theorem). This long block length, combined with the need for iterative decoding, introduces latency which may be unacceptable in many applications. Second, since the G matrix is not necessarily sparse, the encoding operation may have complexity O(N 2 ). Some progress in reducing complexity is discussed in Section 15.11, and also using variations such as RA codes. LDPC codes have an error floor, just as turbo codes do. Considerable effort has been devoted to designing codes with low error floors.

15.6 The Iterative Decoder on General Block Codes There initially seems to be nothing impeding the use of the sum-product decoder for a general linear block code: it simply relies on the parity check matrix. This would mean that there is a straightforward iterative soft decoder for every linear block code. For example, Figure 15.23 shows the use of the soft decoder on a (7, 4) Hamming code. The soft decoding works better than conventional hard decoding by about 1.5 dB. However, for larger codes a serious problem arises. Given a generator matrix G, the corresponding H matrix that might be found for it is not likely to be very sparse, so the resulting Tanner graph has

10–1

Iterative decoded Uncoded Hard-decision decoded

10–2

Pb

10–3

10–4

10–5

10–6

4

5

6

7

8

9

10

Eb/N0 (dB)

Figure 15.23: Comparison of hard-decision Hamming decoding and sum-product (iterative) decoding.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpctest1.cc ldpcdec.cc

696

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis many cycles in it. In fact, a threshold is reached at some density of the parity check matrix at which the decoder seems to break down completely. Also, the problem of finding the sparsest representation of a matrix is as computationally difficult (NP-complete) as performing a ML decoding, so there is not much hope of taking a given H matrix and finding its sparsest equivalent to be used for decoding.

15.7 Density Evolution Having described LDPC codes and the decoding, we now turn our attention to some analytical techniques associated with the codes. Density evolution is an analytical technique which has been used to understand limits of performance of LDPC decoders. It also provides a tool which can be used in the design of families of LDPC codes, since their performance can be predicted using density evolution much more rapidly than the performance can be simulated. Density evolution analysis introduces the idea of a channel threshold, above which the code performs well and below which the probability of error is non-negligible. This provides a single parameter characterizing code performance which may be used to gauge the performance compared to the ultimate limit of the channel capacity. In density evolution, we make a key assumption that the block length N → ∞, under which it is assumed there are no cycles in the Tanner graph. Since the code is linear and we have assumed a symmetric channel, it suffices for this analysis to assume that the all-zero codeword 𝐜 = 0 is sent. We also assume that the LDPC code is regular, with |n | = 𝑤c and |m | = 𝑤r for each n and m. The analysis is essentially performed over ensembles of variable and check nodes, as the particular structure is drawn at random (respecting the assumption that there are no cycles in the Tanner graph.) [𝓁] [𝓁] and Lm→n denote the messages at variable and check nodes at decoding iteration 𝓁. Let Ln→m Local Convention: We assume furthermore that a bit of zero is mapped to a signal √ amplitude √ of + Ec (i.e., 0 → 1 and 1 → −1). Based on this convention, the received signal is yn = Ec + 𝜈n , where 𝜈n ∼  (0, 𝜎 2 ). In the log-likelihood decoding algorithm, the initial LLR is √ √ 2 Ec 2 Ec √ [0] Ln→m = y = ( Ec + 𝜈n ), 𝜎2 n 𝜎2 [0] are which is Gaussian. The mean and variance of Ln→m

2Ec 4E [0] var(Ln→m ) = 2c = 2m[0] . 2 𝜎 𝜎 That is, the variance is equal to twice the mean. Thus, [0] m[0] = E[Ln→m ]=

[0] ∼  (m[0] , 2m[0] ). Ln→m

A Gaussian random variable having the property that its variance is equal to twice its mean is said to be consistent. Consistent random variables are convenient because they are described by a single parameter. [0] [𝓁] are Gaussian, but at other iterations the Ln→m are not Gaussian. However, Clearly, the initial Ln→m since the variable node computation involves a sum of incoming messages (see (15.7)), a central limit argument can be made that it should tend toward Gaussian. Furthermore, numerical experiments [𝓁] . (This does not depend on the confirm that they are fairly Gaussian. Let m[𝓁] denote the mean of Ln→m node n, so this is an ensemble average.) The messages sent from check nodes, Lm→n [𝓁], are nongaussian (e.g., computed using the tanh rule), but again numerical experiments confirm that they can be approximately represented by Gaussians. [𝓁] [𝓁] ] denote the mean of a randomly chosen Lm→n (or the average over an ensemble of Let 𝜇[𝓁] = E[Lm→n codes). Under the assumption that the nodes are randomly chosen and that the code is regular, we also assume that the mean does not depend on m or n. In the interest of analytical tractability, we assume that all messages are not only Gaussian, but consistent so [𝓁] ∼  (𝜇[𝓁] , 2𝜇[𝓁] ) Lm→n

[𝓁] Ln→m ∼  (m[𝓁] , 2m[𝓁] ).

15.7 Density Evolution

697

Density evolution analysis tracks the parameters of these Gaussian random variables through the decoding process. By the tanh rule (15.6), ( ( [𝓁] ) ) [𝓁] Ln′ →m ∏ Lm→n tanh tanh = . 2 2 ′ n ∈ (m)∖{n}

Taking the expectation of both sides, we have [ ( ( [𝓁] ) )] ⎡ [𝓁] Ln′ →m ⎤ ∏ Lm→n ⎥. E tanh tanh = E⎢ ⎢′ ⎥ 2 2 ⎣n ∈ (m)∖{n} ⎦

(15.46)

Now define the function Ψ(x) = E[tanh(y∕2)]where y ∼  (x, 2x) =√



1

4𝜋x ∫−∞

2 ∕(4x)

tanh(y∕2)e−(y−x)

dy,

which is plotted in Figure 15.24 compared with the function tanh(x∕2). The Ψ(x) function is monotonic and looks roughly like tanh(x∕2) stretched out somewhat.2 Using the Ψ function, we can write (15.46) as Ψ(𝜇

[𝓁]

)=

(Ψ(m[𝓁] ))𝑤r −1 .

(15.47)

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

Taking expectations of both sides of (15.7), we obtain m[𝓁] =

psifunc.m

2Ec + (𝑤c − 1)𝜇[𝓁−1] . 𝜎2

Substituting this into (15.47), we obtain ( ( ))𝑤r −1 2Ec [𝓁] [𝓁−1] Ψ(𝜇 ) = Ψ + (𝑤c − 1)𝜇 , 𝜎2 1.5

Ψ(x) tanh(x)

1 0.5 0 −0.5 −1 −1.5 −10

−5

0 x

5

10

Figure 15.24: The function Ψ(x) compared with tanh(x∕2). 2

It has been found [73] that Ψ(x) can be closely approximated by Ψ(x) ≈ 1 − e−0.4527x

0.86 +0.0218

.

698

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis or 𝜇

[𝓁]

−1



(( ( ))𝑤r −1 ) 2Ec [𝓁−1] . + (𝑤c − 1)𝜇 Ψ 𝜎2

(15.48)

The recursion is initialized with 𝜇[0] = 0. The dynamics are completely determined by the row weight 𝑤r , the column weight 𝑤c , and the SNR Ec ∕𝜎 2 . This equation describes how the mean log-likelihood messages from a check node change with time. (Again, under the assumptions that there are no cycles in the Tanner graph, and that the Gaussian assumptions are valid.) If the decoding algorithm converges to a decoded value, the random variables [𝓁] should become more and more certain — since the all-zero codeword is assumed — that is, Lm→n larger and larger. Then, since the messages are assumed Gaussian, the probability of a message less than 0 (that is, incorrectly decoded) becomes small as the number of iterations increases. For some values of SNR, the mean 𝜇[𝓁] converges to a small fixed point. The Gaussian pdf it [𝓁] represents would thus have both positive and negative outcomes, meaning that the messaged Lm→n represented by this distribution could have negative values, or that (recalling that the all-zero codeword is assumed) there is a non-negligible probability that there are decoding errors. On the other hand, for some values of SNR, the mean 𝜇[𝓁] tends to infinity. The pdf has all of its probability on positive values. Thus, for a sufficiently large number of iterations the decoder would decode correctly. α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

densev1.m densevtest.m Psi.m Psiinv.m plotgauss.m

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

threshtab.m

Example 15.24 Let 𝑤c = 4 and 𝑤r = 6, resulting in a R = 1 − 4∕6 = 1∕3 code. Recall that Ec = REb and 𝜎 2 = N0 ∕2, so that Ec ∕𝜎 2 = 2REb ∕N0 . Let Eb ∕N0 = 1.72 dB. Then the iteration (15.48) achieves a fixed point at 𝜇 ∗ = liml→∞ 𝜇 [𝓁] = 0.3155. The corresponding density  (0.3155, 0.631) is shown using dashed lines in Figure 15.25(a). The mean is small enough that there is a high probability that 𝜆[𝓁] < 0 for any iteration; hence, decoding errors are probable. In Figure 15.25(b), the mean value is shown as a function of iterations. When Eb ∕N0 = 1.72 dB, after about 20 iterations the mean has converged to its value of 0.3155. When Eb ∕N0 = 1.764 dB, the mean (15.48) tends to ∞: 𝜇[𝓁] → ∞ as l → ∞. Figure 15.25(a) shows the distributions for iterations 507 through 511. Clearly, for high iteration numbers, the decoder is almost certain to decode correctly. After about 507 iterations, the mean quickly increases. Figure 15.25(b) shows the mean values 𝜇[𝓁] as a function of the iteration number 𝓁 for various SNRs. For sufficiently small SNR, the mean converges to a finite limit, implying a nonzero probability of error. As the SNR increases, the mean “breaks away” to infinity after some number of iterations, where the number of iterations required decreases with increasing SNR. At 1.72, 1.75, and 1.76 dB, the means converge to finite values. However, for a slight increase of SNR to 1.764 dB, the mean breaks away (after about 500 iterations). For another slight increase to 1.765 dB, the breakaway happens at about 250 iterations. ◽

As this example shows, there is a value of Eb ∕N0 above which reliable decoding can be expected (𝜇[𝓁] → ∞) and below which it cannot. This is called the threshold of the decoder. Table 15.1 [73] shows thresholds for regular LDPC codes of various rates, as well as channel capacity at that rate. (Note: The recursion (15.48) is sensitive to numerical variation.) The thresholds are shown both in terms of Eb ∕N0 and in terms of a channel standard deviation 𝜎𝜏 , where 2REb 1 = 2. N0 𝜎𝜏 As the table shows, there is a tendency toward decrease (improvement) in the Eb ∕N0 threshold as the rate of the code decreases. However, even within a given rate, there is variation depending on the values of 𝑤c and 𝑤r . It appears that values of 𝑤c > 3 generally raise the threshold. Note that, since the analysis does not take cycles in the graph into account, this has nothing to do with problems in the decoding algorithm associated with cycles; it is an intrinsic part of the structure of the code.

15.8 EXIT Charts for LDPC Codes

699 1

Eb / N0 = 1.72 dB, final mean value

0.6

0.3

µ[ ℓ]

[ ]

ℓ p(Lm→n )

0.7

1.764 dB

0.8 0.4

1.765 dB

0.9 1.8 d B 1.78 dB 1.77 dB

0.5

Eb / N0 = 1.764, 507 iterations Eb / N0 = 1.764, 508 iterations Eb / N0 = 1.764, 509 iterations

0.2

0.4

Eb / N0 = 1.764, 510 iterations 511 iterations

0.1

0.5

0.3

1.72 dB 1.75 dB 1.76 dB

0.2 0.1

0

0

20

40

60

80

100

0

0

100

λ[ ℓ] (a)

200

300 400 500 iterations ℓ

600

700

(b)

[𝓁] Figure 15.25: Behavior of density evolution for a R = 1∕3 code. (a) The pdf of Lm→n for Eb ∕N0 = 1.72 [𝓁] (final), and Eb ∕N0 = 1.764 (various iterations); (b) the mean of the pdf of Lm→n as a function of iteration 𝓁 for different values of Eb ∕N0 .

Table 15.1: Threshold Values for Various LDPC Codes for the Binary AWGN Channel 𝑤c

𝑤r

Rate

Threshold 𝜎𝜏

Threshold Eb ∕N0 (dB)

Capacity (dB)

Gap (dB)

3 3 4 3 4 5 3 4 3

12 9 10 6 8 10 5 6 4

0.75 2/3 0.6 0.5 0.5 0.5 0.4 1/3 0.25

0.6297 0.7051 0.7440 0.8747 0.8323 0.7910 1.0003 1.0035 1.2517

2.2564 1.7856 1.7767 1.1628 1.5944 2.0365 0.9665 1.7306 1.0603

1.6264 1.0595 0.6787 0.1871 0.1871 0.1871 -0.2383 -0.4954 -0.7941

0.63 0.7261 1.098 0.9757 1.4073 1.8494 1.2048 2.226 1.8544

15.8 EXIT Charts for LDPC Codes Recall from Section 14.5 that an EXIT chart is a method for representing how the mutual information between the decoder output and the transmitted bits changes over turbo decoding iterations. EXIT charts can also be established for LDPC codes, as we now describe. Consider the fragments of a Tanner graph in Figure 15.26. In these fragments, there are bit-to-check messages and check-to-bit messages, denoted as 𝜇B→C and 𝜇C→B , respectively, where the messages are [𝓁] the log-likelihood ratios. Let rmn (x) and q[𝓁] mn (x) denote the probabilities computed in the horizontal and vertical steps of Algorithm 15.1, respectively, at the i th iteration of the algorithm. Using the original probability-based decoding algorithm of Algorithm 15.2, the messages from bit nodes (n) to check

800

700

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis

+ μC→B

Channel LLR

μB→C

μB→C

+

μC→B

+ +

Figure 15.26: A portion of a Tanner graph, showing messages from bits to checks and from checks to bits.

nodes (m) or back are

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

ldpcsim.mat exit1.m loghist.m

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

exit3.m dotrajectory.m α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

exit2.m doexitchart.m getinf.m getinfs.m buildexit.m

C → B ∶ 𝜇C→B = log

rmn (1) rmn (0)

B → C ∶ 𝜇B→C = log

qmn (1) . qmn (0)

Let X denote the original transmitted bits. The mutual information (see Section 1.12) between a check-to-bit message 𝜇C→B and the transmitted data symbol for that bit X is denoted as I(X, 𝜇C→B ) = 𝓁] IC→B . The iteration number 𝓁 may also be indicated, as in IC→B . The actual mutual information is [𝓁] are used to estimate the computed experimentally as follows. Histograms of the message data 𝜇C→B probability distribution. These histograms are obtained from the outputs log rmn (1)∕rmn (0) of all the check nodes in the Tanner graph. (Alternatively a single node could be used, by sending the codeword multiple times through the channel with independent noise.) These histograms are normalized to form estimated probability density functions, here denoted p̂ (𝜇), of the random variable 𝜇C→B . Then these estimated density functions are used in the mutual information integral (1.43) wherever p(y| − a) appears. Because of symmetry, the likelihood p(y|a) is computed using p(−𝜇). ̂ The numerical evaluation of the integral then gives the desired mutual information. In a similar way, the mutual information between a bit-to-check message 𝜇B→C and the transmitted data symbol for that bit X, I(X, 𝜇B→C ) = IB→C , is computed from the histograms of the outputs log qmn (1)∕qmn (0) to estimate the densities in (1.43). [𝓁] [𝓁+1] The first trace of the EXIT chart is now formed by plotting IC→B , IB→C for values of the iteration number 𝓁 as the decoding algorithm proceeds. The horizontal axis is thus the check-to-bit axis. The [𝓁+1] as the independent variable, but plotted on the vertical second trace of the EXIT chart uses IB→C [𝓁+1] axis, with IC→B — that is, the mutual information at the next iteration — on the horizontal axis. The “transfer” of information in the EXIT chart results because the check-to-bit message at the output of the 𝓁 + 1th stage becomes the check-to-bit message at the input of the next stage.

Example 15.25 Figure 15.27 shows the density estimated from the histogram of the log-likelihood ratios L = log rmn (1)∕rmn (0) for a (15,000, 10,000) LDPC code at an SNR of 1.6 dB for various iterations of the algorithm. At iteration 0, the log likelihoods from the received signal data are plotted. At the other iterations, the log likelihoods of the bit-to-check information are plotted. Observe that the histogram has a rather Gaussian appearance (justifying the density evolution analysis of Section 15.7) and that as the iterations proceed, the mean becomes increasingly negative. The decoder thus becomes increasingly certain that the transmitted bits are 0. Figure 15.28 shows the mutual information as a function of decoder iteration number for bit-to-check and check-to-bit information for various SNRs. Perhaps the most interesting is for an SNR of 0.4 dB: after an initial increase in information, the decoder stalls and no additional increases occur. The EXIT chart is essentially obtained by eliminating the iteration number parameter from these two plots and plotting them against each other.

15.8 EXIT Charts for LDPC Codes

701

Iteration 0

0.06

Probability PL

Probability PL

0.05 0.04 0.03 0.02 0.01 0 −60

−40 −20 0 Log likelihood ratio L

0.02 0.01 −40 −20 0 Log likelihood ratio L

0.02

0.015 0.01

0.005 −40 −20 0 Log likelihood ratio L

20

Iteration 14

0.02 Probability PL

Probability PL

0.03

0 −60

0.025

0 −60

0.04

20

Iteration 10

0.03

Iteration 1

0.05

0.015 0.01 0.005 0 −60

20

−40 −20 0 Log likelihood ratio L

20

Figure 15.27: Histograms of the bit-to-check information for various decoder iterations at 1.6 dB. SNR = 0.8 dB

1

0.6 Information

Information

0.8 0.6 0.4 Chk to Bit Bit to Chk

0.2 0

SNR = 0.4 dB

0.7

0

10

20

30

40

0.5 0.4 0.3 0.2

Chk to Bit Bit to Chk

0.1 0

50

0

20

Iteration

SNR = 1.6 dB

1

Information

Information

60

0.8

0.6 0.4 Chk to Bit Bit to Chk

0.2 0

40

SNR = 1.8 dB

1

0.8

Iteration

0

5

10 Iteration

15

0.6 0.4 Chk to Bit Bit to Chk

0.2 20

0

0

5

10 Iteration

Figure 15.28: Decoder information at various signal-to-noise ratios.

15

702

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis SNR = 0.8 dB

1

0

[i +1]

0.2 0

0.5 I

0

[i+1]

,I C→ B C→ B

0.5 I

1

[i+1]

,I C→ B C→ B

0.6 0.4

I

I

[i+1]

0.4

B→ C

,I

0.6

B→ C

0.8 [i+1]

0.8

[i]

SNR = 1.8 dB

1

,I

B→ C

[i]

0

1

SNR = 1.6 dB

1

[i+1]

0.4

I

B→ C

0.4 0.2

B→ C

0.6

,I

0.6

B→ C

[i +1]

0.8

I

[i +1]

[i +1]

,I B→ C B→ C

0.8

[i+1]

SNR = 0.4 dB

1

0.2 0

0.2 0

0.5 I

[i]

[i+1]

,I C→ B C→ B

1

0

0

0.5 I

[i]

1

[i+1]

,I C→ B C→ B

Figure 15.29: EXIT charts at various signal-to-noise ratios. Figure 15.29 shows the EXIT chart for this code at various SNRs. The solid bold line plots the points [𝓁] [𝓁+1] [𝓁+1] [𝓁+1] , IB→C ) and the dashed bold line plots the points (IB→C , IC→B ), with the horizontal axis representing IC→B (IC→B and the vertical axis representing IB→C . The narrow solid line plots the progress of the decoding algorithm: the decoder essentially follows the stair-step pattern between the two traces of the plot. At an SNR of 0.8 dB, the code is fairly close to the decoding threshold, so it takes many iterations for the decoder to pass through the channel. At an SNR of 0.4 dB, the decoder is below the decoding threshold: the information is not able to make it through the channel. At an SNR of 1.2 dB, the channel is open somewhat wider, so fewer decoding iterations are required, and at 1.8 dB the channel is open wider still. ◽

15.9 Irregular LDPC Codes Regular LDPC codes, good as they may be, can by improved upon by using irregular LDPC code [283]. An irregular (or nonuniform) LDPC code has a very sparse parity check matrix in which the column weight (resp. row weight) may vary from column to column (resp. row to row). Considering that the results in Table 15.1 suggest that for the same rate, different column/row weights perform differently, the ability to allocate weights flexibly provides potentially useful design capability. In fact, the best known LDPC codes are irregular; gains of up to 0.5 dB compared to regular codes are attainable [282, 373]. In this section, we present some results of the design of irregular codes. This is followed by a sketch of the density evolution analysis which leads to these results.

15.9 Irregular LDPC Codes 15.9.1

703

Degree Distribution Pairs

The distribution of the weights of the columns and rows of the parity check matrix is described as follows. We let 𝜆i represent the fraction of edges emanating from a variable node in the Tanner graph for the code and let 𝜌i represent the fraction of edges emanating from a check node. Let d𝑣 denote the maximum number of edges connected to a variable node and let dc be the maximum number of edges connected to a check node. The polynomial 𝜆(x) =

d𝑣 ∑

𝜆i xi−1

i=2

represents the distribution of variable node weights and 𝜌(x) =

dc ∑

𝜌i xi−1

i=2

represents the distribution of check node weights. These are called the variable node and check node distributions, respectively. The degree distributions satisfy 𝜆(1) = 1 and 𝜌(1) = 1. The pair (𝜆(x), 𝜌(x)) is called a degree distribution pair. For example, for the (3, 6) regular code, 𝜆(x) = x2 and 𝜌(x) = x5 . The number of variable nodes of degree i is (see Exercise 15.12) N∑

𝜆i 𝜆 =N 1 i . ∫0 𝜆(x) dx j≥2 𝜆j ∕j

(15.49)

The total number of edges emanating from all nodes is E=N



𝜆i ∕i 1 i≥2 ∫0 𝜆(x) i

=N dx

1 1 ∫0 𝜆(x)

dx

.

(15.50)

Similarly, the number of check nodes of degree i is N∑

𝜌 ∕i 𝜌i ∕i =N 1 i ∫0 𝜌(x) dx j≥2 𝜌(j)∕j

and the total number of edges is E=M

1 1 ∫0 𝜌(x)

dx

.

(15.51)

Equating (15.50) and (15.51) we find 1

∫ 𝜌(x) dx M = 01 . N ∫0 𝜆(x) dx Under the assumption that the corresponding check equations are all linearly independent, the rate of the code is 1 ∫ 𝜌(x) dx N−M R(𝜆, 𝜌) = . = 1 − 01 N ∫ 𝜆(x) dx 0

Example 15.26

Suppose 𝜆3 = 0.5 and 𝜆4 = 0.5 and N = 1000. Then 𝜆(x) = 0.5x2 + 0.5x3

704

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis and there are N

0.5∕3 = 571 0.5∕3 + 0.5∕4

N

0.5∕4 = 429 0.5∕3 + 0.5∕4

(rounding) variable nodes of degree 3 and

variable nodes of degree 4, for a total of E = 571 ⋅ 3 + 429 ⋅ 4 = 3426 ◽

edges in the graph.

15.9.2

Density Evolution for Irregular Codes

We now summarize how the density evolution is described for these irregular codes. We present the highlights of the technique, leaving aside some technicalities which are covered in [? ]. In the current analysis, a more explicit representation of the distribution of the messages is needed than the consistent Gaussian approximation made in Section 15.7. The decoder algorithms can be summarized as ( ) ⎛ ∏ [𝓁−1] ⎞ Ln→m [𝓁] −1 ⎜ ⎟ tanh (15.52) Lm→n = 2 tanh ⎜′ ⎟ 2 n ∈ ∖{n} ⎝ ⎠ m ∑ [𝓁] [𝓁] = Ln[0] + Lm . (15.53) Ln→m ′ →n m′ ∈n ∖{m}

Write (15.52) as L[𝓁] tanh m→n = 2



tanh(

Ln[𝓁−1] ′ →m

n′ ∈m ∖n

2

)

and take the log of both sides. In doing this, we have to be careful about the signs. Therefore, we will separate out the sign, ( ( ) |) | [𝓁] | | Ln[𝓁−1] ∑ | Lm→n ′ →m | | | [𝓁] [𝓁−1] | . | sgn(Lm→n ), log |tanh sgn(Ln′ →m ), log |tanh (15.54) | = | | | 2 2 | | | | n∈m ∖{n} | | We employ here a somewhat different definition of the sgn function: ⎧0 ⎪ ⎪0 sgn(x) = ⎨ ⎪1 ⎪ ⎩1

x>0 with probability with probability

1 2 1 2

if x = 0 if x = 0

x0 G0 (− log tanh(x∕2)) + Ix p′1 . Then the probabilities can be computed as p′1

1

p0 =

1+

p′1 p′0

p1 =

p′0

1+

p′1

This is a stable way of numerically computing the result. If p′1 > p′0 , then the result can similarly be written in terms of the ratio p′0 ∕p′1 .

.

p′0

15.13 Exercise 15.1 15.2

15.3

15.4

Determine a low-density parity check matrix for the (n, 1) repetition code. Show that there are no cycles of girth 4 in the Tanner graph. ( )

Let H be a binary matrix whose columns are formed from the m2 m-tuples of weight 2. Determine the minimum nonzero weight of a code that has H as its parity check matrix. Show that there are no cycles of girth 4 in the Tanner graph. Let h(x) = 1 + x + x3 + x7 by a parity check polynomial. Form a 15 × 15 parity check matrix by the cyclical shifts of the coefficients of h(x). Show that there are no cycles of length 4 in the Tanner graph. What is the dimension of the code represented by this matrix? For the parity check matrix ⎡1 1 0 1 0 0 0⎤ ⎢0 1 1 0 1 0 0⎥ A=⎢ 0 0 1 1 0 1 0⎥ ⎢ ⎥ ⎣0 0 0 1 1 0 1⎦ (a) (b) (c) (d) (e)

15.5

Construct the Tanner graph for the code. Determine the girth of the minimum-girth cycle. Determine the number of cycles of length 6. Determine a generator matrix for this code. Express the  and  lists describing this parity check matrix. P(c =1)

Let {c1 , . . . , cn } be independent bits, ci ∈ {0, 1} and let 𝜆i (c) = log P(ci =0) . Let z = i the parity check of the ci . Let P(z = 1) 𝜆(z) = log P(z = 0)

∑n

i=1 ci

be

be the likelihood ratio of the parity check. (a) Show that

( ( ) ) n ∏ −𝜆i (c) 𝜆(z) tanh − tanh = . 2 2 i=1

This is the tanh rule (with the sign switched from the form in the chapter). Thus, ( n ( )) ∏ −𝜆 (c) i tanh 𝜆(z) = −2 tanh−1 . (15.64) 2 i=1 (b) Let f (x) = log Show that f (f (x)) = x for x > 0.

ex + 1 , ex − 1

x > 0.

15.13 Exercise

713

(c) Plot f (x). ∏ (d) Let 𝜎z = − ni=1 sign(−𝜆i (c)) be the product of the signs of the bit likelihoods. Show that (15.64) can be written as sign(−𝜆(z)) =

n ∏

sign(−𝜆i (c))

i=1

tanh(|𝜆(z)∕2|) =

n ∏

tanh(|𝜆i (c)|∕2).

(15.65)

i=1

(e) Show that (15.65) can be written as 𝜆(z) = 𝜎z f

( n ∑

) f (|𝜆i (c)|) .

i=1 z

Hint: tanh(z∕2) = eez −1 . +1 (f) Show that if ck is equally likely to be 0 or 1, then 𝜆(z) = 0. Explain why this is reasonable. (g) Explain why 𝜆(z) ≈ −(−1)ẑ |𝜆min (c)|, where |𝜆min (c)| = mini |𝜆i (c)|. 15.6

Let 𝐚m be the mth row of a parity check matrix H and let zm be the corresponding parity check. That is, for some received vector 𝐞, zm = 𝐞𝐚Tm . Let 𝐚m be nonzero in positions i1 , i2 , . . . , ip , so that zm = ei1 + ei2 + · · · + eip . Assume that each ei is 1 with probability pc . Let zm (𝑤) be the sum of the first 𝑤 terms in zm and let pz (𝑤) be the probability that zm (𝑤) = 1. (a) Show that pz (𝑤 + 1) = pz (𝑤)(1 − pc ) + (1 − pz (𝑤))pc , with initial condition pz (0) = 0. (b) Show that pz (𝑤) = 12 − 12 (1 − 2pc )𝑤 .

15.7

[288] Hard-decision decoding on the BSC. Let H be the m × n parity check matrix for a code and let 𝐫 be a binary-valued received vector. A simple decoder can be implemented as follows: Set 𝐜̂ = 𝐫 (initial codeword guess) ̂ T (mod 2) (compute checks) [*] Let 𝐳 = 𝐜A If 𝐳 = 0, end. (everything checks — done) Evaluate the vote vector 𝐯 = 𝐳A (not modulo 2), which counts for each bit the number of unsatisfied checks to which it belongs. The bits of 𝐜̂ that get the most votes are viewed as the most likely candidates for being wrong. So flip all bits 𝐜̂ that have the largest vote. Go to [*] (a) Let 𝐫 = [1, 0, 1, 0, 1, 0, 0, 1, 0, 1]. Compute 𝐬, 𝐯, and the updated 𝐜̂ using the parity check matrix in (15.1). Is the decoding complete at this point? (b) Repeat for 𝐫 = [1, 0, 1, 0, 1, 1, 0, 1, 0, 1]. Continue operation until correct decoding occurs. (c) Show that 𝐯 = 𝐬̂ A counts the number of unsatisfied checks. (d) Some analysis of the algorithm. We will develop an expression for the average number of bits changed in an iteration. Let 𝑤r be the row weight of H (assumed fixed) and 𝑤c be the column weight (assumed fixed). Determine the largest possible number of votes 𝑤 a check can be involved in. (e) Let 𝐞 be the (binary) error vector. Let bit l of 𝐞 participate in check zm (that is, Hml = 1). Show that when el = 0, the probability that bit l receives a vote of 𝑤 is a = P(votel = 𝑤|el = 0) = [pz (𝑤r − 1)]𝑤 , where pz (𝑤) is the function defined in Exercise 15.6.

714

Low-Density Parity-Check Codes: Introduction, Decoding, and Analysis (f) Show that when el = 1, the probability that this bit receives the largest possible number of votes 𝑤 is the probability b = P(votel = 𝑤|el = 1) = [1 − pz (𝑤r − 1)]𝑤 . (g) Show that P(el = 0|votel = 𝑤) ∝ a(1 − pc ) and P(el = 1|votel = 𝑤) ∝ bpc , where pc is the crossover probability for the channel. (h) Hence show that the expected change in the weight of the error vector when a bit is changed after one round of decoding is (a(1 − pc ) − bpc )∕(a(1 − pc ) + bpc ). 15.8

15.9

[149] Consider a sequence of m independent bits in which the j th bit is 1 with probability pj . Show ∏ that the probability that an even number of the bits in the sequence are 1 is (1 + m j=1 (1 − 2pj ))∕2. (For an expression when all the probabilities are the same, see Exercise 3.3.) The original Gallager decoder: Let Pil be the probability that in the ith parity check the lth bit of the check is equal to 1, i = 1, 2, . . . , 𝑤c , l = 1, 2, . . . , 𝑤r . Show that ∏𝑤 −1 𝑤c 1 + l=1r (1 − 2Pil ) P(cn = 0|𝐫, {zm = 0, m ∈ n }) 1 − P(cn = 1|𝐫) ∏ . = P(cn = 1|𝐫, {zm = 0, m ∈ n }) P(cn = 1|𝐫) i=1 1 − ∏𝑤r −1 (1 − 2P ) il l=1

Use Exercise .15.8. 15.10 Suppose that each bit is checked by 𝑤c = 3 checks, and that 𝑤r bits are involved in each check. Let p0 = pc be the probability that a bit is received in error. (a) Suppose that rn is received incorrectly (which occurs with probability p0 ). Show that a parity check on this bit is unsatisfied with probability (1 + (1 − 2p0 )𝑤r −1 )∕2. (b) Show that the probability that a bit in the first tier is received in error and then corrected is p0 ((1 + (1 − 2p0 )𝑤r −1 )∕2)2 . (c) Show that the probability that a bit in the first tier is received correctly and then changed because of unsatisfied parity checks is (1 − p0 )((1 − (1 − 2p0 )𝑤r −1 )∕2)2 . (d) Show that the probability of error p1 of a digit in the first tier after applying the decoding process is [ [ ]2 ]2 1 − (1 − 2p0 )𝑤r −1 1 + (1 − 2p0 )𝑤r −1 + (1 − p0 ) p1 = p0 − p0 2 2 and that after i steps the probability of error of processing a digit in the i th tier is [ [ ]2 ]2 1 − (1 − 2pi )𝑤r −1 1 + (1 − 2pi )𝑤r −1 pi+1 = p0 − p0 + (1 − p0 ) . 2 2 15.11 A bound on the girth of a graph. The girth of a graph is the length of the smallest cycle. In this exercise, you will develop a bound on the girth. Suppose a regular LDPC code of length n has m parity checks, with column weight 𝑤c and row weight 𝑤r . Let 2l be the girth of the associated Tanner graph. (a) Argue that for any node, the neighborhood of edges on the graph of depth l − 1 forms a tree (i.e., the set of all edges up to l − 1 edges away), with nodes of odd depth having “out-degree” 𝑤r and nodes of even depth having “out-degree” 𝑤c . (b) Argue that the number of nodes at even depths of the tree should be at most equal to n, and that the number of nodes at odd depths is equal to m. (c) Determine the number of nodes at depth 0 (the root), at depth 2, at depth 4, at depth 6, and so forth. Conclude that the total number of nodes at even depth is 1 + 𝑤c (𝑤r − 1)

[(𝑤c − 1)(𝑤r − 1)]⌊l∕2⌋ − 1 . (𝑤c − 1)(𝑤r − 1) − 1

This number must be less than or equal to n. This yields an upper bound on l.

15.14 Exercise 15.12 (Irregular codes) Show that (15.49) is true. ∑ 1 15.13 Show that j≥2 𝜆j ∕j = ∫0 𝜆(x) dx. 15.14 Draw the Tanner graph for a (4, 1) RA code with three input bits with an interleaver Π = (6, 12, 8, 2, 3, 4, 10, 1, 9, 5, 11, 7).

15.14 References Many references appear throughout this chapter; see especially Section 15.1. Tanner graphs and codes built using graphs are discussed in [432]. On the analysis of LDPC codes, the paper [375] is highly recommended, which uses density evolution. Density evolution is also discussed in [73] (our discussion comes from there) and from [26]. See also [106]. The EXIT chart for turbo codes was presented in [438, 439]. The discussion of irregular LDPC codes comes from [373]. See also [282]. One of the potential drawbacks to LDPC codes is the large number of decoding iterations that may be required, resulting in increased latency and decoding hardware complexity. An approach to reduce the number of iterations is presented in [61, 85], in which the messages around a cycle in the Tanner graph establish an eigenvalue problem or a least-squares problem. Bit flipping was proposed originally in Gallager [149] and has been extensively examined since then [171,221,254,263,373]. The gradient descent bit flipping is presented in [473], from which discussion on WBF and MWBF is drawn. The presentation on many of the decoding algorithms in sections 15.4 and 15.5 is deeply indebted to the extensive survey in William E. Ryan and Shu Lin, Channel Codes: Classical and Modern (Cambridge University Press, 2009), Chapter 5. The reduced complexity bit box-plus decoder is described in M. Viens and W.E. Ryan, “A Reduced-Complexity Box-Plus Decoder for LDPC Codes,” Fifth Interntional Symposium on Turbo Codes and Related Topcics, pp. 151–156, September 2008, and M. Viens, A Reduced-Complexity Iterative Decoder for LDPC Codes, M.S. Thesis, ECE Department, University of Arizona, December 2007. Discussion of the WBF was presented in [254] and MWBF was presented in [505]. These were discussed with gradient descent bit flipping in [473]. See also [171, 222, 263]. The DC algorithm for decoding is discussed in [497]. Divide and concur, in general, appeared in [170, 171]. Linear programming decoding is discussed in [117]. In addition to decoding, this paper discusses pseudocodewords and fractional distance, which provide a way to lower bound the minimum distance of a code. Another approach to decoding is to use multiple decoding stages, with simple, fast decoders used in early stages, and more complicated decoders used for later stages only if the earlier stages produce a decoding failure. This can result in a decoder with low average decoding complexity but excellent performance. Multistage decoding is discussed in [497]. Despite the breadth of coverage of decoding algorithms here, there are still additional algorithms in the literature. For example: “Algorithm E” of [375] is a low complexity algorithm for a discrete alphabet with erasures; a reduced complexity decoder of [62]; an “eigenmessage” approach which accounts for cycles in the Tanner graph [316]. Our discussion of encoding comes from [374]. Another class of approaches is based on iterative use of the decoding algorithm in the encoding process. This could allow for the same hardware to be used for both encoding and decoding. These approaches are described in [179]. Using the sum-product decoder on general matrices is explored in [314]. An excellent resource on material related to turbo codes, LDPC codes, and iterative decoding in general is the February 2001 issue of the IEEE Transactions on Information Theory, which contains many articles in addition to those articles cited here. Interesting results on the geometry of the iterative decoding process are in [208]. Stopping sets are described in [93]. RA codes are presented in [223]. Our presentation has benefited from the discussion in [26].

715

Chapter 16

Low-Density Parity-Check Codes: Designs and Variations 16.1 Introduction This chapter builds on the LDPC foundations of the previous chapter to present ideas related to the analysis of LDPC codes and several design methodologies. The chapter begins (Section 16.2) with a description of repeat- accumulate (RA) and irregular repeat-accumulate (IRA) codes, codes which have low encode complexity and good performance. LDPC convolutional codes (LDPCCC) are presented in Section 16.3. These are LDPC codes which have a convolutional structure which provides for low encode complexity. Then several design methodologies are presented, starting with a generalization of cyclic codes known as quasi-cyclic (QC) codes (Section 16.4). Many of the LDPC designs are based on QC codes. These designs provide connections between some of the coding and field theory of earlier chapters to LDPC codes. The chapter continues (Section 16.7) with a description of ensembles of LDPC codes. The presentation here is merely descriptive, but these ensembles are used sometimes in design algorithms. Code design using progressive edge growth (PEG) is then described (Section 16.8) and protograph and multi-edge- type LDPC codes (Section 16.9). Trapping sets have been explored as causing error floors in LDPC codes. Trapping sets are examined briefly in Section 16.10. Attention then turns to importance sampling in Section 16.11. This is a topic not directly related to LDPC codes, but seemed appropriate to include here. Simulation is often used to characterize the performance of codes. For strong codes (such as LDPC codes), doing the simulation can take a very long time. Importance sampling is a method of simulation than can (properly applied) reduce the simulation time by many orders of magnitude. The discussion here presents the basic ideas and methods to apply these to error correction coding. This is still an area of research, so applying the principles may take some creativity. The chapter concludes with an introduction to fountain codes in Section 16.12. These are not, strictly speaking, low- density parity-check codes. But the encoding operations make use of simple parity-type computations, so they are in some ways similar to LDPC codes.

16.2 Repeat-Accumulate Codes A regular RA code encoder consists of three trivial encoding blocks, as shown in Figure 16.1. 1. The outer code is an (n, 1) repetition code. These are the simplest known error correction codes, having good distance properties but very low rate. 2. A pseudorandom interleaver (permuter) Π. 1 3. The inner code is a rate-1 recursive convolutional code with generator G(x) = 1+x . This acts as an accumulator, because the output is the mod-2 sum (accumulation) of the inputs.

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

718

Low-Density Parity-Check Codes: Designs and Variations Permute (n, 1) repetition code Message bits

Accumulator +

Π

z−1

Figure 16.1: A repeat-accumulate encoder. Suppose that a systematic RA code is employed, in which the original message bits are transmitted along with the accumulator output. This linear code would have a parity check matrix, which, in turn, would have a Tanner graph representation. Example 16.1 [[26], p. 623] Consider a systematic RA code on K = 2 message bits with a (3, 1) repetition code. The interleaver (permuter) is Π = (1, 4, 6, 2, 3, 5). The operation of the code is as follows. Two message bits, m1 and m2 , arrive at the encoder and are replicated three times: m1 , m1 , m1 , m2 , m2 , m2 . These bits pass through the interleaver (permuter), which produces the output sequence m1 , m2 , m2 , m1 , m1 , m2 . Then the accumulator produces the running sum output: p1 = m1 ,

p2 = m2 + p1 ,

p3 = m2 + p2 ,

p4 = m1 + p3 ,

···

(16.1) ◽

The pi information (that is, the accumulator outputs) form the codeword 𝐜 = (p1 , p2 , . . . , p6 ). Bits in this codeword are modulated, transmitted, and received. The Tanner graph for such a code is shown in Figure 16.2. The variable nodes of the graph have been split into the (systematic) message bits and the parity bits to help distinguish the structure of the graph. The sequence of accumulator computations (16.1) are reflected in the lower part of the graph (connected to the parity bit nodes).

Message bit nodes

m1

m2

Interleaver Check nodes

Parity bit nodes p1

p2

p3

p4

p5

p6

Figure 16.2: The Tanner graph for a (3, 1) RA code with two input bits. The Tanner graph can be interpreted as follows. Reading left-to-right, the first check node constrains the parity bit to be equal to the first bit, m1 . Each succeeding check node constrains the parity bit to be the sum of the previous parity bit and the next input, where the input sequence is determined by the interleaver. This does not define a regular code, since each message bit is connected to n check nodes and the parity bits are connected to one or two check nodes. Once we observe the structure of the RA code on the Tanner graph, we may also observe that, regardless of the length N of the code, the Tanner graph retains its sparseness. Each parity bit node is connected to no more than two check nodes and each message bit node is connected to n check nodes.

16.2 Repeat-Accumulate Codes

719

Sometimes RA codes are transmitted systematically, in which the original bits are sent as well. In this case, the message bits are sent along with the parity information. The codeword in the example above would be 𝐜 = (m1 , m2 , p1 , p2 , p3 , p4 , p5 , p6 ). The RA encoder may also be understood in terms of parity check and generator matrices. This will be demonstrated using the example above, then generalized to other encoders with other parameters. Writing the accumulator equations (16.1) in vector form, we have ] [ ] [ p1 p2 p3 p4 p5 p6 = m1 m2 + p1 m2 + p2 m1 + p3 m1 + p4 m2 + p5 [ ] [ ] 1 0 0 1 1 0 = m1 m2 0 1 1 0 0 1 ⎡0 ⎢0 ] ⎢0 [ + p1 p2 p3 p4 p5 p6 ⎢ ⎢0 ⎢0 ⎢ ⎣0

1 0 0 0 0 0

0 1 0 0 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0⎤ 0⎥ ⎥ 0⎥ . 0⎥ 1⎥ ⎥ 0⎦

Moving the term on the right involving pi to the other side, we obtain ⎡1 ⎢0 ] ⎢0 [ p1 p2 p3 p4 p5 p6 ⎢ ⎢0 ⎢0 ⎢ ⎣0

1 1 0 0 0 0

0 1 1 0 0 0

0 0 1 1 0 0

0 0 0 1 1 0

0⎤ 0⎥ [ ] ⎥ ] 1 0 0 1 1 0 0⎥ [ m m . = 1 2 0 1 1 0 0 1 0⎥ ⎥ 1 ⎥ 1⎦

Let the matrix on the left be called HpT and the matrix on the right be HmT . Then stacking things up, we obtain [ ] [ ] HmT m1 m2 p1 p2 p3 p4 p5 p6 = 0. HpT The matrix

[ ] H = Hm Hp

is thus observed to be a parity check matrix for the codeword ] [ m1 m2 p1 p2 p3 p4 p5 p6 . A generator matrix corresponding to this parity check matrix is ] [ G = I HmT Hp−T , where Hp−T can be seen to be

⎡1 ⎢0 ⎢ 0 =⎢ ⎢0 ⎢0 ⎢ ⎣0

1⎤ 1⎥ ⎥ 1⎥ Hp−T . 1⎥ 1⎥ ⎥ 1⎦ ] [ The action of the generator on an input message vector m1 m2 is that HuT duplicates the message bits (3 times each) then randomly interleaves them into 6 different positions. The matrix Hp−T computes across its columns the cumulative sum of the sequence of inputs in the row vector applied to its left. [ ] More generally, in the parity check matrix H = Hm Hp , Hm is a nK × K matrix with column weight n and row weight 1, and Hp is a nK × nK matrix with 1s on the diagonal and upper diagonal 1 1 0 0 0 0

1 1 1 0 0 0

1 1 1 1 0 0

1 1 1 1 1 0

720

Low-Density Parity-Check Codes: Designs and Variations ⎡1 ⎢1 Hp = ⎢0 ⎢ ⎢0 ⎣0 16.2.1

0 1 1 0 0

0 0 1 1 0

··· ··· ··· ··· ···

0 0 0 0 1

0⎤ 0⎥ 0⎥ . ⎥ 0⎥ 1⎦

Decoding RA Codes

Decoding of RA codes using message passing, just as for other LDPC codes. Unless the RA code is systematic, the variable nodes in the graph are distinguished as two types: the parity bit nodes, corresponding to the observed channel information, and the message bit nodes, corresponding to the unobserved message bits. The major steps are outlined in Figure 16.3. Message bit nodes

Message bit nodes

Interleaver

Interleaver

Check nodes

Check nodes

Parity bit nodes

Parity bit nodes

Channel information

Channel information (b)

(a) Message bit nodes

Message bit nodes

Interleaver

Interleaver

Check nodes

Check nodes

Parity bit nodes

Parity bit nodes Channel information

Channel information (c)

(d)

Figure 16.3: Decoding steps for an RA code. (a) Initialization/parity bit to check; (b) check to message; (c) message to check; (d) check to parity bit. Algorithm 16.1 Repeat Accumulate Decoding Initialization: The input to the decodingalgorithm is the channel information (just as for LDPC codes), whichis copied as the message to each check node. Check node to message bit messages At the check nodes, theusual check node to variable node computation (e.g., ⊞ operations) is performed to send messages to the message bits. Atthe first check node, involved in only 1 bit, the message to thebit node is simply passed through. Message bit to check bit messages The usual variable nodeleave-one-out computation (e.g., sum of likelihood messages) isperformed at the message bits to pass information back to the checknodes. Check node to parity bit messages The usual check nodecomputation is performed.

16.2 Repeat-Accumulate Codes

721

For systematic RA codes, the decoding is identical to the decoding algorithms described above for LDPC codes. 16.2.2

Irregular RA Codes

The RA encoder structure presented above can be generalized to an irregular repeat-accumulate structure [223]. The structure is shown in Figure 16.4. There are K input bits. Instead of repeating all K bits an equal number of times, we choose fractions f2 , f3 , . . . , fq such that q ∑

fi = 1.

i=2

For a block of a total of K bits, a subset f2 K bits is repeated two times, the next subset of f3 K bits is repeated three times, and so forth. These repeated bits pass through a pseudorandom interleaver and the output is divided into M subsets to form the parity bits. There are d1 bits in the first group, d2 bits in the second group, and so on, up to dM bits in the last group. These are added together to form the parity information. Furthermore, the parity check nodes are generalized so that parity pi connects to a sum of di of the repeated message bit nodes. In this figure, the accumulator structure is on the top of the Tanner graph. Assignment of the code parameters (e.g., the fractions f2 , f3 , . . . , fq ) can be optimized using density evolution, so that the decoding SNR threshold can be minimized. Parity bit nodes

Check nodes

d1

d2

d3

d4

dM

Pseudorandom interleaver Message bit nodes

f2 K nodes repeat 2

f3 K nodes repeat 3

fq K nodes repeat q

Figure 16.4: Tanner graph for an irregular repeat-accumulate code.

Example 16.2 Let K = 2, f2 = 0.5, f3 = 0.5, M = 3 and d1 = 2, d2 = 2, d3 = 1. The Tanner graph for this encoder is shown in Figure 16.5. The operation of the Tanner graph can be expressed as [ ] ] [ ] 1 1 0 ] ⎡0 1 0⎤ [ [ p1 p2 p3 = m1 m2 + p1 p2 p3 ⎢0 0 1⎥ 1 1 1 ⎢ ⎥ ⎣0 0 0⎦ or [

[ ] ] ⎡1 1 0⎤ [ ] 1 1 0 p1 p2 p3 ⎢0 1 1⎥ = m1 m2 1 1 1 ⎢ ⎥ ⎣0 0 1⎦

Calling the matrix on the left Hp and the matrix on the right Hm (as before), these equations can be stacked as [ ] [ ] HmT m1 m2 p1 p2 p3 = 0. HpT



722

Low-Density Parity-Check Codes: Designs and Variations p1

Parity bit nodes

p2

p3

Check nodes

Permuter

Message bit nodes m1

m2

Figure 16.5: Example Tanner graph for an irregular repeat-accumulate code.

In general, the parity check matrix has the same structure as for the RA code, but Hm has column weighs Kf2 , Kf3 , . . . , Kfq , and row weights d1 , d2 , . . . , dM . The matrix Hp representing the accumulator is as for the RA codes. Let K 1∑ q= Kf K j=1 j denote the average repetition rate of the message bits and let d=

M 1 ∑ d M j=1 j

be the average of the degrees. Then for nonsystematic IRA codes, the rate is R=

d q

(in the example above the rate is R = 2∕3). For systematic IRA codes, the rate is R=

d d+q

(in the example above, it the message bits are also sent to make a systematic code the rate is 2∕5). While an irregular RA code is simply a special case of an irregular LDPC code, it has an important advantage: it can be very efficiently encoded. The RA codes thus provide an instance of a code which has linear encode complexity and linear decode complexity, while still achieving excellent decoding performance. 16.2.3

RA Codes with Multiple Accumulators

The flexibility provided by IRA structures can be further enhanced with encoding structures that employ multiple accumulators. It has been shown, for example, that designs with lower error floors can be obtained. A structure known as the irregular repeat-accumulate-accumulate (IRAA) is shown in Figure 16.6, in which the IRA structure is followed by a permutation Π and another accumulator. If the output is

16.3 LDPC Convolutional Codes

723 m b

m

H mT

1 1+x

Π

1 1+x

p

Figure 16.6: An IRAA encoder.

taken to be systematic and both 𝐛 and 𝐩 (the doubly accumulated bits), then the parity check equations are 𝐦HmT + 𝐛HpT = 0 (regular IRA) 𝐛ΠT + 𝐩HpT = 0

(permute parity and accumulate)

That is, the 𝐛 vector is the same as the IRA decoder above. Then these bits are permuted and pass through another accumulator.

16.3 LDPC Convolutional Codes Encoding LDPC codes based on simply multiplying a message vector times a generator matrix has computational complexity O(N 2 ), and the hardware can be complex due to the usual irregularity of the structure of G. LDPCCC, like the more traditional convolutional codes, offer low encode complexity, while still retaining the benefits typically associated with LDPC codes. Let 𝐮i = (ui,1 , ui,2 , . . . , ui,b ) be a block of b input bits. A sequence of input data over t + 1 blocks is [ ] 𝐮[0,t] = 𝐮0 𝐮1 𝐮2 · · · 𝐮t . Each block 𝐮i is encoded to produce an encoded block of length c [ ] 𝐯i = 𝑣i,1 𝑣i,2 𝑣i,3 · · · 𝑣i,c , where each 𝑣i,j is a bit. The encoding is systematic, so that the first b bits are the input bits and the last c − b bits are parity bits. That is, each 𝐯i can be written as ] [ 𝐯i = 𝐯i (s) 𝐯i (p) with 𝐯i (s) = 𝐮i . Considering these are convolutional codes, an infinite stream of bits 𝐮[0,∞] is input and an infinite stream of coded bits 𝐯[0,∞] is produced. Let H[0,∞] be the infinite parity check matrix associated with the convolutional code. The coded sequence must satisfy the parity check equation T 𝐯[0,∞] H[0,∞] = 0.

(16.2)

For an LDPCCC, a section of the parity check matrix has the form

T H[t,t ′]

··· HM (t + M)T ⎤ ⎡H0 (t)T H1 (t + 1)T T T ⎥ ⎢ H0 (t + 1) H1 (t + 2) ··· HM (t + M + 1)T ⎥. ⋱ ⋱ ⋱ =⎢ ⎢ H1 (t′ + 1)T · · · HM (t′ + M + 1)T ⎥ H0 (t′ )T ⎥ ⎢ ⎦ ⎣

724

Low-Density Parity-Check Codes: Designs and Variations Each block Hi (t) has the following properties:

• • •

Hi (t)T is of size c × (c − b). H0 (t)T has full rank for all t. HM (t) for at least one value of t.

Also, the matrix repeats after some number of block rows.

Example 16.3

An example of an H T matrix is given here.

In this case, each block is of size 2 × 1 (c = 2, b = 1). The number of columns along a block row is M + 1 = 5. ◽

The parity check equation (16.2) applied over a block of data leads to 𝐯t H0 (t)T + 𝐯t−1 H1 (t)T + · · · + 𝐯t−M HM (t)T = 0. At time t, the previous blocks 𝐯t−1 , . . . , 𝐯t−M are known. Moving the known blocks to the other side and writing [ ] ] Hs [ 𝐯t H0 (t)T = 𝐮t 𝐯t (p) , Hp where Hs is b × (c − b) and Hp is (c − b) × (c − b), we can write 𝐯t (p)Hp = −𝐮t Hs − 𝐯t−1 H1 (t)T − · · · − 𝐯t−M HM (t)T . Since Hp is full rank, the parity bits can be obtained. To be a low-density parity check matrix, a column of H must be low density, which means that the weight of each column of H should be much less than (c − b)(M + 1). Parity check matrices can be found by computer search, or by stacking parity check matrix blocks from an LDPC code. For example, consider the 5 × 10 parity check matrix on the left of (16.3). This can be partitioned as shown with the line. The lower partition is stacked above the upper partition, as

16.4 Quasi-Cyclic Codes

725

shown on the right of (16.3). This construction can be repeated indefinitely. This gives a convolutional encoder with row weight 4, column weight 2, and period 5 [386, Section 15.5].

(16.3)

16.4 Quasi-Cyclic Codes QC codes are a class of linear codes generalizing the cyclic codes introduced in Chapter 4. QC codes have encoding complexity linear in the size of the code (as opposed to quadratic complexity for typical block codes). While they are of general interest, the particular interest here is due to the fact that some QC codes may have a low-density parity matrix, that is, some QC codes are LDPC codes. Methods of constructing such parity check matrices are described in sections below. 16.4.1

QC Generator Matrices

Let 𝐯1 , 𝐯2 , . . . , 𝐯t be binary row vectors, each of length b, where [ ] 𝐯j = 𝑣j,1 𝑣j,2 𝑣j,3 . . . 𝑣j,b . Let 𝐯(1) denote the vector obtained by cyclically shifting 𝐯j one place to the right: [ ] 𝐯j(1) = 𝑣j,b 𝑣j,1 𝑣j,2 · · · 𝑣j,b−1 . Form the vector 𝐯 of length bt by stacking up the 𝐯j vectors, ] [ 𝐯 = 𝐯1 𝐯2 . . . 𝐯t and let 𝐯(1) denote the stack of the cyclic shifts 𝐯j(1) , ] [ 𝐯(1) = 𝐯1(1) 𝐯2(1) . . . 𝐯t(1) . This vector is called the t-sectionalized cyclic shift of 𝐯. With this notation, we define a quasi-cyclic code qc with parameters b, k, and t as a (tb, k) linear block code if every t-sectionalized cyclic shift of a codeword in qc is also a codeword in qc . Clearly, if t = 1, a conventional cyclic code is obtained. A circulant b × b matrix is one for which each row is a right cyclic shift of the row above it. An example of a circulant matrix is ⎡1 0 1 0 0⎤ ⎢0 1 0 1 0⎥ ⎢0 0 1 0 1⎥ . ⎢ ⎥ ⎢1 0 0 1 0⎥ ⎣0 1 0 0 1⎦ The first row of a circulant matrix is called its generator.

726

Low-Density Parity-Check Codes: Designs and Variations The generator matrix of a QC code is a c × t block matrix, where each block is a b × b circulant matrix. In systematic form, ⎡G1 ⎤ ⎡G1,1 G1,2 ⎢G ⎥ ⎢G G G = ⎢ 2 ⎥ = ⎢ 2,1 2,2 ⋮ ⋮ ⋮ ⎢ ⎥ ⎢ ⎣Gc ⎦ ⎣Gc,1 Gc,2 ] [ = P Icb

· · · G1,t−c · · · G2,t−c ⋱ ⋮ · · · Gc,t−c

I 0 ⋮ 0

0 I ⋮ 0

··· ··· ⋱ ···

0⎤ 0⎥ ⋮⎥ ⎥ I⎦

(16.4)

Each block of G, including each Gi,j and the identities I and the 0s, is a b × b circulant matrix. Example 16.4

[386, p, 134] A generator matrix for a (20, 10) code with b = 5, c = 2, and t = 4 is



Encoding is done using the usual vector/matrix multiplication. Let the message vector 𝐦 be partitioned into c blocks of b bits as [ ] [ ] 𝐦 = 𝐦1 𝐦2 . . . 𝐦c = m1 m2 . . . mcb , where the elements of 𝐦i are

[ ] 𝐦i = m(i−1)b+1 m(i−1)b+2 . . . mib .

The systematically encoded QC codeword is [ ] 𝐜 = 𝐦G = 𝐦1 G1 + 𝐦2 G2 + · · · + mc Gc = 𝐩1 𝐩2 . . . 𝐩t−c 𝐦1 𝐦2 . . . 𝐦c , where 𝐩j = 𝐦1 G1,j + 𝐦2 G2,j + · · · + 𝐦c Gc,j .

(16.5)

This operation can be accomplished efficiently by utilizing the cyclic structure of the generator. For the subblock Gi,j in the generator (16.4), let 𝐠i,j denote its generator (first row), and let 𝐠(𝓁) denote its i,j (0) right cyclic shift, where 𝐠i,j = 𝐠i,j . The vector/matrix product terms in (16.5) can be expressed using generators as (0) (1) 𝐦i Gi,j = m(i−1)b+1 𝐠i,j + m(i−1)b+2 𝐠i,j + · · · + mib 𝐠(b−1) . i,j

This computation can be efficiently realized using hardware as shown in Figure 16.7(a). This hardware is referred to as a cyclic shift register adder accumulator (CSRAA). The hardware works as follows:



The b-bit feedback shift register at the top is loaded with the generator 𝐠c,j i(b−1) , the (b − 1)st right cyclic shift of the generator 𝐠c,j of the circulant Gc,j .

16.4 Quasi-Cyclic Codes

727



The message bits are shifted in, starting at the last message bit mcb . The circuitry forms the , which is placed into the b-bit accumulator (at the bottom). product mcb 𝐠(b−1) c,j



The shift register at the top is shifted left, producing 𝐠(b−2) , and the next message bit mcb−1 is c,j shifted in. The product mcb−1 𝐠(b−2) is accumulated into the accumulator. c,j



This process continues, cyclically shifting the top register while shifting in message bits until the first bit of 𝐦c and accumulating the products into the accumulator, which is m(c−1)b+1 . At this point, the accumulator contains the product 𝐮c Gc,j .



Now the feedback shift register is loaded with the generator 𝐠(b−1) , the (b − 1)-times right-shifted c,j generator and the next b bits are shifted in. After these b shifts, the accumulator contains the product 𝐮c−1 Gc−1,j + 𝐮c Gc,j .



The process continues inputting 𝐜c−2 , 𝐜c−3 , . . . , 𝐜1 , initializing the top buffer, in turn, with , 𝐠(b−1) , . . . , 𝐠(b−1) . 𝐠(b−1) c−2,j c−3,j 1,j



After the process finishes, the register contains 𝐩j from (16.5). The encoding complexity (as measured by “ticks of the clock”) is the length of the message, since the whole message is clocked in.

The operation of computing 𝐩j can be performed in parallel to compute the set 𝐩1 , 𝐩2 , . . . , 𝐩t−c , essentially duplicating the CSRAA for each block. This is shown in Figure 16.7(b). b-bit feedback shift register 1

2

1

b

Input mi

CSRAA 1

p1

CSRAA 2

p2

Input m 1

2

3

b

b-bit accumulator register

(a)

CSRAA t– c

pt –c

(b)

Figure 16.7: Encoding QC codes. (a) cyclic shift register adder accumulator (CSRAA); (b) parallelizing the CRSAA.

16.4.2

Constructing QC Generator Matrices from QC Parity Check Matrices

In the sections below, QC low-density parity check matrices will be constructed. This raises the question: given a QC parity check matrix, how can a QC generator matrix be produced so that the low-complexity encoding techniques described in the previous section can be employed? This question is addressed in this section. Let A0,1 · · · A0,t−1 ⎤ ⎡ A0,0 ⎢ A1,0 A1,1 · · · A1,t−1 ⎥ H=⎢ ⎥ ⋮ ⎥ ⎢ ⎣At−c−1,0 At−c−1,1 · · · At−c−1,t−1 ⎦ be a (t − c)b × tb binary matrix, where each Ai,j is a b × b circulant matrix. A QC code QC is formed by the nullspace of this matrix. It may happen that the rank of H is equal to number of rows (t − c)b; in this case, H is said to be full rank and the code is a (tb, cb) code. If His not full rank, then the dimension of the code is larger than cb.

728

Low-Density Parity-Check Codes: Designs and Variations 16.4.2.1

Full Rank Case

In finding a corresponding generator matrix, we consider first the case that H is full rank. Assume that the columns of H are arranged such that the left (t − c)b × (t − c)b matrix D, A0,1 · · · A0,t−c−1 ⎤ ⎡ A0,0 ⎢ A1,0 A1,1 · · · A1,t−c−1 ⎥ D=⎢ ⎥ ⋮ ⎢ ⎥ ⎣At−c−1,0 At−c−1,1 · · · At−c−1,t−c−1 ⎦ has rank (t − c)b. A generator G corresponding to the parity check matrix H must satisfy HGT = 0. Imposing the condition that G is to be circulant of the form (16.4), let 𝐠i,j be the generator of the circulant block Gi,j . Let 0b = (0, 0, . . . , 0) be an all-zero (row) b-tuple and let 𝐮b = (1, 0, . . . , 0) be the unit (row) b-tuple with 1 in the first position. The first row of the block row Gi of G in (16.4) is [ ] 𝐠i = 𝐠i,0 𝐠i,1 · · · 𝐠i,t−c−1 0b · · · 0b 𝐮b 0b · · · 0b , where the 𝐮b is in block (t − c + i). From the condition that HGT = 0, it follows that H𝐠Ti = 0. Let 𝐳i be the (t − c)b × 1 vector ] [ 𝐳i = 𝐠i,0 𝐠i,1 · · · 𝐠i,t−c−1 . Let Mt−c+i denote the (t − c + i)th column of circulant blocks H,

Mt−c+i

⎡ A0,t−c+i ⎤ ⎥ ⎢ A = ⎢ 1,t−c+i ⎥ , ⋮ ⎥ ⎢ ⎣At−c−1,t−c+i ⎦

where 0 ≤ i < c, so that Mt−c+i is a column of H not included in D. The product H𝐠Ti = 0 can be expressed as (16.6) D𝐳iT + Mt−c+i 𝐮Tb = 0. Since D is a (t − c)b × (t − c)b (square) matrix and full rank, it has an inverse and (16.6) can be solved for 𝐳iT as 𝐳iT = D−1 Mt−c+i 𝐮Tb . Solving this for i = 0, 1, . . . , c − 1 gives 𝐳0 , 𝐳1 , . . . , 𝐳c−1 . Breaking each of these into b-blocks gives the set of generators gi,j of Gi,j of G. 16.4.2.2

Non-Full Rank Case

Let r = rank(H), where r < (t − c)b. Find the number of block columns of H such that these 𝓁 columns form a submatrix having rank r. The columns of H can be reordered to form a (t − c)b × tb matrix H ∗ such that the first 𝓁 block columns of H ∗ form an array D∗ , where the columns are now labeled such that ⎡ A0,0 A0,1 · · · A0,𝓁−1 ⎤ ⎢A A1,1 · · · A1,𝓁−1 ⎥ D∗ = ⎢ 1,0 ⎥. ⋮ ⎥ ⎢ ⎣At−c,0 At−c,1 · · · At−c,𝓁−1 ⎦ Each Ai,j is a b × b circulant matrix. The code that is the nullspace of H ∗ is equivalent to the code what is the nullspace of H (same distance properties). D∗ is a matrix that has rank r, so there are r linearly independent columns and b𝓁 − r linearly dependent columns. The generator is a (tb − r) × tb matrix that has the following form: [ ] Q , (16.7) G= G∗

16.4 Quasi-Cyclic Codes

729

where Q is (𝓁b − r) × tb and G∗ is (t − 𝓁)b × tb that can be expressed in systematic circulant form ⎡ G0,0 G0,1 · · · G0,𝓁−1 I 0 · · · ⎢G G1,1 · · · G1,𝓁−1 0 I · · · G = ⎢ 1,0 ⋮ ⎢ ⎣Gt−𝓁,0 Gt−𝓁,1 · · · Gt−𝓁,𝓁−1 0 0 · · · ∗

0⎤ 0⎥ ⎥. ⎥ I⎦

Here, each Gi,j and I and 0 are b × b, and each Gi,j is circulant with generator 𝐠i,j . As in the full rank case, let ] [ 𝐳i = 𝐠i,0 𝐠i,1 · · · 𝐠i,𝓁−1 . Let

Mt−𝓁+i

⎡ A0,t−𝓁+i ⎤ ⎢ A1,t−𝓁+i ⎥ ⎥, ⋮ =⎢ ⎥ ⎢ ⎢At−c,t−𝓁+i ⎥ ⎦ ⎣

0≤i≤𝓁−1

be column t − 𝓁 + i of H ∗ . The equation H ∗ GT = 0 implies (for the i block row) that D∗ 𝐳i + Mt−𝓁+i 𝐮Tb = 0.

(16.8)

Setting the elements of 𝐳i to 0 that correspond to the linearly dependent columns of D∗ (resulting in r elements of 𝐳i to solve for), (16.8) can be solved for the remaining elements of 𝐳i . This gives the generators of the Gi,j in G∗ . The matrix Q in (16.7) must satisfy H ∗ Q = 0. This can be done by exploiting the structure of the circulant matrices in D∗ . Since D∗ is not full rank, some of the circulant matrices in D∗ will have rank less than b. Let d0 be the number of linearly dependent columns (out of b) in the first block column of D∗ , let d1 be the number of linearly independent columns in the second block column of D∗ , and so on up to d𝓁−1 . Then d0 + d1 + · · · + d𝓁−1 = 𝓁b − r. If a b × b circulant matrix has rank 𝜌 less than b, then any 𝜌 columns of the matrix are linearly independent, and the other b − 𝜌 columns are linearly dependent. Using this fact, take the last b − dj columns of the jth block column of D∗ as linearly independent columns, for j = 0, 1, . . . , 𝓁 − 1. Now the matrix Q can be put in the form Q ··· Q 0 0 ··· 0 Q ⎡ Q0 ⎤ ⎡ Q0,0 Q0,1 · · · Q0,𝓁−1 00,0 00,1 · · · 00,t−𝓁−1 ⎤ ⎢ 1,0 1,1 1,𝓁−1 1,0 1,1 1,t−𝓁−1 ⎥ ⎢ Q ⎥ ⎥. Q=⎢ 1 ⎥=⎢ ⋮ ⋮ ⎢ ⎥ ⎢⎢Q𝓁−1,0 Q𝓁−1,1 · · · Q𝓁−1,𝓁−1 0𝓁−1,0 0𝓁−1,1 · · · 0𝓁−1,t−𝓁−1 ⎥⎥ ⎣Q𝓁−1 ⎦ ⎣ ⎦ Here, each 0i,j is a di × b matrix of zeros and Qi,j is a di × b partial circulant matrix, formed by cyclically shifting its first row di − 1 times. From this matrix form the first row of Qi as 𝐪i = (𝐰i , 0) = (qi,0 , qi,1 , . . . , qi,lb−1 , 0, 0, . . . , 0), where there are (t − 𝓁)b zeros. The 𝓁b bits of 𝐰i correspond to the 𝓁b columns of D∗ . The 𝓁b − r bits that correspond to the linearly dependent columns of D∗ are set to have the form (00 , . . . , 0i−1 , 𝐮i , 0i+1 , . . . , 0𝓁−1 ),

730

Low-Density Parity-Check Codes: Designs and Variations where 0e is a row of de zeros and 𝐮i (here) is the unit row vector 𝐮i = (1, 0, . . . , 0) with di elements. The number of unknown elements in 𝐰i to be determined is thus r. The requirement that H ∗ QT = 0 gives rise to the equation



D

𝐰Ti

A0,1 · · · A0,𝓁−1 ⎤ ⎡ qi,0 ⎤ ⎡ A0,0 ⎢ A1,0 A1,1 · · · A1,𝓁−1 ⎥ ⎢ qi,1 ⎥ =⎢ ⎥ ⎢ ⋮ ⎥ = 0. ⋮ ⎥⎢ ⎥ ⎢ ⎣At−c−1,0 At−c−1,1 · · · At−c−1,𝓁−1 ⎦ ⎣qi,𝓁b−1 ⎦

Finding a nonzero in the nullspace gives the vector 𝐰i , which can be broken up into 𝓁 − 1 components 𝐰i,0 , 𝐰i,1 , . . . , 𝐰i,𝓁−1 , each of length b. Each partial circulant Qi,j of Q is then obtained by shifting 𝐰i,j as its generator and shifting di − 1 times. Once G is obtained, it can be used with a CSRAA circuit as in Figure 16.7 for an efficient encoding operation.

16.5 Construction of LDPC Codes Based on Finite Fields In the early days of LDPC codes, the parity check matrices of LDPC codes were selected at random. More recently, however, several approaches to designing parity-check codes have been developed. Many of these make use of finite field theory, thus providing some interesting interplay between classical field-based codes such as BCH and RS codes and more modern LDPC codes. In this section, we consider a few of these design techniques. In the Galois field GF(q), where q = ps for some prime p, let 𝛼 be primitive. Then all of the nonzero elements of the field can be represented as powers of 𝛼. For each nonzero element of the field 𝛼 i , 0 ≤ i ≤ q − 2, form vector 𝐳(𝛼i ), known as the location vector of 𝛼 i , as 𝐳(𝛼 i ) = (z0 , z1 , . . . , zq−1 ), where the ith element is 1 and all other elements are 0. Clearly if i ≠ j, then 𝐳(𝛼 i ) ≠ 𝐳(𝛼 j ) — the nonzero elements appear in different locations. Let 𝛿 be a nonzero element of GF(q), say 𝛿 = 𝛼 j . The location vector 𝐳(𝛿) is nonzero only at position j. The location vector 𝐳(𝛼𝛿) is nonzero at position j + 1 (modulo q − 1), where the nonzero location wraps around the end of the location vector. In general, 𝐳(𝛼 i 𝛿) is a right cyclic shift of 𝐳(𝛿) by i positions, and 𝐳(𝛼 q−1 𝛿) = 𝐳(𝛿). Now form a matrix with rows ⎡ 𝐳(𝛿) ⎤ ⎢ 𝐳(𝛼𝛿) ⎥ A=⎢ . ⋮ ⎥ ⎢ q−2 ⎥ ⎣𝐳(𝛼 𝛿)⎦ This is a (q − 1) × (q − 1) circulant matrix. Since there is only one 1 on each row, A is a permutation matrix, referred to as a circulant permutation matrix (CPM). The matrix A is also referred to as the (q − 1)-fold matrix dispersion of 𝛿 over GF(2). The matrix A is a function of 𝛿, so it would be more explicit to write A = A(𝛿). It is clear that for different field elements 𝛿 and 𝛾 in GF(q), A(𝛿) ≠ A(𝛾). Consider now a matrix an m × n matrix W over GF(q), defined as ⎡ 𝐰0 ⎤ ⎡ 𝑤0,0 𝑤0,1 . . . 𝑤0,n−1 ⎤ ⎢ 𝐰 ⎥ ⎢ 𝑤 𝑤1,1 . . . 𝑤1,n−1 ⎥ W = ⎢ 1 ⎥ = ⎢ 1,0 ⎥. ⋮ ⋮ ⎥ ⎢ ⎥ ⎢ ⎣𝐰m−1 ⎦ ⎣𝑤m−1,0 𝑤m−1,1 . . . 𝑤m−1,n−1 ⎦

16.5 Construction of LDPC Codes Based on Finite Fields

731

The matrix W is to be designed to satisfy two constraints: 1. Consider the row 𝐰i . The multiplication 𝛼 k 𝐰i produces a row that shuffles the elements in GF(q) around, as does the product 𝛼 𝓁 𝐰i . The first constraint is that for any row, 0 ≤ i ≤ m and any k ≠ 𝓁 with 0 ≤ k, 𝓁 ≤ q − 2, 𝛼 k 𝐰i and 𝛼 𝓁 𝐰i differ in at least n − 1 positions. This constraint precludes the possibility that 𝐰i = 0 in more than one position (because in such zero positions both 𝛼 k 𝐰i and 𝛼 𝓁 𝐰i would be 0). 2. For rows 𝐰i and 𝐰j , 𝛼 k 𝐰i and 𝛼 𝓁 𝐰j differ in at least n − 1 positions for 0 ≤ i, j ≤ m − 1, i ≠ j and 0 ≤ k, 𝓁 ≤ q − 2. This constraint implies that 𝐰i and 𝐰j differ in at least n − 1 positions. Matrices over GF(q) satisfying these two constraints are said to satisfy the 𝛼-multiplied row constraint. A construction of a matrix satisfying the 𝛼-multiplied row constraint is given in Section 16.5.1. For a matrix W as above, a potential parity check matrix is obtained by replacing each nonzero GF(q) element by its (q − 1) × (q − 1) CPM, and replacing each 0 GF(q) element by (q − 1) × (q − 1) matrix of 0s, resulting in ⎡ A0,0 A0,1 · · · A0,n−1 ⎤ ⎢ A A1,1 · · · A1,n−1 ⎥ Hdisp = ⎢ 1,0 ⎥. ⋮ ⎥ ⎢ ⎣Am−1,0 Am−1,1 · · · Am−1,n−1 ⎦ Hdisp is a m(q − 1) × n(q − 1) binary matrix. From the first constraint above, there is at most one zero in each row of W, and so at most one block of zeros in each block row of Hdisp . By the second constraint above, for any two block rows of Hdisp there is at most one place where they have the same cyclic permutation matrix. At any other place on any two rows, the two cyclic permutation matrices are different. It follows that Hdisp satisfies the RC-constraint described in Section 15.3. For any pair of integer (g, r), 1 ≤ g ≤ m and 1 ≤ r ≤ n, let Hdisp (g, r) be a g × r submatrix of Hdisp , forming a g(q − 1) × r(q − 1) matrix over GF(2). This also satisfies the RC constraint. The nullspace of this matrix is a binary linear code of length r(q − 1) with rate at least (r − g)∕r. It can be shown that this code has minimum distance at least g + 2 (for even g) or g + 1 (for odd g). This construction motivates the search for 𝛼-multiplied row-constrained matrices such as W, from which a parity check matrix can be formed by matrix dispersion. Several such methods are developed in Sections 16.5.1 through 16.5.5. The general pattern in each case is the same. A method of constructing a matrix W is obtained, an H matrix is obtained by replacing elements of W with a matrix dispersion, and then a submatrix is obtained. The descriptions are summarized as code construction recipes. The codes constructed by these techniques tend to have good distance properties, such as being close to Shannon channel capacity, or having low error floors. 16.5.1

I. Construction of QC-LDPC Codes Based on the Minimum-Weight Codewords of a Reed–Solomon Code with Two Information Symbols

This method is presented in [386, Section 11.3] and [100]. Consider a (q − 1, 2) RS (q − 1, 2) RS code RS2 over the field GF(q) having minimum distance q − 2 with a generator polynomial g(x) whose roots are 𝛼, 𝛼 2 , . . . , 𝛼 q−3 , where 𝛼 is primitive: 𝛼 q−1 = 1. Since the minimum distance is q − 2, the minimum weight nonzero codeword has weight q − 2. The polynomials c1 (x) = 1 + x + x2 + · · · + xq−2 =

q−2 ∑

xj =

j=0

and c2 (x) = 1 + 𝛼x + 𝛼 2 x2 + · · · + 𝛼 q−2 xq−2 =

xq−1 − 1 xn 1

𝛼 q−1 xq−1 − 1 𝛼x − 1

732

Low-Density Parity-Check Codes: Designs and Variations have 𝛼, 𝛼 2 , . . . , 𝛼 q−2 as roots, so they are both code polynomials in the code. The corresponding codewords are 𝐜1 = (1, 1, . . . , 1) 𝐜2 = (1, 𝛼, . . . , 𝛼 q−2 ), each having weight q − 1. Form the weight-(q − 2) codeword as the difference 𝐰0 = 𝐜2 − 𝐯1 = (0, 𝛼 − 1, 𝛼 2 − 1, . . . , 𝛼 q−2 − 1). Now form the (q − 1) × (q − 1) matrix W (1) by cyclic shifts of 𝐰0 , W (1)

𝛼 − 1 𝛼 2 − 1 · · · 𝛼 q−2 − 1⎤ ⎡ 𝐰0 ⎤ ⎡ 0 q−2 ⎢ 𝐰 ⎥ ⎢𝛼 −1 0 𝛼 − 1 · · · 𝛼 q−3 − 1⎥ =⎢ 1 ⎥=⎢ ⎥. ⋮ ⋮ ⎥ ⎢ ⎢ ⎥ 2 3 0 ⎦ ⎣𝐰q−2 ⎦ ⎣ 𝛼 − 1 𝛼 − 1 𝛼 − 1 · · ·

Each row and column of W (1) contains exactly one 0, with the 0 elements each lying on the main diagonal of W (1) . Since RS2 is cyclic, each of the rows of W (1) is a codeword of RS2 , each of minimum weight. Any two rows of W (1) differ in q − 1 positions. Furthermore, any 𝛼 k 𝐰i and 𝛼 𝓁 𝐰i are also minimal weight codewords, where 0 ≤ i < q − 1, 0 ≤ k < q − 1, 0 ≤ 𝓁 < q − 1 with k ≠ 𝓁. Both such codewords have 0 at position i and so must differ at the other q − 2 positions. The rows of W (1) thus satisfy the 𝛼-multiplied row constraint 1. For rows i and j with i ≠ j, 𝛼 k 𝐰i and 𝛼 𝓁 𝐰k are two different minimum-weight codewords in RS2 which differ in at least q − 2 places. So W (1) also satisfies the 𝛼-multiplied row constraint 2. By replacing each element in W (1) by its (q − 1)-fold matrix dispersion, a (q − 1)2 × (q − 1)2 block circulant matrix is obtained,

H

(1)

⎤ ⎡ 0 A0,1 · · · A0,q−2 ⎢A0,q−2 0 A0,1 · · · A0,q−3 ⎥ =⎢ ⎥. ⎥ ⎢ ⋮ ⎦ ⎣ A0,1 A0,2 · · · 0

From its construction, H (1) has the following properties: 1. Each row (and column) has only one zero block matrix. 2. These zero block matrices lie down the main diagonal. 3. Each block row is a cyclic shift of the block row above it. 4. Each of the cyclic permutation matrices in each row are different from each other. 5. H (1) satisfies the RC constraint and has row weight and column weight equal to q − 2. Now submatrices can be selected to establish codes of different sizes. For a pair of integers (g, r), with 1 ≤ g, r < q, let H (1) (g, r) be a g × r block submatrix of H (1) . It will also satisfy the RC constraint. If the submatrix H (1) (g, r) is chosen to exclude the diagonal of H (1) , then H (1) (g, r) has constant weight g in each column and constant weight r in each row. The nullspace of H (1) (g, r) is a code of rate at least (r − g)∕r. α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

gf257ex.m cyclicshift1.m blockshift1.m HRSg4r32.txt

Example 16.5 [386, p 489] Let q = 257 (prime) and consider the (256, 2) RS code over GF(q). The matrix W (1) is of size 256 × 256. The matrix formed by the matrix dispersion is of size 2562 × 2562 = 65,536 × 65,536. Then for various values of (g, r) codes of different sizes can be obtained. For example, when g = 4, r = 32 the H matrix is of size 1024 × 8192, corresponding to a (8192, 7171) code of rate 0.875. If the submatrix is taken from above or below the diagonal, the code is regular, with column weight 4 and row weight 32. By simulation it can be determined, for example, that at a probability of bit error or Pb = 10−6 , it is approximately 1 dB away from Shannon channel capacity. Taking g = 4 and r = 64 gives a (16384, 15363) code of rate 0.9376, which performs 0.75 dB away from ◽ capacity at Pb = 10−6 .

16.5 Construction of LDPC Codes Based on Finite Fields 16.5.2

733

II. Construction of QC-LDPC Codes Based on a Special Subclass of RS Codes

This method is presented in [386, Section 11.4]. For a finite field GF(q), let 𝛼 be primitive. Let m be the largest prime factor of q − 1, and let q − 1 = cm. Then 𝛽 = 𝛼 c is an element of GF(q) of order m. The set Gm = {𝛽 0 = 1, 𝛽, . . . , 𝛽m−1 } is a cyclic subgroup of the nonzero elements of GF(q). Form the Vandermonde matrix

W (2)

⎡1 1 1 ⎡ 𝐰0 ⎤ ⎢1 𝛽 𝛽2 ⎢ 𝐰 ⎥ ⎢ 𝛽4 = ⎢ 1 ⎥ = ⎢1 𝛽 2 ⋮ ⎥ ⎢⋮ ⎢ ⎣𝐰m−1 ⎦ ⎢ m−1 2(m−1) 𝛽 ⎣1 𝛽

··· 1 ⎤ · · · 𝛽 m−1 ⎥ ⎥ · · · 𝛽 2(m−1) ⎥ . ⎥ 2⎥ (m−1) ··· 𝛽 ⎦

Since 𝛽 m = 1, powers of 𝛽 in this matrix are taken modulo m. Taking any t consecutive rows (1 ≤ t ≤ m) of W (2) produces a matrix that can be considered a parity check matrix for an (m, m − t) RS code, whose generator polynomial has t consecutive powers of 𝛽 as roots. W (2) has the following properties: 1. Except for the first row, all the elements of a row are different, forming all the elements of the cyclic subgroup Gm . Similarly for all the elements in the columns of W (2) (except the first column). 2. Any two rows have only the first entries the same; the elements in the other m − 1 positions are different between any two rows. Similarly for the column structure. Because of these two properties, it can be shown that W (2) satisfies the 𝛼-multiplied row constraints 1 and 2. W (2) can thus be used as the base matrix for array dispersion. Replace each element of W (2) by its multiplicative (q − 1)-fold matrix dispersion to obtain the (q − 1) × (q − 1) matrix of cyclic permutation matrices: ⎡ A0,0 A0,1 · · · A0,m−1 ⎤ ⎢ A A1,1 · · · A1,m−1 ⎥ H (2) = ⎢ 1,0 ⎥. ⋮ ⎥ ⎢ ⎣Am−1,0 Am−1,1 · · · Am−1,m−1 ⎦ H (2) contains no 0-blocks. As before, for any pair integers (g, r) with 1 ≤ g, r ≤ m, a submatrix H (2) (g, r) can be extracted to use as a parity check matrix of size (q − 1)g × (q − 1)r. Example 16.6 [386, 497] For the field GF(27 ), q − 1 = 127 is itself a prime and can be used as the prime factor, so m = 127. The matrix W (2) is of size 127 × 127 and H (2) is of size 1272 × 1272 . For (g, r) = (4, 40), for example, results in a parity check matrix H (2) (4, 40) of size 508 × 5080, which produces a (5080, 4575) code with rate 0.9006. ◽

There is another related way to construct LDPC codes. Suppose q − 1 can be factored as q − 1 = 𝓁tm, where as before m is the largest prime factor of q − 1. Let 𝛽 = 𝛼 𝓁t (an element of order m in GF(q)) and let 𝛿 = 𝛼 𝓁 (an element of order tm). The sets Gm = {𝛽 0 = 1, 𝛽, . . . , 𝛽 m−1 } and

Gtm = {𝛿 0 = 1, 𝛿, . . . , 𝛿 tm−1 }

are cyclic subgroups of the nonzero elements of GF(q), and Gm is a subgroup of Gtm . For an element 𝛿 i ∈ Gtm , define the location vector 𝐳(𝛿 i ) as the binary tm-tuple 𝐳(𝛿 i ) = (z0 , z1 , . . . , ztm−1 ),

734

Low-Density Parity-Check Codes: Designs and Variations where all elements are zero except zi = 1. 𝐳(𝛿 i ) is referred to as the Gtm -location vector of 𝛿 i . Let 𝜎 ∈ Gtm . Form the tm × tm dispersion matrix A∗ over Gtm with the Gtm -location vectors of 𝜎, 𝛿𝜎, . . . , 𝛿 tm−1 𝜎. This is a cyclic permutation matrix. Since Gm is a cyclic subgroup of Gtm , W (2) satisfies 𝛿-multiplied row constraints 1 and 2. Now, instead of replacing elements of W (2) with the (q − 1)-fold dispersions (as above), replace each element of W (2) with is tm-fold dispersion to produce the tm2 × tm2 matrix

H (3)

⎡ A∗0,0 A∗0,1 · · · A∗0,m−1 ⎤ ⎢ A∗ A∗1,1 · · · A∗1,m−1 ⎥⎥ = ⎢ 1,0 . ⎥ ⎢ ⋮ ∗ ∗ ⎥ ⎢A∗ ⎣ m−1,0 Am−1,1 · · · Am−1,m−1 ⎦

As before, for any (g, r), H (3) (g, r) can be formed as a submatrix of size gtm × rtm, having column and row weights g and r, respectively. Example 16.7 [386, p. 499] For the field GF(210 ), factor 210 − 1 = 1023 as 1023 = 3 × 11 × 31. Ler m = 31, l = 11, t = 3. Let 𝛼 be primitive in GF(210 ) and 𝛽 = 𝛼 33 , 𝛿 = 𝛼 11 . Then G31 = {𝛽 0 = 1, 𝛽, . . . , 𝛽 30 } and

G93 = {𝛿 0 = 1, 𝛿, . . . , 𝛿 92 }.

Form W (2) with elements from G31 . Then replace each element by its 93-fold matrix dispersion, resulting in H (3) of size 2883 × 2883. ◽

16.5.3

III. Construction of QC-LDPC Codes Based on Subgroups of a Finite Field

This is based on [[386], Section 11.5] and [418]. Let p be prime and let q = pm . Let 𝛼 be primitive in GF(q). Then (as described in Chapter 5), the elements 𝛼 0 , 𝛼, . . . , 𝛼 m−1 are linearly independent and GF(q) can be expressed as a vector space over these basis vectors so that any 𝛼 i ∈ Gf (q) can be written as 𝛼 i = ci,0 𝛼 0 + ci,1 𝛼 1 + · · · + ci,m−1 𝛼 m−1 with ci,j ∈ GF(p). Now take a subset of these basis elements to form {𝛼 0 , 𝛼, 𝛼 2 , . . . , 𝛼 t−1 } and generate the additive subgroup G1 of GF(q) by linear combinations of this restricted set of basis elements. For every element 𝛽i ∈ G1 , 𝛽i = ci,0 𝛼 0 + ci,1 𝛼 + · · · + ci,t−1 𝛼 t−1 with ci,j ∈ GF(p). Since each coefficient can be chosen independently, the subgroup G1 has pt elements. Write G1 = {𝛽0 = 1, 𝛽1 , . . . , 𝛽pt −1 }. Similarly, form the complementary subset of basis elements {𝛼 t , 𝛼 t+1 , . . . , 𝛼 m−1 } and form the additive subgroup G2 using these elements as the basis. An element 𝛿i ∈ G2 can be expressed as 𝛿i = ci,t 𝛼 t + ci,t+1 𝛼 t+1 + · · · + ci,m−1 𝛼 m−1 . This subgroup has pm−t elements in it. Write G2 = {𝛿0 = 0, 𝛿1 , . . . , 𝛿pm−t −1 }. By this construction the two subgroups satisfy G1 ∩ G2 = {0}. Based on this additive group structure, for 𝛿i ∈ G2 , form the coset with coset leader 𝛿i , 𝛿i + G1 = {𝛿i , 𝛿i + 𝛽1 , . . . , 𝛿i + 𝛽pt −1 }. There are pm−t such cosets which form a partition of GF(q); two such cosets of G1 are disjoint.

16.5 Construction of LDPC Codes Based on Finite Fields

735

These cosets are now used to form a pm−t × pt matrix W (4) by

W (4)

⎡ 𝐰0 ⎤ ⎡ 0 𝛽1 ⎢ 𝐰 ⎥ ⎢ 𝛿 𝛿 1 + 𝛽1 1 ⎥ ⎢ ⎢ 1 𝛿 2 + 𝛽1 = ⎢ 𝐰2 ⎥ = ⎢ 𝛿2 ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⎢𝐰 ⎣ pm−t −1 ⎦ ⎣𝛿pm−t −1 𝛿pm−t −1 + 𝛽1

⎤ ⎥ ⎥ ⎥. ⎥ · · · 𝛿pm−t −1 + 𝛽pt −1 ⎥⎦ ··· ··· ···

𝛽pt −1 𝛿1 + 𝛽pt −1 𝛿2 + 𝛽pt −1

Since the cosets of G1 are disjoint, different rows 𝐰i and 𝐰j differ in all positions. Based on this, it can be argued that W (4) satisfies the 𝛼-multiplied row constraints 1 and 2. Replace each element of W (4) by its (q − 1)-fold matrix dispersion to construct a matrix H (4) , in the pattern established in the sections above. For any pair (g, r), a submatrix H (4) (g, r) of size (q − 1)g × (q − 1)r can be extracted and used as a parity check matrix. 16.5.4

IV. Construction of QC-LDPC Codes Based on Subgroups of the Multiplicative Group of a Finite Field

The set of integers {0, 1, . . . , p − 1} forms a field GF(p) when p is prime. For an element i ∈ GF(p), its location vector with respect to the additive group is the unit p-tuple 𝐳(i) = (z0 , z1 , . . . , zp−1 ), where all elements are 0 except zi = 1. Location vectors for different elements of GF(p) are obviously different. The location vector for 𝐳(k + 1) is the right cyclic shift of 𝐳(k), with 𝐳(0) the right cyclic shift of 𝐳(p − 1). For a k ∈ GF(p), form a matrix A with the location vectors 𝐳(k), 𝐳(k + 1), . . . , 𝐳(k + p − 1) (operations modulo p). A is a cyclic permutation matrix, called the additive p-fold matrix of the element k. Now form the p × p matrix of elements of GF(p) as

W

(6)

0⋅1 0⋅2 ··· 0 ⋅ (p − 1) ⎤ ⎡ 𝐰0 ⎤ ⎡ 0 ⋅ 0 ⎢ 𝐰1 ⎥ ⎢ 1 ⋅ 0 1⋅1 1⋅2 ··· 1 ⋅ (p − 1) ⎥ =⎢ = ⎥ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⎢ ⎥ ⎣𝐰p−1 ⎦ ⎣(p − 1) ⋅ 0 (p − 1) ⋅ 1 (p − 1) ⋅ 2 · · · (p − 1) ⋅ (p − 1)⎦

(operations modulo p). W (6) has the following structural properties: 1. 𝐰0 has all 0 elements. 2. For any other row, all elements on the row are distinct, forming the p elements of GF(p); similarly for the elements of the columns. 3. Any two distinct rows have the 0 element in the first position, and differ in the other p − 1 positions. Now construct a matrix H (6) by replacing each element of W (6) by its additive p-fold matrix dispersion, resulting in the p2 × p2 matrix

H (6)

⎡ A0,0 A0,1 · · · A0,p−1 ⎤ ⎢ A A1,1 · · · A1,p−1 ⎥ = ⎢ 1,0 ⎥, ⋮ ⎥ ⎢ ⎣Ap−1,0 Ap−1,1 · · · Ap−1,p−1 ⎦

where each Ai,j is a cyclic permutation matrix. For any pair (g, r), let H (6) (g, r) be the g(p − 1) × r(r − 1) submatrix of H (6) .

736

Low-Density Parity-Check Codes: Designs and Variations Example 16.8 [[386], p. 508] Let p = 73. The matrix H (6) composed of the cyclic permutation matrices of dispersions of location vector with respect to the additive group of GF(73) results in a matrix H (6) of size 732 × 732 . ◽ Taking (g, r) = (6, 72) results in a parity check matrix H (6) (g, r) of size 438 × 5256.

16.5.5

Construction of QC-LDPC Codes Based on Primitive Elements of a Field

Let 𝛼 be primitive in GF(q), where q is the power of a prime. Factor q − 1 as powers of primes as k

k

k

q − 1 = p11 p22 · · · pt t . Let K = 𝜙(q − 1) be the number of primitive elements in GF(q), where 𝜙 is the Euler’s totient function. Write these primitive elements as {𝛼 j1 , 𝛼 j2 , . . . , 𝛼 jK }. Let j0 = 0. Form the matrix

W (7)

⎡ 𝛼 j0 −j0 − 1 𝛼 j1 −j0 − 1 · · · 𝛼 jK −j0 − 1 ⎤ ⎢ 𝛼 j0 −j1 − 1 𝛼 j1 −j1 − 1 · · · 𝛼 jK −j1 − 1 ⎥ =⎢ ⎥. ⋮ ⎢ j −j ⎥ ⎣𝛼 0 K − 1 𝛼 j1 −jK − 1 · · · 𝛼 jK −jK − 1⎦

It can be shown that W (7) satisfies the 𝛼-constrained row constraints 1 and 2. Now form H (7) by replacing element of W (7) by its (q − 1)-fold matrix dispersion to obtain an RC-constrained (K + 1)(q − 1) × (K + 1)(q − 1) matrix. Then a parity check matrix can be formed for any (g, r) by taking a g(q − 1) × r(q − 1) submatrix. Example 16.9 [[386], p. 512] For the field GF(26 ), factor 26 − 1 = 63 as 9 × 7. This field has 36 primitive elements. Then H (7) has size 37 ⋅ 63 × 37 ⋅ 63. With (g, r) = (6, 37), a matrix H (7) (6, 37) of size 378 × 2331 is obtained. ◽

16.6 Code Design Based on Finite Geometries A Euclidean geometry (EG) over GF(q) consists of a set of points, Lines, and other geometric objects, where the elements of the points are drawn from GF(q). For an integer m and a field GF(q), the EG over GF(q), denoted as EG(m, q), consists of points, lines, and flats. Basic concepts of EG are defined below with examples from EG(2, 4) to illustrate the concepts and develop intuition. In this section, some elements of EG are presented, followed by an example construction in which the incidence structure of the geometry is exploited. A more exhaustive summary of code designs based on finite geometries is presented in [386]. 16.6.1 16.6.1.1

Rudiments of Euclidean Geometry Points in EG(m, q)

Each point in EG(m, q) is an m-tuple over GF(q), so there are qm points in the geometry. Example 16.10 Points in EG(3, 4) For the field GF(4) with elements 0, 1, 𝛽, 𝛽 2 , where 𝛽 is an element of order 3, and m = 3, there are 44 = 256 points: 𝐩1 = (0, 0, 0, 1) 𝐩2 = (0, 0, 0, 𝛽) 𝐩3 = (0, 0, 0, 𝛽 2 ) 𝐩0 = (0, 0, 0, 0) 𝐩5 = (0, 0, 1, 1) 𝐩6 = (0, 0, 1, 𝛽) 𝐩7 = (0, 0, 1, 𝛽 2 ) 𝐩4 = (0, 0, 1, 0) 𝐩9 = (0, 0, 𝛽, 1) 𝐩10 = (0, 0, 𝛽, 𝛽) 𝐩11 = (0, 0, 𝛽, 𝛽 2 ) 𝐩8 = (0, 0, 𝛽, 0) ⋮ ⋮ ⋮ ⋮ 𝐩252 = (𝛽 2 , 𝛽 2 , 𝛽 2 , 0) 𝐩252 = (𝛽 2 , 𝛽 2 , 𝛽 2 , 1) 𝐩254 = (𝛽 2 , 𝛽 2 , 𝛽 2 , 𝛽) 𝐩255 = (𝛽 2 , 𝛽 2 , 𝛽 2 , 𝛽 2 ) ◽

16.6 Code Design Based on Finite Geometries

737

Example 16.11 Points in EG(2, 4): This is 2-tuples over GF(4). These are listed as 2-tuples and with a suggestive “geometric” representation. 𝛽2 𝛽 1 0

0

1

𝛽

2

𝛽

The 16 points in EG(2, 4) form a vector space. Recall that GF(16) can be constructed as a vector space over GF(4). There is thus an isomorphism between EG(2, 4) and GF(16), which we denote as EG(2, 4) ⇔ GF(16).



More generally, the isomorphism between EG(m, q) and GF(qm ) is denoted as EG(m, q) ⇔ GF(qm ). By this isomorphism the qm points in EG(m, q) can be identified with the points in GF(qm ). Example 16.12 Let GF(24 ) be generated by the polynomial 1 + x + x4 , with 𝛼 primitive, and consider GF(24 ) as an extension field of GF(22 ) = {0, 1, 𝛽, 𝛽 2 }, with 𝛽 = 𝛼 5 . The points in GF(16) are identified with points in EG(2, 4) in the following table.



We will use the terms “point” notation and the “field” notation interchangeably. 16.6.1.2

Lines in EG(m, q)

A line in EG(m, q) is the set of points {𝐩0 + 𝛾𝐩, 𝛾 ∈ GF(q)}, that is, as 𝛾 sweeps over all values in GF(q), where 𝐩 ≠ 0. (That is, in the notation below, 𝛾𝐩 is intended to denote the set of values as 𝛾 sweeps over its field.) Each line in EG(m, q) has q distinct points on it. Example 16.13 Lines through the origin in EG(2, 4). Let the field GF(4) have the values 𝛾 ∈ {0, 1, 𝛽, 𝛽 2 }, where 𝛽 = 𝛼 5 and 𝛼 is primitive in GF(16). In the list below, points are shown both as 2-tuples in GF(4)2 , and as isomorphically labeled points in GF(16).

738

Low-Density Parity-Check Codes: Designs and Variations

The lines defined by these four points 𝐩 cover all the points in EG(2, 4). (Color versions of this page can be viewed at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.) ◽

More generally, lines are of the form {𝐩0 + 𝛽𝐩, 𝛽 ∈ GF(q)}, where 𝐩 ≠ 0. Example 16.14 Lines through 𝐩0 = (1, 1) = 𝛼 4 in EG(2, 4). (Color versions of this page can be viewed at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.)



If 𝐩0 ≠ 0, the lines {𝛽𝐩} and {𝐩0 + 𝛽𝐩} do not have any common points. These may be considered to be parallel lines. Example 16.15 In EG(2, 4), let 𝐩 = (1, 1) = 𝛼 4 and consider the lines defined with different values of 𝐩0 . (Color versions of this page can be viewed at wileywebpage or https://github.com/tkmoon/eccbook/ colorpages.)



In general, there are qm−1 such parallel lines having the same direction 𝐩, with different offsets 𝐩0 . (In the example above, qm−1 = q2−1 = 4.) The set of such parallel lines is referred to as a parallel bundle. There are qm − 1 Kp = q−1 parallel bundles. (In EG(2, 4) there are five parallel bundles.) These will be denoted as 1 , 2 , . . . , Kp , each consisting of qm−1 lines. Each parallel bundle contains all the qm points in EG(m, q). In each parallel bundle, there is one line passing through the origin. We discussed placing a footnote indicating that a color page would be available at some web page

16.6 Code Design Based on Finite Geometries

739

Let 𝐩1 and 𝐩2 be two linearly independent points, that is, points such that 𝛽1 𝐩1 + 𝛽2 𝐩2 ≠ 0 unless 𝛽1 = 𝛽2 = 0. Then the lines {𝐩0 + 𝛽𝐩1 } and {𝐩0 + 𝛽𝐩2 } have only one point in common, the point 𝐩0 . Example 16.16

In EG(2, 4), let 𝐩0 = (𝛽, 𝛽 2 ) = 𝛼 3 , 𝐩1 = (1, 0) = 1 and 𝐩2 = (𝛽, 𝛽) = 𝛼 9 .

The only point in common on these two lines is (𝛽, 𝛽 2 ). (Color versions of this page can be viewed at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.) ◽

Some basic facts about points and lines can be summarized as follows:

• •

Any two points are connected by exactly one line.



Let 𝐩 be a point in EG(m, q), and  a line in EG(m, q). If 𝐩 is a point on , we say that  passes through 𝐩.

• • •

There are (qm − 1)∕(q − 1) lines passing through any point 𝐩.

Two lines either have exactly one point in common (intersect), or they have no points in common (disjoint; parallel).

If two lines have a common point 𝐩, we say they intersect at 𝐩. There are qm−1

qm − 1 q−1

lines in EG(m, q). 16.6.1.3

Incidence vectors in EG∗ (m, q)

EG∗ (m, q) is the subgeometry of EG(m, q) obtained by removing from EG(m, q) the origin and all lines passing through the origin. EG∗ (m, q) has qm − 1 points and (qm−1 − 1)(qm − 1)∕(q − 1) lines. The finite geometry structure will now be endowed with an incidence vector structure, to be able to define sparse matrices. Let  be a line in EG∗ (m, q) (not passing through the origin). Define the incidence vector 𝐯 = (𝑣0 , 𝑣1 , . . . , 𝑣qm −2 ) with elements 0 or 1 corresponding to the qm − 1 nonzero points m −1

(𝛼 0 = 1, 𝛼 1 , 𝛼 2 , . . . , 𝛼 q

),

where 𝛼 is primitive in GF(qm ), where 𝑣i = 1 if and only if 𝛼 i is a point on . The weight of any incidence vector in EG(m, q) is q (that is, there are q points on the line). Example 16.17

For the line

 = {(𝛽, 𝛽 2 ) + 𝛾(1, 1)} = {(𝛽, 𝛽 2 ), (𝛽 2 , 𝛽), (0, 1), (1, 0)} = {𝛼 3 + 𝛾𝛼 4 } = {𝛼 3 , 𝛼 7 , 𝛼, 1} in EG∗ (2, 4), the incidence vector is 𝐯 = (1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0).

We discussed placing a footnote indicating that a color page would be available at some web page



740

Low-Density Parity-Check Codes: Designs and Variations For a line  = (𝛼 j1 , 𝛼 j2 , . . . , 𝛼 jq ), the incidence vector 𝐯 is 1 only at locations ji . The product 𝛼 t  means the product of 𝛼 t with each point of . For  as above, 𝛼 t  = {𝛼 j1 +t , 𝛼 j2 +t , . . . , 𝛼 jq +t } 𝛼 t  is a line for 0 ≤ t ≤ qm − 2. The set of these lines is said to form a cyclic class. The incidence vector 𝐯𝛼t  is the incidence vector 𝐯 shifted to the right by t places. Each cyclic class has qm − 1 lines in it, and there are Kc =

qm − 1 q−1

cyclic classes. These classes are denoted as 1 , 2 , . . . , Kc . Example 16.18

In EG∗ (2, 4), a cyclic class is as follows.

𝛽2 𝛽 1 0

0

𝛽

1

𝛽2

There are 15 lines in this cyclic class, and EG∗ (2, 4) has only this Kc = 1 cyclic class. (Color versions of this page can be viewed at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.) ◽

16.6.2

A Family of Cyclic EG-LDPC Codes

Material in this section is drawn from [[386], chapter 10]. The incidence vectors of a cyclic class i can be used to form a sparse matrix. For each class i of lines in EG∗ (m, q), 1 ≤ i ≤ Kc , form a (qm − 1) × (qm − 1) matrix Hi with incidence vectors 𝐯 , 𝐯𝛼 , . . . , 𝐯𝛼qm−2  as rows. This produces a matrix with row and column weight equal to q. Example 16.19

For the cyclic class in Example 16.18 the H matrix is ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ Hi = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

0 0 0 1 0 0 0 1 0 1 1 0 0 0 0

0 0 0 0 1 0 0 0 1 0 1 1 0 0 0

0 0 0 0 0 1 0 0 0 1 0 1 1 0 0

0 0 0 0 0 0 1 0 0 0 1 0 1 1 0

0 0 0 0 0 0 0 1 0 0 0 1 0 1 1

1 0 0 0 0 0 0 0 1 0 0 0 1 0 1

1 1 0 0 0 0 0 0 0 1 0 0 0 1 0

0 1 1 0 0 0 0 0 0 0 1 0 0 0 1

1 0 1 1 0 0 0 0 0 0 0 1 0 0 0

0 1 0 1 1 0 0 0 0 0 0 0 1 0 0

0 0 1 0 1 1 0 0 0 0 0 0 0 1 0

0 0 0 1 0 1 1 0 0 0 0 0 0 0 1

1 0 0 0 1 0 1 1 0 0 0 0 0 0 0

0 1 0 0 0 1 0 1 1 0 0 0 0 0 0

0 0 1 0 0 0 1 0 1 1 0 0 0 0 0

We discussed placing a footnote indicating that a color page would be available at some web page

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(16.9)



16.6 Code Design Based on Finite Geometries

(qm

741

Then for some k, 1 ≤ k ≤ Kc stack the Hi for 1 ≤ i ≤ k to form a matrix H of size k(qm − 1) × − 1) ⎡ H1 ⎤ ⎢H ⎥ (1) HEG,k = ⎢ 2 ⎥ . ⋮ ⎥ ⎢ ⎣HKc ⎦

(1) has column weight kq and row weight q. HEG,k (1) correspond Now we can invoke the geometric structure of lines in EG∗ (m, q). The rows of HEG,k to lines, and no two lines have more than one point in common. It follows, therefore, that no two rows (1) (or two columns) have more than one component in common. Thus, HEG,k satisfies the RC constraint and has no four-cycles in its Tanner graph. The nullspace of H is an LDPC code. From the cyclic structure of the lines and the incidence vectors, H is the parity check matrix of a cyclic code. To find (1) the generator polynomial, express the rows of HEG,k as polynomials. For example, the first row in (16.9) has the polynomial x5 + x6 + x8 + x12 . (1) Let h(x) be the greatest common divisor of the row polynomials of HEG,k , and let h∗ (x) be the reciprocal polynomial of h(x). The generator polynomial for the EG-LDPC code is m −1

g(x) =

xq

−1 . h∗ (x)

Example 16.20 As an example of this construction, consider codes built on EG∗ (2, q), which has q2 − 1 points (1) produces a cyclic code of length and q2 − 1 lines. There is a single cyclic class, . The circulant matrix HEG,k q s 2 − 1. If q = 2 , then the code can be shown to have the following parameters:

• • •

s

Length N = 22 − 1. Number of parity bits: 3s − 1. Minimum distance: 2s + 1.

16.6.3



Construction of LDPC Codes Based on Parallel Bundles of Lines

Let Kp be the number of parallel bundles of lines, Kp = (qm − 1)∕(q − 1), each consisting of qm−1 parallel lines. Let  be a line in EG(m, q) (possibly passing through the origin). Define an incidence vector now as 𝐯 = (𝑣−∞ , 𝑣0 , 𝑣1 , . . . , 𝑣qm −2 ) m

representing all the points in EG(m, q), 𝛼 −∞ = 0, 𝛼 0 = 1, 𝛼, . . . , 𝛼 q −2 , where 𝑣j ∈ {0, 1} and 𝑣j = 1 only if 𝛼 j is a point in . The weight of 𝐯 is q. For each parallel bundle Pi of lines in EG(m, q), with 1 ≤ i ≤ Kp form a qm−1 × qm matrix Hi,p whose rows are the incidence vectors for the lines in i . Hi,p has column weight 1 and row weight q. For a number k, 1 ≤ k ≤ Kp stack these Hi,p to form the matrix

(2) HEG,k

⎡H1,p ⎤ ⎢H ⎥ = ⎢ 2,p ⎥ . ⋮ ⎥ ⎢ ⎣Hk,p ⎦

(2) are k and q, respectively. The column and row weights of HEG,k

742

Low-Density Parity-Check Codes: Designs and Variations 16.6.4

Constructions Based on Other Combinatoric Objects

Several LDPC constructions have been reported based on combinatoric objects. Constructions based on Kirkman triple systems are reported in [233, 234], which produce (3, k)-regular LDPC codes whose Tanner graph is free of four-cycles for any k. Constructions based on Latin rectangles are reported in [464], with reportedly low encode and decode complexity. Designs based on Steiner 2-designs are reported in [463], [236], and [388]. See also [465]. High rate LDPC codes based on constructions from unital designs are reported in [235]. Constructions based on disjoint difference sets permutation matrices for use in conjunction with the magnetic recording channel are reported in [417].

16.7 Ensembles of LDPC Codes In the analysis and design of LDPC codes, an important analysis tool is to discuss ensembles of LDPC codes and to examine, for example, average performance these ensembles of codes. Historically this hearkens back to Shannon’s idea of establishing capacity by computing average performance over an ensemble of (block) codes [404], but in the case of LDPC codes some specific theorems have been proven. To describe these theorems, we need some notation. Let K denote some particular LDPC code (examples of such configurations will be given below). Let PNe (𝓁; K) denote the probability of decoding error for a code of length N at the 𝓁th iteration of a message-passing decoding algorithm. Obviously, for different codes the error will be different — some codes will be good and some codes will be bad. The ensemble approach to LDPC code analysis looks at the average of PNe (𝓁; K) over all different configurations of codes: PNe (𝓁) = EK [PNe (𝓁; K)]. The efficacy of this approach is determined by the following theorems [375]. Concentration Theorem The performance of individual codes in the ensemble converges exponentially fast (in N) to PNe (𝓁) as N → ∞. The relevance of this theorem is that when performance of the ensemble is established, pretty much any code selected at random (or by some construction technique) from the ensemble will have that performance. Convergence to the Cycle-free case It has been discussed that cycles in the Tanner graph affect the decoding by introducing biases into what would otherwise be an ML detector. Defining P∞ e (𝓁) as the probability of error given that there are no cycles of length 2𝓁 or less (e.g., the cycles have no effect up to iteration 𝓁), then this theorem says that PNe (𝓁) converges to P∞ e as N → ∞. Practically what this means is that for analysis of long codes (ideally, as the block length N → ∞), the effect of cycles can be neglected. Density Evolution The probability P∞ e (𝓁) (on ensembles of codes) can be calculated through a process called density evolution. This means that aspects of performance of codes can be analyzed by density evolution instead of by simulation, which leads to insight and significant computational savings. Density evolution is discussed in Section 15.7. A Tanner graph ensemble is a description of conventional LDPC codes in terms of connectivity at variable and check nodes. These are described here in terms of increasing generality: regular ensembles; irregular ensembles; and ensembles of multiple-edge types.

16.7 Ensembles of LDPC Codes 16.7.1

743

Regular ensembles

A regular ensemble describes sets of regular LDPC codes. A (j, k) ensemble is the set of all Tanner graphs of length N where each variable node is of degree j and each check node is of degree k. Figure 16.8 illustrates the idea. There are N variable nodes, each having j sockets to which edges can be connected, for a total of jN sockets (edges). There are jN∕k check nodes, each having k sockets. Any variable node socket can connect to any check node socket. A Tanner graph is formed by connecting each variable node socket to exactly one check node socket, so there are (jN)! different Tanner graphs, each representable by a permutation on jN elements. Multiple graphs might correspond to the same parity check matrix, and many parity check matrices might represent the same code, so the relationship between the permutations and the codes is complicated.

2

+

+

1 2 3 4 5 6

7 8 9 10 11 12

N variables nodes, j sockets per node

4

5

6

7

2

1

7

9

3N –1

3

3N

2

jN/k +

3N –2

1

1

3N –5 3N –4 3N –3 3N –2 3N –1 3N

M = jN/k check nodes k sockets per node

N

3

Figure 16.8: (3, 6) Ensemble.

16.7.2

Irregular Ensembles

In an irregular ensemble, different variable and check nodes may have different numbers of sockets. The concept is illustrated with an example. Figure 16.9 illustrates N = 12 variable nodes and M = 8 check nodes. Six of the variable nodes have two sockets (that is, will connect to two edges), four of the variable nodes have three sockets, and so forth. In this figure, any socket from a variable node can connect to any socket from a check node.

1 +

1

2

3

2 +

4

3 +

5

4 +

6

5 +

7

6 +

8

7 +

9

8 +

10

11

12

Figure 16.9: Illustrating an irregular ensemble.

It is common to describe the allocation of different degrees using fractions. The table below (on the left) shows the fraction Λi of variable nodes (out of N = 12 variable nodes) that have i sockets and the table below (on the right) shows the fraction Pi of check nodes (out of M = 8 check nodes) that have i sockets. This description is said to provide a node perspective definition. The tables also provide the fraction 𝜆i of edges (out of a total of 32 edges) which connect to variable nodes of degree i (on the left) and the fraction 𝜌i of edges (out of 32 edges) which connect to check nodes of degree i (on the right). This is said to provide an edge perspective definition of the Tanner graph.

744

Low-Density Parity-Check Codes: Designs and Variations

It is also common to summarize this information using a generating function representation (think z-transform), using the polynomial ∑ Λi xi−1 Λ(x) = i

to represent the distribution of variable node degrees, and ∑ P(x) = Pi xi−1 i

to represent the distribution of check node degrees, and ∑ 𝜆(x) = 𝜆i xi−1 i

to represent the fractions of edges connected to variable nodes, and ∑ 𝜌(x) = 𝜌i xi−1 i

to represent the fractions of edges connected to check nodes. For the example in the tables above, 1 3 1 1 1 x + x2 + x3 P(x) = x2 + x3 + 2 3 6 8 4 3 9 2 8 3 3 2 1 3 𝜆(x) = x + x + x 𝜌(x) = x + x + 8 8 4 32 32 The total number of edges is ∑ E=N Λi = MΛ(1) Λ(x) =

3 4 x 8 15 4 x . 32

i

=M



Pi = MP(1).

i

The average number of edges per variable node and the average number of edges per check nodes are d𝑣 =

E 1 =∑ N 𝜆i ∕i

and

dc =

i

E 1 , =∑ M 𝜌i ∕i i

respectively. The relationships between the node perspective and the edge perspective for variable and check nodes are d 𝜆 d𝜌 Λi = 𝑣 i Pi = c i . i i Observe that 1 1 ∑ ∑ 𝜆(x) dx = 𝜆i ∕i and 𝜌(x) dx = 𝜌i ∕i, ∫0 ∫0 i i so that d𝑣 =

1 1 ∫0 𝜆(x)

dx

,

dc =

and Λi =

1 1 ∫0 𝜌(x)

𝜆i ∕i 1 ∫0 𝜆(x)

dx

dx

,

and

Pi =

E=

N 1 ∫0 𝜆(x)

𝜌i ∕i 1 ∫0 𝜌(x)

dx

.

= dx

M 1 ∫0 𝜌(x)

dx

16.7 Ensembles of LDPC Codes 16.7.3

745

Multi-edge-type Ensembles

Multi-edge-type (MET) ensembles generalize the notion of irregular ensembles by introducing the notion of a socket type. These provide a finer degree of specification on the nature of the ensemble. MET ensembles are general enough that they capture most interesting code sets related to LDPC codes, such as regular and irregular codes, RA and IRA codes, and protograph codes. In a MET ensemble, the sockets at the variable nodes and check nodes are each assigned a type, and edges can be placed only on sockets of the same type. As an example of such an ensemble, consider Figure 16.10. In this ensemble, variable nodes have either two sockets, three sockets, or eight sockets, occurring with probabilities 12 , 13 , and 16 , respectively. Check nodes have either six or seven sockets, occurring with probabilities 13 and 23 , respectively. Furthermore, the ensemble is restricted so that degree-3 variable nodes only connect to degree-7 check nodes. This is done by introducing another edge type. In the figure, there is an edge type that can connect circular sockets to each other, and another edge type that connects triangular sockets to each other.

1 2

1 3

2 3

+

+

1 3

1 6

Figure 16.10: An example MET ensemble.

To describe the ensembles with multiple-edge types, a representation similar to the generating functions 𝜆(x) above is used. Let mt denote the number of edge types. The variables x1 , x2 , . . . , xmt are used. Let 𝐱 = x1 x2 · · · xmt . The MET degree type of a check node is a vector of nonnegative integers ∏mt di xi . Let R𝐝 denote the of length mt , 𝐝 = (d1 , d2 , . . . , dmt ). The notation 𝐱𝐝 is used to represent i=1 fraction of check nodes (out of N) that have MET degree 𝐝. The ensemble of check nodes is described by the multinomial ∑ 𝐱𝐝 . R(𝐱) = 𝐝

In Figure 16.10, the MET degrees are (6, 0) and (3, 4). The ensemble of check nodes is described by R(𝐱) =

1 6 2 4 3 x + x x . 3 1 3 1 2

For variable nodes in MET ensemble, yet another degree of generalization is employed. The bits associated with a variable node may be modeled as having been received from different kinds of channels, such as erasure channels, AWGN channels, BSC, or having been punctured. Let mr denote the number of different channels in the communications model. The convention in the literature is to let channel 0 denote the puncturing channel (which may be thought of as an erasure channel with probability of erasure equal to 1). Let 𝐛 = b0 b1 · · · , bmr denote the received degree for each channel type and let 𝐫 = r0 r1 · · · rmr denote the variables corresponding to the received distribution.

746

Low-Density Parity-Check Codes: Designs and Variations ∏mr bi The notation 𝐫 𝐛 is used to denote i=0 ri . Let L𝐛,𝐝 denote the fraction of variable nodes (out of N) that have MET degree 𝐝 and received degree 𝐫. Then the ensemble of MET variable nodes is described by L(𝐫, 𝐱) =

∑ 𝐛,𝐝

L𝐛,𝐝 𝐫 𝐛 𝐱𝐝 .

In Figure 16.10, assuming there is only one channel type, mr = 1, associated with variable r, the ensemble of variable nodes is described by L(𝐫, 𝐱) =

1 2 1 3 1 8 rx + rx + rx . 2 1 3 2 6 1

Designs based on MET are described in [377].

16.8 Constructing LDPC Codes by Progressive Edge Growth (PEG) PEG is a method of building Tanner graphs with large girth. PEG is described in [203, 494]1 and [[386], Section 12.6]. In general, PEG starts with a set of variable nodes and a set of edge nodes, with no edges between them. Working sequentially through the variable nodes, the algorithm adds edges to a variable node until the degree reaches some specified degree. Let G = (V𝑣 , Vc ) be a bipartite graph, where V𝑣 = {𝑣0 , 𝑣1 , . . . , 𝑣N−1 } is the set of variable nodes and Vc = {c0 , c1 , . . . , cM−1 } is the set of check nodes. Let 𝜆i be the length of the shortest cycle passing through 𝑣n . This is called the local girth of 𝑣n . The girth of the entire graph is, of course, the smallest of these min0≤n 2, then that node is also connected to other nodes. For such a cycle, form the quantity 𝜖=

t ∑

di − 2.

i=1

This is the total number of edges that connect this cycle S to other nodes in the Tanner graph. The quantity is called the approximate cycle extrinsic message degree (ACE). When ACE is larger, there

16.9 Protograph and Multi-Edge-Type LDPC Codes

749

are more messages coming from outside the cycle into the cycle to ameliorate the effect of the cycle. It has been observed that larger ACE tends to improve the performance of LDPC codes by decreasing (𝓁) the error floor. The modification to the PEG algorithm seeks to increase ACE. In the case that N n ≠ ∅ (𝓁+1) = ∅. In this case, adding a node will create a cycle. The modification applies when there is but N n more than one candidate check node of minimal degree. In this case, for each such candidate check node, the node is checked to see what is the ACE that results for the cycle produced by adding it to the Tanner graph. The node producing the largest ACE is the one selected.

16.9 Protograph and Multi-Edge-Type LDPC Codes Protograph LDPC codes are a way of imposing some structure on the design of LDPC parity check matrices while still maintaining considerable flexibility. The main idea is that a small prototype graph is established, referred to as a protograph. Then nodes in the graph are expanded by a factor of Q, that is, each node is repeated Q times. Edges between repeated nodes are also duplicated, not directly, but using a permuting structure. The idea is explained by means of an example. Begin with an adjacency matrix for the protograph which begins with two check nodes and three variable nodes, [ ] 2 0 1 B= . (16.10) 1 1 1 This matrix has a feature not seen in other adjacency matrices, having an element greater than 1. This corresponds to multiple edges in the graph. The graph corresponding to this adjacency matrix is shown in Figure 16.13(a). Figure 16.13(b) shows the concept of expansion by Q. In this figure, each node is duplicated Q = 3 times. The bold lines are suggestive of the repetition of connections. The edges between the repeated

c0

c0

c1

c1

Permute

𝜐0

𝜐1

𝜋

v0

𝜐1

v1

(a)

𝜋

𝜋

v2

(b)

c5

c4

c3

c2

c5

c5

𝜐8

𝜐7

𝜐6

𝜐8

𝜐7

𝜐6

𝜐8

𝜐7

𝜐6

(c)

Figure 16.13: Graphs related to protograph description of LDPC code. (a) Initial protograph from adjacency matrix in (16.10); (b) repeating nodes Q = 3 times, and permutations of repeated edges; (c) detailed Tanner graph derived from protograph.

750

Low-Density Parity-Check Codes: Designs and Variations variable nodes and the repeated check nodes are permuted by a Q × Q permutation (or circulant) matrix, whose column sum is the number of edges between variable and edge in the original protograph. Figure 16.13(c) fills in the sketch of the expansion with the detailed Tanner graph. The parity check matrix corresponding to this construction is ⎡1 ⎢0 ⎢ 1 H=⎢ ⎢0 ⎢1 ⎢ ⎣0

1 1 0 0 0 1

0 1 1 1 0 0

0 0 0 0 0 1

0 0 0 1 0 0

0 0 0 0 1 0

0 1 0 1 0 0

0 0 1 0 1 0

1⎤ 0⎥ ⎥ 0⎥ . 0⎥ 0⎥ ⎥ 1⎦

This has two block rows (from the original 2 check nodes) and three block columns (from the original 3 variable nodes), each block of which has a permutation or circulant matrix. The column sum of the (1,1) block of H is two, corresponding to the 2 at the corresponding position in B. A general approach to the design of protograph codes is as follows. Select the desired rate and the number of nodes Np in the protograph. This determines the number of check nodes in the protograph. Determine a degree distribution for the protograph. Choose an expansion factor. Then choose cyclic permutation matrices (more or less at random) to connect the expanded variable nodes with the expanded check nodes. When there are multiple edges (as in the example above), the connection matrix may be viewed as a sum of that many cyclic permutation matrices. Protograph architecture can lead to some potential parallel operations in hardware-based decoders. See the references at the end of the chapter.

16.10 Error Floors and Trapping Sets The concept of error floors was described for turbo codes in Section 14.4. In many modern communication and storage systems, probability of word errors below 10−12 are desired, so the problem of error floors becomes increasingly important. In [289], the error floors were attributed to “near codewords” in the graph — configurations that message-passing decoding algorithms converge to (or return to periodically) that are not codewords — which are erroneous. As an example of how this can happen, consider the subgraph of a Tanner graph in Figure 16.14(a). In this figure, check nodes which are satisfied are indicated with ◽. Each of these is connected to two variable nodes (degree-2 check nodes), which, in turn, are connected to an unsatisfied check node, indicated with ◾. These unsatisfied check nodes are connected to only a single variable node. Suppose that the all-zero codeword is sent, but that all of the variable nodes in the subgraph are (erroneously) equal to 1, then the check nodes are all satisfied. In the message-passing algorithm, there is no message sent from the unsatisfied check nodes ◾ (“leave-one-out”), so the erroneous bits never change. The decoding algorithm thus converges to, and gets stuck at, a situation that is not a codeword (since not all checks are satisfied). The other subfigures in Figure 16.14 give other examples of trapping set graphs. This leads to the definition of a trapping set. A (𝑤, 𝑣) trapping set is a set of 𝑤 variable nodes associated with a subgraph of a Tanner graph that has 𝑣 odd-degree check nodes (for which there are 𝑣 unsatisfied checks; indicated with ◾) and an arbitrary number of even-degree check nodes (indicated with ◽). Trapping sets are associated with near codewords. The performance of the message-passing decoder in the low-SNR region is determined by the number of different trapping sets, and their structure, rather than the minimum distance of the code. Actually, the situation with trapping sets is somewhat more complicated than suggested by the discussion above. It has been found that there are different kinds of trapping behavior. 1. A stable trapping set is one for which, once reached, the decoder remains fixed there, no matter how many iterations it performs.

16.11 Importance Sampling

751

(a)

(b)

(c)

(d)

Figure 16.14: Subgraphs associated with trapping sets. ∘ = variable nodes. ◾ = unsatisfied check nodes. ◽ = incorrectly satisfied check nodes. (a) (12, 4) trapping set; (b) (14, 4) trapping set; (c) (5, 5) trapping set; (d) (5, 7) trapping set. For example, with a stable (12, 4) trapping set the decoder has 12 bits in error, no matter how long the decoder runs. 2. A periodically oscillating trapping set is when the decoding algorithm cycles deterministically through a number of different error patterns. Once the message-passing decoder falls into a such a cycle, it cannot escape. For example, there may be a (6, 18) and (7, 18) trapping set pair, and the decoder alternates between these on successive decoding iterations. 3. More complicated, in an aperiodic oscillating trapping set, as the message-passing decoder proceeds, the number of erroneous bits varies in a random fashion, but no matter how many decoding iterations are computed a correct solution is never reached.

16.11 Importance Sampling Determining the probability of error for a coded system by Monte Carlo (MC) methods (that is, by simulation) can be challenging, especially when the probability of error is very small. For a probability of error of Pb , approximately 10∕Pb bits must be simulated to achieve a probability of bit error estimate accurate to about 10%. For example, if a code has a bit error probability of 10−8 , at least 108 bits must be simulated before even a single error may be seen. To reduce the variance of the estimate, it is advised to simulate an order of magnitude (or more) more than this. For codes having complicated or iterated decoding algorithms, this can be very time-consuming. The problem only becomes more difficult as systems requirements push word error rates below 10−12 . Importance sampling (IS) is a method of reducing the complexity of conventional MC simulation by reducing the variance of the

752

Low-Density Parity-Check Codes: Designs and Variations estimated probabilities of error. Extensive simulation may still be necessary, but in some instances, when IS is properly applied, the simulation runtime may be reduced by many orders of magnitude using IS. IS is not, strictly, related to LDPC codes. It is relevant in communications generally, and has been broadly applied. But this section is placed in this chapter because modern LDPC-coded systems are designed for very low rates where the advantages of IS can be especially important. This section introduces the general principles of IS and applies them to communication systems. There has been much work in applying IS to coded systems, but this is still an area of research. This chapter points to some of the work in this area. While IS applies generally to estimating probability of error, this discussion is placed in the chapter on LDPC codes for two reasons. First, many LDPC codes are designed to have very low error probabilities, at rates below which their performance can be simulated. Also, the structure of LDPC codes provides mechanisms for the application of IS. 16.11.1

Importance Sampling: General Principles

Let g(𝐱) be a function to be integrated over ℝn , where for convenience we assume g(𝐱) ≥ 0 for all 𝐱 ∈ ℝn , let f (𝐱) be a probability density function, and consider the following integral: P=

∫ℝn

g(𝐱)f (𝐱) d𝐱 = Ef [g(𝐱)].

(16.11)

(The subscript on the expectation E indicates the distribution with respect to which the expectation is computed.) In one dimension (n = 1) numerical evaluation is often straightforward using standard numerical integration routines. As the dimension n increases, the definite integral may become even more difficult to evaluate. More generally, the integral ∫Ω

g(𝐱) d𝐱

over a domain Ω ⊂ ℝn may be considered to be of the form (16.11) by introducing a pdf u(𝐱) which is uniform over Ω, so that ∫Ω

g(𝐱) d𝐱 =

∫ℝ n

g(𝐱)u(𝐱) d𝐱 = Eu [g(𝐱)].

Returning to (16.11), let 𝐱1 , 𝐱2 , . . . , 𝐱L samples drawn according to the distribution f (𝐱), which we denote a 𝐱i ∼ f , i = 1, 2, . . . , L. The integral (16.11) is approximated as the sample average 1∑ g(𝐱i ). P̂ MC = L i=1 L

By the law of large numbers, this sample average converges to the expectation, so P̂ → P as L → ∞. This is said to be the Monte Carlo (MC) approximation to the integral. Even though the integral (16.11) is deterministic, the approximation is a random variable (due to the random samples). The variance can be computed as follows: 1∑ 1 E [g(𝐱i )] = LEf [g(𝐱i )] = P. L i=1 f L L

Ef [P̂ MC ] = (That is, P̂ MC is an unbiased estimate.) Ef [P̂ 2MC ]

[ L ] ∑ ∑ 1 2 = 2 Ef g(𝐱i ) + g(𝐱i )g(𝐱j ) L i≠j i=1

=

1 [LEf [g(𝐱)2 ] + (L2 − L)Ef [g(𝐱)]2 ]. L2

16.11 Importance Sampling Then

753

var(P̂ MC ) = Ef [P̂ 2MC ] − Ef [P̂ MC ]2 =

1 [E [g(𝐱)2 ] − P2 ]. L f

(16.12)

IS is used to evaluate integrals using averages of random samples, as in the MC method above, but introduces another degree of freedom by using a different density q(𝐱), known as the sample distribution or the proposal distribution. This may result in a reduction of the variance of the approximation (compared to the var(PMC )), or may be easier to draw samples from. Let q(𝐱) be a pdf over ℝn (the domain of integration) and write the integral (16.11) as P=

∫ℝn

=

∫ℝ n

g(𝐱)f (𝐱)

q(𝐱) f (𝐱) d𝐱 = g(𝐱)q(𝐱) d𝐱 ∫ℝn q(𝐱) q(𝐱)

𝑤(𝐱)g(𝐱)q(𝐱) d𝐱 = Eq [𝑤(𝐱)g(𝐱)],

where 𝑤(𝐱) =

f (𝐱) q(𝐱)

is the weight function. Again the integral is expressed as an expectation, but of the weighted function 𝑤(𝐱)g(𝐱) and with respect to the density q. The integral may be approximated as follows: Draw 𝐱̃ i ∼ q, i = 1, 2, . . . , L. Form the sample average L 1∑ 𝑤(̃𝐱i )g(̃𝐱i ). P̂ IS = L i=1 Again by the law of large numbers, this sample average converges to the expectation, P̂ IS → P as L → ∞. The mean and variance of this approximation are E[P̂ IS ] = P (the approximation is unbiased) and, reasoning as in the MC case, 1 [E [𝑤(𝐱)2 g(𝐱)2 ] − Eq [𝑤(𝐱)g(𝐱)]2 ] L q ] ] [ [ 1 1 2 2 2 2 2 𝑤(𝐱) g(𝐱) q(𝐱) d𝐱 − P = 𝑤(𝐱)g(𝐱) f (𝐱) d𝐱 − P . = L ∫ℝn L ∫ℝn

var(P̂ IS ) =

The variance can be estimated by [ 1 ̂ P̂ IS ) = var( L

] L 1∑ 𝑤(̃𝐱i )2 g(̃𝐱i )2 − P̂ 2 . L i=1

Since the importance sample estimate is unbiased for any choice of sample distribution q, it may be selected with consideration to the variance of the estimate. It is natural to select a distribution q(⋅) which minimizes var(P̂ IS ). The optimum sample distribution, which causes var(P̂ IS ) to be 0, is q∗ (𝐱) = With this sample distribution, 𝑤(𝐱) =

∫ℝn g(𝐱)f (𝐱) d𝐱

[ 1 var(P̂ MS ) = N

g(𝐱)f (𝐱) g(𝐱)f (𝐱) = . P ∫ℝn g(𝐱)f (𝐱) d𝐱

∫ℝn

g(𝐱)

(16.13)

, resulting in a variance

∫ℝn g(𝐱)f (𝐱) d𝐱 g(𝐱)

] 2

2

g(𝐱) f (𝐱) d𝐱 − P

= 0.

754

Low-Density Parity-Check Codes: Designs and Variations The optimal sample distribution q∗ is a theoretical concept, since it requires knowing the value of P, which is unknown, being the value of the integral which is desired in the first place. Even if q∗ could be obtained, it may be difficult to sample from. But the optimal distribution is of interest, because it teaches that a good (if not optimal) sample distribution should draw samples where g(𝐱)f (𝐱) is large. This is sensible, since in approximating an integral with a finite number of points it is prudent to apply those points where the function being integrated makes the largest contribution to the integral. The number of samples needed to achieve a precision 𝜖 = 𝜎P̂ ∕P is ⌉ ⌈ 𝑣 L= 2 2 , 𝜖 P where 𝑣 = var[1E (x)𝑤(x)], and where 1E (x) is the indicator function defined below. Selecting a good sample distribution q is the “art” of IS [414]. A good q can provide significant speedup, while a poorly chosen q can actually result in the simulation runtime being longer. For nonoptimal sampling, the IS variance gain 𝛾 is the ratio of the variance for the MC approximation and the IS approximation, with the same number of points N, 𝛾=

Ef [g(𝐱)2 ] − P2 var(P̂ MS ) . = ∫ℝn 𝑤(𝐱)g(𝐱)2 f (𝐱) d𝐱 − P2 var(P̂ IS )

Regardless of the sampling distribution selected, the estimate is unbiased. Even suboptimal sampling distributions can be practically effective, resulting in a significantly reduced variance, and a corresponding reduction in the number of samples needed to obtain a given variance. 16.11.2

Importance Sampling for Estimating Error Probability

In this section, we apply the general principles of IS from above to problems of estimating probability of error. This is done by a series of example. Example 16.22 In this first example, we consider the problem of uncoded binary communication, in which a signal s = ±1 is sent in AWGN. Suppose, to be specific that s = −1 transmitted. The received signal is y = s + n,

(16.14)

where n ∼  (0, 𝜎 2 ). Assuming equally probable symbols there is a decision threshold 𝜏0 = 0 and the optimal decision rule is { 1 if y > 𝜏0 ŝ = −1 if y < 𝜏0 . Given that s = −1 is sent, the error region where an incorrect decision is made is E = {y ∈ ℝ ∶ y > 𝜏0 }. The probability of detection error, PE , can be expressed as PE = P(error|s = −1) = P(y ∈ E|s = −1) = Let 1E (y) denote the indicator function

{ 1E (y) =

Then

1 0

∫E

fy (y|s = −1) dy,

y∈E y ∉ E.



PE =

∫−∞

1E (y)fy (y|s = −1) dy.

16.11 Importance Sampling

755

This integral is analogous to the integral in (16.11), where the function is g(y) = 1E (y) and is recognized as an expectation, PE = Ef (y|s=−1) [1E (y)].

−2

−1

0 y

1

2

3

5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 −3

fy(y |s = −1), q

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −3

q* (y1)

fy(y |s = −1)

(This is an application of the principle that the expectation of an indicator function is a probability.) Because of the additive signal model, fy (y|s = −1) = fn (y + 1) =  (−1, 𝜎 2 ). The situation is indicated in Figure 16.15(a) (where 𝜎 2 = 0.3). The probability of error is the shaded region under the Gaussian curve. The error region E is also shaded.

−2

−1

(a)

0 y

1

2

3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −3

(b)

fy(y|s = −1) q

−2

−1

0 y

1

2

3

(c)

Figure 16.15: Probability of error for a BPSK detection problem using MC and IS sampling. (a) Distribution for Monte Carlo integration; (b) optimal sampling distribution for this problem; (c) distribution for IS integration.

Conventional Sampling (MC) Let y1 , y2 , . . . , yL be samples of a signal drawn according to the distribution fy (y|s = −1) =  (−1, 𝜎 2 ). Then the expectation can be approximated by the sum L 1∑ 1 1 (y ) = (number of yi greater than 0). P̂ E,MC ≈ L i=1 E i L

This is simply a description of the conventional way of estimating the probability of error by a MC simulation. This can be described as follows: 1. Generate noise samples n1 , n2 , . . . , nL , where each ni ∼  (0, 𝜎 2 ). 2. Produce the “received” signals yi = s + ni (with s = −1 in this example). 3. Count how many of the yi are greater than 0, and divide by the count by L. By the law of large numbers, P̂ E,MC → PE as N → ∞. Figure 16.15(a) illustrates the practical problem with this approach. When a point yi is drawn, with high probability it falls in the unshaded region (correct), so it contributes nothing to the error count. To get a low variance (accurate) estimate many samples must be counted, which means that L must be very large. Using g(y) = 1E (y) and g(y)2 = 1E (y), using (16.12) the variance of the MC estimate is var(P̂ E,MS ) =

1 (P − P2E ). L E

756

Low-Density Parity-Check Codes: Designs and Variations Importance Sampling (IS) We will now express this problem using IS. Let q denote a sample distribution, and write the probability of error as ∞ ∞ f (y|s = −1) q(y) y 1E (y)fy (y|s = −1) dy = 1E (y)q(y) dy PE = ∫−∞ ∫−∞ q(y) q(y) ∞

=

∫−∞

𝑤(y)1E (y)q(y dy = Eq [1E (y)𝑤(y)].

Here, q(y) is a sampling distribution and 𝑤(y) =

fy (y|s = −1) q(y)

is a weight function. How should the sample distribution q be selected? From (16.13), the optimal sample distribution is q∗ (y) =

1E (y)f (y|s = −1) ∞

∫−∞ 1E (y′ )f (y′ |s = −1) dy′

=

1E (y)f (y|s = −1) , PE

which would give var(P̂ E,IS ) = 0, if it could be computed, which it can’t since it requires knowing PE . This optimal sampling distribution is shown in Figure 16.15(b). While it would not be difficult to approximate q∗ (for example, using an exponential function), a common approach that generalizes well is as follows. What is observed from Figure 16.15(b) is that the optimal sampling distribution places more weight in the E region than the original density fy (y|s = −1). Given the additive nature of the signal (16.14), a readily obtainable sampling distribution is obtained by shifting the noise toward the decision boundary. Let ñ be a “shifted’,’ or “mean translated,” noise ñ = n + 𝜁 , so ñ ∼  (𝜁 , 𝜎 2 ) for some shift 𝜁 . Then fñ (y) = fn (y − 𝜁). This shifted noise will be referred to as the IS noise. Let the observation with this shifted noise be ỹ = s + ñ. The pdf of ỹ (given s = −1) is fỹ (̃y|s = −1) = fñ (̃y − s)|s=−1 = fn (̃y − 𝜁 − s)|s=−1 = fn (̃y − 𝜁 + 1). This is used as the sample distribution: q(y) = fñ (y − s)|s=−1 = fn (y − 𝜁 + 1). The original distribution f (y|s = −1) =  (−1, 𝜎 2 ) has been replaced by a sample distribution q(y) =  (𝜁 − 1, 𝜎 2 ). Figure 16.15(c) shows the sampling distribution q with the shift 𝜁 = 1. In this figure, q is shifted so that its mean lies at the point of the decision threshold 𝜏0 . Half of the samples drawn using q fall in the region E. This is an instance of mean translation , in which the sample distribution is obtained by translating the mean of the original pdf in the integral. (Another alternative would be to use a sample distribution with larger variance, but mean translation is more common in communication problems.) In this example, in which f (y|s = −1) =  (−1, 𝜎 2 ) is replaced by  (𝜁 − 1, 𝜎 2 ), we say that 𝜁 is a new bias point, and the shift amount 𝜁 − 1 is the bias. The selection 𝜁 = 1 is a dominant point [387]. Using this sample density, the IS estimate of PE is computed as follows: 1. Generate noise samples ñ1 , ñ2 , . . . , ñN , with each ñi ∼  (𝜁 , 𝜎 2 ). 2. Produce the IS received signals ỹ i = s + ñi (with s = −1 in this example). 3. Compute the weight function 𝑤(̃yi ) =

 (̃yi ; 0, 𝜎 2 )  (̃yi ; 𝜁 − 1, 𝜎 2 )

=

exp[− 2𝜎12 ỹ 2i ] exp[− 2𝜎12 (̃yi − 𝜁 + 1)2 ]

,

i = 1, 2, . . . , N.

16.11 Importance Sampling

757

4. Compute

N 1 ∑ P̂ E,IS = 1 (̃y )𝑤(̃yi ), N i=1 E i

that is, count how many IS received signals fall in the error region, weighted by the weight function , then divide by N. It is evident from Figure 16.15(c) that there will be many more ỹ i that fall in the error region E compared to drawing yi ∼ fy (y|s = −1), but each of these error counts is weighted by 𝑤(̃yi ), which is a small number, so the net contribution to PE works out to give an unbiased estimate. The variance of this estimator is ] ] [ ∞ [ ∞ 1 1 𝑤(y)2 1E (y)q(y) dy − P2E = 𝑤(y)fy (y|s = −1) dy − P2E . var(P̂ E,IS ) = L ∫−∞ L ∫−∞ This can be estimated by

[ 1 ̂ P̂ E,IS ) = var( L

] L 1∑ 2 2 𝑤(̃yi ) 1E (̃yi ) − P̂ E,IS . L i=1

The variance gain is 𝛾=

var(P̂ E,MS ) var(P̂ E,IS )

=

PE − P2E ∞ ∫−∞

𝑤(y)fy (y|s = −1) − P2E

P̂ E,IS − P̂ 2E,IS

≈ 1 N

N ∑ i=1

𝑤2 (̃yi )1E (̃yi )



.

(16.15)

P̂ 2E,IS

A simulation was performed to demonstrate some aspects of this problem (see Table 16.1). For different values of shift 𝜁, N = 1, 00,000 symbols were generated and the probability of error was estimated using MC and IS methods. This was repeated 10 times. The variance of the different estimates using MC and IS methods were computed. These estimated variances are displayed, along with the value of 𝛾 computed using (16.15). In this simulation 𝜎 2 = 0.1. The true probability of error is PE = Q(2∕(2𝜎)) = 0.00782. Some observations from this experiment:



The IS method is much more accurate than the MC method, as seen by comparing the %error columns. The % error for the IS in almost every instance less than 1%, while for the MC method the errors are much higher, ranging up to 20%.

Table 16.1: IS Simulation Results 𝜁 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

P̂ E,MC

%ErrMC

P̂ E,IS

0.00094 0.00073 0.00097 0.00081 0.00086 0.00088 0.00082 0.00091 0.00071 0.00071 0.00096 0.00084 0.00084 0.00077 0.00078 0.00086

−20.0969 6.73324 −23.9298 −3.48778 −9.87591 −12.4312 −4.7654 −16.264 9.28849 9.28849 −22.6522 −7.32066 −7.32066 1.62273 0.345104 −9.87591

0.000794419 −1.49707 6.88444e−09 1.48479e−10 0.000781498 0.153668 7.05e−09 5.15492e−11 0.000778427 0.546093 8.71667e−09 1.34966e−11 0.000779367 0.425995 1.45e−09 1.4428e−11 0.000781537 0.148749 1.78178e−08 1.75635e−11 0.00077407 1.10276 5.06778e−09 2.07852e−11 0.00077946 0.414091 3.08444e−09 1.28797e−11 0.000777432 0.673217 1.30711e−08 2.1317e−11 0.000777043 0.722907 2.57333e−09 4.031e−11 0.000775506 0.919295 8.91222e−09 1.00685e−10 0.000779201 0.447142 5.67111e−09 2.01995e−10 0.000783985 −0.164008 1.085e−08 1.15386e−10 0.000788395 −0.727431 1.004e−08 2.86059e−10 0.000773905 1.12386 9.97889e−09 3.30281e−10 0.000797485 −1.88882 8.35556e−09 9.59689e−10 0.000759019 3.02565 6.96556e−09 1.218e−09

%ErrIS

Var(P̂ E,MC )

Var(P̂ E,IS )

Var(P̂ E,MC ) Var(P̂ E,IS ) 46.3666 136.762 645.84 100.499 1014.48 243.816 239.481 613.177 63.8385 88.5157 28.0755 94.032 35.0977 30.2133 8.70653 5.71885

𝛾 64.3651 110.176 173.329 246.936 315.327 357.96 354.914 313.076 245.394 168.927 108.621 63.0629 33.0292 15.4904 7.11057 2.8325

758

Low-Density Parity-Check Codes: Designs and Variations



The ratio of the estimated variances var(P̂ E,MC )∕var(P̂ E,IS ) (where the variances are estimated over 10 different estimates) is not exactly the same as 𝛾, but they are generally within an order of magnitude of each other.



Both of these measures achieve their highest value with the shift 𝜁 = 1. 𝛾 = 358 there — the variance using IS is approximately 350 times better than using MC.

These result suggest that the shift 𝜁 should be selected so that the shifted pdf q has its mode (peak value) at the decision threshold. This is portrayed in Figure 16.15(c). Figure 16.16(a) shows the best IS gain 𝛾 (for the best 𝜁) as a function of the noise variance. As the noise variance 𝜎 2 decreases (SNR increases), 𝛾 increases. The figure also shows the probability of error PE . There is a fortuitous benefit indicated: as PE decreases, it would take more iterations to estimate PE well; the increase in 𝛾 offsets the number of iterations that would be required. Figure 16.16(b) illustrates another aspect of IS. As the noise variance increases, the shift 𝜁 increases. Intuitively, as the noise distribution broadens, the sample distribution is pushed further into the error region E.

108

1.3

100

𝛾 (left) PE (right)

1.25

10−1

6

10

𝛾

104

𝜁 (best)

1.2 10−2

1.15

−3

10 102

100

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1.1

10−4

1.05

10−5 0.8

1

0

0.1

0.2

0.3

0.4

𝜎

𝜎

(a)

(b)

2

0.5

0.6

0.7

0.8

2

Figure 16.16: Best gain and shift 𝜁 for an importance sampling example. (a) 𝛾 and PE as a function of 𝜎 2 . PE decreases when 𝛾 increases; (b) z as a function of 𝜎 2 . ◽ Example 16.23 Now consider the situation where there are three possible signals, s ∈ {−1, 0, 1}, and assume that s = 0 is sent. As before, y = s + n and n ∼  (0, 𝜎 2 ). The decision regions are ⎧−1 ⎪ ŝ = ⎨1 ⎪0 ⎩

if y < 𝜏0 = −.5 if y > 𝜏1 = 0.5 if 𝜏0 < y < 𝜏1 .

With s = 0 sent, the error region is E = {y ∈ ℝ ∶ y < 𝜏0 or y > 𝜏1 }. ∞ ∫−∞

1E (y)fy (y|s = 0) dy. The region of integration is shown in Figure 16.17(a). The probability of error is PE = The optimal sampling distribution q∗ is shown in Figure 16.17(b). It puts all of its weight in the region E. As before, computing this and drawing from this would require knowing PE . Instead, a tractable sample distribution is obtained by a Gaussian mixture ) ( ) ( 1 1 2 , 𝜎 + 0.5 − , 𝜎 2 , ñ ∼ 0.5 2 2 as shown in Figure 16.17(c).

−2

x −1

x 0 y

x 1

2

3

2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −3

fy(y|s = 0)

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −3

759

q* (y)

fy(y|s = 0)

16.11 Importance Sampling

−2

−1

(a)

0 y

1

2

3

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −3

fy(y|s = 0) q

−2

x −1

(b)

x 0 y

x 1

2

3

(c)

Figure 16.17: Probability of error for three-symbols detection problem using MC and IS sampling. (a) Distribution for Monte Carlo integration; (b) optimal sampling distribution q∗ for this problem; (c) distribution for IS integration. ◽ Before proceeding with the next communications-related example, consider the description of hyperspherical surfaces. Let K be a d × d covariance matrix (symmetric, positive definite), thought of as the covariance of AWGN of an observation 𝐱 with density  (𝐱; 0, K). Let K̃ = K∕n, so the covariance K̃ decreases with n. The density corresponding to K̃ is  (𝐱; 0; K∕n). Let R(𝐱) = 12 𝐱T K −1 𝐱. The set of points S = {𝐱 ∈ ℝd ∶ 12 𝐱T K −1 𝐱 = c} is a hyperellipsoid centered at 0 (think, roughly, of the shape of American football). Such a surface could represent the set of points where a zero-mean Gaussian with covariance K takes on a constant value: {𝐱 ∈ ℝd ∶  (𝐱; 0, K) = c′ }. Moving constants to the other side and taking a logarithm, this set can be expressed in terms of the representation S. With this background, consider the way of generating a random outcome for points near the set S. 1. Pick a point uniformly at random on a hypersphere of radius 𝜌. This may be done by drawing 𝐮 ∼  (0, Id ), then normalize by 𝐮 ← 𝜌𝐮∕∥𝐮∥. ̃ 2. Add a Gaussian random number with covariance K, ̃ 𝐧̃ ∼  (0, K) ̃ 𝐲 = 𝐮 + 𝐧.

2

0 y2

−2

−4 −4

(a)

−2

0 y1

2

4

4 2

0 y2

−2

−4 −4

(b)

−2

0 y1

2

4

random data, qn (y1,y2)

4

random data, qn (y1,y2)

random data, qn (y1,y2)

Figure 16.18 shows the draws (the dots in the plane), where K = 𝜎 2 I2 with 𝜎 2 = 1 for n = 0.5, 1, and 2. As n increases, the random samples cluster more tightly around the circle of radius 𝜌. To be useful in IS, the pdf of data so generated is needed. This is described as follows.

4 2

0 y2

−2

−4 −4

−2

0

2

4

y1

(c)

Figure 16.18: Points chosen at random on a circle of radius 𝜌 = 2 with Gaussian noise added of covariance K̃ = K∕n, where K = 𝜎 2 I2 with 𝜎 2 = 1. (a) n = 0.5; (b) n = 1; (c) n = 2.

760

Low-Density Parity-Check Codes: Designs and Variations

A S

𝜌

Figure 16.19: An example constraint set A.

Let A be an arbitrary closed set (for example, the set A shown in Figure 16.19). Let r = min R(𝐱). 𝐱∈A

If, as in the figure, A is the region outside a circle (or hypersphere) of radius 𝜌, and K = 𝜎 2 Id , then r=

1 2 𝜌 . 2𝜎 2

It has been shown [51] that the pdf of the data generated around the hypersphere of radius 𝜌 is √

qn (𝐲) =  (𝐲 n; 0, K)nd∕2 exp(−nr)

Γ

( )√ d 2

d

22

√ d2 −1 r

−1

√ I d −1 (n 2r ∥K −(1∕2) 𝐲∥) 2

d

(n ∥K −1∕2 𝐲∥) 2 −1

.

(16.16)

The pdfs are also plotted in Figure 16.18. They have the form of the “rim of a volcano” of radius 𝜌, with the steepness and concentration of the rim depending on n. Example 16.24 [51] Let X = (X1 , X2 ) ∼  (0, 𝜎 2 I) be a two-dimensional random variable. Let A be the exterior to a disk of radius 𝜌 = 10. Then 1 1 r = min 𝐱T K −1 𝐱 = 2 𝜌2 = 50. 𝐱∈A 2 2𝜎 The probability that X ∈ A can be computed as ( 2) ∞ z z exp − dz = exp(−50) = 1.93 × 10−22 . P= ∫10 2 If this were determined using MC techniques by picking points at random using conventional MC estimation, on the order of 1024 instances would need to be generated at random to obtain accurate results. However, this can be determined using IS with far fewer samples by translating the mean to a point chosen at random on a circle of radius 10, as described by the pdf (16.16). Let 𝐮 be uniform on the circle of radius 10, let 𝐧 ∼  (0, 𝜎 2 I2 ) and 𝐲̃ = 𝐮 + 𝐧. Using n = 1, 𝜎 2 = 1, d = 2, and K = 𝜎 2 I2 , using (16.16) the weight function is 𝑤(𝐲) =

 (𝐲; 0, 𝜎 2 I2 ) exp(50) = , q1 (𝐲) I0 (10 ∥𝐲∥)

where I0 (⋅) is the modified Bessel function of order 0. Using this to form the estimate L exp(50) 1∑ P̂ = 1 , L i=1 ∥̃𝐲i ∥>10 I0 (10 ∥𝐲̃ i ∥)

results in an accurate estimate with relatively small L. For example, on a run with L = 10,000, a value P̂ = 1.919×−22 was obtained, an error of about 0.5%. ◽

16.11 Importance Sampling

761

The point of this is that when there are multiple possible outcomes which could occur at some distance from the transmitted point s, it may be possible to use this circular distribution idea to do a mean translation. Example 16.25 We now return to a communication example. In this example, the constellation is in d dimensions with 2d + 1 points consisting of the 𝐬0 = 0 point and points of the form ⎡0⎤ ⎢ ⎥ ⎢⋮⎥ ±𝐬i = ⎢±1⎥ , ⎢ ⎥ ⎢⋮⎥ ⎢ ⎥ ⎣0⎦

i = 1, 2, . . . , d,

where the ±1 amplitude occurs in coordinate i. The constellation is portrayed in d = 2 dimensions in Figure 16.20(a), where the points are [ ] [ ] [ ] [ ] 1 −1 0 0 . 0 0 1 −1 Observations are of the form 𝐲 = 𝐬 + 𝐧, where 𝐧 ∼  (0, 𝜎 Id ). Given that 𝐬0 = 0 is sent, an error occurs if the absolute value of any coordinate exceeds 0.5. The true probability of error given 𝐬0 is sent is 2

PE = 1 − (1 − 2Q(1∕(2𝜎))d .

(a)

(b)

(c)

Figure 16.20: Example constellation. (a) Constellation; (b) error region; (c) spherical offset. For IS there are some options to consider for a sample density q.



(a) One potential sample density shifts the noise in one dimension, always in the positive direction, 𝐧̃ ∼  (𝜁𝐞1 , 𝜎 2 ), where 𝐞1 is the unit vector with 1 in the first component and 𝜁 is an offset size, e.g., 𝜁 = 1.



(b) Another potential sample density uses a Gaussian mixture density in each of the 2d directions 𝐧̃ ∼

d 1 ∑  (+𝜁 𝐬i , 𝜎 2 Id ) +  (−𝜁 𝐬i , 𝜎 2 ). 2d i=1

This shifts the mean in the direction of one of the actual signal points.



(c) A third potential sample density picks a point at random on the d-dimensional hypersphere of radius 𝜁, then uses this as the mean of a d-dimensional Gaussian density, as described above to produce the noise ̃ signal 𝐧.

The IS received signal is now ̃ 𝐲̃ = 𝐬0 + 𝐧. A critical aspect of the IS is the weight function. For a sample 𝐲̃ i , this is computed for each of the three different cases above as follows.

762

Low-Density Parity-Check Codes: Designs and Variations



(a) One-dimensional shift: 𝑤(𝐲̃ i ) =



 (𝐲̃ i − 𝐬0 ; 𝜁 𝐬1 , 𝜎 2 Id )

.

(b) Gaussian mixture over the dimensions: 𝑤(𝐲̃ i ) =



 (𝐲̃ i − 𝐬0 ; 𝐬0 , 𝜎 2 Id )

1 2d

 (𝐲̃ i − 𝐬; 𝜎 2 Id )

∑d i=1

 (𝐲̃ i − 𝐬i − 𝐬; 𝜁 𝐬i , 𝜎 2 Id ) +  (𝐲̃ i + 𝐬i − 𝐬; −𝜁𝐬i , 𝜎 2 Id )

(c) Circular selection: 𝑤(𝐲̃ i ) =

.

 (𝐲̃ i − 𝐬; 𝜎 2 Id ) , qn (𝐲̃ i )

where qn (𝐲̃ i ) is in (16.16). α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

is5.m

A simulation was established which draws L = 500 samples and estimates the probability of error using MC and each of the three IS approaches described above. Each of these simulations was repeated 10 times so that the variance of PE could be estimated. The ratio of these variances, as well as the IS gain 𝛾, is computed for each of the IS methods. In the simulation, various values of 𝜁 can be tried. A value of 𝜁 = 0.5 is generally selected as producing the highest variance ratio. A sample of results is shown in Table 16.2 for d = 2 dimensions and n = 1. The PE estimates are taken from the first of the 10 estimates produced. As the table shows, the MC estimates of the probability of error are far from the true values (because of the small number of iterations), while the IS methods are better. The IS1 method (bias in one direction only) did not fare as well as the method that use the Gauss mixture or the spherical model. The Gauss mixture model appeared to have lower variance, and hence better variance ratios. ◽ Example 16.26 (See, for example, [474].) Let  be an (N, K) linear code with minimum distance dmin . Assume the coded bits ci are mapped via BPSK modulation to signal components si = 2ci − 1 ∈ {−1, 1}. We will call the n-dimensional symbols obtained from code words as the coded symbols. The all-zero codeword thus maps to the coded symbol signal 𝐬0 = (−1, −1, . . . , −1). The squared minimum Euclidean distance between modulated codewords is 4dmin . Let 𝐲 = (y1 , y2 , . . . , yn ) be a vector in the received signal space. If 𝐲 = 𝐬 = (s1 , s2 , . . . , sN ) is the vector corresponding to a coded symbol, then 𝐲 lies on the N-dimensional hypersphere S1 centered at the origin satisfying y21 + y22 + · · · + y2N = N.

(16.17)

The points 𝐲 corresponding to codewords of minimum Hamming distance from the coded symbol 𝐬0 lie on the N-dimensional hypersphere S0 centered at 𝐬0 satisfying (y1 + 1)2 + (y2 + 1)2 + · · · + (yN + 1)2 = 4dmin .

(16.18)

Coded symbols closest to 𝐬0 lie on the intersection of S0 and S1 , which is an (N − 1)-dimensional hypersphere, as suggested by Figure 16.21(a), designated as S0∩1 . Figure 16.21(b) illustrates the same concept from a side view. The idea for IS is that points lying on the sphere S0∩1 are reasonable points to use as translated means in an IS setting. Since those points may not in general be known, an effective approach to IS may be to use the hypersphere sample idea above to select points at random in S0∩1 . Expanding (16.17) and (16.18) and subtracting yields the equation y1 + y2 + · · · + yN = 2dmin − N.

(16.19)

Equation (16.19) describes the (hyper)plane upon which the intersection S0∩1 lies. Let 𝟏 = (1, 1, . . . , 1)T ∈ ℝN . Equation (16.19) can be written as 𝟏T (𝐲 − 𝐩) = 0,

16.11 Importance Sampling

763

Table 16.2: MC and IS Comparison for Digital Communication for Various IS Mean Translation Schemes 𝜎2 0.02 0.05

𝜁 0.5 0.5

PE (true) 8.1 × 10−4 0.05

PE (MC) 0.002 0.036

PE (IS1) 1.9 × 10−4 0.036

PE (IS2) 8.5 × 10−4 0.047

PE (IS3) 8.3 × 10−4 0.047

Var (MC) 1.9 × 10−6 1.7 × 10−4

Var (IS1) 3.1 × 10−10 5.5 × 10−4

Var (IS2) 2.7 × 10−9 5.2 × 10−6

Var (IS3) 8.6 × 10−9 1.9 × 10−6

Var (MC)/ Var (IS1) 6 0.3

Var (MC)/ Var (IS2) 730 33

Var (MC)/ Var (IS3) 226 88

𝛾 (IS1) 1200 0.13

𝛾 (IS2) 328 7

𝛾 (IS3) 160 7

764

Low-Density Parity-Check Codes: Designs and Variations S1

z3

S0 s0

z1 p

p

p

z2

s0 S0∩1 (a)

(b)

(c)

Figure 16.21: Geometry of coded symbol points. (a) The sphere of codewords around 0, and the sphere of codewords around 𝐬0 ; (b) side view of two spheres; (c) plane in ℝN−1 on which intersecting hypersphere lies. where

⎤ ⎡ 2dmin ⎢ N − 1⎥ ⎥ ⎢ ⎢ 2dmin − 1⎥ ⎥. 𝐩=⎢ N ⎥ ⎢ ⋮ ⎥ ⎢ ⎥ ⎢ 2d min ⎢ − 1⎥ ⎦ ⎣ N

⎡ y1 ⎤ ⎢y ⎥ 𝐲 = ⎢ 2⎥ ⋮ ⎢ ⎥ ⎣yN ⎦

𝟏 is the normal to the plane, and 𝐩 is a point on the plane from which the normal vector emerges. Let 𝐲̃ = 𝐲 − 𝐩 represent points on the plane, with 𝐩 as the origin. Further algebraic manipulation of (16.17) and (16.18) shows that ( ( ( ))2 ))2 ) ( ( 2dmin 2dmin d + · · · + yN − = 4dmin 1 − min . −1 −1 (16.20) y1 − N N N ( ) d . This is an equation for the (hyper)sphere S0∩1 centered at 𝐩 = (2dmin ∕N − 1)𝟏 with squared radius 4dmin 1 − min N To use this geometry for IS, randomly selected points on the hypersphere S0∩1 are used as the location to which the means are shifted. This is done as follows. 1. An orthogonal coordinate system with axes z1 , z2 , . . . , zN is established, where the z1 axis lies in the (1, 1, . . . , 1) direction, and the z2 , . . . , zN directions lie on the plane (16.19). This may be done as follows. In N = 3 dimensions form a matrix ⎡1 1 0⎤ T = ⎢1 0 1⎥ . ⎢ ⎥ ⎣1 0 0⎦ This generalizes in obvious ways to larger values of N. Do Gram–Schmidt processes to obtain orthonormalized columns √ ⎤ ⎡1∕ N q 12 q13 ⎥ ⎢ √ Q = ⎢1∕ N q22 q23 ⎥ . √ ⎥ ⎢1∕ N q ⎣ 32 q33 ⎦ For a point 𝐲̃ on the plane centered at 𝐩, let ⎡z1 ⎤ 𝐲̃ = Q ⎢z2 ⎥ ⎢ ⎥ ⎣z3 ⎦

16.11 Importance Sampling

765

or ̃ 𝐳 = QT 𝐲. For example, in N = 3 dimensions, √ √ √ ⎡z1 ⎤ ⎡1∕ N 1∕ N 1∕ N ⎤ ⎡ỹ 1 ⎤ T ⎢z ⎥ = ⎢ q q22 q23 ⎥ ⎢ỹ 2 ⎥ = Q 𝐫̃ . ⎢ 2 ⎥ ⎢ 21 ⎥⎢ ⎥ z y ̃ q32 q33 ⎦ ⎣ 3 ⎦ ⎣ 3 ⎦ ⎣ q31 ⎡1⎤ The vector 𝐳 = ⎢0⎥ (in z coordinates) corresponds to the vector √1 𝟏 (in ỹ coordinates). The vectors N ⎢ ⎥ ⎣0⎦ ⎡0⎤ ⎡0⎤ 𝐳 = ⎢1⎥ and 𝐳 = ⎢0⎥ corresponds to vectors which are orthogonal. ⎢ ⎥ ⎢ ⎥ ⎣0⎦ ⎣1⎦ Thus, the variables z2 , z3 , . . . , zN are coordinates on the plane. √ d , generate points (z2 , z3 , . . . , zN ) on the hypersphere (16.20) at random 2. For a radius 2dmin 1 − min N (z2 , z3 , . . . , zN ) ∼  (0, IN−1 ) √ d (z2 , z3 ) ← 2dmin 1 − min (z2 , z3 )∕ ∥(z2 , z3 )∥ . n 3. Transform this back to 𝐲̃ coordinates. 𝐲̃ = Q𝐳 and then to 𝐲 coordinates, 𝐲 = 𝐲̃ + 𝐩. 4. Add noise on to this: 𝐧̃ = 𝐲 +  (0, K∕n), where K = 𝜎 2 In . 5. The received signal is ̃ 𝐲̃ = 𝐬0 + 𝐧. The sample distribution to be used in computing the weight function for IS is ) ( N √ N 1 1 ∑ 2 exp(−nr) n exp − n (z ) qn (z1 , . . . , zN ) = 2𝜎 2 i=1 i (2𝜋)N∕2 det (K)1∕2 ) ( √ √ ∑N 2 √ (N−3)∕2 I 2r z ∕𝜎 n N−1 (N−3)∕2 i=2 i Γ( 2 ) 2 × ( )(N−3)∕2 , √ (N−3)∕2 √ ∑N 2 r n∕𝜎 i=2 zi where K = 𝜎 2 IN , r =

16.11.3

1 2 𝜌, 2𝜎 2

√ and 𝜌 is near the size of the radius 2 dmin (1 − dmin ∕N).



Importance Sampling for Tanner Trees

In this section, another set of points to use for mean translation is discussed that provides an approach suitable for codes defined on Tanner trees. The work begins with single parity-check codes, then combines these together for more general codes. See [493].

766

Low-Density Parity-Check Codes: Designs and Variations 16.11.3.1

Single Parity-Check Codes

Consider a single (N, N − 1) parity-check code with codewords satisfying c1 ⊕ c2 ⊕ · · · ⊕ cN = 0. By symmetry, we will study the situation when the all-zero codeword is sent. In this section, this is mapped to the symbol 𝐬0 = (s1 , s2 , . . . , sN ) = (1, 1, . . . , 1). Let yn = sn + 𝜈n , where 𝜈n is AWGN with variance 𝜎 2 . By symmetry, decoding at variable node 1 is considered. Decoding can be accomplished using one iteration of a message-passing algorithm using( the tanh rule. Let)𝓁n = Lc yn , where Lc is the ∏N channel parameter 2∕𝜎 2 . Let M(𝓁2 , . . . , 𝓁N ) = 2tanh−1 n=2 tanh(𝓁n ∕2) . The message at variable node 1 is computed as m1 = 𝓁1 + M(𝓁2 , . . . , 𝓁N ). Having computed m1 , a decision is made as

{ 0 ĉ 1 = 1

m1 > 0 m1 < 0.

Points (y1 , y2 , . . . , yN ) such that 𝓁1 + M(𝓁2 , . . . , 𝓁N ) < 0 form an error set E, and points such that 𝓁1 + M(𝓁2 , . . . , 𝓁N ) = 0 form the decision boundary, denoted as 𝜕E. Points lying below the boundary decode to 1 (error); points lying above the boundary decode to 0 (correct). Figure 16.22 shows this decision boundary for N = 3. The boundary forms a saddle surface consisting of nearly planar surfaces delimited by the planes y2 ± y3 = 0. This nearly planar structure follows since, as discussed in Section 15.4.3.1, N ∏ sign(𝓁n )min(|𝓁2 |, . . . , |𝓁N |). M(𝓁2 , . . . , 𝓁N ) ≈ n=2

Of particular interest are the two surfaces closest to the transmitted point (1, 1, 1), denoted as 𝜕E(2) and 𝜕E(3) , where 𝜕E(2) = 𝜕E ∩ {|y2 | ≤ y3 } and E(3) = 𝜕E ∩ {|y3 | ≤ y2 }. The boundary 𝜕E(2) can be approximated by points such that 𝓁1 + sign(𝓁2 )sign(𝓁3 )|𝓁2 | = 𝓁1 + 𝓁2 = 0. The + sign appears because sign(𝓁3 ) = 1 on 𝜕E(2) . Similarly the boundary 𝜕E(2) can be approximated by 𝓁1 + 𝓁3 = 0. The error region E is partitioned into two regions E(2) = E ∩ {y2 ≤ y3 } and

E(3) = E ∩ {y3 ≤ y2 },

suggested by the shading in Figure 16.22. Let PE(2) and PE(3) be the error probabilities in the two regions E(2) and E(3) , PE(2) = P((y1 , y2 , y2 ) ∈ E and y2 ≥ y3 ) PE(3) = P((y1 , y2 , y2 ) ∈ E and y2 ≤ y3 ) and the overall probability of error is PE = PE(2) + PE(3) . By symmetry, only PE(2) needs to be computed, and PE = 2PE(2) .

(16.21)

16.11 Importance Sampling

767 1

y1

0.5 0 b2

∂E(2)

∂E

(3)

−0.5 b3

−1 1

0 y2 −1 −1

−0.5

y3

0

0.5

1

Figure 16.22: Decision boundary for (3, 2) single parity check coding (𝜎 = 0.1). Squares denote codewords. For IS, the points on the two planar surfaces closest to the transmitted point (1, 1, 1) are used as the dominant points, where 𝐛2 = (0, 0, 1) ∈ 𝜕E(2)

and

𝐛3 = (0, 1, 0) ∈ E(3) .

That is, the mean is shifted in the IS simulation so that instead of transmitting 𝐬0 either the point 𝐛2 or the point 𝐛3 is transmitted. (That is, the mean is shifted by 𝐛2 − 𝐬0 .) The densities in the IS weight function are ̃ ∼  (𝐲; ̃ 𝐬0 ; 𝜎 2 I) f (𝐲) so that ̃ = 𝑤(𝐲)

̃ ∼  (𝐲; ̃ 𝐛2 ; 𝜎2 I) f̃ (𝐲)

] [ ] [ ∥𝐲̃ − 𝐬0 ∥2 − ∥𝐲̃ − 𝐛2 ∥2 ̃ f (𝐲) 1 (̃ y + y ̃ − 1) . = exp = exp − 1 2 2𝜎 2 𝜎2 ̃ f̃ (𝐲)

(16.22)

The IS estimate of the probability of error is as follows. Select a number of iterations L. For each iteration i, draw 𝐲̃ i = 𝐛2 + 𝐧, where 𝐧 ∼  (0, 𝜎 2 I). Then compute the log-likelihood ratios 𝓁 = 𝐲̃ Lc , where Lc = 2∕𝜎 2 . Compute m1 = 𝓁1 + M(𝓁2 , . . . , 𝓁N ). If m1 is < 0 (decoding error) and ỹ 2 < ỹ 3 (so the error falls in E(2) ), accumulate the weighted error. The error estimate P̂ E(2) is thus L 1∑ P̂ E(2) = 1 (2) (𝐲̃ i )𝑤(𝐲̃ i ). L i=1 E Then by (16.21),

P̂ E = 2P̂ E(2) .

From Figure √ 16.22, it can be seen that the distance from the codeword (1, 1, . . . , 1) to the dominant point 𝐛2 is 2. By the union bound, an upper bound on the probability of error can thus be obtained as (√ ) 2 PE ≤ (N − 1)Q . (16.23) 𝜎 IS can be generalized to single parity checks in N dimensions. The error region E is defined as E = {𝐲 ∈ ℝN ∶ 𝓁1 + M(𝓁2 , . . . , 𝓁N ) < 0}.

768

Low-Density Parity-Check Codes: Designs and Variations Table 16.3: IS Simulation Results for Single Parity-Check Codes of Length N. (See [493]) P̂ b (MC) P̂ b (IS) P̂ b (bound) N Eb ∕N0 (dB) 𝜎 −7 −7 3 3 0.2739 2.2 × 10 2.41 × 10 2.41 × 10−7 −60 3 20 0.0866 — 7.3 × 10 6.04 × 10−60 20 10 0.2294 — 6.30 × 10−9 6.72 × 10−9 −83 20 20 0.0725 — 1.22 × 10 1.18 × 10−83

This is partitioned into disjoint regions E(k) = E ∩ {𝐲|𝓁k ≤ 𝓁i ∀i = 2, . . . , N}, k = 2, . . . , N. The bias points for each of these regions is given by 𝐛k = (0, 1, 1, 0k th , 1, . . . , −1),

k = 2, 3, . . . , N

and the importance weight function is the same as (16.22). By symmetry, it is still sufficient to estimate only P̂ E(2) . The overall probability of error estimate is then P̂ E = (N − 1)P̂ E(2) ,

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

and the error bound is given by (16.23). Some results of applying this to single parity-check codes of length N = 3 and N = 20 are shown in Table 16.3. The MC result (conventional simulation) is obtained using 108 codewords simulated. MC results are computed only for the N = 3 code at Eb ∕N0 = 10 dB, since the probabilities of error are far too small in the other cases in the table. The IS results were obtained with 10,000 codewords at N = 3 and Eb ∕N0 = 10 dB, and 2000 iterations otherwise. (There was, of course, some variance in the IS results, so different results are obtained with every simulation run. At longer lengths and higher SNR, the MC methods tend to overestimate the probability of bit error, dominated by a small number of received vectors with large weight.)

spcxiaryan1.m

16.11.3.2

Symmetric Tanner Trees

The methods presented above are now extended to symmetric Tanner trees, which are trees in which all nodes on the same layer have identical connection degrees. The notation (d𝑣 , dc , 𝓁) is used, where 𝓁 is the number of layers (with root at layer 0), dc is the check node degree, and d𝑣 is the variable node degree, except that root has degree d𝑣 − 1 and the leaves have degree 1. Figure 16.23(a) shows 1

L0

L1

2

4

3

L2

1

L0 5

L1

2

4

3

5

L2 6 7 8 9

6 7 8 9 (a)

(b)

Figure 16.23: A (3, 2, 2) tree and labeling for a particular boundary. (a) Tree; (b) indication of variable nodes for the 𝜕E(2,4,6,8,14,16) boundary.

16.11 Importance Sampling

769

a (3, 3, 2) degree. Trees such as this can be obtained by “unfolding” a bipartite graph representing a code, then eliminating nodes duplicated by this unfolding process. At node 𝑣1 the decision boundary is described by 𝓁1 + M(𝓁2 , 𝓁3 ) + M(𝓁4 , 𝓁5 ) = 0. Define a boundary set 𝜕E(2,4) = 𝜕E ∩ {|𝓁2 | ≤ 𝓁3 } ∩ {|𝓁4 | ≤ 𝓁5 }. As for the single parity-check code case this set is approximated for large SNR by the plane 𝓁1 + 𝓁2 + 𝓁4 = 0. The dominant point associated with this boundary is 𝐛(2,4) = (0, 0, 1, 0, 1). The error subset corresponding to 𝐛(2,4) is E(2,4) = E ∩ {𝓁2 ≤ 𝓁3 } ∩ {𝓁4 ≤ 𝓁5 }. Observe that by symmetry, other error regions that could have been chosen are E(2,5) , E(3,4) , E(3,5) , each yielding the same performance. The selected subset E(2,4) thus has a multiplicity of 4. In actual decoding, the channel messages 𝓁2 through 𝓁5 are replaced by decoded messages passed up from the layer below, which we denote as 𝓁̂2 through 𝓁̂5 , where 𝓁̂2 = 𝓁2 + M(𝓁6 , 𝓁7 )

𝓁̂4 = 𝓁4 + M(𝓁14 , 𝓁17 ).

Thinking in terms of the variables used in each of these tanh rules, for sufficiently large SNR, there is a linear approximation to a decision boundary given by 𝓁1 + 𝓁2 + 𝓁4 + 𝓁6 + 𝓁8 + 𝓁14 + 𝓁16 = 0,

(16.24)

which corresponds to a boundary region 𝜕E(2,4,6,8,14,16) = 𝜕EP (2, 4) ∩ {|𝓁6 | ≤ 𝓁7 } ∩ {|𝓁8 | ≤ 𝓁9 } ∩ {|𝓁14 | ≤ 𝓁15 } ∩ {|𝓁16 | ≤ 𝓁17 } (The subset E(2,4,6,8,14,16) has a multiplicity of 16.) The dominant point associated with the boundary 𝜕E(2,4,6,8,14,16) is denoted as 𝐛(2,4,6,8,14,16) in which bk is 0 if 𝓁k appears in (16.24), and is 1 otherwise, for k = 1, . . . , 21. The nodes with bk = 0 are said to be biased. The biased variable nodes associated with this boundary are indicated in Figure 16.23(b) with shading. For brevity in the discussion below, 𝜕E(2,4,6,8,14,16) will be denoted simply as E′ . In general, to do IS simulation for a (d𝑣 , dc , l) tree: 1. Find the set of biased (colored) variable nodes. (a) Shade variable node 1 (the root of the tree). (b) For each shaded variable node in the current layer, locate all check nodes connected from below, and for each of these check nodes, shade the left variable node in the next layer. (c) Move down to the next layer, and repeat step (b) until the last layer is reached. (d) Form a bias vector with 0 in the biased variable positions. The number of bias (shaded) variables is nb = 1 + (d𝑣 − 1) + (d𝑣 − 1)2 + · · · + (d𝑣 − 1)l . The number of symmetric error subsets is NE = (dc − 1)nb −1 .

770

Low-Density Parity-Check Codes: Designs and Variations Table 16.4: Eb ∕N0 (dB) 3 4 6 8 10 12

IS Simulation Results for a (3, 6, 3) Decoding Tree. 𝜖 = 10% [493] P̂ b (MC) P̂ b (MC) Pb (bound) −5 8.6 × 10 8.75 × 10−5 0.0137 −7 −7 6.9 × 10 6.76 × 10 2.5 × 10−5 — 8.75 × 10−14 4.36 × 10−13 −25 — 1.56 × 10 2.53 × 10−25 −44 — 1.07 × 10 1.168 × 10−44 — 2.94 × 10−75 3.02 × 10−75

L̂ 𝜖 105 27,000 5300 2300 2000 2400

The error region E′ is given by E′ = ∩𝑣≠𝑣1 {𝓁̂𝑣 ≤ 𝓁̂𝑣′ , ∀𝑣′ ∈ C(𝑣)} ∩ E, where C(𝑣) is the set of variable nodes having the same parent as 𝑣. For example, in Figure 16.23, C(𝑣2 ) = {𝑣2 , 𝑣3 }. With these preliminaries, P̂ E′ can be obtained from an IS simulation for E′ , and the error probability is computed from P̂ E = NE P̂ E′ . √ Since the distance from 𝐬0 to E′ is nb , the probability of error can also be bounded, as in the single parity case, by (√ ) nb . (16.25) PE ≤ Ne Q 𝜎 Table 16.4 (copied from [493]) illustrates performance for a (3, 6, 3) code. The column P̂ b (MC) shows the results for conventional probability of error simulations, which can only be computed for relatively high bit error rates. The column Pb (bound) shows the bound computed using (16.25). For higher SNRs, this provides a good estimate of the error. Also shown is the estimated number of IS codewords to be generated to estimate the probability of error within 𝜖 = 10% error. 16.11.3.3

General Trees

As the previous subsections illustrated, if a tree is symmetric then the error probability can be estimated using only one error subset, since all error subsets have equivalent error performance by symmetry. While general trees do not have the same global symmetries, local symmetries may be used to reduce complexity. Consider the tree shown in Figure 16.24(a), with a subset E(2,4) represented by the shaded variable nodes. Equivalent subsets correspond to E(2,5) , E(3,4) , and E(3,5) , so E(2,4) has multiplicity 4, since 1

L0 L1 2

4

3

L2

1

L0 5

L1 2

3

4

8 9

10

5

L2 6 7

8 9

6 7 (a)

11

(b)

Figure 16.24: Trees which are not symmetric. (a) Variable nodes 2 and 3 and nodes 4 and 5 isormophically interchangeable; (b) variable nodes 2 and 3 isormophically interchangeable.

16.12 Fountain Codes

771

nodes 2 and 3 and nodes 4 and 5 can be swapped to produce isomorphic trees. Selecting E(2,4) as the representative, it can be further partitioned into E(2,4,6) and E(2,4,7) , each having multiplicity 8. By contrast, in Figure 16.23(b), a subtree has been added to variable node 4, so that 4 and 5 cannot be isomorphically interchanged. In this case, the four subsets E(2,4) , E(2,5) , E(3,4) , and E(3,5) are not symmetric. However, the sets E(2,4) and E(2,5) may both be selected, each having multiplicity 2. These can be further partitioned into subsets E(2,4,6,10) and E(2,5,6) , having multiplicities eight and four, respectively. In general, as this example illustrates, the procedure for dealing with an arbitrary tree is to identify isomorphisms among variable nodes connected above to the same check node, then determine representative subsets with their multiplicities. This proceeds from the top layer down to the bottom layer. 16.11.3.4

Importance Sampling for LDPC Codes

To apply these ideas to an LDPC code (or other linear block code), the Tanner graph is unfolded into a tree, continuing the unfolding process until all variable and check nodes have been considered. Figure 16.25(a) illustrates a Tanner graph for a (9, 3) code which (since one of the check equations is redundant) represents a rate 4/9 code. The initial unfolding of this Tanner graph using 𝑣1 as the root is the tree shown in Figure 16.25(b). Because the code is finite, as the unfolding proceeds, some of the variable nodes are duplicated. In this example, nodes 𝑣2 , 𝑣3 , 𝑣6 , and 𝑣8 are duplicated. These duplicated nodes are an indication of cycles in the Tanner graph. Since the IS technique developed here is based on analysis on trees, it does not take the cycles into account. To apply the analysis, the duplicates must be removed, which results in a performance underestimate (that is, estimates worse performance than actually occurs). When a duplicated variable node is removed from a tree, the check nodes connected to it must also be removed. The tree pruned based on these principles, in which four duplicated variable nodes on the right and their connected check nodes are removed, is shown in Figure 16.25(c). This tree happens to be identical to the tree in Figure 16.24. The probability of error can be estimated to be 8PE(2,4,6) , (2,4,6) , which has multiplicity 8. For high SNR, the where PE(2,4,6) is the probability for the error ( subset ) E performance can be bounded as PE ≤ 8Q



4 𝜎

, where the 4 is the number of biased variable nodes.

c1 c1

c2

c3

c4

c5

c6

v4 c6

v1

v2

v3

v4

v5

v6

v7

(a)

v8

v9

v3 v8

c1

c4

v5

v7

c5 c2

v2 v6

v2 v8 (a)

v9

v4

v7

v5

v9

c5

c3

v3 v6

c4

v3 v8

v2 v6 (c)

Figure 16.25: Tanner graph for (9, 3) regular code, and its unfolding into a tree. (a) Tanner graph for the code; (b) unfolded tree; (c) pruned unfolded tree. While this procedure was fairly straightforward for this short example code, complications arise for longer codes. Longer codes have more ways that structural complexity can occur in the tree. Also, the performance is underestimated due to removal of cycles. Despite this, the IS method can be used to obtain effective upper bounds on performance.

16.12 Fountain Codes A digital fountain is an analogy drawn from a sprinkling water fountain. When you fill a cup from a water fountain, you do not care which particular drops of water fall in, but only that the cup fills with

772

Low-Density Parity-Check Codes: Designs and Variations enough water. With a digital fountain, data packets may be obtained from one or more servers, and once enough packets are obtained (from whatever source) that an original file, transmission can end. Which particular packets are obtained does not matter [312]. In an ideal that is not quite attained, for a message that would require receipt of k packet of data, a digital fountain receiver can receive any k packets and be able to quickly reconstruct the desired message. In practice, some overhead is incurred, so that it is necessary to receive (1 + 𝜖)k packets. A fountain code is a means of representing the data in a way that provides the capabilities of a digital fountain. A digital fountain can be used, for example, to create efficient multicast, in which a number of users desire to download the same file, at roughly the same time. (Think of streaming a newly released movie, or a software update.) If packets are sent uncoordinated (not in a fountain sense) and packets are subject to loss, when users request retransmission (as in an ARQ protocol), there could be an explosion of feedback. Using fountain-coded data can prevent the need for retransmission: each user simply collects enough packets to reconstruct its data. There are several ways of implementing digital fountains. They are not all, strictly speaking, LDPC codes, but many of the ideas are related to Tanner graphs. 16.12.1

Conventional Erasure Correction Codes

Erasure correction codes, such as Reed–Solomon codes, can be used to implement a digital fountain. A code of minimum distance d can recover up to d − 1 erasures. The RS codes are MDS (see Section 6.2.4). Encoding/decoding is conceptually straightforward, but can incur undesirable computational complexity. The decode (and encode) complexity is O(k2 ), which may be too much for moderate to long values of k. The codes below trade off some rate (they are not minimum-distance separable) for lower computational complexity. 16.12.2

Tornado Codes

Tornado codes are based on Tanner-like graphs. A tornado code encoder is a cascade of s + 1 rate 1∕(1 + 𝛽) LDPC encoders, followed by a rate 1 − 𝛽 erasure correcting code (such as a Reed–Solomon code). Let B0 be the bipartite graph shown in Figure 16.26(a). There are K message bits on the left and 𝛽K , 0 < 𝛽 < 1 check bits on the right. While this looks like a Tanner graph for an LDPC code, it operates somewhat differently. Associated with this graph is a code C(B), in which each check bit on the right is computed simply as the modulo-2 (⊕) sum of its neighboring bits on the left. For this stage there are K bits in, and K + 𝛽K = K(1 + 𝛽) bits out, with the message bits appearing systematically, to produce the codeword (x1 , x2 , . . . , xk , c1 , . . . , c𝛽K ). If there is an erasure on a message bit, it is recovered using the code structure suggested by Figure 16.26(b). Knowing the other message bits and the check bits, a bit can be recovered, as for bit x3 in the figure. In general, decoding of message bits can be accomplished using an algorithm analogous to the peeling algorithm of Section 15.4.9. To protect the check bits, an additional layer of check bits can be added, as shown in Figure 16.26(c). Decoding of bits not in the last check layer proceeds from right to left. In general, construct a family of codes C(B0 ), C(B1 ), . . . , C(Bm ) from a family of graphs B0 , . . . , Bm , where Bi has 𝛽 i K left nodes and 𝛽 i+1 K right nodes. The sequence of coded bits terminates with an erasure correcting code C of rate 1 − 𝛽, with 𝛽 m+1 K message bits input and K𝛽 m+2 ∕(1 − 𝛽) check bits, for which it is known how to recover from the random loss of a fraction of up to 𝛽 of its bits with high probability. (This final erasure code may be, for example, a Reed–Solomon code.) This is portrayed in Figure 16.26(d). The overall code C(B0 , B1 , . . . , Bm , C) has K message bits and ∑

m+1 i=1

𝛽 iK +

𝛽 m+2 K K𝛽 = (1 − 𝛽) 1 − 𝛽

16.12 Fountain Codes

773

x1

x1

c1

x2

x2

c2

x3

x3 = c2 ⊕ x1 ⊕ x2

c𝛽N

xN

xK

Check Message bits bits (a)

c1 = x1 ⊕ x2 ⊕ x3 c2 c𝛽K

Message Check bits bits (b)

x1 x2 x3 xN Message 𝛽K check bits bits

𝛽 2 K check bits

(c)

Encoding

x1 x2 x3

𝛽 m+ 1 K check bits

x4

Rate 1–𝛽 erasure correcting code

x5 x6 xN Message bits

𝛽K check bits

𝛽 2 K check bits

(d)

Decoding

Figure 16.26: Graph defining Tornado codes. (a) One stage of code graph; (b) decoding an erased value; (c) two stages of check bits; (d) overall Tornado code with m + 1 stages.

check bits. The overall rate is R=

K = 1 − 𝛽. K𝛽∕(1 − 𝛽)

Encoding proceeds left to right, computing checks, then checks on checks, and so on, then through the erasure correcting code. Decoding proceeds right to left, first decoding using the erasure correcting code to correct any missing check bits at the last stage, then propagating back through the layers of checks until the message bits are decoded. The key to an effective Tornado code is to select the degree distributions at each graph B0 , . . . , Bm that makes it possible to correct a fraction 𝛽(1 − 𝜖) of erasures with high probability. This is described

774

Low-Density Parity-Check Codes: Designs and Variations in [282]. The connections between the left and right nodes of each node are chosen sparsely, so that the encode and decode complexity is linear in the size of the code. The end result is that this construction is able to recover a message from a random set of K(1 + 𝜖) coordinate locations (codeword bits) with high probability, and with a decode complexity that is linear in the number of message bits. 16.12.3

Luby Transform Codes

Luby transform (LT) codes are rateless, which means practically that they more nearly achieve the idea of collecting drops of water (data) until the cup (packet) is full. “The decoder can recover an exact copy of the transmitted data from any set of the generated encoding symbols that are in aggregate only slightly longer in length than the original data. No matter what the loss is on the erasure channel, encoding symbols can be generated as needed and sent over the erasure channel until a sufficient number have arrived at the decoder in order to recover the data” [281]. There is a degree distribution associated with the message nodes which ensures the properties described above. The encoding process for LT codes is as follows. To generate an encoding symbol:

• • •

Randomly choose the degree d of the encoding symbol from a degree distribution. Choose d random distinct input symbols. The parity packet is the modulo-2 sum of the d neighbors.

To do the decoding the decoder needs to know the degree and set of neighbors of each encoding symbol. This might be accomplished, for example, by employing the same random number generator initiated with the same seed at both the encode and decode side. Design of degree distributions is described in [281]. 16.12.4

Raptor Codes

Raptor codes [410] introduce some pre-coding to the LT code which reduces the encode–decode complexity. The basic idea is as follows:

• •

K input symbols are encoded by an (N, K) linear block code . These N symbols are then encoded with an LT code, having a degree distribution represented by Ω(x) on the N symbols.

The raptor code is characterized by the parameters (K, , Ω(x)). Raptor codes relax the LT code requirement that all K input symbols are recoverable, requiring that only a constant fraction of the input symbols be recoverable.

16.13 References RA codes were introduced in [94]. IRA codes were introduced in [223]. See also [373]. IRA codes having structure to reduce hardware complexity, such as QC structure in Hm , have been investigated; see, e.g., [433, 506, 508]. IRAA codes are described in [503], where it is shown that the error floor may be lower while the waterfall region is worse. A variation on multiple accumulator codes is the family of accumulate-repeat-accumulate codes [3, 4]. These encoders add an accumulator to a subset of the input bits. These accumulated bits (or some puncturing of them) are then passed into the usual permuter/accumulator stage. Encoding hardware of QC codes are discussed in [267], which also discusses how to find the generator from a QC parity check matrix and cites literature on QC LDPC codes. More on QC LDPC codes can be found in [63, 64, 258, 259]. Discussion of matrix dispersions was drawn from [386], which, in turn, draws from [66, ?].

16.13 References EG-LDPC codes can be traced back to the 1960s [107, 384, 385]. More recently, see [254]. For a discussion of polynomial theory related to these computations, see [246]. PEG was presented in [203]. The improvement to produce larger ACE was proposed in [494]. An alternative to PEG for creating Tanner graphs is a trellis-based method for removing short cycles from a Tanner graph, presented in [256]. There are many other algorithms for LDPC design. For example, a superposition approach has been investigated which replaces each element of matrix satisfying the RC constraint with other matrices (analogous to replacing elements of a GF(q) matrix with a matrix dispersion). These approaches are described in [495, 496]. An family of approaches is based on combinatorial objects known as combinatorial designs [10, 232, 257]. Several LDPC constructions have been reported based on combinatoric objects. Constructions based on Kirkman triple systems are reported in [233, 234], which produce (3, k)-regular LDPC codes whose Tanner graph is free of four-cycles for any k. Constructions based on Latin rectangles are reported in [464], with reportedly low encode and decode complexity. Designs based on Steiner 2-designs are reported in [463], [236], and [388]. See also [465]. High-rate LDPC codes based on constructions from unital designs are reported in [235]. Constructions based on disjoint difference sets permutation matrices for use in conjunction with the magnetic recording channel are reported in [417]. LDPCCC are presented in [118, 355]. Protograph designs were introduced in [370, 441] and were further developed in [96] and [95]. Discussions of efficient decoder architectures appear in [237, 372]. A generalization of protograph codes is the MET codes. In these codes, groups of repeated nodes may be combined to share permutation matrices on edges emanating from them. These are discussed in [368, 377]. For more recent work, see, e.g., [217]. For some design work using MET, see [198]. Discussion of error floors follows [[386], Section 15.3] and [183]. Because of the importance of low error floors, there has been a lot of research studying the problem and seeking code designs that mitigate the effect. [369] describes some experimental ways of looking for error floors and some analytic tools for estimating the effect of error floors. Because the error floor phenomenon is associated with low error rates, it is difficult to evaluate using software decoders, so decoding hardware is used for some of these studies [504, 507]. Design studies are in [503]. Analysis of trapping sets is in [308]. There has also been some work on modified decoding algorithms to avoid the problem of trapping sets [186]. An algorithm for approximately identifying trapping sets is given in [199]. Another analysis technique is based on absorbing sets, which are combinatorial and hence do not depend on the particular decoder [101]. These have been used with IS to obtain good probability of error estimates in the error floor region. Importance sampling in communication is surveyed generally in [213]. The excellent survey paper [414] provides intuitive explanations and a large number of references, placing the mean translation methods described here in context with other methods such as variance scaling. [280] is “improved” using mean translation techniques. [51] describes efficient estimators using the Gaussian mixture and spherical bias idea. Using this spherical idea for block codes is described in [474]. Another mean translation idea for block codes is described in [492], where it is applied to single parity codes then generalized to more general linear block codes. See also [419], [387]. The paper [75] goes into some depth on using trapping sets as dominant points for IS. This may be the recommended method for IS simulation of LDPC codes. Tornado codes are detailed in [282], including how to design the degree distributions for each bipartite graph to achieve high decoding probability.

775

Part V

Polar Codes

Chapter 17

Polar Codes 17.1 Introduction and Preview Polar codes, invented by Erdal Arıkan, are the first family of codes proven to achieve Shannon channel capacity. The codes have low encoding and decoding complexity: for a code of length N, both the encoding and the decoding complexity is O(N log N). Unlike many modern codes such as turbo codes or LDPC codes, decoding is noniterative. There are explicit code design algorithms and there is great flexibility in selection of the rate for virtually any rate below the channel capacity. As an introduction, Figure 17.1 highlights some aspects of the probability of error performance of polar codes. In Figure 17.1(a), frame error rates and bit error rates for a rate R = 1∕2 code of a “moderate” length of N = 2048 are shown. The red line shows the performance of the basic “successive cancellation” (SC) decoder, the O(N log N) decoder originally proposed. This is described in Section 17.4.7. Also shown is the result of list decoding for list sizes of L = 2, 16, and 32. List decoding provides some improvement over SC decoding, particularly at low SNRs. However, at larger SNRs, the performance is essentially the same for all list sizes. List decoding can be easily concatenated with a CRC code, producing modified polar codes. As the figure shows, the performance of the modified polar code improves with list size, reaching a frame error rate (FER) of about 10−5 for only 2 dB. The list decoding complexity, using the algorithm described in Section 17.9, is O(LN log N). Figure 17.1(b) provides some more results, comparing polar code decoding with a rate R = 0.84 (2048, 1723) LDPC code used in the 10GBASE-T standard (black line). With SC decoding of a (2048, 1723) polar code (blue line), the performance is inferior to the LDPC code. Comparable results can be obtained using a longer (32768, 27568) code. Because polar decoding is noniterative, this may be comparable to LDPC decoding in terms of complexity. Using list decoding (magenta line), even the much shorter (2048, 1723) polar code performs at the same level as the LDPC code. We present in this introductory section the basic idea of polarization. The ideas touched on here are fleshed out in the remainder of this chapter. Let W denote a channel that accepts binary inputs. In Figure 17.2(a), the channel W is used twice, once with input u0 and a second time with input u1 . (These uses may be in succession, such as sending multiple bits down a single channel.) The channel outputs are denoted y0 and y1 . The mutual information between u0 and y0 is denoted I 0 and the mutual information between u1 and y1 is denoted as I 1 . Since the same channel is used for each transmission, we have I 0 = I 1 = I(W), where I(W) is the mutual information associated with the channel W. In this case, the two uses of the channel are equally reliable. In Figure 17.2(b), we use the W twice again, but over the first W we send u0 ⊕ u1 and over the second W we send u1 . Let Ic0 now denote the mutual information between u0 and the collective output (y0 , y1 ) in this coded scenario and let Ic1 denote the mutual information between u1 and the collective output (y0 , y1 , u0 ). u1 appears in (influences) both y0 and y1 , so that we have a little bit more information about u1 than we do about u0 . And since u1 ⊕ u0 scrambles u0 by u1 , there is a little less information about u0 than in the original transmission setting. Hence, with details to be developed below, Ic0 < I(W) < Ic1 . By this simple coding scheme, we have artificially created a channel Ic1 which is better than it was without coding, at the same time creating a channel Ic0 which is worse than without coding.

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

780

Polar Codes

100

10−2

10−5

−4

2 Eb/N0(dB)

10−4

PC (2048,1024) List decode L = 2 List decode L = 16 List decode L = 32 CRC/List L =2 CRC/List L =16 CRC/List L =32

10−6 2

3

Eb/N0(dB)

3

FER

10 1

1

100

BER

10−2 FER

100

BER

100

10−5

10−10

10−6

3.8 4 10−8

PC (2048,1732) PC*(32768,27568) PC list CRC(2048,1723) PC(32768,27568) LDPC (2048,1723)

10−10 10−12

3.8

4

4.2 4.4 4.6 Eb/N0

(a)

4.2 4.4 4.6 Eb/N0

(b)

Figure 17.1: Error correction performance of polar codes compared with an LDPC code of the same rate. (a) Results for a (2048,1024) polar code, with list decoding (see [431]); (b) comparison of R = 0.84 polar codes with LDPC code of same rate (see [391]). (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/colorpages). 𝑢0

𝑢1

W

W

y0

𝑢0

y1

𝑢1

W

W

(a) 𝑢0

W

𝑢1

W

𝑢3

W (c)

y1

(b)

W

𝑢2

y0

y0

𝑢0

W

𝑢4 y1

y2

W

𝑢2

W

u6

W

𝑢1 y3

W

u5

W

u3

W

u7

W

y0 y1 y2 y3 y4 y5 y6 y7

(d)

Figure 17.2: Basic ideas of polarization. (a) Two uses of the channel W; (b) simple coding and two channel uses; (c) recursively applying the coding construction, N = 4; (d) recursively applying the coding construction, N = 8 with some bits frozen.

17.2 Notation and Channels

781

The coding construction can be applied recursively, as in Figure 17.2(c) which shows N = 4 (note the particular input order, which is discussed below), and in Figure 17.2(d) which shows N = 8. As this recursive construction is repeated and N becomes large, some of the bit channels become a lot better, eventually becoming so good they are nearly perfect. At the same time, some of the channels become a lot worse, becoming so bad they cannot carry any reliable information. This is what is meant by polarizing. Significantly, the proportion of the nearly perfect channels approaches the channel capacity of the channels W, which is I(W). In polar coding, to transmit at rate R = K∕N, select the K best channels to transmit information and “freeze” the remaining N − K bits to 0. In Figure 17.2(d), the frozen bits are indicated with gray, so u3 , u5 , u6 , and u7 are unfrozen “information” bits. This chapter develops the basic ideas of polarization, describes how to do encoding (both for nonsystematic and systematic representations), how to design polar codes, how to do successive cancellation decoding and simplified successive cancellation decoding, and how to do efficient list decoding and modified polar codes. The basic theorems describing how polarization happens are stated and proved. As the chapter begins, the development of the polar coding idea is deliberately leisurely, presenting major ideas first for codes of length N = 2 to build some concrete understanding, then generalizing to codes of larger length.

17.2 Notation and Channels A sequence x0 , x1 , . . . , xN−1 may be written as x0N−1 . Note that the convention of 0-based indexing is j employed throughout this chapter.1 The subsequence xi , xi+1 , . . . , xj may be written as xi . If j < i, then j j j xi is an empty sequence. The notation xi,even denotes the subsequence with even indices, and xi,odd denotes the subsequence with odd indices. For example, x03 = (x0 , x1 , x2 , x3 )

3 3 x0,even = (x0 , x2 ) x0,odd = (x1 , x3 ).

The notation x¬j denotes the sequence x0N−1 (the range understood by context) with the jth element removed: x¬j = (x0 , x1 , . . . , xj−1 , xj+1 , . . . , xN−1 ). If  is a subset of {0, 1, . . . , N − 1}, then u denotes the subsequence (ui ∶ i ∈ ). is equal to a sequence û N−1 means that they are equal To say that a sequence uN−1 0 0 element-by-element: = û N−1 ↔ ui = û i ∀i ∈ 0, 1, . . . , N − 1. uN−1 0 0 Random variables are denoted with upper-case letters, such as U1 or X1 . Particular outcome values of random variables are denoted by corresponding lower-case letters, such as u1 or x1 . A discrete memoryless channel accepts inputs from a discrete alphabet  and produces outputs from a discrete alphabet . The channel is memoryless if the conditional probability distribution of the output y given some input x, p(y|x), at some time depends only on the input at that time and is conditionally independent of channel inputs or outputs at other times. A channel which is discrete and memoryless is denoted as a DMC. A channel for which  = {0, 1} (or some other binary-valued alphabet) is referred to as a binary DMC, or B-DMC. In this chapter, work on polar codes is largely restricted to B-DMCs. Let W ∶  →  denote a B-DMC, where the input alphabet is  = {0, 1} and  is an output alphabet depending on the channel. For example, for a binary erasure channel (BEC),  = {0, 1, ?}. Channel transition probabilities are denoted as W(y|x) for x ∈ , y ∈ . The notation W N (with 1

Readers interested in one-based indexing may refer to [16].

782

Polar Codes superscripted N) is used to denote N uses of (or copies of) the channel, with W N ∶  N →  N . Due to the memoryless nature of the channels, ∏

N−1

W

N

(yN−1 |x0N−1 ) 0

=

W(yi |xi ).

i=0

Definition 17.1 A B-DMC W ∶  →  is symmetric if there exists a permutation 𝜋 ∶  →  such that 𝜋 = 𝜋 −1 and W(y|0) = W(𝜋(y)|1) for all y ∈ . ◽ Examples of symmetric channels include the binary symmetric channel and the BEC. For a symmetric channel W ∶  → , with X ∈  and Y ∈ , denote the mutual information as I(X; Y) as I(W). For a binary symmetric channel with symmetric probabilities on the input P(X = 0) = P(X = 1) = 12 , the mutual information is (see (1.38)) I(W) =

∑∑1 W(y|x) . W(y|x) log 1 1 2 W(y|0) + W(y|1) y∈ x∈ 2 2

(17.1)

Recall that for symmetric channels with equiprobable inputs — as in this case — this mutual information is the channel capacity (see (1.41)). The uncoded input (random) bits will be denoted as (U0 , . . . , UN−1 ) = U0N−1 , with realizations (u0 , . . . , uN−1 ) = uN−1 which are coded using a linear code to produce the bits (x0 , . . . , xN−1 ) = x0N−1 . 0 These code bits are conveyed through the channel to produce measurements y0 , . . . , yN−1 = yN−1 . The 0 input bits are assumed to be binary, independent, identically distributed with equal probability.

17.3 Channel Polarization, N = 2 Channel Channel polarization constructs out of N independent copies of a B-DMC W (or N independent uses of a single channel) another set of N channels that exhibit a polarization effect, which is that as N becomes large the set of capacities of the synthesized channels tend either toward 0 or 1 for almost all of the synthetic channels. In this section, we consider in detail the case when N = 2. Despite its apparent simplicity, this case uncovers many of the issues that surface for larger values of N. Specifically, we consider the following aspects of the coding problem for the N = 2 case:

• •

Encoding and the encoding matrix (Section 17.3.1).

• • • •

The transition probabilities associated with these channels (Section 17.3.3).

The channels synthesized by the polarization process via channel combining and channel splitting, and the mutual information associated with these channels (Section 17.3.2). An example using BECs (Section 17.3.4). Successive cancellation decoding (Section 17.3.5). Using the polarized channels for improved coded performance (Section 17.3.7).

In later sections these issues will be extended to the case that N > 2. 17.3.1

Encoding

With N = 2 copies of W, the input bits are u0 and u1 . The input bits are encoded into bits x0 and x1 by (see Figure 17.3(a)) x0 = u0 ⊕ u1 (17.2) x1 = u1 ,

17.3 Channel Polarization, N = 2 Channel

𝑢0

x0

𝑢1

x1

W

W

783

y0

𝑢0

y1

𝑢1

W2 (a)

(0)

y0, y1

(1)

y0, y1, u0

W2

W2 W2 (b)

Figure 17.3: First stage of the construction of the polar transform. (a) The basic building block; (b) representation in terms of the channels W2(0) and W2(1) . where ⊕ denotes addition in GF(2). This transformation is invertible, since given x0 and x1 , u0 = x0 ⊕ x1 u1 = x1 . The codeword (x0 , x1 ) is transmitted through the channels W 2 , with channel outputs (y0 , y1 ). This structure, comprising coding and channel transmission, creates the channel W2 ∶ u0 , u1 → y0 , y1 . (The subscript on W2 denotes the synthesized channel.) The encoding operation in (17.2) can be expressed in terms of an 𝔽2 generator matrix [ ] 1 0 G2 = 1 1 [ ] with encoding as 𝐱 = 𝐮G2 , where 𝐮 = u0 u1 is a row vector. For later use, this matrix will also be referred to as F, [ ] 1 0 F= . (17.3) 1 1 17.3.2

Synthetic Channels and Mutual Information

Since X0 and X1 are independent, the mutual information between the joint channel input X0 , X1 and the joint channel output Y0 , Y1 is I(X0 , X1 ; Y0 , Y1 ) = 2I(X0 ; Y0 ) = 2I(W).

(17.4)

Since X0 , X1 is in one-to-one invertible correspondence with U0 , U1 , I(X0 , X1 ; Y0 , Y1 ) = I(U0 , U1 ; Y0 , Y1 ).

(17.5)

Recall that the chain rule for information is (1.40) I(X1 , . . . , Xn ; Y) =

n ∑

I(Xi ; Y|Xi−1 , Xi−2 , . . . , X1 ).

i=1

As applied to (17.5), the chain rule indicates that I(U0 , U1 ; Y0 , Y1 ) = I(U0 ; Y0 , Y1 ) + I(U1 ; Y0 , Y1 |U0 ).

(17.6)

784

Polar Codes Recalling that (see (1.39)) I(U1 ; Y0 , Y1 |U0 ) = H(U1 |U0 ) − H(U1 |Y0 , Y1 , U0 ), since U0 and U1 are independent, the second term of (17.6) can be written as I(U1 ; Y0 , Y1 , U0 ), so that

(17.7) Combining this with (17.4), 2I(W) = I(U0 ; Y0 , Y1 ) + I(U1 ; Y0 , Y1 , U0 ). ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏟ (0)

W2 channel

(17.8)

(1)

W2 channel

The first term on the right represents the mutual information on some channel having input U0 and output (Y0 , Y1 ). The second term on the right represents the mutual information on some channel having input U1 and output (Y0 , Y1 , U0 ). The first of these channels is denoted as W2(0) and the second of these channels is denoted as W2(1) . That is, W2(0) ∶ U0 → (Y0 , Y1 ) W2(1) ∶ U1 → (Y0 , Y1 , U0 ).

(17.9)

This is suggested by Figure 17.3(b). The mutual information for these channels is denoted I(W2(0) ) and I(W2(1) ), where I(W2(0) ) = I(U0 ; Y0 , Y1 ) and I(W2(1) ) = I(U1 ; Y0 , Y1 , U0 ). The channels W2(0) and W21 are “synthetic” channels, produced as a result of the linear operations on the inputs and the actual channels used. These channels are sometimes referred to as bit channels. W2(0) and W2(1) are “split” from the combined channel W 2 , so the perception of these channels from W 2 is called the splitting phase. It should be understood that the two diagrams in Figure 17.3 are simply two representations for exactly the same thing, based on the equivalence in (17.7). Thus, the encoding of the W2(0) and W2(1) channels is performed as in Figure 17.3(a). The result of this encoding can be considered to result in the channels in Figure 17.3(b). In terms of input and output spaces, recall that  = {0, 1} denotes the input space, and  denotes the output space, with W ∶  → . Then from (17.9), W2(0) ∶  →  2 W2(1) ∶  →  2 × . The outputs of the channels are considered to be vector outputs, from  2 in the case of W2(0) , and from  2 ×  in the case of W2(1) . As the number of channels N increases, the size of the vector output increases as well. The mutual information of interest between channel input and channel output is between the scalar input ui ∈  and the vector output of the respective channel. Notation note: Later, when we generalize to more than N = 2 channels, we will talk about the synthetic channels WN(i) , where 0 ≤ i < N. When i is thought of as an integer, it will be enclosed in parentheses. It will also be convenient at times to express i in binary, as in 3 = 0112 . When the upper index is expressed in binary, it is not enclosed in parentheses. We may thus write W8(3) = W8011 . When the binary index is used, the subscript N may not be used, since it can be inferred by the number of bits n in the upper index, and N = 2n . Obviously for N = 2 and i = 0 or i = 1 there is no distinction between the integer representation and the binary notation.

17.3 Channel Polarization, N = 2 Channel

785

The W channel has I(W) = I(U1 ; Y1 ). The W2(1) channel with I(W2(1) ) = I(U1 ; Y0 , Y1 , U0 ) has input U1 and output Y1 and it has additional outputs U0 , Y0 , which cannot decrease the amount of information conveyed by the channel. Thus, (17.10) I(W2(1) ) ≥ I(W). In light of (17.8), it must be the case that I(W2(0) ) ≤ I(W). Combining these together, I(W2(0) ) ≤ I(W) ≤ I(W2(1) ).

(17.11)

This inequality embodies the concept of channel polarization: one of the synthesized bit channels has higher mutual information than the other one. As we will see, this polarization tendency is amplified as the basic encoding building block is recursively applied. The nature of these synthetic channels merits some discussion. The variables y0 and y1 are the measured outputs of real channels. The channel W2(0) ∶ U0 → (Y0 , Y1 ) is a “bonafide” channel: u0 is input to a channel, and observations Y0 = y0 and Y1 = y1 are available from the channel. By contrast, the channel W2(1) ∶ U1 → Y0 , Y1 , U0 is conceptually more problematic, since U0 = u0 is not actually observed at the receiver (being an input to the coding system). The W2(1) channel can by synthesized, however, by first processing the output of W2(0) to obtain an estimate û 0 of u0 . This is used to synthesize the channel W2(1) with outputs y0 , y1 , û 0 . How does the use of û 0 in this synthetic receiver compare with the performance obtained if u0 were actually known in forming the W2(1) channel? To explore this, consider two receivers, a “genie-aided receiver,” that is, a conceptual receiver with more information than is physically possible, and an “implementable” receiver, which only uses received values y0 and y1 . We will show that these two receivers have identical block error performance. The genie-aided receiver works as follows: The receiver first computes an estimate ũ 0 as a function 𝜙0 of the W00 channel outputs (for some appropriate decoder function 𝜙0 ): ũ 0 = 𝜙0 (y0 , y1 ). Following this, the genie provides the true value u0 , and the genie-aided receiver then computes an estimate ũ 1 as a function of the W21 channel outputs: ũ 1 = 𝜙1 (y0 , y1 , u0 ). In determining the probability of error, the detected symbol ũ 0 and the genie-aided detected symbol ũ 1 are used. The implementable receiver, on the other hand, first estimates u0 as û 0 = 𝜙0 (y0 , y1 ). The implementable receiver then estimates u1 by û 1 = 𝜙1 (y0 , y1 , û 0 ). The symbols û 0 and û 1 are used to determine the probability of error. The probability of decoding a block correctly is the probability that P((̃u0 , ũ 1 ) = (u0 , u1 )) in the genie-aided case and P((̂u0 , û 1 ) = (u0 , u1 )) in the implementable case. Since the implementable receiver uses only an estimate of u0 , it would seem the genie-aided receiver should exhibit better performance. But as we show, they exhibit the same probability of block decoder error.



Suppose the genie-aided receiver correctly detects u0 (so ũ 0 = u0 ). Then the probability of a block decoder error P((̃u0 , ũ 1 ) ≠ (u0 , u1 )) is the probability P(̃u1 ≠ u1 ). In the case that the implementable receiver correctly detects u0 , so û 0 = u0 , the probability of the block error for the implementable receiver is the probability P(̂u1 ≠ u1 ).

786

Polar Codes



If the genie-aided receiver incorrectly estimates u0 , so ũ 0 ≠ u0 , then a block error is already committed, so P((̃u0 , ũ 1 ) ≠ (u0 , u1 )) = P(̃u0 ≠ u0 ), the probability of this incorrect estimate. The same holds for the implementable receiver.

Thus, the synthesized channels W2(0) and W2(1) can be regarded as true channels for the purposes of analysis. For implementation, of course, the implementable receivers are used. 17.3.3

Synthetic Channel Transition Probabilities

The synthetic channel transition probabilities are the transition probabilities for the synthetic channels W2(0) and W2(1) , or, more generally, for the channels WN(i) , 0 ≤ i < N. Understanding these probabilities will help explain the channel polarization, and will be used in the likelihood functions used for decoding. Collectively the pair of channels (W2(0) , W2(1) ) is denoted as W2 (y0 , y1 |u0 , u1 ). The transition probability for this joint channel can be written as W2 (y0 , y1 |u0 , u1 ) = W 2 (y0 , y1 |(u0 , u1 )G2 ) = W 2 (y0 , y1 |(u0 ⊕ u1 , u1 )) = W(y0 |u0 ⊕ u1 )W(y1 |u1 ),

(17.12)

where W(y|x) denotes the transition probability for the channel W, and where the last equality follows because the channel W is memoryless. The synthetic channel probabilities are W2(0) (y0 , y1 |u0 ) and W2(1) (y0 , y1 , u0 |u1 ). These transition probabilities are computed using the laws of probability (assuming that the bits are equally likely to be 0 or 1) as W2(0) (y0 , y1 |u0 ) =

∑ u1 ∈

W2 (y0 , y1 |u0 , u1 )P(u1 ) =

1 2

∑ u1 ∈

W2 (y0 , y1 |u0 , u1 ) (17.13)

W2(1) (y0 , y1 , u0 |u1 ) = W2 (y0 , y1 |u0 , u1 )P(u0 ) = 12 W2 (y0 , y1 |u0 , u1 ), which can be written using (17.12) as W2(0) (y0 , y1 |u0 ) =

1 2

∑ u1 ∈

W(y0 |u0 ⊕ u1 )W(y1 |u1 ) (17.14)

W2(1) (y0 , y1 , u0 |u1 ) = 12 W(y0 |u0 ⊕ u1 )W(y1 |u1 ). Abstractly, a pair of channels (W, W) has been transformed by (17.14) to a pair of new channels (W2(0) , W2(1) ). This transformation is denoted as (W, W) → (W2(0) , W2(1) ). 17.3.4

(17.15)

An Example with N = 𝟐 Using the Binary Erasure Channel

The BEC is simple enough that the concepts introduced above can be explicitly computed. The results for the BEC will form a framework for more general results for other channels. Consider the case when W is the BEC with erasure probability p, that is, W is BEC(p), so that W ∶  →  can be described as

17.3 Channel Polarization, N = 2 Channel

787

where ? denotes an erasure. This channel has capacity I(W) = 1 − p. The synthetic channels W2(0) and W2(1) are formed from this erasure channel. The W2(0) channel with input u0 has output ⎧(u0 ⊕ u1 , u1 ) ⎪ ⎪(?, u1 ) (y0 , y1 ) = ⎨ ⎪(u0 ⊕ u1 , ?) ⎪ ⎩(?, ?)

w.p. (1 − p)2

(neither channel erases) ✓(case 1)

w.p. p(1 − p) (the y0 channel erases)

✘(case 2)

w.p. (1 − p)p (the y1 channel erases)

✘(case 3)

w.p. p2

✘(case 4).

(both channels erase)

A check ✓ denotes a successful transmission from which both u0 and u1 can be recovered (a successful block transmission) and a ✘ denotes an erasure.

• • • •

In case 1, from y1 the input u1 is available, from which u0 can be extracted from u0 = y0 ⊕ y1 . In case 2, u0 cannot be recovered — the information is erased. In case 3, neither u0 nor u1 can be recovered — the information is erased. And, obviously, in case 4 the information is erased.

Collectively across the outputs (y0 , y1 ) there is no erasure with probability (1 − p)2 and there is an erasure of some sort with probability 1 − (1 − p)2 = 2p − p2 . We denote the probability of erasure for this channel W2(0) (with the block output) as p2(0) = 2p − p2 . That is, W2(0) is a BEC(p2(0) ) channel. Let g(p) denote this function: p2(0) = g(p) = 2p − p2 . (17.16) For p ∈ (0, 1) (specifically, p ≠ 0 or 1), 2p − p2 > p. This confirms that I(W2(0) ) < I(W). Considering now the synthetic channel W2(1) with input u1 , the possible outputs at the genie-aided receiver are ⎧(u ⊕ u , u , u ) 1 1 0 ⎪ 0 ⎪ ⎪(?, u1 , u0 ) (y0 , y1 , u0 ) = ⎨ ⎪(u0 ⊕ u1 , ?, u0 ) ⎪ ⎪(?, ?, u0 ) ⎩

w. p. (1 − p)2 (neither channel erases) ✓(case 1) w.p. p(1 − p) (y0 channel erases)

✓(case 2)

w.p. (1 − p)p (y1 channel erases)

✓(case 3)

w.p. p2

✘(case 4).

(both channels erase)

Because the genie-aided receiver provides u0 , it is possible to recover (u0 , u1 ) in all cases except case 4. The probability of an erasure on the block is thus p2 . Thus, W2(1) is a BEC channel with probability of erasure p2(1) = p2 . Let h(p) denote the function p2(1) = h(p) = p2 .

(17.17)

Since p2(1) = p2 ≤ p (see Figure 17.4) and I(W2(1) ) = 1 − p2(1) , this confirms the inequality in (17.10) that I(W2(1) ) > I(W). The inequality is strict unless p = 0 or 1, so that in all interesting channels, the synthetic channel W2(1) improves performance compared to the channel W. 17.3.5

Successive Cancellation Decoding

The first decoding algorithm proposed for polar codes is called successive cancellation (SC). The basic idea is that bits are decoded in a particular order and previously estimated bits are used in the likelihood computation of new bits.

788

Polar Codes 1 0.9 g(p) h(p)

0.8 g(p), h(p)

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.2

0.4

p

0.6

0.8

1

Figure 17.4: g(p) and h(p) for p ∈ [0, 1], showing that g(p) > h(p).

The basic computations of bit likelihoods are similar to those used for decoding LDPC codes using log-likelihood ratios (LLRs), as discussed in Section 15.4.1. However, to make a stand-alone presentation and to put the notation in the context of this chapter, the concepts are reviewed here. The decoding is based on likelihood ratios for the different channels. Consider first the channel W2(0) , with transition probabilities W2(0) (y0 , y1 |u0 ). Let 𝜆2(0) (y0 , y1 ) =

W2(0) (y0 , y1 |u0 = 0)

(17.18)

W2(0) (y0 , y1 |u0 = 1)

denote a likelihood ratio for the bit u0 as a function of the received values (y0 , y1 ). It will be convenient to deal with a log likelihood ratio L2(0) (y0 , y1 ) = log 𝜆(0) (y0 , y1 ) = log 2

W2(0) (y0 , y1 |u0 = 0) W2(0) (y0 , y1 |u0 = 1)

.

(17.19)

Then the decision about u0 is formulated according to û 0 =

{ 0 1

L2(0) (y0 , y1 ) ≥ 0 △ = 𝜙0 (y0 , y1 ). L2(0) (y0 , y1 ) < 0

Note that, since W2(0) is considered to have output (y0 , y1 ), this decision is based on both channel observations, not just the channel output y0 . Computing the log-likelihood ratio is discussed below. For the W2(1) channel, the log-likelihood ratio is L2(1) (y0 , y1 , u0 ) = log

W2(1) (y0 , y1 , u0 |u1 = 0) W2(1) (y0 , y1 , u0 |u1 = 1)

.

(17.20)

Note that this likelihood ratio depends on u0 . In successive cancellation decoding, the u0 that is used is the û 0 obtained from the likelihood ratio in the first step. As N becomes large, a reliable decision for u0 is highly probable, so it is reasonable to use it in following computations. The decision about u1 is formulated as { 0 L2(1) (y0 , y1 , û 0 ) ≥ 0 △ û 1 = = 𝜙1 (y0 , y1 , û 0 ). 1 L2(1) (y0 , y1 , û 1 ) < 0

17.3 Channel Polarization, N = 2 Channel 17.3.5.1

789

Log-Likelihood Ratio Computations

We now consider how to compute the LLRs in (17.19) and (17.20). The conditioning can be turned around using Bayes rule and the assumed equiprobable nature of the bits. From (17.18), write 𝜆(0) (y0 , y1 ) = 2 =

W2(0) (y0 , y1 |u0 = 0) W2(0) (y0 , y1 |u0

=

= 1)

P(u0 = 0|y0 , y1 )P(y0 , y1 )∕P(u0 = 0) P(u0 = 1|y0 , y1 )P(y0 , y1 )∕P(u0 = 1)

P(u0 = 0|y0 , y1 ) . P(u0 = 1|y0 , y1 )

Consider the basic building block shown in Figure 17.3(a). From this encoding structure, reading left to right, x0 = u0 ⊕ u1 x1 = u1 . These equations can be turned around as u0 = x0 ⊕ x1 u1 = u0 ⊕ x0 . We first obtain the likelihood of u0 and corresponding estimate û 0 based on channel-corrupted measurements of x0 and x1 . Assuming that the input bits are equiprobable, the log-likelihood ratio for xi conditioned on its channel observation yi is log

P(xi = 0|yi ) W(yi |xi = 0) △ = Lyi , = log P(xi = 1|yi ) W(yi |xi = 1)

(17.21)

and turning that around, P(xi = 0|yi ) =

eLyi 1 + eLyi

P(xi = 1|yi ) =

1 1 + eLyi

(17.22)

Since u0 = x0 ⊕ x1 and the channel is memoryless, the probability P(u0 = 0|y0 , y1 ) is P(u0 = 0|y0 , y1 ) = P(x0 ⊕ x1 = 0|y0 , y1 ) = P(x0 = 0|y0 , y1 )P(x1 = 0|y0 , y1 ) + P(x0 = 1|y0 , y1 )P(x1 = 1|y0 , y1 ). Using the probabilities in terms of the LLRs in (17.22), P(u0 = 0|y0 , y1 ) = =

eLy0 eLy1 Ly

(1 + e

0

Ly

)(1 + e

1

+ 1

)

1 + eLy0 +Ly1 (1 + eLy0 )(1 + eLy1 )

Ly

(1 + e

0

)(1 + eLy1 )

.

Similarly, P(u0 = 1|y0 , y1 ) =

eLy0 + eLy1 (1 + eLy0 )(1 + eLy1 )

.

The log-likelihood ratio for u0 is thus L2(0) (y0 , y1 ) = log

P(u0 = 0|y0 , y1 ) 1 + eLy0 +Ly1 . = log L P(u0 = 1|y0 , y1 ) e y0 + eLy1

(17.23)

790

Polar Codes Readers familiar with LDPC codes will recognize that this is the computation that takes place at a check node in the LDPC decoding algorithm. This computation is important enough that it goes by several notations in the literature. It is sometimes called the chk function, and it may be computed using the tanh function. Since this computes the likelihood on the “upper” branch of the basic encoding operation, this will also be referred to as the “upper” decoder. Symbolically, since it does the decoding on the branch that involves a sum x0 = u0 ⊕ u1 , we will also denote the computation as s(Ly0 , Ly1 ). Another notation used is ⊞ . Collecting all these synonyms, we have L2(0) (y0 , y1 ) = log

P(u0 = 0|y0 , y1 ) 1 + eLy0 +Ly1 = chk(Ly0 , Ly1 ) = log L P(u0 = 1|y0 , y1 ) e y0 + eLy1 (17.24)

= 2 tanh−1 (tanh(Ly0 ∕2) tanh(Ly1 ∕2)) = s(Ly0 , Ly1 ) = Ly0 ⊞ Ly1 .

The computation is suggested by Figure 17.5(a). The brown lines indicate the path over which the information into the sum s flows, moving from right to left. Once L2(0) (y0 , y1 ) is computed, an estimate of the bit u0 is obtained as { 0 L2(0) (y0 , y1 ) ≥ 0 û 0 = 1 L2(0) (y0 , y1 ) < 0. With û 0 on hand (and assumed to be correct) there are two equations related to u1 , u1 = x1 u1 = û 0 ⊕ x0 , observed in channel outputs y1 and y0 , respectively. Then △

L2(1) (y0 , y1 |̂u0 ) = log

P(u1 = 0|y0 , y1 , û 0 ) P(y0 , y1 |u1 = 0, û 0 ) = log P(u1 = 1|y0 , y1 , û 0 ) P(y0 , y1 |u1 = 1, û 0 )

(assuming equal priors) = log

P(y0 |u1 = 0, û 0 )P(y1 |u1 = 0, û 0 ) P(y0 |u1 = 1, û 0 )P(y1 |u1 = 1, û 0 )

(memoryless channels) = log

P(̂u0 ⊕ x0 = 0|y0 ) P(x1 = 0|y1 ) + log P(̂u0 ⊕ x0 = 1|y0 ) P(x1 = 1|y1 )

(assuming equal priors). Direction of computation s(L y0, L y1)

(a)

Direction of computation

L y0

𝑢ˆ 0

L y1

t(L y0, L y1 | 𝑢ˆ 0

L y0

(b)

L y1

Figure 17.5: Successive cancellation on N = 2 channels. (a) Upper log-likelihood ratio computation; (b) lower log-likelihood ratio computation. Color versions of this plot are available at wileywebpage or https://gitghub/tkmoon/eccbook/colorpages

17.3 Channel Polarization, N = 2 Channel

791

The second term is recognized as Ly1 from (17.21). The first term is { P(̂u0 ⊕ x0 = 0|y0 ) if û 0 = 0 Ly0 log = P(̂u0 ⊕ x0 = 1|y0 ) −Ly0 if û 0 = 1, so that L2(1) (y0 , y1 |̂u0 )

=

{ Ly1 + Ly0

if û 0 = 0

Ly1 − Ly0

if û 0 = 1.

This will be denoted variously as û 0

L2(1) (y0 , y1 |̂u0 ) = Ly1 ± Ly0 = Ly1 + Ly0 (1 − 2̂u0 ).

(17.25)

As a sum, this is analogous to the computation that takes place at the variable node of a log-likelihood-based LDPC decoder. Let t(Ly0 , Ly1 |̂u0 ) = L2(1) be the log-likelihood ratio in (17.25). (The t may be suggested by the fact that u1 goes straight through to x1 .) Since L2(1) is on the lower branch, this is also referred to as the lower decoding. Thus, the lower computation computes t(Ly0 , Ly1 |̂u0 ) = L2(1) . The computation is portrayed in Figure 17.5(b). The brown lines indicate the path over which the channel likelihoods Ly0 and Ly1 are conveyed. From the upper computation, there is an estimate of the bit û 0 , portrayed with the green line. Then the likelihood produces the lower (t) output, conveyed on the cyan line. While visually this does introduce some confusion, because the lower line carries both likelihood information Ly1 as an input (brown) and the lower computation t as an output (cyan), the computation that is actually done is very simple. 17.3.5.2

Computing the Log-Likelihood Function with Lower Complexity

The chk (or ⊞ or tanh-rule) computation of (17.24) involves either tanh or logarithmic/exponential computations. In implementation, it is desirable to reduce the computational complexity. One way to do this is using an approximation: x ⊞y ≈ sign(x)sign(y)min(|x|, |y|).

(17.26)

The approximation is illustrated in Figure 17.6. The approximation is seen to be a piecewise linear/constant approximation to the decoding function. Using this approximation has a negligible effect on the code performance (see [392]).

5 Ly = 1 1

chk(Ly0, Ly1)

Ly = 3 1 Ly = 5 1

0

−5 −10

−5

0 Ly0

5

10

Figure 17.6: Comparison of tanh rule (solid lines) with (17.26) (dashed lines).

792

Polar Codes 17.3.6

Tree Representation

The encoder for N = 2, shown in Figure 17.3(a) and repeated in Figure 17.7(a) can be represented as a tree. The inputs u0 and u1 are associated with nodes, indicated in black, and the adder and its associated parallel edge are combined into a single block, as shown in Figure 17.7(b). This block and its two connections (y0 and y1 ) are represented with a single node, indicated in black in Figure 17.7(c). The encoder is thus represented as a binary tree, whose leaf nodes correspond to the bits u0 and u1 and whose root node incorporates all the inputs or, more precisely, all the log-likelihood functions Lyi . This tree is shown rotated with the root node at the top in Figure 17.7(d). In the decoding algorithm, each 𝑣 node recursively performs a computation appropriate to its position in the tree and the sequence in the algorithm and sends a message to its children, first the left node then the right. Then, upon receiving messages from its children, it sends a message to its parent node. For the N = 2 decoder, the sequence of steps is as follows:



The root node, 𝑣 in Figure 17.7(d), computes Ly0 ⊞ Ly1 and sends this as the message to its left child: 𝛼𝑣l = Ly0 ⊞ Ly1 .



The left node 𝑣l is in this case a leaf node. The computation at a leaf node is to determine the bit value based on the incoming message 𝛼𝑣l . For a nonfrozen bit, this is done by quantizing the incoming message, that is, converting it to a bit value, { 0 𝛼𝑣l ≥ 0 û 0 = 1 𝛼𝑣l < 0. This estimated bit value is the message sent to the parent of 𝑣l : 𝛽𝑣l = û 0 .



The root node performs another computation, which is the likelihood in (17.25) and sends this to its right child: 𝛼𝑣r = Ly1 + Ly0 (1 − 2𝛽𝑣l ).



The right node 𝑣r is again a leaf node, which quantizes its incoming message 𝛼𝑣r .

For N = 2, this completes the decoding. From this perspective, the SC is seen to be a depth-first traversal of the tree representing the encoding operation, with likelihood messages sent leafward and bit messages sent rootward in the traversal. The encoding diagram shows that there is an asymmetry, so that 1 bit is decoded before another in SC. By contrast, the abstraction of the tree representation hides the asymmetry, since all leaf nodes appear equivalent in the tree, but this asymmetry must be taken into account in the order in which the bits are decoded. v 𝑢0

y0

𝑢1

y1 (a)

𝛼v l

𝛼v r 𝛽vl vr

vl (b)

(c)

(d)

Figure 17.7: Tree representation of the encoding. (a) Encoder N = 2; (b) converting to a tree; (c) converted to a tree; (d) rotated with the root node at the top.

17.4 Channel Polarization, N > 2 Channels 17.3.7

793

The Polar Coding Idea

Continuing the case when N = 2, we examine now the idea of polar coding. With this small value of N the details will not yet be clear, but the main idea can now be conveyed, with details to be fleshed out later. Based on the inequalities in (17.11), of the two synthetic channels W2(0) and W2(1) , the W2(1) channel, which carries information about u1 , is superior. A simple-minded rate 1/2 coding scheme can be devised as follows: Set u0 to some arbitrary value u0 known to both transmitter and receiver. This bit is said to be frozen. Then encode the single bit u according to [ ] [ ] x0 x1 = u0 u G2 and transmit this two-component vector over the channel. Since this sends u through the more reliable W2(1) channel, this transmits u more reliably than it would have been without coding, albeit at the cost of a rate 1/2 code. At the decoder, the value of the frozen bits is known a priori, so it is not necessary to threshold the log-likelihood functions to decode them. At this point, this does not seem like a reliable way to encode information. But we will see that as N increases, a fraction of the synthetic channels become increasingly reliable. Using only these reliable channels allows codewords, each of whose elements are transmitted using reliable channels, to be reliably transmitted. Here, with N = 2, the only possible code rates are 1 (use both bits), 12 (freeze 1 bit), and 0 (freeze both bits). But as we will see, as N increases, it becomes possible to finely tune the rate of the code. The ability to tune the rate is another advantage of polar codes compared to many other block codes. In order to exploit the polar coding idea, it will be necessary to identify which of the N synthetic channels have high capacity. This is the code design problem, which is discussed in Section 17.6.

17.4 Channel Polarization, N > 2 Channels We now show how to go from 2 to 4 channels, then describe the general construction to go from N∕2 to N channels. 17.4.1

Channel Combining: Extension from N ∕ 2 to N channels.

The next level of recursion is obtained by making two copies of the W2 channel to create the channel W4 ∶  4 →  4 .



Start with two copies of W2 , with inputs now called 𝑣0 through 𝑣3 and outputs y0 through y3 , as shown in Figure 17.8(a).



Combine the two W2(0) channels (the inputs to the ⊕ operators of the W2 s) using inputs 𝑣0 and 𝑣2 to create the “0” channel and the “1” channel of W2(0) , as shown in Figure 17.8(b). This gives channels which can be denoted as (W20 )0 = W400 = W4(0)

and

(W20 )1 = W401 = W4(1) .

The W4(0) channel acts according to W4(0) ∶ u0 → y0 , y1 , y2 , y3

(17.27)

and the W4(1) channel acts according to W4(1) ∶ u1 → y0 , y1 , y2 , y3 , u0 .

(17.28)

794

Polar Codes



Combine the two W2(1) channels (the inputs of the W2 s that don’t have ⊕ operators) with inputs 𝑣1 and 𝑣3 to create the “0” and “1” channels of W2(1) , as shown in Figure 17.8(c). This gives channels which can be denoted as (W21 )0 = W410 = W4(2)

and

(W21 )1 = W411 = W4(3) .

The W4(2) and W4(3) channels act according to W4(2) ∶ u2 → y0 , y1 , 𝑣0 , y2 , y3 , 𝑣2

(17.29)

W4(3) ∶ u3 → y0 , y1 , 𝑣0 , y2 , y3 , 𝑣2 , u2 .

(17.30)

The pair (𝑣0 , 𝑣2 ) is informationally equivalent to knowing (u0 , u1 ). Making this substitution in (17.30) and reordering the sequence of outputs, we have W4(2) ∶ u2 → y0 , y1 , y2 , y3 , u0 , u1 W4(3) ∶ u3 → y0 , y1 , y2 , y3 , u0 , u1 , u2 . This completes the construction of the W4 channel, as indicated in Figure 17.8(c). υ0

x0

υ1

x1

u0

y0

W

y1

W

υ0

x0

υ1

x1

W2 υ2

x2

υ3

x3

y0

W

y1

W

W2 u1

y2

W

y3

W

υ2

x2

υ3

x3

y2

W

y3

W

W2 (b)

W2 (a)

u0

υ0

x0

u2

υ1

x1

W

W

y0

u0

s0

υ0

x0

y1

u1

s1

υ1

x1

υ2

x2

u3

υ3

x3 W2

W

y0 y1

W2

W2

u1

W

W

W

y2

u2

s2

υ2

x2

y3

u3

s3

υ3

x3

R4

W

W

y2 y3

W2

W4

W4

(c)

(d)

Figure 17.8: Repeating the basic construction by duplicating W2 to form W4 . (a) Duplication of W2 ; (b) creation of W4(0) and W4(1) ; (c) creation of W4(2) and W4(3) ; (d) reordering the inputs of W4 .

17.4 Channel Polarization, N > 2 Channels

795

There is a natural representation of the exponents in the binary and integer representation, for example, (W20 )0 = W400 = W4(0) . The least significant bit (on the right) indicates the most recent synthetic channel created. As drawn in Figure 17.8(c), the inputs are in the order u0 , u2 , u1 , u3 . These can be redrawn in order as shown in Figure 17.8(d). In this case, there is a permutation block R4 which reorders the inputs. Denoting the inputs to the permutation as s0 , . . . , s3 and the outputs as 𝑣0 , . . . , 𝑣3 , the permutation is s0 ↔ 𝑣0

s1 ↔ 𝑣2

s2 ↔ 𝑣1

s3 ↔ 𝑣3 .

(17.31)

Another representation of the encoder in Figure 17.9 has all the inputs in order, but interchanges the initial ⊕s and R4 to place all the ⊕s together. This involves another permutation block B4 which turns out in the general case to be a bit-reverse permutation. The encoding operation can be written in terms of a generator matrix. This will be developed in two steps. First, consider the vector of values after the bit reverse permutation. Then ⎡1 ⎢1 (x0 , x1 , x2 , x3 ) = (u0 , u2 , u1 , u3 ) ⎢ ⎢1 ⎢ ⎣1

0 0 0⎤ 1 0 0⎥ ⎥. 0 1 0⎥ ⎥ 1 1 1⎦

This matrix can be written as the Kronecker product of the F matrix of (17.3) with itself, denoted F ⊗2 . To put the ui values in the order there is a permutation matrix, B4 . The overall encoding operation is ⎡1 ⎢0 (x0 , x1 , x2 , x3 ) = (u0 , u1 , u2 , u3 ) ⎢ ⎢0 ⎢ ⎣0 ⎡1 ⎢1 = (u0 , u1 , u2 , u3 ) ⎢ ⎢1 ⎢ ⎣1

u0

u2 u3

Bit reverse permutation B4

u1

0 0 0⎤ ⎡1 0 1 0⎥ ⎢1 ⎥⎢ 1 0 0⎥ ⎢1 ⎥⎢ 0 0 1⎦ ⎣1

0 0 0⎤ 1 0 0⎥ ⎥ = (u0 , u1 , u2 , u3 )(B4 F ⊗2 ) 0 1 0⎥ ⎥ 1 1 1⎦

0 0 0⎤ 0 1 0⎥ △ ⎥ =(u , u , u , u )G . 0 1 2 3 4 1 0 0⎥ ⎥ 1 1 1⎦

u0

υ0

x0

u2

υ1

x1

W W

y0 y1

W2 u1

υ2

x2

u3

υ3

x3

B4

W W

y2 y3

W2 W4

Figure 17.9: Another representation of W4 .

(17.32)

796

Polar Codes Generalizing now to arbitrary N = 2n , the recursion to go from WN∕2 to WN is shown in Figure 17.10(a). Two copies of the channel WN∕2 are used. Incoming bits are permuted by a permuter RN . The input bits are added pairwise to form s2i = u2i ⊕ u2i+1 . The odd-indexed values are s2i+1 = u2i+1 . to produce 𝑣N−1 = (s0 , s2 , . . . , sN−2 , s1 , s3 , . . . , sN−1 ). The The permutation operator RN acts on sN−1 0 0 encoding can also be written as shown in Figure 17.10(b). Based on the operations described in this figure, the encoder matrix is GN = RN (F ⊗ IN∕2 )(I2 ⊗ GN∕2 ). By the properties of the Kronecker product (see Section 8.2.1), GN = RN (F ⊗ GN∕2 ). Proceeding recursively using GN∕2 = RN∕2 (F ⊗ GN∕4 ), GN = RN (F ⊗ (RN∕2 (F ⊗ GN∕4 )) = RN (I2 ⊗ RN∕2 )(F ⊗2 ⊗ GN∕4 ), where the latter follows from the Kronecker product theorem (AC) ⊗ (BD) = (A ⊗ B)(C ⊗ D) with A = I2 , C = F, B = RN∕2 and D = F ⊗ GN∕4 . Continuing, we find GN = BN F ⊗n ,

(17.33)

where F ⊗n is the n-fold Kronecker product of F with itself, defined recursively as F ⊗1 = F and ] [ ⊗n−1 0 F (17.34) F ⊗n = F ⊗n−1 F ⊗n−1 and BN = RN (I2 ⊗ BN∕2 ).

(17.35)

For example, the B8 permutation matrix performs the following Permutations: u0 ↔ u0

u1 ↔ u4

u2 ↔ u2

u3 ↔ u6

u4 ↔ u1

u5 ↔ u5

u6 ↔ u3

u7 ↔ u7 .

(17.36)

This permutation can be understood as follows. For an index j in uj , express the index as an n-bit binary number. Then the permuted index is obtained from writing the binary number in bit-reversed order. Writing (17.36) in binary form, u000 ↔ u000

u001 ↔ u100

u010 ↔ u010

u011 ↔ u110

u100 ↔ u001

u101 ↔ u101

u110 ↔ u011

u111 ↔ u111 .

In Appendix 17.A it is shown that, in general, BN is the bit-reversal permutation matrix and furthermore that BN F ⊗n = F ⊗n BN . Summarizing, the encoding operation can be expressed as )BN F ⊗n , x0N−1 = (uN−1 0 where BN is an N-bit bit-reverse permutation and F ⊗n is the n-fold Kronecker product defined by F ⊗n = F ⊗ F ⊗n−1

with F ⊗1 = F.

As is apparent from the structure shown in Figure 17.11, the encoder simply pushes data from left to right across the graph: there are log2 N stages, each of which has N∕2 adders in it, so the overall encoding complexity is O(N log2 N). The presence of the bit-reverse permutation matrix BN obviously does not affect the distance properties of the code. Some implementations do not include it. When the bit-reverse permutation is not included, the encoder is said to be in natural order.

17.4 Channel Polarization, N > 2 Channels

797 F ⊗ IN/2

RN

I2 ⊗ GN/2

u0

s0

υ0

y0

u0

u0

υ0

y0

u1

s1

υ1

y1

u1

u3

υ1

y1

uN / 2 – 2

sN/ 2 – 2

υN/2 – 2

yN/2–2

uN/ 2–2

uN/ 2–4

υN/2–2

uN / 2 – 1

sN/ 2 – 1

υN/2 – 1

yN/2–1

uN/ 2–1

uN/ 2–2

υN/2–1

yN/ 2–1

uN / 2

sN/ 2

yN/ 2

uN/ 2

u1

yN/ 2

u3

υN/ 2 υN/ 2 +1

yN/ 2

uN/ 2 +1

uN–2

uN–3

υN–2

uN–1

uN–1

υN–1

RN

uN / 2 +1

sN/ 2 +1

υN/2 υN/2 +1

uN – 2

sN–2

υN– 2

uN – 1

sN–1

υN– 1

WN/2

+1

RN

WN/2 yN–2 yN–1

WN

WN

(a)

(b)

WN/2 yN/ 2–2

yN/ 2 +1 WN/2

yN–2 yN–1

Figure 17.10: General recursion to go from two WN∕2 channels to WN . (a) Basic form of general recursion to go from two WN∕2 channels to WN ; (b) alternative form for the recursion.

17.4.2

Pseudocode for Encoder for Arbitrary N

α α2 α3 α4 α5 α6 α7 α8 α9 +

Algorithm 17.1 shows pseudocode for the O(N log2 N)-complexity encoder described above. It takes into account which bits are frozen (as discussed below) and performs the bit-reverse permutation.

01010

00000

α10 α11 α12 α13 α14 α15 α16

Algorithm 17.1 Polar Code Encoding

1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Function PolarEncode(u) % Encoding a binary message vector u of length K Input: u binary message vector of length K Merge the information bits uK0 into a message word ũ N−1 , setting the frozen bits to 0, in 0 bit-reversed order F ⊗n in place: % Compute ũ N−1 0 skipr = 1 % offset of RHS of adder equation, and number of addergroups stepl = 2 % left index steps ninaddergroup = N∕2 % number of adders in group in current level lindexstart = 0 % starting index of addergroup for n1 = 0, 1, … , n − 1 do for each level for i2 = 0, 1, … , skipr − 1 do lindex = lindexstart for i1 = 0, 1, … , ninaddergroup − 1 do ũ [lindex] = ũ [lindex] ⊕ ũ [lindx + skipr] lindx+ = stepl end lindexstart++ end skipr = 2 ⋅ skipr stepl = 2 ⋅ stepl ninaddergruop = ninaddergroup∕2 lindexstart = 0 end

testpolar.cc polarcode.cc: encode( )

798

Polar Codes u0

u0

u1

u4

u2

u2

u3

u6

u4

B8 u 1

u5

u5

u6

u3

u7

u7

W W W W W W W W

y0 y1 y2 y3 y4 y5 y6 y7

Figure 17.11: The channel W8 . Figure 17.11 shows the channel combining for N = 8. 17.4.3

Transition Probabilities and Channel Splitting

The transition probabilities for the joint channels are expressed in terms of the iterated channel usage as WN (yN−1 |uN−1 ) = W N (yN−1 |uN−1 GN ). 0 0 0 0 In the case when N = 4, referring to Figure 17.9, since the two W2 channels are independent, the joint transition probability for the W4 channel is W4 (y30 |u30 ) = W2 (y0 , y1 |u0 ⊕ u1 , u2 ⊕ u3 )W2 (y2 , y3 |u1 , u3 ).

(17.37)

The conditioning argument in the first W2 can be stacked up as (u0 ⊕ u1 , u2 ⊕ u3 ) = (u0 , u2 ) ⊕ (u1 , u3 ). On the right-hand side of this equation, since the indices in the first vector are even and the indices in the second vector are odd, this can be written as (u0 ⊕ u1 , u2 ⊕ u3 ) = u30,even ⊕ u30,odd . Equation (17.37) can thus be written as W4 (y30 |u30 ) = W2 (y10 |u30,even ⊕ u30,odd )W2 (y32 |u30,odd ). In general, following this model, referring to Figure 17.10, the transition probabilities for W2N are W2N (y2N−1 |u2N−1 ) = WN (yN−1 |uN−1 ⊕ uN−1 )WN (y2N−1 |uN−1 ). N 0 0 0 0,even 0,odd 0,odd

(17.38)

17.4 Channel Polarization, N > 2 Channels

799

We develop recursive expressions for the channel conditional probabilities for the synthetic channels (i) W2N in terms of the probabilities for the channels WN(i) . These probabilities are used in decoding, and help explain the nature of polarization. Generalizing (17.13) to N channels, straightforward application of the laws of probability yields, for 0 ≤ i < N, WN(i) (yN−1 , ui−1 |ui ) = 0 0 =

∑ uN−1 ∈ N−1−i i+1



1 2

N−1

WN (yN−1 |uN−1 )P(u¬i ) 0 0 WN (yN−1 |uN−1 ). 0 0

(17.39)

uN−1 ∈ N−1−i i+1

In words, and somewhat more abstractly, this expression says that a normalized sum over bit sequences uN−1 ∈  N−1−i of WN (yN−1 |uN−1 ) gives a closed-form probability expression WN(i) (yN−1 , ui−1 |ui ), where 0 i+1 0 0 0 the probability is conditioned on the first bit ui not summed out, and the summed probability is joint with the bits ui−1 prior to ui . The argument yN−1 (whatever it may be) is not changed in the summation. 0 0 Example 17.2

When N = 4, (17.39) gives: i=0∶

i=1∶

i=2∶ i=3∶

W4(0) (y30 |u0 ) =



1

W4 (y30 |u30 )

4−1

2

W4(0) (y30 , u0 |u1 ) = W4(0) (y30 , u10 |u2 ) = W4(0) (y30 , u20 |u3 ) =

u31 ∈ 3

1



W4 (y30 |u30 )

4−1

2

u32 ∈ 2

1



4−1

2

W4 (y30 |u30 )

u3 ∈

1 W4 (y30 |u30 ). 24−1

. This will be an important aspect of As i increases, the probability is a function jointly on “previous” symbols ui−1 0 successive cancellation decoding. ◽

The recursive construction of W2N provides for recursive computation of the channel transition probabilities, generalizing the relation in (17.14). (2i) (2i+1) Theorem 17.3 The probabilities W2N and W2N can be expressed in terms of the probabilities WNi as (2i) 2N−1 2i−1 (y0 , u0 |u2i ) W2N

=

1 ∑ (i) 2N−1 2i−1 2i−1 W (i) (yN−1 , u2i−1 , u0,odd |u2i+1 ) 0,even ⊕ u0,odd |u2i ⊕ u2i+1 )WN (yN 2 u ∈ N 0

(17.40)

2i+1

(2i+1) 2N−1 2i (y0 , u0 |u2i+1 ) W2N

= for 0 ≤ i < N.

1 (i) N−1 2i−1 |u ⊕ u2i+1 )WN(i) (y2N−1 , u2i−1 |u ) W (y , u0,even ⊕ u2i−1 N 0,odd 2i 0,odd 2i+1 2 N 0

(17.41)

800

Polar Codes Proof For (17.40): (2i) 2N−1 2i−1 W2N (y0 , u0 |u2i ) =



1 2

2N−1

W2N (y2N−1 |u2N−1 ) 0 0

u2N−1 ∈ 2(N−i)−1 2i+1

(basic rules of probability) ∑ ∑ 1 2N−1 2N−1 2N−1 2N−1 WN (yn−1 |u0,odd ) = 2N−1 0 |u0,even ⊕ u0,odd )WN (yN 2 2N−1 2N−1 u u 2i+1,even 2i+1,odd

=

(using (17.38)) 1 ∑ 2u

2i+1 ∈

+



u2N−1



2i+2,odd

WN (y2N−1 |u2N−1 ) N 0,odd

∈ N−1−i

WN (yN−1 |u2N−1 ⊕ u2N−1 ) 0 0,even 0,odd

u2N−1 ∈ N−1−i 2i+2,even

(split the indices of summation) = (∗). ∑ Consider the final sum in (*), u2N−1

2i+2,even

N−1 2N−1 ∈ N−1−i WN (y0 |u0,even

⊕ u2N−1 ): For any fixed value of 0,odd

, as u2N−1 ranges over  N−1−i , the conditioning quantity u2N−1 ⊕ u2N−1 also ranges over u2N−1 0,even 0,even 2i+2,odd 0,odd  N−1−i . Comparing this result with (17.39), we see that the summation equals ∑ 2i−1 WN (yN−1 |u2N−1 ⊕ u2N−1 ) = WN(i) (yN−1 , u2i−1 0,even ⊕ u0,odd |u2i ⊕ u2i+1 ). 0 0,even 0 0,odd u2N−1 ∈ N−1−i 2i+2,even

Similarly, in the middle sum of (*), as u2N−1 ranges over  N−1−i , (17.39) shows that this is equal to 2i+2,odd i 2N−1 2i−1 WN (yN , u0,odd |u2i ). For (17.41): ∑ 1 (2i+1) 2N−1 2i W2N (y0 , u0 |u2i+1 ) = 2N−1 W2N (y2N−1 |u2N−1 ) 0 0 2 2N−1 u ∈ 2(N−i) 2i+2

1 = 2



1

u2N−1 ∈ N−i 2i+2,odd

∑ u2N−1 ∈ N−i 2i+2

N−1

2

WN (y2N−1 |u2N−1 ) N 0,odd

1 WN (yN−1 |u2N−1 ⊕ u2N−1 ). 0 0,even 0,odd 2N−1

Applying (17.39) to the inner and outer summations yields (17.41).

Example 17.4

Setting N = 1 in (17.40) and (17.41) yields W2(0) (y10 |u0 ) =

1 ∑ (0) W (y |u ⊕ u1 )W10 (y1 |u1 ) 2 u ∈ 1 0 0 1

W2(1) (y10 , u0 |u1 ) which is the same as (17.14).

=

1 (0) W (y |u ⊕ u1 )W1(0) (y1 |u1 ), 2 1 0 0



17.4 Channel Polarization, N > 2 Channels

801

When N = 2: i=0∶

W4(0) (y30 |u0 ) =

1 ∑ (0) 1 W (y |u ⊕ u1 )W20 (y32 |u1 ) 2 u ∈ 2 0 0 1

i=1∶

W4(2) (y30 , u10 |u2 ) =

1 ∑ (1) 1 W (y , u ⊕ u1 |u2 ⊕ u3 )W2(1) (y32 , u1 |u3 ) 2 u ∈ 2 0 0 3

i=0∶

W4(1) (y30 , u0 |u1 ) =

1 (0) 1 W (y , |u ⊕ u1 )W2(0) (y32 |u1 ) 2 2 0 0

i=1∶

W4(3) (y30 , u20 |u3 ) =

1 (1) 1 W (y , u ⊕ u1 |u2 ⊕ u3 )W2(1) (y32 , u1 |u3 ). 2 2 0 0



The formulas (17.40) and (17.41) are abstractly represented by the expression (2i) (2i+1) , W2N ), (WN(i) , WN(i) ) → (W2N

which generalizes (17.15). Figure 17.12 illustrates the channel construction process. Starting from a channel W = W1 at level (2i) (2i+1) 0, each node WN(i) in the tree gives birth to two nodes W2N and W2N . In Figure 17.12, the node WN(i) b b . . . b 1 2 n is also portrayed as W , where b1 b2 . . . bn is the binary representation of i, and the length n of the sequence implicitly represents N = 2n . Given a channel at level n at index b1 b2 . . . bn , the tree connects to a channel at level n + 1 at index b1 b2 . . . bn 0 on the upper branch, and to a channel at index b1 b2 . . . bn 1 on the lower branch. The root at level n = 0 is indexed by an empty sequence. The decomposition into split channels WN(i) can also be expressed in terms of information theory, generalizing the result in (17.7). For example, for N = 4, successively applying the chain rule of

W8(0) = W 000 W4(0) = W 00 W8(1) = W 001 W2(0) = W 0 W8(2) = W 010 W4(1) = W 01 W8(3) = W 011

W

W8(4) = W 100 W4(2) = W 10 W8(5) = W 101 W4(1) = W 1 W8(6) = W 110 W4(3) = W 11 W8(7) = W 111

Figure 17.12: The tree of recursively constructed channels.

802

Polar Codes I (U 0 ; Y 30 ) I (U 30 ; Y 30 )

I (U 1; Y 30 , U 0 ) I (U 31; Y 30 , U 0 )

I (U 2; Y 30 , U 0 ) I (U 2; Y 30 , U 10 ) I (U 2; Y 30 , U 20 )

Figure 17.13: Tree decomposition of the information in the W4 channels.

information to I(U03 ; Y03 ) yields I(U03 ; Y03 ) = I(U0 ; Y03 ) + I(U13 ; Y03 |U0 ) = I(U0 ; Y03 ) + I(U1 ; Y03 |U0 ) + I(U23 ; Y03 |U0 , U1 ) = I(U0 ; Y03 ) + I(U1 ; Y03 |U0 ) + I(U2 ; Y03 |U0 , U1 ) + I(U3 ; Y03 |U0 , U1 , U2 ).

(17.42)

Again using the conditional independence among the Ui , this can be written as I(Y03 ; Y03 ) = I(U0 ; Y03 ) + I(Y1 ; Y03 , Y0 ) + I(Y2 ; Y03 , Y0 , U1 ) + I(U3 ; Y03 , U0 , U1 , U2 ). These four terms correspond to the channels W4(0) , W4(1) , W4(2) , and W4(3) . The decomposition (17.42) suggests a tree representation, as shown in Figure 17.13, where the leaf nodes of the tree correspond to the four channels. 17.4.4

Channel Polarization Demonstrated: An Example Using the Binary Erasure Channel for N > 𝟐

In this section, we provide an example analogous to that of Section 17.3.4 for the BEC with N = 4. Recall that in that section the synthetic channels W2(0) and W2(1) could be interpreted as BEC channels with transition probabilities p2(0) and p2(1) , respectively. In this section, we generalize beyond this result to further demonstrate polarization. Let W be a BEC(p) channel. For the channel W4(0) ∶ u0 → y30 , the table below shows different output values, where • indicates that yi has been received as transmitted and ? indicates that a yi is an erasure. The symbol ✓ indicates that the received data are sufficient to decode u30 and ✘ indicates that the received data are insufficient to decode u30 , (y0 , y1 , y2 , y3 ) ⎧ u0 ⊕ u1 ⊕ u2 ⊕ u3 ⎪ • ⎪ ? ⎪ • ⎪ ⎪ • ⎪ • ⎪ ? =⎨ ⎪ ? ⎪ ? ⎪ • ⎪ ⎪ • ⎪ • ⎪ etc. ⎩

u2 ⊕ u3 • • ? • • ? • • ? ? •

u1 ⊕ u3 • • • ? • • ? • ? • ?

u3 • • • • ? • • ? • ? ?

w.p. w.p. w.p. w.p. w.p. w.p. w.p. w.p. w.p. w.p. w.p.

(1 − p)4 p(1 − p)3 (1 − p)p(1 − p)2 (1 − p)2 p(1 − p) (1 − p)3 p p2 (1 − p)2 (1 − p)p(1 − p)p p(1 − p)2 p (1 − p)p2 (1 − p) (1 − p)p(1 − p)p (1 − p)2 p2

✓ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘

17.4 Channel Polarization, N > 2 Channels

803

As this table shows, there is an erasure precluding the possibility of accurate decoding in every case except the first, so the overall erasure probability for the W4(0) channel is p4(0) = 1 − (1 − p)4 = 4p − 6p2 + 4p3 − p4 . This probability can be understood another way. The second polarization W4(0) operation operates on the probabilities produced by the first polarization operation using a W2(0) channel, so, recalling that p2(0) = g(p), where g(p) is defined in (17.16), p4(0) = g(g(p)) = g(2p − p2 ) = 2(2p − p2 ) − (2p − p2 )2 = 4p − 6p2 + 4p3 − p4 . Also consider the various W4(1) channel outputs. (y0 , y1 , y2 , y3 , u0 ) ⎧ u0 ⊕ u1 ⊕ u2 ⊕ u3 ⎪ • ⎪ ? ⎪ ⎪ • ⎪ • ⎪ • ⎪ ⎪ ? ⎪ ? ⎪ ? =⎨ ⎪ • ⎪ • ⎪ • ⎪ ⎪ ? ⎪ ? ⎪ ? ⎪ ⎪ • ⎪ ? ⎩

u2 ⊕ u3 • • ? • • ? • • ? ? • ? ? • ? ?

u1 ⊕ u3 • • • ? • • ? • ? • ? ? • ? ? ?

u3 u0 • • • • • • • • ? • • • • • ? • • • ? • ? • • • ? • ? • ? • ? •

w.p. decodable? (1 − p)4 ✓ ✓ p(1 − p)3 ✓ p(1 − p)3 ✓ p(1 − p)3 ✓ p(1 − p)3 ✓ p2 (1 − p)2 ✓ p2 (1 − p)2 ✘ p2 (1 − p)2 ✘ p2 (1 − p)2 ✘ p2 (1 − p)2 ✘ p2 (1 − p)2 ✘ p3 (1 − p) ✘ p3 (1 − p) ✘ p3 (1 − p) ✘ p3 (1 − p) ✘ p4

(17.43)

The probability of a block erasure is p4(1) = 4p2 (1 − p)2 + 4p3 (1 − p) + p4 = p4 − 4p3 + 4p2 . This can be computed more simply since this is the probability of a W2(1) BEC channel acting on a W2(0) BEC channel. Recalling that p2(1) = h(p), where h(p) is defined in (17.17), p4(1) = h(g(p)) = (2p − p2 )2 = p4 − 4p3 + 4p2 . Forming a table similar to (17.43) for the W4(2) channel, it is determined that there is a block erasure for the individual erasure patterns (?, ?, •, •), (•, •, ?, ?), (?, ?, ?, •), (?, ?, •, ?), (•, ?, ?, ?), (?, ?, ?, ?), which results in a block erasure probability p4(2) = 2p2 (1 − p)2 + 4p3 (1 − p) + p4 = 2p2 − p4 . This can also be computed as p4(2) = g(h(p)) = 2p2 − p4 .

804

Polar Codes For the W4(3) channel it is straightforward to show that p4(3) = h(h(p)) = p4 . These BEC channel probabilities are conveniently represented on a tree, as in Figure 17.14. This duplication pattern can be repeated, duplicating p4(0) to obtain p8(0) and p(1) ; and duplicating p4(1) to 8 obtain p8(2) and p8(3) , and so forth. The BEC erasure probabilities for N = 23 are p8(0) = g(g(g(p))) p8(1) = h(g(g(p))) p8(2) = g(h(g(p))) p8(3) = h(h(g(p))) p8(4) = g(g(h(p))) p8(5) = h(g(h(p))) p8(6) = g(h(h(p))) p8(7) = h(h(h(p))). Figure 17.15 shows the effect of continuing to repeat the polarization construction multiple times. The vertical axis is mutual information, recalling that for a BEC I(W(p)) = 1 − p. As the code length increases, many of the channels approach a mutual information = 1 and become “good” channels. Many of the channels approach a mutual information of 0 and become “bad” channels. There are a few channels whose performance remains mediocre, neither “good” nor “bad.” This effect can be further explored as follows. The following MATLAB function computes g(p) = p2 and h(p) = 2p − p2 for each p in a vector passed in:

p(0) 2 g p h p(1) 2

g

p(0) 4

h

p(1) 4

g

p(2) 4

h

p(3) 4

Figure 17.14: Computing the synthetic BEC erasure probabilities on a tree. 1

Mutual information

0.8

W21

0.7 0.6

W410

0.5 W

W401

0.4 0.3

0

W2

0.2 0.1 0

good channels

W411

0.9

W200 1

2

4

8

16

32

64

Bad channels 128

Code length N

Figure 17.15: Polarization effect for the BEC channel.

17.4 Channel Polarization, N > 2 Channels function pout = polarize(pin) % polarize.m: For each probability in pin, % compute the W ̂ 0 and W ̂ 1 % probabilities for a BEC and return them in pout pout = []; for p = pin % for each p passed in pout = [pout p ̂ 2 2*p - p ̂ 2]; % compute g(p) and h(p) end

For example, when this is called as polarize(.3), the output is 0.09 0.51. The effect of repeated polarization is obtained by repeated function calls, for example, polarize(polarize(0.3))

yields [0.0081 0.1719 0.2601 0.7599]. Repeating this eight times (eight polarization iterations) polarize(polarize(polarize(polarize(polarize(polarize(polarize(polarize (.3)))))))) (0) (255) produces 256 probability outputs. These are plotted in the order p256 to p256 in Figure 17.16(a), and more revealingly in sorted order in Figure 17.16(b). The polarization effect become evident: There are channels for which the synthetic probability p(i) is near 1 (and hence the capacity is near 0 bit), and N channels for which the synthetic probability is near 0 (and hence the capacity is near 1). There are also some mediocre channels. The effect of polarization is further demonstrated in Figure 17.16(c), which plots the sorted probabilities pi 16 for 16 iterations. The channel polarization effect is more complete 2 in this case: the fraction of mediocre channels is smaller (a more abrupt transition from probability 0 to probability 1), and the railed probabilities are closer to 0 and 1. As the number of polarization iterations increases, this effect becomes increasingly strong. How many of these “good” channels are there? Let 𝜖 > 0 be a threshold, and consider the fraction of channels whose probability is less than 𝜖. For example, take 𝜖 = 0.1. The fraction of probabilities less than 𝜖, #(channel probabilities < 𝜖) #(number of channels)

is the fraction of channels that are “good” at the 𝜖 level. For the eight-iteration probabilities of Figure 17.16(b), this fraction is 0.605 for 𝜖 = 0.1. For the 16-iteration probabilities of Figure 17.16(c), this fraction is 0.6807. As the number of polarization iterations increases, this fraction of “good” channel approaches a limit of 0.7. The channel polarization effect demonstrated in these figures indicates how reliable communication can be obtained. Out of all the communication channels available in this iterated scheme, only the best channels are used for communication. The fraction of “good” channels (having probability below some threshold 𝜖) approaches the channel capacity of the channel. For the BEC of this example, I(W) = 1 − p. For p = 0.3, with 16 iterations the fraction good channels at 𝜖-level 0.1 is already 0.6807∕0.7, or 86.5% of capacity. The nature of the BEC channel makes it easy to compute the parameters of the synthesized channels. For other channels (such as BSC or AWGN), computing I(WN ) is more difficult, and analysis will resort to bounds. This is done in Section 17.5. 17.4.5

Polar Coding

As Figure 17.16 suggests, there is a set of channels which has low crossover probability (high capacity) and other channels with high crossover probability (low capacity). In the encoding operation, each input bit ui corresponds to a different channel WN(i) . Thus, in the encoding operation expressed in terms

805

0

50

100 150 200 Channel index (a)

BEC probability

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

BEC probability

Polar Codes

BEC probability

806

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

250

1

300

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

50 100 150 200 250 Channel index (sorted data) (b)

300

2 3 4 5 6 7 × 104 Channel index (sorted data) (c)

Figure 17.16: BEC probability for polarized channels, p = 0.3. (a) Unsorted probabilities, 8 iterations; (b) sorted probabilities, 8 iterations; (c) sorted probabilities, 16 iterations.

of the generator matrix, x0N−1 = uN−1 GN , 0

(17.44)

the i rows of GN correspond to encoding into the channel WN(i) . Let  denote an arbitrary subset of {0, 1, . . . , N − 1}, where the elements of  select the high-capacity channels to be used and let c denote its set complement. Let GN () denote the submatrix of GN formed by the rows with indices in . Also, let u denote the elements of 𝐮 with indices in . The encoding operation (17.44) can be expressed as x0N−1 = u GN () ⊕ uc GN (c ).

(17.45)

Equation (17.45) represents the polar coding operation: The set  indexes the set of channels selected for transmission (i.e., having high capacity). Thus,  is referred to as the information set. The bits in u are the bits that are encoded for transmission. The bits in uc are frozen — not used to transmit information, and known to both the transmitter and the receiver. Since u GN () produces a subgroup of 𝔽2N , adding the offset uc GN (c ) produces a coset of this subgroup. The polar code is therefore said to be a coset code, that is, a linear code offset by a vector.

17.4 Channel Polarization, N > 2 Channels Let K = ||. A polar code (more generally, a coset code) is specified by the parameter list (N, K, , uc ). The rate of the code is R = K∕N. The vector uc ∈  N−K is called the set of frozen bits. Example 17.5 For encoding with the G4 matrix in (17.32), suppose that  = {1, 3} and u = (0, 1). The encoding operation of this (4, 2, {1, 3}, (0, 1)) code is represented with [ ] [ ] 1 0 1 0 1 0 0 0 3 + (0, 1) . x0 = (u1 , u3 ) 1 1 1 1 1 1 0 0 ◽

In the decoding operation, bits which are frozen do not need to be decoded, so û i = ui if i ∈ c . Nevertheless, the likelihood function computations leading to such bits ui are usually performed since these likelihood functions are usually needed for decoding other bits. 17.4.6

Tree Representation

Given the recursive structure of the polar code, it is natural to use a tree to represent it. We demonstrate the general principles for an encoder with N = 8. Figure 17.17(a) shows the encoding structure with the input bits in bit-reverse permuted order, as in Figure 17.11. Bits that are frozen are indicated in gray font; bits that are unfrozen (u5 , u6 , and u7 ) are in black font. The outputs yi are indicated to show the order (even though the channels are not shown). In Figure 17.17(b), the structure is the same, but the inputs have been reordered so that the bits are numbered sequentially, rather than in bit-reverse order. Note that the output sequence is reordered. In Figure 17.17(c), the input bits are represented with colored nodes. The black nodes represent unfrozen bits and the white nodes represent frozen bits. Also, the adders in each stage have been grouped together with boxes. These boxes will represent nodes in the tree. In Figure 17.17(d), the first stage of converting the groups to nodes in the tree is accomplished. The nodes are colored according to the frozen nature of their children. A node whose children are all frozen (white) is a white node. A node whose children are all unfrozen (black) is a black node. A node whose children are partially frozen and partially unfrozen is gray. This replacement of groups with nodes on the tree continues in Figure 17.17(e). As before, parent of a node which is gray is gray. The completed tree is portrayed in Figure 17.17(f). The decoding algorithm passes messages along the edges of the tree. For some edges, the messages passed are vectors (not scalars). In the representations of the tree in Figure 17.17, thickness of the line suggests that the closer to the root, the bigger the vector that is conveyed along the edge. A message along an edge with multiple components is denoted using indexing using, e.g., [i], such as 𝛼𝑣 [i]. Figure 17.18(a) shows the tree rotated by 90∘ to portray the root at the top of the tree, as is conventional. Figure 17.18(b) shows a labeling of a typical node 𝑣. The parent of the node is denoted p𝑣 , its children are 𝑣l and 𝑣r . The message to node 𝑣 from its parent is denoted 𝛼𝑣 ; the message of node 𝑣 to its parent is 𝛽𝑣 . Similarly, messages to and from its left and right children are, respectively, 𝛼𝑣l , 𝛽𝑣l , 𝛼𝑣r , and 𝛽𝑣r , as shown. 17.4.7

Successive Cancellation Decoding

In Section 17.3.5, we met SC for N = 2, based on likelihood ratio and log-likelihood ratio computations. In this section, we extend these concepts to arbitrary N. The log-likelihood function is used in the decoding algorithms. Based on the recursive nature of the channel transition function, this can also be computed recursively. Let , û i−1 |ui = 0) WN(i) (yN−1 (i) N−1 i−1 0 𝜆N (y0 , û 0 ) = (i) 0 (17.46) WN (yN−1 , û i−1 |ui = 1) 0 0

807

808

Polar Codes uˆ 0 uˆ 1 uˆ 2 uˆ 3 uˆ 4 uˆ 5 uˆ 6 uˆ 7

y0 y1 y2 y3 y4 y5 y6 y7

u0 u4 u2 u6 u1 u5 u3 u7

y0 y1 y2 y3 y4 y5 y6 y7

(a)

(b)

(c)

(d)

(e)

(f)

Figure 17.17: A tree representation of the decoder structure. (a) Basic encoding structure; (b) basic encoding structure, sequential bit order; (c) identifying frozen and unfrozen nodes, and the groupings of adders; (d) first stage of converting groupings into nodes, with nodes colored to match colorings of children; (e) next stage of converting groupings into nodes; (f) final stage of converting groupings into nodes.

and let (i) N−1 i−1 LN(i) (yN−1 , û i−1 ̂ 0 ). 0 ) = log 𝜆N (y0 , u 0

This likelihood ratio can be used to estimate the value of the bit ui using the function hi defined as

hi (yN−1 , û i−1 0 ) 0

⎧ ⎪0 =⎨ ⎪1 ⎩

, û i−1 )≥0 if LN(i) (yN−1 0 0

(17.47)

otherwise.

For decoding, if i ∈ c (the set of frozen bits), then the decoded value is the frozen bit value. Otherwise, it is determined by the h function: ⎧ ⎪ui û i = ⎨ ⎪hi (yN−1 , û i−1 ) 0 0 ⎩

i ∈ c i ∈ .

(17.48)

17.4 Channel Polarization, N > 2 Channels

809

pυ 𝛼υ 𝛽υ υ

𝛼υl υl (a)

𝛽υl

𝛼υr

𝛽υr υr

(b)

Figure 17.18: Rotated tree and messages in the tree. (a) Rotating the tree to put the root node at the top; (b) a portion of a tree, showing messages passed in a local decoder. The two aspects of decoding for frozen and nonfrozen bits described in (17.47) and (17.48) are ̃ defined as summarized in the decoding function h, { i ∈ c ui N−1 i−1 ̃ , u ̂ ) = û i = h(y 0 0 h(yN−1 , û i−1 ) i∈ 0 0 Note that û i depends on the previously estimated values û i−1 . This is the essence of SC: bits are 0 estimated in order û 0 , û 1 , û 2 , . . . , û N−1 , with the estimate û i being based upon previously determined bits. Under thepolarization idea, since the polarized channels used are assumed to be good, each of the previously determined bits û i−1 are assumed to be good. 0 The formulas (17.40) and (17.41) allow the likelihood ratios (17.46) to be recursively expressed in terms of likelihood ratios of subsequences of the received data. Theorem 17.6 The likelihood ratio (17.46) can be written recursively in terms of likelihood ratio for smaller values of N: N∕2−1

(yN−1 , û 2i−1 ) 𝜆(2i) 0 0 N

=

(y 𝜆(i) N∕2 0

, û 2i−1 ⊕ û 2i−1 )𝜆(i) (yN−1 , û 2i−1 ) + 1 0,even 0,odd N∕2 N∕2 0,odd

N∕2−1

𝜆(i) (y N∕2 0

N∕2−1

𝜆(2i+1) (yN−1 , û 2i−1 ) = [𝜆(i) (y 0 0 N N∕2 0

, û 2i−1 ⊕ û 2i−1 ) + 𝜆(i) (yN−1 , û 2i−1 ) 0,even N∕2 N∕2 0,odd 0,odd

, û 2i−1 ̂ 2i−1 )]1−2̂u2i−1 𝜆(i) (yN−1 , û 2i−1 ). 0,even ⊕ u 0,odd N∕2 N∕2 0,odd

(17.49) (17.50)

Proof For (17.49), substituting i → 2i in (17.46) then employing (17.40) gives 𝜆N(2i) (yN−1 |̂u2i−1 ) 0 0

=

=

û 2i−1 |u2i = 0) WN(2i) (yN−1 0 0 WN(2i) (yN−1 , û 2i−1 |u2i = 1) 0 0 (i) (i) (i) (i) (a|0 ⊕ 0)WN∕2 (b|0) + WN∕2 (a|0 ⊕ 1)WN∕2 (b|1) WN∕2 (i) (i) (i) (i) WN∕2 (a|1 ⊕ 0)WN∕2 (b|0) + WN∕2 (a|1 ⊕ 1)WN∕2 (b|1) N∕2−1

(from (17.40) with a = (y0 (i)

(i)

WN∕2 (a|0) WN∕2 (b|0)

=

(i) WN∕2 (a|1) (i)

(i)

(i) WN∕2 (b|1)

N∕2−1

=

(i)

WN∕2 (b|1)

WN∕2 (a|1) WN∕2 (b|0) (i) WN∕2 (a|1)

𝜆(i) (y N∕2 0

+1

(i)

+

(i) (i) WN∕2 (a|1)WN∕2 (b|1) (i)

WN∕2 (a|0) WN∕2 (b|1) (i) WN∕2 (a|1)

(i) WN∕2 (b|1)

(i) (i) WN∕2 (a|1)WN∕2 (b|1)

, û 2i−1 ⊕ û 2i−1 )𝜆(i) (yN−1 , û 2i−1 ) + 1 0,even 0,odd N∕2 N∕2 0,odd

N∕2−1

𝜆(i) (y N∕2 0

, û 2i−1 ̂ 2i−1 ) and b = (yN−1 , u2i−1 )) 0,even ⊕ u N∕2 0,odd 0,odd

, û 2i−1 ⊕ û 2i−1 ) + 𝜆(i) (yN−1 , û 2i−1 ) 0,even N∕2 N∕2 0,odd 0,odd

= (17.49)

810

Polar Codes For (17.50), substituting i → 2i + 1 in (17.46) and employing (17.41) gives 𝜆(2i+1) (yN−1 |̂u2i 0) 0 N

=

=

, û 2i |u = 0) WN(2i+1) (yN−1 0 0 2i+1 WN(2i+1) (yN−1 , û 2i |u = 1) 0 2i+1 0 N∕2−1

(i) , û 2i−1 ⊕ û 2i−1 |̂u ⊕ 0) WN∕2 (yN−1 , û 2i−1 |0) 0,even N∕2 0,odd 0,odd 2i

N∕2−1

(i) , û 2i−1 |1) , û 2i−1 ⊕ û 2i−1 |̂u ⊕ 1) WN∕2 (yN−1 N∕2 0,odd 0,even 0,odd 2i

(i) (y0 WN∕2 (i) WN∕2 (y0

.

If û 2i = 0, then the first ratio is a likelihood ratio; if û 2i = 1, the first ratio is the reciprocal of a likelihood ratio, so N∕2−1

(i) 𝜆(2i+1) (yN−1 |̂u2i 0 ) = [𝜆N∕2 (y0 0 N

|̂u2i−1 ̂ 2i−1 )]1−2̂u2i 𝜆(i) (yN−1 |̂u2i−1 ) = (17.50). 0,even ⊕ u 0,odd N∕2 N∕2 0,odd



It will be convenient to express these formulas using log likelihoods. Equation (17.49) can be written as N∕2−1

LN(2i) (yN−1 |̂u2i−1 )= 0 0

(i) (y0 exp[LN∕2

N∕2−1

(i) exp[LN∕2 (y0

(i) |̂u2i−1 ⊕ û 2i−1 )] + exp[LN∕2 (yN−1 |̂u2i−1 )] 0,even N∕2 0,odd 0,odd

N∕2−1

(i) = chk(LN∕2 (y0

N∕2−1

(i) (y0 = LN∕2

(i) |̂u2i−1 ⊕ û 2i−1 ) + LN∕2 (yN−1 |̂u2i−1 )] + 1 0,even N∕2 0,odd 0,odd

(i) |̂u2i−1 ̂ 2i−1 ), LN∕2 (yN−1 |̂u2i−1 )) 0,even ⊕ u N∕2 0,odd 0,odd

(i) |̂u2i−1 ̂ 2i−1 ) ⊞ LN∕2 (yN−1 |̂u2i−1 ). 0,even ⊕ u N∕2 0,odd 0,odd

(17.51) (17.52)

To use other notations, this is the s or “upper” computation, also computable using the tanh rule. Also, û 2i

N∕2−1

(i) (i) N−1 2i−1 LN(2i+1) (yN−1 |̂u2i u0,odd ) ± LN∕2 (y0 0 ) = LN∕2 (yN∕2 |̂ 0

|̂u2i−1 ̂ 2i−1 ). 0,even ⊕ u 0,odd

(17.53)

This is the t or “lower” computation. These recursions end with L2(0) and L2(1) , which are computed using the formulas for N = 2 in (17.24) and (17.25). In all of these computations, whether (17.52) or (17.53) is used depends on whether the likelihood is for an even-numbered bit or an odd-numbered bit. 17.4.8

SC Decoding from a Message Passing Point of View on the Tree

Conventional SC decoding can be expressed as message passing on the tree introduced in Section 17.4.6. For a node 𝑣 in the tree, let d𝑣 denote the depth of the tree, where d𝑣 = 0 for the root node. The operations that take place at node 𝑣 are said to constitute a decoder. When a non-leaf decoder 𝑣 is activated, it computes the message 𝛼𝑣l (i) = 𝛼𝑣 [2i] ⊞𝛼𝑣 [2i + 1]

for i = 0, 1, . . . , 2n−d𝑣 −1 − 1

(17.54)

and sends 𝛼𝑣l to its left child 𝑣l . This implements (17.52). The decoder at 𝑣 then waits until it receives the codeword 𝛽𝑣l from 𝑣l . This corresponds to the bits that are sent toward the output. Node 𝑣 then calculates the messages 𝛼𝑣r [i] = 𝛼𝑣 [2i](1 − 2𝛽𝑣l [i]) + 𝛼𝑣 [2i + 1] 𝛽𝑣 [i] l

= 𝛼𝑣 [2i + 1] ± 𝛼𝑣 [2i]

for i = 0, 1, . . . , 2n−d𝑣 −1 − 1

(17.55)

17.4 Channel Polarization, N > 2 Channels

811

and sends it to its right child 𝑣r . This implements (17.53). The local decoder at 𝑣 then waits until it receives the codeword 𝛽𝑣r from 𝑣r . Node 𝑣 then calculates the message to its parent as 𝛽𝑣 [2i] = 𝛽𝑣l [i] ⊕ 𝛽𝑣r [i]

for i = 0, 1, . . . , 2n−d𝑣 −1 − 1

𝛽𝑣 [2i + 1] = 𝛽𝑣r [i]

and sends this to p𝑣 . This computes the bit addition/copying operations. At this point, the computations in the set of nodes V𝑣 terminate as these local decoders will never be activated again through the rest of the decoding. When 𝑣 is a leaf node, the local decoder at 𝑣 takes the incoming message (log likelihood) and thresholds it to obtain a bit if it is not frozen, or uses the frozen value if it is frozen. This bit forms the message to the parent of 𝑣. 17.4.9

A Decoding Example with N = 𝟒

In this section, we demonstrate how the decoding works with N = 4, illustrating the flow of information both on the encoding diagram and also in the tree. The two different perspectives inform the overall process. While the computations are identical, the different perspectives might lead to different implementations. The log-likelihood formulas developed above are applied to decoding when N = 4. The processing steps are summarized in Figure 17.20. Before these computations, the LLRs Lyi , i = 0, 1, . . . , N − 1 are computed. Step 4-0 i = 0 using (17.52), Figure 17.20(a): L4(0) (y30 ) = L2(0) (y10 ) ⊞ L2(0) (y32 ). To compute this, L2(0) (y10 ) and L20 (y32 )) are recursively computed as L2(0) (y10 ) = Ly0 ⊞ Ly1

L2(0) (y32 ) = Ly2 ⊞ Ly3 .

In Figure 17.20(a), the computations are shown using the brown lines leading from the likelihoods of the received values (on the right) to the input symbols (on the left). These computations ̃ (0) (y3 )) (the green dot). produce the estimate û 0 = h(L 0 4 On the tree in Figure 17.19(b), the messages are as follows:

a

𝛼al 𝛽al

b

𝑢ˆ 0

y0

𝑢ˆ 1

y1

𝑢ˆ 2

y2

𝑢ˆ 3

y3

(a)

𝛼bl

𝛽ar

𝛽br

c

𝛼cl

𝛼br 𝛽bl

𝑢ˆ 0

𝛼ar

𝑢ˆ 1

𝑢ˆ 2

𝛽cl

𝛼cr 𝛽cr

𝑢ˆ 3

(b)

Figure 17.19: Encoder for N = 4 and its associated tree. (a) Encoder illustrating node groupings; (b) decoding tree for N = 4 with messages labeled.

812

Polar Codes Node a has the incoming messages (not shown in the figure) 𝛼𝑣 = (Ly0 , Ly1 , Ly2 , Ly3 ). The message that node a sends to its left child b is the vector (see (17.54)) 𝛼al = (𝛼a [0] ⊞𝛼a [1], 𝛼a [2] ⊞𝛼a [3]) = (Ly0 ⊞ Ly1 , Ly2 ⊞ Ly3 ) = (L2(0) (y10 ), L2(0) (y32 )). The message that node b sends to its left child is 𝛼bl = 𝛼al [0] ⊞𝛼al [1] = L4(0) (y30 ) = Ly0 ⊞ Ly1 ⊞ Ly2 ⊞ Ly3 . At node û 0 , a leaf node, the bit value is determined and transmitted back as the message that û 0 sends to its parent b, ̃ b ) = û 0 . 𝛽bl = h(𝛼 l Step 4-1 i = 1 using (17.53), Figure 17.20(b): û 0

L4(1) (y30 |̂u0 ) = L2(0) (y32 ) ± L2(0) (y10 ). The sum/difference of the likelihoods L2(0) (y32 ) and L2(0) (y10 ) (both already computed in Step 4-0, brown lines) are controlled by û 0 (computed in Step 4-0, green line) to produce the cyan line and ̃ (1) (y3 |̂u0 )) (green dot). the estimate û 1 = h(L 0 4 Observe that the order of the bits decoded is numeric order: first u0 , then u1 , etc., even that is not the order they appear on the graph. In terms of the tree, the message that b sends to its right child is 𝛽b

l

𝛼br = 𝛼al [1] ± 𝛼al [0] = L4(1) (y30 |̂u0 ). At node û 1 , a leaf node, the bit value is determined and transmitted back as the message that û 1 sends to its parent b: ̃ b ) = û 1 . 𝛽br = h(𝛼 r At this point, the subtree rooted at node b is complete, so it can send a message back to its parent. This message is based on the pair of bits from this subtree, 𝛽al = (̂u0 ⊕ û 1 , û 1 ). Step 4-2 i = 2 using (17.52), Figure 17.20(c): L4(2) (y30 |̂u10 ) = L2(1) (y10 , û 0 ⊕ û 1 ) ⊞ L2(1) (y32 , û 1 ). The log-likelihood functions L2(1) (y10 |̂u0 ⊕ û 1 ) and L2(1) (y32 |̂u1 ) are computed using (17.25): û 0 ⊕̂u1

û 1

L2(1) (y10 |̂u0 ⊕ û 1 ) = Ly1 ± [Ly0 ] L2(1) (y32 |̂u1 ) = Ly3 ± Ly2 .

17.4 Channel Polarization, N > 2 Channels

813

The sum û 0 ⊕ û 1 (green lines) is used to control the sum/difference in L2(1) (y10 , û 0 ⊕ û 1 ). Also, û 1 (green line) is used to control L2(1) (y32 , û 1 ). These are then used in a ⊞ computation to produce û 2 : ̃ (2) (y3 , û 1 )). û 2 = h(L 0 0 4 In terms of the tree, the message that a sends to its right child c is the vector 𝛼ar = (L2(1) (y10 |̂u0 ⊕ û 1 ), L2(1) (y32 |̂u1 )). The message that c sends to its left child is 𝛼cl = 𝛼ar [0] ⊞𝛼ar [1] = L4(2) (y30 |̂u10 ). At node û 2 , a leaf node, the bit value is determined and transmitted back as the message that û 2 sends to its parent c: ̃ c ) = û 1 . 𝛽cl = h(𝛼 l Step 4-3 i = 3 using (17.53), Figure 17.20(d): û 2

L4(3) (y30 |̂u20 ) = L1(1) (y32 |̂u1 ) ± L2(1) (y10 |̂u0 ⊕ û 1 ). Using previously computed L1(1) (y32 |̂u1 ) and L2(1) (y10 |̂u0 ⊕ û 1 ), compute the sum/difference controlled by û 2 . In terms of the tree, the message that c sends to its right child is û 2

𝛼cr = 𝛼ar [0] ± 𝛼ar [1] = L4(3) (y30 , û 20 ). At this point there is no need to send further information back up the tree.

17.4.10

A Decoding Example with N = 𝟖

The example in this section provides a larger illustration of the successive cancellation algorithm with N = 8 and details some data structures which can be used for an efficient decoding algorithm, that require only O(N) space. The decoding tree labeled with the message is shown in Figure 17.21. Figure 17.22 illustrates the encoding operation. On the left, the input bits are numbered using an index j. After passing through a bit reverse permutation, the bits are numbered using an index i, which is listed in numerical order. The decoding proceeds in numerical order of j, j = 0, 1, . . . , N − 1. This means that the order of i is 0, 4, 2, 6, 1, 5, 3, 7. Decoding operations are performed once per bit in this order. LLRs are propagated from right to left. As bits are determined, they are propagated left to right as necessary. This general rule is tempered by the fact that it is not necessary at every step to propagate the LLRs all the way from the far right-hand side, nor is it necessary to propagate all the bits all the way to the far right-hand side. As described above, the LLR computations involve two types. For the “upper” branches of the basic blocks, the LLR computation involves the s function. For the “lower” branches of the basic blocks, the LLR computation involve the t function. In the examples, results of computing s are shown using brown lines and results of computing t are shown using cyan lines. Known bits and values propagated from bits are shown using green lines. There are two overall steps to the algorithm for each j: LLR computation, which proceeds from right to left in the encoding diagram, and from top to bottom in the tree representation, and bit propagation

814

Polar Codes

(a)

(b)

(a)

(d)

Figure 17.20: Decoding steps for N = 4. (a) L4(0) ; (b) L4(1) ; (c) L4(2) ; (d) L4(3) . (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.)

Figure 17.21: Decoding tree for N = 8 with messages labeled.

in which bit values on the left are propagated as far to the right in the encoding diagram and from bottom to top in the tree, as needed for future computations. There are two simple data structures used in the decoding algorithm. The first is an array of length 2N − 1 which contains the LLR values. The last N values, LLR(N-1), . . . , LLR(2N − 2) (using 0-based indexing) hold the channel input LLRs Lyi , i = 0, 1, . . . , N − 1. The other LLR values contain results of LLR computations (either s or t computations), varying in position depending on which bit is being decoded. This makes for a memory-efficient implementation. There is also an array BITS of size 2 × (N − 1) which hold bit information propagating back through the network. A BITS whose first index is 0, such as BITS(0,0), corresponds to an upper branch going into a ⊕. A BITS whose first index is 1, such as BITS(1,0), corresponds to a lower branch going into a ⊕. The second index in BITS indicates which LLR value the bit is associated with. As bits become available, they are propagated to the right to populate the BITS data. Like the LLR data

17.4 Channel Polarization, N > 2 Channels

815

j=4

i=1

1

s

i=2

2

t

i=3

3

i=4

4

i=5

5

i=6

6

i=7

7

j=2 j=6 j=1 j=5 j=3

Bit reverse permutation

←LLRs

j=0

Bits→ i=0

j=7

Level 0

Level 1 adders

Level 2 Level 3 adders adders

Figure 17.22: The structure used for the decoding example.

structure, the bit values are reused as possible for efficiency in storage. (Data structures described here are based on code derived from [459].) We now describe the algorithm in detail, referring to Figures 17.23, 17.24 and 17.25. The description unfolds as the various aspects of the algorithm are uncovered. 0-based indexing is used is used throughout. j = 0, i = 0 Figure 17.23(a, b). Update LLR Using (17.52), L8(0) (y70 ) = L4(0) (y30 ) ⊞L4(0) (y74 ), ⏟⏟⏟ LLR(0)

where L4(0) (y30 ) = L2(0) (y10 ) ⊞ L2(0) (y32 ) ⏟⏟⏟ ⏟⏟⏟ ⏟⏟⏟ LLR(1) L4(0) (y74 )

⏟⏟⏟ LLR(2)

=

LLR(3) LLR(4) (0) 5 L2 (y4 ) ⊞ L2(0) (y76 )

⏟⏟⏟

⏟⏟⏟

LLR(5)

LLR(6)

and the L2 quantities are each computed using the chk computation in (17.24). The tree indicated with brown lines propagates log-likelihood information from the received values to the node 0. In the figure, the shaded ⊕s indicate that computations take place at those nodes, s (i.e., chk) computations in this case since these are all upper branches. After propagating to the left, a bit value û 0 is computed (indicated by the green dot). The LLR for this bit, L8(0) at level 0, is contained in LLR(0). At the next level, LLR values are in LLR(1) and LLR(2). At the next level, LLR values are in LLR(3) through LLR(6), and the input level on the right LLR values are in LLR(7) through LLR(14). In general, at a level number level (from 0 on the left to through n on the right) the LLR values associated with that level have indexes ranging from 2level − 1 through 2level+1 − 2.

816

Polar Codes

(a)

(b)

(c)

(d)

(e)

(f)

Figure 17.23: Successive cancellation steps, j = 0, 1, 2, i = 0, 4, 2. (a) LLR path for bit j = 0. i = 0, level1 = 3; (b) bit propagation, j = 0, i = 0; (c) LLR path for bit j = 1. i = 4, level1 = 1; (d) bit propagation j = 1. i = 4, level0 = 2; (e) LLR path for bit j = 2, i = 2. level1 = 2; (f) bit propagation j = 2, i = 2. (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.).

In terms of messages on the tree in Figure 17.21, the message that node a computes and sends to its left child b is the vector 𝛼al = (Ly0 ⊞ Ly1 , Ly2 ⊞ Ly3 , Ly4 ⊞ Ly5 , Ly6 ⊞ Ly7 ) = (L2(0) (y10 ), L2(0) (y32 ), L2(0) (y54 ), L2(0) (y76 )). The message that node b sends to its left child d is the vector 𝛼bl = (𝛼al [0] ⊞𝛼al [1]), 𝛼al [2] ⊞ 𝛼al [3]) = (L4(0) (y30 ), L4(0) (y74 )).

17.4 Channel Polarization, N > 2 Channels

817

The message that node d sends to its left child û 0 is 𝛼dl = 𝛼bl [0] ⊞ 𝛼bl [1] = L8(0) (y70 ). At node û 0 , a child node, the bit is estimated and its value is passed up to the parent node, ̃ d ) = û 0 . 𝛽dl = h(𝛼 l Update Bits At this point, there are no bits to be added to the bit just found. The presence of this bit is indicated with a green line on the graph. The bit value just obtained in stored in BITS(0,0). The first index indicates that this is an “upper” bit into an adder. The second index indicates that this value is associated with the first (“zeroth”) element on the graph, corresponding to LLR(0). Observations: For future reference, when i < N∕2, the MSB of i is 0 (the LSB of j is 0; j is even), the line is in the the top half of the figure, there is a ⊕ immediately connected to it (a Level 1 adder in Figure 17.22) whose other input is not yet known. So there is no further propagation of the bits. Save the bit value in BITS(0,0). Furthermore, at the next time step, j is odd, the LSB of j is 1, the MSB of i is 1, and we get the line connected to the other input of the Level 1 adder. The number of levels into the tree for which the likelihood is recursively computed is 3 in this case — it was necessary to go all the way to the root of the tree (the input likelihoods). j = 1, i = 4 Figure 17.23(c, d). Update LLR Using (17.53), û 0

L8(1) (y70 |̂u00 ) = L4(0) (y74 ) ± L4(0) (y30 ), where L4(0) (y74 ) and L4(0) (y30 ) are already available in LLR(2) and LLR(1), respectively, from the previous step, as indicated with the brown lines coming in from the right. These combine with the estimated bit û 0 into the ⊕ (shaded) to compute the LLR shown on the cyan line, a lower branch. There is potential confusion in the figure, since the lower line on the ⊕ is both an input (from LLR(2)) and an output LLR(0). This is a visual portrayal of the fact that Ly0 is an input

connected to the lower terminal in the ⊕, and L8(1) is an output connected to the lower terminal in the ⊕. In order to show both lines, the brown line is portrayed with greater width.

The bit determined by L8(1) is stored in BITS(1,0), where the first index 1 indicates that the bit came from a lower t computation. Observations: For this bit it is not necessary to go any further into the tree than a level 1 adder to get the LLR information. Note that at the previous step, when j = 0 and i = 0, the MSB of i was 0. At the present step, when j = 1 and i = 4, the MSB of i is 1, and both inputs to the adder at level 1 are available to compute the needed LLR. If the first bit (MSB) of i is 1, then information from the upper branch computed at the previous value of j (with MSB i = 0 there) is available to compute at an ⊕ at level 1. As stated, it was only necessary to go to Level 1 of the tree. The depth of the tree necessary is determined by a variable level1, which may be determined as follows. Write down the binary representation of the index i. Then level1 is the index of the first 1 in the representation starting with the MSB of i, or level1 = n if i = 0. It is only necessary to compute LLRs from level level1 down to 1. Higher levels are inactive. This results in a savings of computation. In the code provided, the level1 is computed in the same function that computes the bit-reversal of j.

818

Polar Codes For this computation at i = 4 (level1 = 1), the LLR was computed using a lower (t). In general, the t computation is performed at the highest level for that step, and LLR computations at lower levels use the upper (s) formula. In terms of messages on the tree, the message that node d send to its right child û 1 is û 0

𝛼dr = 𝛼bl [1] ± 𝛼bl [1]. At û 1 the bit is detected and the message is sent back ̃ d ) = û 1 . 𝛽dr = h(𝛼 r All the children of d have sent their messages, so d can send the message to its parent, which consists of a pair of bits 𝛽bl = (̂u0 ⊕ û 1 , û 1 ). Update Bits The shaded ⊕ in Figure 17.23(d) has both inputs available since BITS(0,0) contains the upper value computed when i = 0 and the bit value just computed is available, stored in BITS(1,0). The sum is computed and stored in BITS(0,1). The index 0 here indicates that this is a bit on an upper branch, and the 1 indicates that this corresponds to the same branch as LLR(1) (compare with Figure 17.23(c)). The function that computes these bit sums also propagates bit values forward for use in further bit computations. Thus, the value BITS(0,2) is assigned, equal to BITS(1,0). Note that BITS(0,2) is on the same line as LLR(2). When bits are propagated to the right (that is, i ≥ N∕2) the number of levels to propagate the bits is determined by the level0 variable. level0 is the index of the first 0 in the binary representation of i, starting with the MSB. The highest level to propagate to through level0 1. In this case, bits are propagated through the Level 1 adders. j = 2, i = 2 Figure 17.23(e, f). Update LLR With i = 010, level1=2. Using (17.52), LLR(0) = L8(2) (y70 |̂u10 ) = L4(1) (y30 |̂u10,even ⊕ û 10,odd ) ⊞L4(1) (y74 |̂u10,odd ) = L4(1) (y30 |̂u0 ⊕ û 1 ) ⊞ L4(1) (y74 |̂u1 ). The L4(1) are not available, so they must be computed at the level1 = 2 adders, using (17.53): û 0 ⊕̂u1

LLR(1) = L4(1) (y30 |̂u0 ⊕ û 1 ) = L2(0) (y32 ) ± L2(0) (y10 ) û 1

LLR(2) = L4(1) (y74 |̂u1 ) = L2(0) (y76 ) ± L2(0) (y54 ). These are both lower (t) computations. Note that the output values LLR(1) and LLR(2) are on different lines than previously. Specifically, in Figure 17.23(a), LLR(1) and LLR(2) were on lines i = 0 and i = 4. Since those values of i have been used (above) and are not needed now, it is permissible to reuse these memory locations, so LLR(1) and LLR(2) are reassociated with lines i = 2 and i = 6. With two levels of LLR computation, we see that the first level (level 2) is lower (t), and the second (level = 1) is upper (s).

17.4 Channel Polarization, N > 2 Channels

819

In terms of the decoding tree, the message that node b sends to its right child e is the vector û 0 ⊕̂u1

û 1

𝛼br = (𝛼al [1] ± 𝛼al [0], 𝛼al [3] ± 𝛼al [2]) = (L4(1) (y30 , û 0 ⊕ û 1 ), L4(1) (y74 , û 1 )). The message that node e sends to its left child is 𝛼el = 𝛼br [0] ⊞𝛼br [1]. At the leaf node û 2 , the bit is detected and sent back ̃ e ) = û 2 . 𝛽el = h(𝛼 l Update Bits In general, lines for i < N∕2 occur in the in the top half of the structure (where the MSB = 0), and are connected immediately to a ⊕ at level 1. Since to this point the other input of the adder (from a corresponding line where the MSB) is not done yet, no bit propagation is accomplished. All the update bits stage is to save the current bit estimate û 1 into BITS(0,0) in preparation for the next step. j = 3, i = 6 Figure 17.24(a, b). Update LLR Using (17.53), û 2

L8(3) (y70 |̂u20 ) = L4(1) (y74 |̂u1 ) ± L4(1) (y30 |̂u0 ⊕ û 1 ). Both of the L4(1) values are available from the previous step, so only a single level of computation is necessary. (The number of levels is level1 = 1.) In terms of messages on the tree, the message that node e sends to its right child û 3 is 𝛼

j

𝛼er = 𝛼br [1]±[0] = L8(3) (y0 7, û 20 ). br

The leaf node û 3 detects the bit and sends it back to its parent ̃ e ) = û 3 . 𝛽er = h(𝛼 r The children of node e have now sent their messages, so e can send the message to its parent, which is a pair of bits 𝛽br = (̂u2 ⊕ û 3 , û 3 ). And now the children of node b have sent their messages, so b can send the message to its parent, which consists of 4 bits 𝛽al = (̂u0 ⊕ û 1 , û 1 , û 2 ⊕ û 3 , û 3 ). Update Bits level0=3, so bits are propagated through the Level 2 adders. The current bit û 3 is stored in BITS(1,0), and is added to BITS(0,0) which is presently associated with û 2 from that last step i = 2, to produce BITS(1,1). The bit in BITS(1,0) is also propagated forward into BITS(1,2) (on the same line as LLR(2)). At the next level, the bits output of the two shaded ⊕s are stored in BITS(0,3) and BITS(0,5). Also, the bits in BITS(1,1) and BITS(1,2) are propagated forward as BITS(0,4) and BITS(0,6). (These bits are never used.)

820

Polar Codes

(a)

(b)

(c)

(d)

(e)

(f)

Figure 17.24: Successive cancellation steps, j = 3, 4, 5, i = 6, 1, 5. (a) LLR path for bit j = 3, i = 6. level1 = 1; (b) bit propagation for bit j = 3, i = 6. level0 = 3; (c) LLR path for bit j = 4, i = 1. level1 = 3; (d) bit propagation j = 4, i = 1; (e) LLR path for bit j = 5, i = 5. level1 = 1; (f) bit propagation j = 5, i = 5. level0 = 2. (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.). Observations: Examination of this result reveals that there are two kinds of computation here: those that produce “lower” bits BITS(1,*) and those that produce “upper” bits BITS(0,*). The number of levels of lower bit computation is level0-2. This is followed by one level of upper bit computation at level level0-1. j = 4, i = 1 Figure 17.24(c, d). Update LLR level1 = 3. From (17.52), L8(4) (y70 |̂u30 ) = L4(2) (y30 |(̂u0 , û 2 ) ⊕ (̂u1 , û 3 )) ⊞L4(2) (y74 |̂u1 , û 3 ).

17.4 Channel Polarization, N > 2 Channels

821

Neither of the L4(2) are available. From (17.52) again, L4(2) (y30 |(̂u0 , û 2 ) ⊕ (̂u1 , û 3 )) = L2(1) (y10 |(̂u0 ⊕ û 1 ) ⊕ (u2 ⊕ u3 )) ⊞ L2(1) (y32 |̂u2 ⊕ û 3 ), ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ LLR(1)

LLR(3)

L4(2) (y74 , (̂u1 , û 3 ))

⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟

=

L2(1) (y54 |̂u1

LLR(4)

û 3 ) ⊞ L2(1) (y76 |̂u3 ).

⊕ ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟

⏟⏞⏞⏟⏞⏞⏟

LLR(5)

LLR(6)

LLR(2)

At level 3, a lower (t) is computation performed, reassociating the values LLR(3), LLR(4), LLR(5), and LLR(6). At level 2, upper (s) computations are performed, and then another at level 1. In terms of the tree, the message that node a sends to its right child c is the vector 𝛼ar = (Ly1

û 30,even ⊕̂u3

±

0,odd

û 2 ⊕̂u3

û 1 ⊕̂u3

û 3

Ly0 , Ly3 ± Ly2 , Ly5 ± Ly4 , Ly7 ± Ly6 ))

= (L2(1) (y10 |̂u30,even ⊕ û 30,odd ), L2(1) (y32 |̂u2 ⊕ û 3 ), L2(1) (y54 |̂u1 ⊕ û 3 ), L2(1) (y76 |̂u3 )). The message that node c sends its left child f is the vector 𝛼cl = (𝛼ar [0] ⊞𝛼ar [1], 𝛼ar [2] + 𝛼ar [3]) = (L4(2) (y30 |(̂u0 , û 2 ) ⊕ (̂u1 , û 3 )), L4(2) (y74 |(̂u1 , û 3 )). The message that node f sends its left child is the vector 𝛼fl = 𝛼cl [0] ⊞ 𝛼cl [1] = L8(4) (y70 , û 30 ). At the leaf node, the bit is detected and sent to its parent ̃ f ) = û 4 . 𝛽fl = h(𝛼 l Update Bits Since i < N∕2, no bit updates are computed. j = 5, i = 5 Figure 17.24(e, f). Update LLR level1=1. Using (17.53), û 4

L8(5) (y70 |̂u40 ) = L4(2) (y74 |̂u30 ) ± L4(2) (y30 |̂u30,even ⊕ û 30,odd ). Both of the L4(2) are available, so this lower (t) does not require any additional levels. In terms of the tree, the message that node f computes and sends its right child û 5 is 𝛽f

l

𝛼fr = 𝛼cl [3] ± 𝛼cl [2] = L4(2) (y30 , (̂u0 , û 2 ) ⊕ (̂u1 , û 3 )) ⊞L4(2) (y74 , (̂u1 , û 3 ). At the leaf node û 5 the bit is detected and sent up ̃ f ) = û 5 . 𝛽fr = h(𝛼 r At this point, node f has received messages from both of its children and its sends a message to its parents consisting of 2 bits 𝛽cl = (̂u4 ⊕ û 5 , û 5 ).

822

Polar Codes Update Bits There are two levels. According to the rules described for i = 6, there are no lower bits computed, and 1 level of upper bits computed at level 1, including propagating a bit forward. j = 6, i = 3 Figure 17.25(a, b). Update LLR level1=1. Using (17.52), L8(6) (y70 |̂u50 ) = L4(3) (y30 |(̂u0 ⊕ û 1 , û 2 ⊕ û 3 , û 4 ⊕ û 5 )) ⊞L4(3) (y74 |(̂u1 , û 3 , û 5 )). The L4(3) must be computed. Using (17.53), L4(3) (y30 |(̂u0 ⊕ û 1 , û 2 ⊕ û 3 , û 4 ⊕ û 5 )) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ LLR(1) û 4 ⊕̂u5

= L2(1) (y32 |̂u2 ⊕ û 3 ) ± L2(1) (y10 |̂u0 ⊕ û 1 ⊕ û 2 ⊕ û 3 ), û 5

L4(3) (y74 |(̂u1 , û 3 , û 5 )) = L2(1) (y76 |̂u3 ) ± L2(1) (y54 |̂u1 ⊕ û 3 ). ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ LLR(2)

The L2(1) values are available from the j = 4 step. In terms of the tree, the message that node c sends to its right child g is the vector û 4 ⊕̂u5

û 5

𝛼cr = (𝛼ar [1] ± 𝛼ar [0], 𝛼ar [3] ± 𝛼ar [2]) =

(L4(3) (y30 , (̂u0

⊕ û 1 , û 2 ⊕ û 3 , û 4 ⊕

û 5 )), L4(3) (y74 , (̂u1 , û 3 , û 5 ))).

The message that g sends to its left node û 6 is 𝛼gl = 𝛼cr [0] ⊞ 𝛼cr [1] = L8(6) (y70 , û 50 ). At the leaf node û 6 the bit is detected and sent up ̃ g ) = û 6 . 𝛽𝛽l = h(𝛼 l Update Bits Since i < N∕2, no bit propagation. j = 7, i = 7 Figure 17.25(c). Update LLR level1=1. Using (17.53), û 6

L8(7) (y70 |̂u60 ) = L4(3) (y74 |(̂u1 , û 3 , û 5 )) ± L4(3) (y30 |(̂u0 ⊕ û 1 , û 2 ⊕ û 3 , û 4 ⊕ û 5 )). The L4(3) are available from step j = 6.

(17.56)

17.4 Channel Polarization, N > 2 Channels

823

(a)

(b)

(c)

Figure 17.25: Successive cancellation steps, j = 6, 7, i = 3, 7. (a) LLR path for bit j = 6, i = 3. level1 = 2; (b) bit propagation j = 6, i = 3; (c) LLR path for bit j = 7, i = 7. level1 = 1. (Color versions of the page can be found at wileywebpage or https://github.com/tkmoon/eccbook/colorpages.).

In terms of the tree, the message that node g sends to its right child û 7 is û 6

y

𝛼gr = 𝛼cr [1] ± 𝛼cr [0] = L8(7) (y0 |̂u60 ). The bit is detected, but there is no need to send a message back: ̃ g ). û 7 = h(𝛼 r Update Bits Since all bits have been estimated, there is no need to propagate bits, so no computation is performed.

17.4.11

Pseudo-Code Description of Successive Cancellation Algorithm

Combining the observations above, the successive cancellation algorithm is set forth in Algorithm 17.2.

824

Polar Codes

Algorithm 17.2 Successive Cancellation Decoder

1

2 3

4 5 6 7 8

9 10 11 12 13 14 15 16 17

18 19 20

21 22 23 24 25

26 27

Function SCPolarDecode(𝐲) % Decode polar coded data from a Gaussian channel % Derived from code by Harish Vangala then of (Monash University) % Input: 𝐲 received vector of length N computeLLR(𝐲, LLR + N − 1) % compute log-likelihood ratios and put into last N positions of LLR array for j = 0, 1, … , N − 1 do % For each bit position (i, level1, level0) = bitreverse(j, n) % i is bit-reversed of j % level1 = index of first 1 in binary representation of i. level0 = index of first 0 in binary representation of i updateLLR(i, level1) % Compute LLR at bit j, right to left LLRtobit(j, i) % Convert LLR to bit in bitsout[i], using code design LLRfinal[i] = LLR[0] % save the likelihood (optional) updateBITS(bitsout[j], i, level0) % Push the bits right end Function updateLLR(i, level1) % Update the LLR at (reversed) bit i using level1, pushing LLRs from right to left Input: i bit-reversed bit index, level1 index of 1st 1 in i if i = 0 then nextlevel = n % No ‘‘lower" computations for first iteration else % At the deepest level of the tree for this i, do the LLR‘‘lower’’ t computation lastlevel = level1 start = 2level1−1 − 1 end = 2level1 − 1 for index = start, start + 1, … , end − 1 do LLR[index] = LLRlower(BITS[0][index], LLR[end + 2(index − start)], LLR[end + 2(index − start) + 1]) end nextlevel = level1 - 1 end % Do the other levels of the tree with ‘‘upper’’ s computations for level = nextlevel, nextlevel − 1, … , 1 do start = 2level−1 − 1 end = 2level − 1 for index = start, … , end − 1 do LLR[index] = LLRupper(LLR[end + 2(index − start)], LLR[end + 2(index − start) + 1] end end

17.4 Channel Polarization, N > 2 Channels

28 29 30

Function updateBITS(latestbit, i, level0) % Push estimated bits from left to right Input: latestbit is most recent bit estimate; i is (bit reversed) bit number; level0 is index of 0 in i if i = N − 1 then % Nothing more to do on last step return else if i < N∕2 then % bit lines in the top half

31 32 33 34 35 36 37 38 39 40 41 42 43 44

45 46 47 48 49 50 51 52

BITS[0][0] = latestbit % Just save the bit for next step else lastlevel = level0 BITS[1][0] = latestbit for level1 = 1, … , lastlevel − 2 do start = 2level−1 − 1 end = 2level − 1 for index = start, … , end − 1 do BITS[1][end + 2(index − start)] = BITS[0][index] ⊕ BITS[1][index] BITS[1][end + 2(index − start) + 1] = BITS[1][index] % propagate end end if level0 = n then return % Don’t need to compute upper bits here level = lastlevel - 1; start = 2level−1 − 1 end = 2level − 1 for index = start, … , end − 1 do BITS[0][end + 2(index − start)] = BITS[0][index] ⊕ BITS[1][index] BITS[0][end + 2(index − start) + 1] = BITS[1][index] end end

Function LLRupper(LLR1, LLR2) % Compute the likelihood for the ‘‘upper’’ part, using log-based computation Input: LLR1: LLR input from upper branch; LLR2: LLR input from lower branch Output: Returns log likelihood 53

𝐫𝐞𝐭𝐮𝐫𝐧 logdomainsum(LLR1 + LLR2, LLR1) − logdomainsum(LLR1, LLR2) Function LLRlower(bit, LLR1, LLR2) % Compute the likelihood for the ‘‘lower’’ part, using log-based computation Input: bit: input bit; LLR1: LLR input from upper branch; LLR2: LLR input from lower branch Output: Returns log likelihood

54 55 56 57 58

59 60 61 62 63

if bit = 0 then 𝐫𝐞𝐭𝐮𝐫𝐧 LLR1 + LLR2 else 𝐫𝐞𝐭𝐮𝐫𝐧 LLR2 − LLR1 end Function logdomainsum(x, y) Output: Returns log likelihood if y > x then return y + log(1 + exp(x − y)) else return x + log(1 + exp(y − x)) end

825

826

Polar Codes

17.5 Some Theorems of Polar Coding Theory This section formalizes and generalizes the ideas that have been developed informally in this chapter. 17.5.1

I(W ) and Z (W ) for general B-DMCs

The informal development above presented examples using the BEC. We generalize this to other channels. For a B-DMC channel W there are two parameters of interest related to polar coding. The first is the symmetric capacity already introduced, I(W) =

∑∑1 W(y|x) . W(y|x)log2 1 2 W(y|0) + 1 W(y|1) y∈ x∈ 2

(17.57)

2

By the channel coding theorem, I(W) represents the Shannon channel capacity when W is a symmetric channel, the highest rate at which information can reliably transmitted through the channel by using equiprobable selection of input data x ∈ . Example 17.7

• •

Recall that

For the BEC(p) channel, I(W) = 1 − p; For the BSC(p) channel, I(W) = 1 − H2 (p).



The second parameter of interest is the Battacharyya parameter. For a binary channel W ∶  → , the Battacharyya parameter is defined as ∑√ W(y|0)W(y|1). (17.58) Z(W) = y∈

For channels with a continuum of outputs such as a Gaussian channel, Z(W) =

√ ∫y∈

W(y|0)W(y|1).

The Battacharyya parameters may be thought of as a measure of the distance between the pmf (or pdf) W(y|0) and the pmf (or pdf) W(y|0). Example 17.8



For the BEC(p) channel it is straightforward to show that Z(W) = p.



For the BSC(p) channel, Z(W) =



4p(1 − p). ◽

As suggested by these examples, when Z(W) is small, the channel is reliable (p is small). As discussed in Section 17.5.5, the Battacharyya parameter is an upper bound on the probability of error in maximum likelihood (ML) decoding. A small Battacharyya parameter corresponds to a low probability of error for ML decoding. Figure 17.26 shows Z(W) for 0 ≤ p ≤ 0.5 for the BEC and the BSC channels. Qualitatively, Z(W) for these two channels is similar, in that they both start at Z(W) = 0 and increase

17.5 Some Theorems of Polar Coding Theory

827

1 Z(W) I(W) log(2/(1+Z(W)) sqrt(1-Z(W )2)

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

p

0.3

0.4

0.5

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Z(W ) I(W) log(2/(1+Z(W)) sqrt(1-Z(W )2)

0

0.1

(a)

0.2

p

0.3

0.4

0.5

(b)

Figure 17.26: Z(W), I(W) and the bounds from (17.59) and (17.60). (a) BEC; (b) BSC. monotonically with p (for p ∈ [0, 0.5]). We will find that we can use Z(W) for a general channel where previously we used p. There is a tradeoff between the information I(W) and Z(W): If Z(W) is near 0 or 1, then I(W) is near 1 or 0, respectively. These intuitions are quantified by the following theorem: Theorem 17.9 For any B-DMC channel W, I(W) ≥ log2 I(W) ≤

2 1 + Z(W)

√ 1 − Z(W)2 .

(17.59) (17.60)

Proof of this theorem is given in Section 17.5.5. Figure 17.26 shows Z(W), I(W), and the bounds (17.59) and (17.60) for the BEC and BSC.

17.5.2

Channel Polarization

In this section, we formalize and generalize results that were developed more informally in the previous sections. We first abstract the notation used in the polarization recursion equations (17.40) and (17.41).

• • • • •

Let W be shorthand for WN(i) .



(2i) Let W0 be shorthand for W2N .



(2i+1) Let W1 be shorthand for W2N .



Let F(y0 , y1 ) = (y2N−1 , u2i−1 ), that is, a pooling of the data in y0 and y1 . 0 0

Let y0 be shorthand for the set of data (yN−1 , u2i−1 ⊕ u2i−1 ). 0,even 0 0,odd Let y1 be shorthand for (y2N−1 , u2i−1 ). N 0,odd Let u0 be shorthand for u2i . Let u1 be shorthand for u2i+1 .

828

Polar Codes Using this notation, (17.40) can be written as W0 (F(y0 , y1 )|u0 ) =

1∑ W(y0 |u0 ⊕ u1 )W(y1 |u1 ) 2 u

(17.61)

1 W(y0 |u0 ⊕ u1 )W(y1 |u1 ). 2

(17.62)

1

and (17.41) can be written as W1 (F(y0 , y1 ), u0 |u1 ) =

(2i) (2i+1) , W2N ) in the sense of (17.40) and (17.41) is Under this shorthand, the mapping (WN(i) , WN(i) ) → (W2N expressed as (17.63) (W, W) → (W0 , W1 ).

That is, two independent copies of a channel W, whose channel transition probabilities are functions of independent data y0 and y1 , produce two new channels W0 and W1 , which are functions of the pool of the data F(y0 , y1 ) in the first case, and that pool and the input u0 in the second case. ̃ so W ∶  → . ̃ Let Let the input and output spaces of W and outputs be denoted by  and , 0 0 ̃ ̃ ̃ ̃ W ∶  →  × . Denote  ×  by  . Then W0 ∶  →  0 and W1 ∶  →  0 × .

Example 17.10

Recall from (17.9) that W2(0) ∶ U0 → Y0 , Y1 .

Taking W = W2(0) , we have  = {0, 1} and ̃ =  × , the space of the two outputs Y0 , Y1 . From (17.27) and (17.28), W4(0) ∶ U0 → (Y0 , Y1 , Y2 , Y3 ) W4(1) ∶ U1 → (Y0 , Y1 , Y2 , Y3 , X). Taking W0 = W4(0) and W1 = W4(1) , we see that  0 is the space ̃ × ̃ and  1 =  0 ⊗ .



The following theorem formalizes and generalizes the observation about polarization made in (17.10). Theorem 17.11 Suppose (W, W) → (W0 , W1 ). Then I(W0 ) + I(W1 ) = 2I(W)

(17.64)

I(W0 ) ≤ I(W1 ).

(17.65)

̃ Then for Proof Let U0 , U1 ∈  be independent and let (X0 , X1 ) = (U0 ⊕ U1 , U1 ). Let (Y0 , Y1 ) ∈ . U0 ∈ , I(W0 ) = I(U0 ; Y0 , Y1 ) and I(W1 ) = I(U1 ; Y0 , Y1 , U0 ).

17.5 Some Theorems of Polar Coding Theory

829

Since U0 and U1 are independent, I(U1 ; Y0 , Y1 , U0 ) is equal to I(U1 ; Y0 , Y1 |U0 ). By the chain rule, I(W0 ) + I(W1 ) = I(U0 ; Y0 , Y1 ) + I(U1 ; Y0 , Y1 |U0 ) = I(U0 , U1 ; Y0 , Y1 ). Since (X0 , X1 ) and (U0 , U1 ) are in a one-to-one, onto relationship, I(U0 , U1 ; Y0 , Y1 ) = I(X0 , X1 ; Y0 , Y1 ). Now note that I(X0 , X1 ; Y0 , Y1 ) = I(X0 ; Y0 ) + I(X1 ; Y1 ) = 2I(W). This proves (17.64). To prove (17.65), I(W1 ) = I(U1 ; Y0 , Y1 , U0 ) = I(U1 ; Y1 ) + I(U1 ; Y0 , U0 |Y1 ) (chain rule) = I(W) + I(U1 ; Y0 , U0 |Y1 ). This shows that I(W1 ) ≥ I(W). In light of (17.64) it must be the case that I(W0 ) ≤ I(W1 ).



It can be shown that I(W0 ) = I(W1 ) if and only if I(W) equals 0 or 1. In Sections 17.3.4 and 17.4.4 it was demonstrated that for a BEC(p) channel W, the result of combining and splitting to produce channels W0 and W1 is that channels W0 and W1 are BEC(2p − p2 ) and BEC(p2 ), respectively. The following theorem formalizes this and extends beyond BEC channels. Theorem 17.12 Suppose (W, W) → (W 0 , W1 ) in the sense of (17.61) and (17.62). Then Z(W 0 ) ≤ 2Z(W) − Z(W)2

(17.66)

Z(W1 ) = Z(W)2

(17.67)

Z(W 0 ) ≥ Z(W) ≥ Z(W 1 ),

(17.68)

where Z(W) is the Battacharyya parameter for the channel W (17.58). The proof of this theorem is provided in Appendix 17.C. (2i+1) The inequalities in (17.68) show that the W1 channel, that is, the W2N channel, i.e., the “odd” (2i) channel, is better than the W2N channel, or the “even” channel. As a result, as the combining and splitting operations are recursively applied polarization happens. As we have seen in the examples, for the BEC channel, the result (17.66) holds with equality. For other channels, Z(W0 ) is merely bounded by 2Z(W) − Z(W)2 . Less abstractly, (17.64) and (17.68) can be expressed as (2i) (2i+1) ) + I(W2N ) = 2I(WN(i) ) I(W2N

(17.69)

(2i) (2i+1) Z(W2N ) + Z(W2N ) ≤ 2Z(WN(i) ).

(17.70)

Recall the discussion in the context of Figure 17.16, in which it was shown numerically that the channels WN(i) polarize: most have capacity going to either 0 or 1, with a vanishingly small fraction in between. Let 𝛿 > 0. As N → ∞, channels with I(WN(i) ) < 𝛿 are “bad.” Channels with I(WN(i) ) > 1 − 𝛿 are “good.” Channels with I(WN(i) ) ∈ [𝛿, 1 − 𝛿) are “ugly”2 . The key polarization theorem is that the fraction of channels with I(WN(i) ) > 1 − 𝛿 goes to I(W), that is, the channel capacity. This theorem is formally stated and proved in the next section.

2

The language here is due to Emre Telatar. See [437].

830

Polar Codes 17.5.3

The Polarization Theorem

The polarization theorem establishes the main tenet of polar coding theory: the fraction of good channels approaches I(W), the fraction of bad channels approaches 1 − I(W), so the fraction of ugly channels approaches 0, as N → ∞.

Theorem 17.13 [16, Theorem 1] (The Polarization Theorem) For any B-DMC W, the channels {WN(i) } polarize in the sense that, for any fixed 𝛿 ∈ (0, 1), as N → ∞ through powers of 2, the fraction of channels WN(i) , i ∈ {0, 1, . . . , N − 1} for which I(WN(i) ) ∈ (1 − 𝛿, 1] goes to I(W) and the fraction of channels for which I(WN(i) ) ∈ [0, 𝛿) goes to 1 − I(W). By this theorem, as the channels polarize into “good” and “bad” channels, and the “ugly” channels are squeezed out as N → ∞.

17.5.4

A Few Facts About Martingales

The proof of the polarization theorem makes use of some martingale theory. In preparation for the proof, we briefly present a few facts about this important class of random processes. Definition 17.14 [411, Section 6.8] A random sequence {X0 , X1 , X2 , . . . } with E[|X[k]|] < ∞ is a martingale if E[Xk |Xk−1 , XK−2 , . . . , X0 ] = Xk−1 for all k ≥ 1. Such a random sequence is said to be supermartingale if E[Xk |Xk−1 , Xk−2 , . . . , X0 ] ≤ Xk−1 . A random sequence Yk with E[|Yk |] < ∞ is martingale with respect to sequence Xk if E[Yk |Xk−1 , Xk−2 , . . . , X0 ] = Yk−1

for all k ≥ 1.



Martingales and supermartingales enjoy a convergence property that says that the martingale or supermartingale sequence {Xn } is a uniformly integrable sequence and it converges almost everywhere to an integrable random variable X∞ . (see [71, Theorem 9.4.5, Theorem 9.4.6]). An implication of this (see [71, Theorem 4.5.4]) is that lim E[X[n]] = E[ lim X[n]],

n→∞

n→∞

that is, taking limits and expectations can be interchanged. Let {X0 , X1 , X2 , . . . } be a martingale sequence. Then by definition E[X1 |X0 ] = X0 . Taking expectations again and using the rule of iterated expectation,3 E[E[X1 |X0 ]] = E[X1 ]. In combination with (17.71), E[X1 ] = E[X0 ]. 3

The rule of iterated expectations says E[E[X|Y]] = E[X].

(17.71)

17.5 Some Theorems of Polar Coding Theory Repeating this at the next sample instant, E[X2 |X1 , X0 ] = X1 . Using iterated expectation again, E[E[X2 |X1 , X0 ] = E[X2 ] and by martingale property this is also E[X1 ] = E[X0 ], so that E[X2 ] = E[X0 ]. Proceeding inductively, E[Xn ] = E[X0 ], n = 1, 2, . . . . 17.5.5

Proof of the Polarization Theorem

Proof The tree in Figure 17.12 can be associated with a random process. Let Bn be 0 or 1 with equal probability, that is, Bernoulli(1/2), and let b1 b2 . . . bn be a sequence of outcomes of these random values. Define a random process of channels {Kn , n ≥ 0} as follows: Begin at the root K0 = W. For n ≥ 0, when Kn = W b1 b2 . . . bn , the next node Kn+1 is randomly set to W b1 b2 . . . bn 0 or W b1 b2 . . . bn 1 , with equal probability. The node W b1 b2 . . . bn represents WN(i) , where N = 2n and b1 b2 . . . bn is the binary representation of the number i. Given a sequence of bits b1 . . . bn , defining a node on the tree in (2i) (2i+1) and W2N , Figure 17.12 its successor nodes are b1 . . . bn 0 and b1 . . . bn 1, representing channels W2N respectively. Associated with this random process of channels, define the real random processes In = I(Kn ) and Zn = Z(Kn ), the information and Battacharyya parameter, respectively, associated with the channel Kn . At the root of the tree, I0 = I(W) and Z0 = Z(W), which are not random variables. We will show the following:

• • • •

In is a martingale sequence which has a limit I∞ such that E[I∞ ] = I0 . Zn is a supermartingale which has a limit Z∞ . Based on Theorem 17.12, we will show that Z∞ takes on values either 0 or 1. From this it follows that I∞ takes on values of either 0 or 1, from which we can determine the fraction of channel on which it takes on the value 1.

Given a sequence of bits b1 b2 . . . bn defining a node on the tree in Figure 17.12 and its associated random process {In , n ≥ 0}, E[In+1 |b1 , . . . , bn ] = I(W b1 bn 0 )P(Bn+1 = 0) + I(W b1 bn 1 )P(Bn+1 = 1) (2i) (2i+1) )P(Bn+1 = 0) + I(W2N )P(Bn+1 = 1) = I(W2N

=

1 (2i) (2i+1) ) + I(W2N ) (I(W2N 2

= I(W b1 . . . bn ) = I(WN(i) ), where the last equality follows from (17.69). Thus, {In , n ≥ 0} is a martingale with respect to the sequence {bn , n ≥ 0}. By the results in Section 17.5.4, the martingale converges to a limit I∞ such that E[I∞ ] = E[I0 ]. Since I0 = I(W) (not random), E[I∞ ] = I0 . Now consider the random process {Zn , n ≥ 0}. Given a sequence of bits b1 . . . bn defining a node on the tree, E[Zn+1 |b1 , . . . , bn ] = Z(W b1 . . . bn 0 )P(bn+1 = 0) + Z(W b1 . . . bn 1 )P(bn+1 = 1) =

1 1 Z(W b1 . . . bn 0 ) + Z(W b1 . . . bn 0 ) ≤ Z(W b1 . . . bn ) = Zn , 2 2

831

832

Polar Codes where the inequality follows from (17.70). Thus, {Zn } is a supermartingale with respect to b1 , . . . , bn . Zn converges to a limit Z∞ so that E[|Zn − Z∞ |] → 0 as n → ∞. Since E[|Zn − Z∞ |] = E[|Zn − Zn+1 + Zn+1 − Z∞ |] ≤ E[|Zn − Zn+1 |] + E[|Zn+1 − Z∞ |] → 0, it must be that E[|Zn − Zn+1 |] → 0. By Theorem 17.12 and the construction of the tree, either Zn+1 ≤ 2Zn − Zn2 (from (17.66)) or Zn+1 = Zn2 (from (17.67)), each possibility occurring with probability 1∕2. Hence E[|Zn+1 − Zn |] = E[|Zn+1 − Zn | | Zn+1 = Zn2 ]P(Zn+1 = Zn2 ) + E[|Zn+1 − Zn | | Zn+1 ≠ Zn2 ]P(Zn+1 ≠ Zn2 ) 1 E[|Zn2 − Zn |] ≥ 0. 2



Since the LHS converges to 0, E[Zn (Zn − 1)] → 0, so that E[Z∞ (1 − Z∞ )] → 0. This implies that Z∞ = 0 or Z∞ = 1. By the convergence properties of martingales, this is with probability 1. Using the fact that Z∞ is equal to 0 or 1 with probability 1 and using the inequalities in Theorem 17.9 implies that I∞ = 1 − Z∞ w.p. 1. Since E[I∞ ] = I0 and E[I∞ ] = 1(P(I∞ ) = 1) + 0P(I∞ ) = I0 , it follows that P(I∞ ) = I0 = I(W). 17.5.6



Another Polarization Theorem

In [437], another approach to proving polarization is presented, based on insights from BEC channels. Let WN(i) , 0 ≤ i < N denote a BEC(p(i) ) channel. Let I(WN(i) ) denote the information associated with N (i) (i) . Following the pattern in Section 17.3.4, the two the channel WN , where for the BEC I(WN ) = 1 − p(i) N (2i) (2i+1) children channels W2N and W2N are BEC(p(2i) ) and BEC(p(2i+1) ) channels. For notational simplicity, N N (i) (2i) (2i+1) we will refer to WN , W2N , and W2N as W, W ′ , and W ′′ , respectively, with BEC channel parameters p, p′ , and p′′ , respectively. Then p′ = g(p) = 2p − p2

p′′ = h(p) = p2 .

We define the ugliness of a BEC(p) channel W by √ ugly(W) = 4p(1 − p). This is a measure of how close the channel parameter p is to 0 or 1, or equivalently, how close the mutual information I(W) is to 1 or 0 (respectively). A channel which is mediocre is one which has larger ugliness. √ For a BEC channel W with parameter p and with ugliness ugly(W) = 4p(1 − p), the ugliness of its two children W ′ and W ′′ is √ √ ugly(W ′′ ) = 4p2 (1 − p2 ) = ugly(WN(i) ) p(1 + p) and

√ 4(2p − p2 )(1 − 2p + p2 ) √ √ = 4p(2 − p)(1 − p)2 = ugly(W) (2 − p)(1 − p).

ugly(W ′ ) =

The average ugliness of the two children is thus √ 1 1 √ 1 ugly(W ′ ) + uglyW ′′ = ugly(W) [ p(1 + p) + (2 − p)(1 − p)]. 2 2 2

17.5 Some Theorems of Polar Coding Theory

833

By symmetry arguments (replacing p → 1 − p) it is argued that the function involving the square roots is maximized when p = 12 , leading to 1 1 ugly(W ′ ) + uglyW ′′ ≤ ugly(W) 2 2



3 . 4

So, whatever the ugliness of W, on average its children are less ugly. For a tree with root at √ W with ugliness ugly(W), the average ugliness one level away is, as shown, less than or equal to 3∕4ugly(W). Inductively, the average ugliness n levels from the root is ≤ (3∕4)n∕2 ugly(W). A channel WN is said to be 𝜖-mediocre if I(WN ) ∈ (𝜖, 1 − 𝜖), that is, its information neither close to 0 nor close to 1. The fraction of 𝜖-mediocre channels at level n of polarization is defined as 𝜇n (𝜖) =

1 2n



[I(W2(i)n ) ∈ (𝜖, 1 − 𝛿)],

i=0,1, . . . ,2n −1

where [x] denote the indicator function { 1 [x] = 0 For the BEC channel, 𝜇n (𝜖) =

1 2n



if x is true otherwise.

[p(i) ∈ (𝜖, 1 − 𝛿)]. N

i=0,1, . . . ,2n −1

Another measure of interest is the fraction of good channels at level 𝜖, the fraction of channels whose information exceeds 1 − 𝜖: ∑ ∑ 1 1 𝛾n (𝜖) = n [I(WN(i) ) ≥ 𝜖] = n [p(i) ≤ 1 − 𝜖]. 2 i=0,1, . . . ,N−1 2 i=0,1, . . . ,N−1 N The following theorem shows that the fraction of mediocre channels approaches 0, and the fraction of good channels approaches the BEC channel capacity 1 − p: Theorem 17.15 For any 𝜖 > 0 and any channel W = BEC(p), the fraction of 𝜖-mediocre channels vanishes: lim 𝜇n (𝜖) = 0. (17.72) n→∞

Furthermore, lim 𝛾 (𝜖) n→∞ n

= 1 − p.

Proof From the discussion above, 1 2n If p(i) ∈ (𝜖, 1 − 𝜖), then N

so that

∑ i=0,1, . . . ,N−1

ugly(W2(i)n ) ≤

( )n∕2 3 ugly(W). 4

√ ugly(WN(i) )∕ 4𝜖(1 − 𝜖) ≥ 1, √ ∈ (𝜖, 1 − 𝜖)] ≤ ugly(WN(i) ∕ 4𝜖(1 − 𝜖)). [p(i) N

(17.73)

834

Polar Codes Thus, 𝜇n (𝜖) =

1 2n



[p(i) ∈ (𝜖, 1 − 𝜖)] ≤ 2n

i=0,1, . . . ,N−1

( )n∕2 √ 3 ∕ 4𝜖(1 − 𝜖), 4

so limn→∞ 𝜇n (𝜖) = 0. Since for large n almost all channels at level n are either good or bad, and since the average erasure probability is preserved at each generation, it must be that the fraction of good channels approaches 1 − p. ◽ This theorem would seem to establish the polarization as desired. But consider the probability of block errors under polar coding. Let R < I(W) and let K = NR be the number of channels used < 𝜖. The (unfrozen) and let K be an unfrozen set with K elements. Then for each i ∈ K , p(i) N probability of block error is upper bounded by ∑ (i) pN ≤ N𝜖. i∈K

That is, as N gets large, the bound on the probability of block error seems to go up, even though any individual channel has low probability. A stronger polarization theorem is needed, one in which 𝜖 ≪ 1∕N. This is provided by the following. 1 −𝛿

Theorem 17.16 Let W = BEC(p). For any 𝛿 > 0 let 𝜖n = 2−n 2 . Then the fraction of 𝜖n -good channels approaches 1 − p: lim 𝛾n (𝜖n ) = 1 − p. n→∞

b ...b

Proof A channel WN(i) = WN1 n at level n (where N = 2n ) is said to be 𝜖-tainted if it has an 𝜖-mediocre √ b ...b ancestor WN1 i at any generation i ∈ [ n, n]. The fraction of tainted channels at generation n is upper bounded by n n ( )j∕2 ( )√n∕2 ∑ ∑ 1 3 3 𝜇j (𝜖) ≤ √ ≤ K(𝜖) 4 √ 4𝜖(1 − 𝜖) √ 4 j= n

j= n

for some constant K that may depend on 𝜖. (2i) For small 𝜖, the transformation p(i) → p2N and p(i) → p(2i+1) do not jump between [0, 𝜖] and N N 2N √ b1 . . . bn is 𝜖-untainted, then all its ancestors {W (i) , i ∈ [ n, n]} are [1 − 𝜖, 1]. So if 𝜖 is small and if WN √ the same type, that is, all of the ancestors in generations j ∈ [ n, n] are all either good or bad. √ b ...b Another measure is luck. A channel WN(i) = WN1 n is said to be 𝜖- unlucky if {bi , i ∈ [ n, n] has an (unlucky) share of 1s: n ( ) ∑ 1 [bi = 1] < − 𝜖 n. 2 √ i= n

As n gets large, the fraction of 𝜖-unlucky channels vanishes. Let p(i) = pb1 . . . bn denote an 𝜖-good, 𝜖-untainted, 𝜖-lucky channel. The fraction of such√channels approaches 1 − p as n gets large. For such a channel, trace its ancestors in generations i ∈ [ n, n]. At each generation i next generational step, either the next generation satisfies pb1 . . . bi bi+1 = (pb1 . . . bi )2 or

pb1 . . . bi bi+1 = pb1 . . . bi (2 − pb1 . . . bi ) ≤ 2pb1 . . . bi = 𝜖 −𝜖 pb1 . . . bi , ′

17.5 Some Theorems of Polar Coding Theory

835

where 𝜖 ′ = 1∕log2 (1∕𝜖). Since the channel is untainted, the inequalities continue as pb1 . . . bi b ≤ (pb1 . . . bi )1−𝜖 . ′

Taking logs of both of these, log2

1 pb1 . . . bi bi+1

≥ log2

1 pb1 . . . bi

{ 2

×

bi+1 = 1

1 − 𝜖′

bi+1 = 0.

Taking another log, log2 log2

1 pb1 , . . . ,bi bi+1

≥ log2 log2

Chaining these inequalities from i = log2 log2

1 pb1 . . . bn

≥ log2 log2

1 bb1 . . . bi

+

{ 1

bi+1 = 1

−𝜖 ′′

bi+1 = 0

where 𝜖 ′′ = −log2 (1 − 𝜖 ′ ).

√ n to n − 1 gives ( ( ) ) 1 1 1 + − 𝜖 n − n𝜖 ′′ ≥ − 𝛿 n for small enough 𝜖. 𝜖 2 2

Exponentiating, −log2 p b1 . . . bn ≥ 2(1∕2−𝛿)t = N 1∕2−𝛿 . This proves the stronger version of the polarization result. 17.5.7



Rate of Polarization

It is also of interest how rapidly polarization can happen, that is, how quickly polarization takes hold as N increases. Let Pe (N, R) denote the block error probability for a rate R code with R < I(W). One result is given here, without proof. (See the reference for proof and discussion.) Theorem 17.17 [18] Let W be a B-DMC. For any fixed rate R < I(W) and constant 𝛽 < 12 , for N = 2n , n ≥ 0, the block error probability for polar cancellation decoding at block length N satisfies 𝛽

Pe (N, R) = O(2−N ) 𝛽

that is, the probability of block error goes down as 2−N . 17.5.8

Probability of Error Performance

In this section, we derive a bound on the probability of block error. The bound is expressed in terms of the Z function, specifically the function ∑ √ (i) △ ∑ Z(WN(i) ) = WN (yN−1 , ui−1 |0)WN(i) (yN−1 , ui−1 |1). (17.74) 0 0 0 0 ∈ i yN−1 ∈ N ui−1 0 0

For an (N, K, , uc ) code, let Pe (N, K, , uc ) denote the probability of block error, where each data vector u ∈  K is sent with probability 2−K , that is, Pe (N, K, , uc ) =

∑ u∈ K

1 2K

∑ yN−1 ∈ N ∶̂uN−1 ≠uN−1 0 0 0

WN (yN−1 |uN−1 ). 0 0

836

Polar Codes The average of this probability of error over all frozen bit sequences uc is denoted Pe (N, K, ): Pe (N, K, ) =



1 2

N−K

Pe (N, K, , uc ).

uc ∈ N−k

The performance is bounded as follows: Theorem 17.18 For any B-DMC W and any choice of the parameters (N, K, ), ∑ Pe (N, K, ) ≤ Z(WN(i) ).

(17.75)

i∈

N

N

be an input sequence and yN−1 be its corresponding received value. Let û 0 1 (u0 1 , yN−1 ) Proof Let uN−1 0 0 0 denote the SC-decoded block. Since the SC decoder always correctly decodes the frozen bits, we are interested in the error event ∈  N , yN−1 ∈  N ∶ û c (uN−1 , yN−1 ) ≠ u }.  = {(uN−1 0 0 0 0 The probability of block error can be expressed in terms of this error event: Pe (N, K, ) = P(). The error event  can be decomposed into events related to individual decoding errors. Let i be the event of a first decoding error at the i step of SC decoding: N−1 N−1 ∈  N , yN−1 ∈  N ∶ ui−1 ̂ i−1 ̂ i (uN−1 , yN−1 )}.  = {uN−1 0 =u 0 (u0 , y0 , ui ≠ u 0 0 0 0

Then  = ∪i ∈ i . The set i can be written as N−1 N−1 N−1 i−1 i = {uN−1 ∈  N , yN−1 ∈  N ∶ ui−1 ̂ i−1 0 =u 0 (u0 , y0 ), ui ≠ hi (y0 , u0 )} 0 0

⊂ {uN−1 ∈  N , yN−1 ∈  N ∶ ui ≠ hi (yN−1 )} 0 0 0 ∈  N , yN−1 ∈  N ∶ WN(i−1) (yN−1 , ui−1 ⊂ {uN−1 0 |ui ) 0 0 0 △

Then  =



, ui−1 ≤ WN(i−1) (yN−1 0 |u0 ⊕ 1) = i . 0

i∈ i ,

so by the union bound P() ≤



P(i ).

i∈

Using the notation

{ 1 [C] = 0

C is true C is not true

the constituent probabilities in this sum can be bounded as follows: ∑ 1 P(i ) = W (yN−1 |x0N−1 )[(uN−1 , yN−1 ) ∈ i ] 0 0 N N 0 2 N−1 N−1 N N u ∈ ,y ∈ 0



0

∑ uN−1 ∈ N ,yN−1 ∈ N 0 0

√ √ (i) √ W (yN−1 , ui−1 |u ⊕ 1) i 1 0 N−1 N−1 √ √ N 0 W (y |x ) 0 N N 0 (i) N−1 i−1 2 WN (y0 , u0 |ui )

(17.76)

17.6 Designing Polar Codes

837

√ √ (i) √ W (yN−1 , ui−1 |u ⊕ 1) ∑ 1 ∑∑ 1 ∑ i 0 N−1 N−1 √ √ N 0 = W (y |u ) N 0 0 N−1 (i) N−1 i−1 2 W (y , u |u ) N−1 i−1 2 N−1 u y0

i

u0

ui+1

N

0

= Z(WN(i) )),

0

(17.77)

i

(17.78)

where the inequality (17.76) follows since for every sequence in i the ratio in the square root ≥ 1 by definition. The result (17.75) follows. ◽

17.6 Designing Polar Codes The design of a polar code consists of selecting the set c , that is, the bits that are frozen in the encoding/decoding process. We will refer to the design as selecting the frozen set  = c . In principle, all that needs to be done to design good polar codes is to select the K best synthetic channels out of the N channels, to achieve a rate R = K∕N codes, where “best” here means the lowest values of Z(WN(i) ), since that will result in the lowest bound (17.75) on the probability of block error. For BEC channels, this is easily accomplished since Z(WN(i) ) can be recursively computed using (17.67) by (17.66) with equality (i) (i) 2 (i) 2 ) − Z(WN∕2 ) Z(WN(2i+1) ) = Z(WN∕2 ) . Z(WN(2i) ) = 2Z(WN∕2 For a channel of specified length N, these N values can be easily computed, and the largest (worst) values are placed in the frozen bit set  . But for non-BEC channels, there is rarely a closed-form formula to compute ZN(i) (W). In this section, we select methods from [462], in which four polar code design methodologies were compared. They showed that, under proper selection, these four methodologies produce codes with essentially the same performance. As a result, rather than presenting more complicated methods, we select a pair of methods. An important consideration in polar code design is to recall that polar codes are not universal: a polar code is ideally designed for a particular channel, for example, at a particular SNR. If the code designed for one SNR is used in a channel with a different SNR, then the performance may degrade. The SNR at which a polar code is designed is called its design SNR. Practically, however, once a code is designed for some particular SNR, it may be used over a range of SNRs (see, e.g., [431]). It is not necessary, therefore, to keep a separate code design for every single operating point in the channel. A practical approach described by [462] is as follows: 1. Consider a set of SNRs {S1 , S2 , . . . , Sm } that cover the range of SNRs of interest in the channel. 2. For each SNR Si , design a polar code (i.e., select  for the code). 3. Examine the bit error rate or block error rate as a function of SNR for each of these designed polar codes. 4. From this examination, select the single code (if possible) that best suits the application.

17.6.1

Code Design by Battacharyya Bound

This approach to code design operates by assuming that whatever the channel, the bound on the Battacharyya parameters in (17.67) and (17.66) is sufficiently tight to be used for code design. The algorithm is initialized with the Battacharyya parameter appropriate for the channel. For a √ BEC(p) channel, simply set z = p. For a BSC(p), set z = 4p(1 − p). For an AWGN channel with SNR REb ∕N0 dB, set S = 10REb ∕N0 ∕10 and initialize with z = e−S . This results in the following algorithm.

838

Polar Codes

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

Algorithm 17.3 Design of Polar Codes Using Bhattacharrya Parameter

00000

α10 α11 α12 α13 α14 α15 α16

testpolardesign.cc polarcode.cc: designBhatt2( )

1 2 3

Function DesignBhatt2(N, K, designSNRdB) % Design a polar code using the Bhattacharrya parameter at a particular design SNR Input: N: Code length; K: Code dimension; designSNRdB Output:  ⊂ {0, 1, … , N − 1}, with | | = N − K Initialize Allocate z, a vector of length N designSNR = 10designSNRdB∕10 z[0] = −designSNR n = log2 N % number of levels

4 5 6 7 8 9 10 11 12 13 14

for j = 1, 2, … , n do For each level u = 2j for t = 0 to u∕2 − 1 do T = z[t] z[t] = 2T − T 2 % lower channel (bound) z[u∕2 + t] = T 2 % upper channel (exact) end end  = indices of the N − K greatest elements of z return 

The indices of the N − K greatest elements can be obtained, of course, by sorting the elements and retaining the sort order, then selecting the best N − K indices of the sorted data.

17.6.2

Monte Carlo Estimation of Z(WN(i) )

The expression in (17.77) can be written as Z(WN(i) ) =

∑ yN−1 0

1 2

N−1

√ √ (i) √ W (yN−1 , ui−1 |u ⊕ 1) ∑ i 0 N−1 √ √ N 0 WN (yN−1 |u ) . 0 0 (i) N−1 i−1 WN (y0 , u0 |ui ) uN−1 i+1

|uN−1 ) of the quantity Recognize this as an expectation with respect to the distribution WN (yN−1 0 0 √ √ (i) √ W (yN−1 , ui−1 |u ⊕ 1) √ N 0 i 0 √ . (17.79) (i) N−1 i−1 WN (y0 , u0 |ui )

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

testpolardesign.cc polarcode.cc designMC()

and This expectation can be approximated via Monte Carlo techniques, generating sequences uN−1 0 N−1 y0 according to the governing distribution, and computing (17.79). Fortunately, the argument of the square root in (17.79) is computed by the successive cancellation decoder, which operates with O(N log N) complexity. Practically, since Z(WN(i) ) is related to the probability of error (see (17.103)), a Monte Carlo simulation can be based on the estimated probability of bit error. This leads to the very heuristically plausible algorithm: simulate the probability of error on each of the bits separately, then select the K bits with the smallest probability of error. Since the polar code is linear, it suffices to simulate the all-zero code transmission.

17.8 Systematic Encoding of Polar Codes

Algorithm 17.4 Design of Polar Codes Using Monte Carlo Simulation

1 2 3 4 5 6 7 8 9

Function DesignMC(N, K, M, designSNRdB) % Design a polar code at a particular design SNR using Monte Carlo Simulation of Bits Input: N: Code length; K: Code dimension; M: number of bits to simulation; designSNR: design SNR Output:  ⊂ {0, 1, … , N − 1}, with | | = N − K Initialize c, a vector of length N and set all elements to 0 for i = 0, 1, … , N − 1 do Unfreeze bit i for m = 0, 1, … , M − 1 do Generate a modulated vector from an all-zero codeword Decode the modulated codeword, and count the number of bits in location i end end  = indices of the N − K largest error counts

17.7 Perspective: The Channel Coding Theorem In the proof of the channel coding theorem, Shannon used the ideas of: 1. Random code selection; 2. Joint asymptotic equipartition property (AEP) between the transmitted codeword and the received sequence; 3. And ML decoding. Random coding was important to Shannon’s proof, because it provided a tool for proof which allowed him to step over the details of identifying a particular good code while proving that a good code exists. Recent coding successes, such as LDPC codes and turbo codes have made use of the random coding idea, such as via interleavers or pseudorandom connections between variable nodes and check nodes. While this has been effective, it is not necessarily required for good coding performance. Joint AEP is more central to the coding theorem, because it guarantees that the received sequence is jointly typical with the transmitted codeword, with high probability. Under joint AEP, the set of all pairs of transmitted codewords/received sequences can be divided into two sets: the jointly typical set, where the mutual information between the input and output sequences is close to capacity, and the non-jointly typical set, where the mutual information between the input and output sequences is lower. Joint AEP is related to channel polarization: selecting the good channels which result from polarization, the codewords from the jointly typical set can be reliably transmitted over the nearly noiseless channels created by channel polarization. Polar coding is thus a constructive instance of joint AEP.

17.8 Systematic Encoding of Polar Codes The encoding described in Section 17.4 is not systematic. However, systematic encoding for polar codes has been found, and (surprisingly) it has been shown to have superior bit error performance to nonsystematic encoding, although the word error rate is the same.

839

840

Polar Codes 17.8.1

Polar Systematic Encoding via the Encoder Graph

As a preface to the discussion of systematic polar encoding, it may be helpful to review a couple of points about linear coding. Recall that for a linear code, a codeword is a point in the row space of the generator matrix G, so that in 𝐱 = 𝐳G, 𝐱 is a codeword, regardless of the particular values in 𝐳. One way to do encoding might be to take a message vector 𝐮, place it into K elements of the codeword 𝐱, then find a vector 𝐳 which fill in the remaining N − K elements of 𝐱 in such a way that 𝐱 is in the row space of G. If 𝐳 can be found via linear operations from 𝐮 in such a way that the message symbols appear explicitly in 𝐱, then systematic encoding has been achieved. This is an alternative to the method of doing systematic encoding discussed in Section 3.2, in which the form of the generator G is modified to achieve systematic encoding. This viewpoint is essentially the way systematic encoding may be achieved for polar codes. This will be described first using an example for a code of length N = 8 and dimension K = 4. For the moment, ignore the bit reverse permutation on the bits. Let the frozen bits be represented by the vector [ ] 𝐟 = 0 0 0 −1 0 −1 −1 −1 , where the −1 locations indicate unfrozen bits, and the frozen bits take on the other values indicated (0s in this case). That is, the set of unfrozen bits is  = {3, 5, 6, 7}. Let the message bits be 𝐮 = [u0 , u1 , u2 , u3 ]. The systematic codeword for this message places these message bits in the unfrozen locations and fills in the frozen bits with xi values, to be determined, such that the 𝐱 so formed is in the row space of the generator G. Let the vector operating on G be denoted by 𝐳. 𝐳 is formed by placing 0 in the frozen locations, with other elements in the unfrozen locations determined to satisfy the relationship 𝐱 = 𝐳G. Specifically, consider as an example the message vector 𝐮 = [1, 0, 0, 1]. Then the encoding relationship is ] [ ] [ x0 x1 x2 1 x4 0 0 1 = 0 0 0 z3 0 z5 z6 z7 , G (17.80) where (still neglecting the bit reverse permutation)

G = F ⊗3

⎡1 ⎢ ⎢1 ⎢ ⎢1 ⎢1 =⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢1 ⎢1 ⎣

0 0 0 0 0 0 0⎤ ⎥ 1 0 0 0 0 0 0⎥ ⎥ 0 1 0 0 0 0 0⎥ 1 1 1 0 0 0 0⎥ ⎥. 0 0 0 1 0 0 0⎥ ⎥ 1 0 0 1 1 0 0⎥ ⎥ 0 1 0 1 0 1 0⎥ 1 1 1 1 1 1 1⎥⎦

The number of unknowns in 𝐱 is |c | = N − K and the number of unknowns in 𝐳 is always || = K, so the total number of unknowns in 𝐱 and 𝐲 is N. Rearranging the equations in (17.80) results in N equations in N unknowns. There is a method for solving for the unknowns in (17.80) with O(N log N) complexity and memory O(N log N) [460]. As we have seen, because of the Kronecker structure of G, (17.80) can be represented using the now-familiar encoder graph shown in Figure 17.27. In the figure, the inputs zi are on the left and the outputs xi are on the right. The xi s from the unfrozen codeword bits are indicated, and the zi s that are zeros from the frozen portion are shown. Observe that for each horizontal line, either zi or xi is known. In Figure 17.27, each horizontal line is labeled with 𝓁i . Along each line there are n = 3 potential positions for adders. For example, line 𝓁0 has three adders along it. Line 𝓁7 has no adders along it. Consider the binary representation of i in comparison with 𝓁i . For 𝓁7 , i = 7 = 1112 (with the MSB

17.8 Systematic Encoding of Polar Codes

0 1 2 3

Direction of computation

4 5 6 7

841

Z0 = 0

x0

Z1 = 0

x1

Z2 = 0

a

x2

Z3

x3 = 1

Z4 = 0

x4

Z5

x5 = 0

Z6

x6 = 0

Z7

x7 = 1

Figure 17.27: Calculations for systematic polar encoding. on the left). This line has no adders. For 𝓁6 , i = 6 = 1102 . This line has adders in position 3. For 𝓁0 , i = 0002 , and this has adders in each region. A binary 1 in a position corresponds to “no adder” in the corresponding position along the line and a binary 0 in a position corresponds to placing an adder in its corresponding position along the line. The recursive structure of the graph and of binary numbers ensures that this observation is true for any N = 2n . Along each line 𝓁i there are n + 1 = 4 computational nodes, each denoted by a •, representing the input zi , the output xi , and n − 1 interior computational nodes between the adder regions. The algorithm makes use of N(n + 1) = N(log2 N + 1) storage representing the • locations. For each horizontal line 𝓁i , the computation of the n + 1 nodes on it involves only nodes from the same horizontal line and nodes below it. Significantly, along each line 𝓁i , either the left node (zi ) or the right node (xi ) is known. By performing bottom-up computation (the “direction of computation” shown), the unknown xi and zi values can be determined. Continuing our example, examining Figure 17.27, z7 = x7 = 1 z6 = x6 ⊕ x7 z5 = x5 ⊕ x7 x4 = z4 ⊕ z6 ⊕ x5 z3 = x3 ⊕ x7 x2 = z2 ⊕ z6 ⊕ x3 x1 = z1 ⊕ z5 ⊕ x3 x0 = z0 ⊕ z4 ⊕ a ⊕ x1 . All N unknowns are easily solved for. In general terms, the computation proceeds as follows. For bit number i = N − 1 down to 0, if i is unfrozen, then computation proceeds from right to left. If i is frozen, then computation proceeds from left to right. Computations at each node involves either copying from the appropriate adjacent node, or adding the adjacent node to the node from a line below it, to compute the value at each • along each line 𝓁i . More specifically, the encoding proceeds as shown in Algorithm 17.5.

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc encodesyst ()

842

Polar Codes

Algorithm 17.5 Polar Code Systematic Encoder

1 2 3

Function encodesyst(u) % Systematically encode the K-bit vector in u Input: u: the bit string to be encoded Output: Codeword and Vector 𝐱 with solution of (17.80) Initialize Form matrix B ∈ 𝔽2N×(n+1) of zeros For i ∈ c , set B[i][0] = 0 For i ∈ , set B[i][n] = uj % fill in message bits sequentially for i = N − 1, N − 2, … , 0 do % For each line from the bottom to the top if i ∈  then % If not frozen, start on right side and move left

4 5 6 7 8

9 10 11 12

s=n−1 𝛿 = −1. else % If frozen, start on left side and move right s=1𝛿=1 end Denote the binary representation of i = b0 b1 … bn−1 , with b0 the MSB t=s for j = 0, 1, … , n − 1 do % For each level needing computation

13

𝓁 = min(t, t − 𝛿) if b𝓁 == 0 then % If there is an adder at this branch

14 15 16 17 18 19 20 21

22 23

𝜅 = 2n−1−𝓁 B[i][t] = B[i][t − 𝛿] ⊕ B[i + 𝜅][t − 𝛿] else % No adder on this branch – just copy over B[i][t] = B[i][t − 𝛿] end t =t+𝛿 % that is, t = s + j ∗ 𝛿 end end Copy out 𝐱 = B[∶][n] in bit reverse order

The encoding complexity is O(N log2 N), counting one operation for every exclusive-OR. The memory size is O(N log2 N) (a memory location for each • in the graph). While the complexity is only O(N log2 N), the encoding algorithm is strictly sequential, proceeding in the direction of computation. This sequential order limits throughput compared to parallel implementations. 17.8.2

Polar Systematic Encoding via Arıkan’s Method

Systematic encoding of polar codes can be expressed more formally, leading to another algorithm for encoding. For a code of length N, let G be a N × N generator matrix over a field 𝔽 , let 𝐮 ∈ 𝔽 N , 𝐱 ∈ 𝔽 N , and denote the encoding operation as 𝐱 = 𝐳G for some vector 𝐳 to be determined. Let  ⊂ {0, 1, . . . , N − 1} be a subset of indices with || = ||. Ultimately we will take  = , but we start with a more general formulation. The vector 𝐱 consists of two parts, the systematic information, 𝐱 = 𝐮, and the other information in 𝐱c . The vector 𝐳 is decomposed into a portion corresponding to the unfrozen bits and the frozen bits by 𝐳 and 𝐳c . Then the encoding 𝐱 = 𝐳 G + 𝐳c Gc ,

(17.81)

17.8 Systematic Encoding of Polar Codes

843

where G and Gc a submatrices having rows indexed by  and c , respectively, further divides as 𝐱 = 𝐮 = 𝐳 G + 𝐳c Gc 

(17.82)

𝐱c = 𝐳 Gc + 𝐳c Gc c .

(17.83)

Since 𝐱 is known in (17.82), solve as 𝐳 = (𝐮 − 𝐳c Gc  )G−1  .

(17.84)

This can now be substituted in (17.83) to determine 𝐱c . (The method of Section 17.8.1 takes 𝐳c = 0 and simultaneously and efficiently solves for 𝐳 and 𝐱c .) For polar codes G = F ⊗n and  is taken to be equal to . Then the recursive structure of G it possible to efficiently compute G−1 in (17.84). With care, this can be done with slightly less memory  than the method of Section 17.8.1, but at higher computational cost. 17.8.3

Systematic Encoding: The Bit Reverse Permutation

The discussion of systematic encoding of polar codes has neglected the bit reverse permutation. Including now the bit reverse permutation, the encoding is written as 𝐱 = 𝐳BN F ⊗n , where BN is the bit reverse permutation matrix. The matrices commute so that BN F ⊗n = F ⊗n BN . The methods described above provide a solution to the equation 𝐱 = 𝐳F ⊗n . Multiplying both sides on the right by BN thus gives the encoding including the bit reverse permutation. 17.8.4

Decoding Systematically Encoded Codes

The usual encoding operation is 𝐱 = 𝐮G, and the usual decoder takes the codeword information (via the channel) and returns the message vector 𝐮. In the systematic case, the encoding operation is 𝐱 = 𝐳G, where 𝐱 has systematic information about 𝐮 within it. Calling the usual decoder would return the vector 𝐳. To extract the codeword and its contained message 𝐮, it is necessary to re-encode the vector 𝐳 using G. From this, 𝐮 can be extracted.

17.8.5

Flexible Systematic Encoding

While the systematic encoder of Section 17.8.1 has low complexity, it is distinctly sequential and hence not amenable to parallel processing, since the processing proceeds from the bottom to the top of the encoding graph. In this section, we present a systematic encoder which is amenable to parallel implementation. Let the information set be  = {a0 , a1 , . . . , ak−1 }, ordered as a0 < a1 < · · · < ak−1 . The systematic encoder not only contains the input bits 𝐮 explicitly, but in the order of . An encoder diagram for an (8, 5) code is shown in Figure 17.28 with  = {3, 4, 5, 6, 7}, where the bit-reverse permutation is not used. Based on this figure, the encoder process is as follows: 1. The input vector 𝐮 = (u0 , u1 , . . . , uk−1 ) is formed into the 𝐯I = (𝑣I,0 , 𝑣I,1 , . . . , 𝑣I,n−1 ) of length n by setting 𝑣i,ai = ui for i = 0, 1, . . . , k − 1. The remaining n − k elements of 𝐯I are set to 0. 2. Compute 𝐯II = 𝐯I F ⊗n . 3. Form the vector 𝐯III be setting elements not in  to 0. 4. Compute 𝐱 = 𝐯III F ⊗n .

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc decodesyst()

844

Polar Codes vII vIII

vI 0

0

x0

0

0

x1

0

0

x2

u0

x3

u1

x4 x5

u2 u3

x6

u4

x7

Figure 17.28: A systematic encoder for an (8, 5) polar code [390]. Since multiplication by F ⊗n can be done in parallel, this encoder is parallelizable. The computational complexity is O(2 N log2 N). The remainder of this section explains why this operation works. Let (𝐮) denote the encoding operation just described. Let G be an k × n, rank k, encoding matrix. The encoding operation is linear, and so it can be written as (𝐮) = 𝐮ΠG for an invertible k × k matrix Π. The encoding operation  is systematic if there are a set of indices  = {a0 , a1 , . . . , ak−1 }, which we can identify as the information bits, such that restricting (𝐮) to the indices in  is equal to 𝐮. We assume, as before, that the indices in  are ordered, a0 < a1 < · · · < ak−1 so that in (𝐮) the ui appears before uj if i < j. One way to do systematic encoding is to take k linearly independent columns of G and to take Π as the inverse of the k × k submatrix of G. Let E be the k × n matrix with elements { 1 if j = ai (17.85) Ei,j = 0 otherwise. Example 17.19

For the (8, 5) polar code whose encoder is shown in Figure 17.28, ⎡0 ⎢0 ⎢ E = ⎢0 ⎢0 ⎢ ⎣0

0 0 0 0 0

0 0 0 0 0

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0⎤ 0⎥ ⎥ 0⎥ . 0⎥ ⎥ 1⎦



The polar encoding operation, not necessarily systematic (excluding the bit-reverse permutation) can be expressed in terms of a generator matrix G, where G = EF ⊗n . That is, E selects the rows of G associated with the information (unfrozen) bits. Applying E to the right of a length-k vector 𝐮 produces a vector 𝐯I of length n, 𝐯I = 𝐮E,

17.8 Systematic Encoding of Polar Codes

845

where 𝑣I,aj = uj . This is Step 1 of the encoding algorithm above. The next step of decoding is 𝐯II = 𝐯I F ⊗n = 𝐮EF ⊗n . Step 3 of the algorithm produces 𝐯III = 𝐯II ET E, producing a vector in which elements indexed by  are the same and the other elements are set to 0. Step 4 of the algorithm produces 𝐱 = 𝐯III F ⊗n = 𝐮EF ⊗n ET EF ⊗n . ⏟⏞⏟⏞⏟⏟⏟⏟ Π

G

The encoding is systematic if the elements of 𝐱 restricted to the indices of  are equal to the elements of 𝐮: 𝐱ET = 𝐮. Thus, we need to show that

EF ⊗n ET EF ⊗n ET = I.

A matrix is said to be an involution if multiplication of the matrix by itself yields the identity matrix. Proof that the encoding is systematic amounts to showing that EF ⊗n ET is an involution. The following example shows that not every polar-type code produces an involution. Example 17.20

For a code with length N = 4, let 1 = {0, 1, 3}. The associated E matrix is ⎡1 ⎢ ⎢0 T E1 = ⎢ ⎢0 ⎢0 ⎣

⎡1 0 0 0⎤ ⎢ ⎥ E1 = ⎢0 1 0 0⎥ ⎢ ⎥ ⎣0 0 0 1⎦ and

E1 F

⊗n

E1T

0 0⎤ ⎥ 1 0⎥ ⎥ 0 0⎥ 0 1⎥⎦

(E1 F

0 0 0⎤ 1 0 0⎥ ⎥ 0 1 0⎥ ⎥ 1 1 1⎥ ⎥ ⎦

⎡1 0 0⎤ ⎢ ⎥ = ⎢1 1 0⎥ ⎢ ⎥ ⎣1 1 1⎦

so ⊗n

F ⊗n

⎡1 ⎢1 ⎢ = ⎢1 ⎢ ⎢1 ⎢ ⎣

E1T )(E1 F ⊗n E1T )

⎡1 0 0⎤ ⎢ ⎥ = ⎢0 1 0⎥ . ⎢ ⎥ ⎣1 0 1⎦

For information set 1 , the encoding process does not produce a systematically encoded code! By contrast, let 2 = {1, 2, 3} and ⎡0 1 0 0⎤ ⎢ ⎥ E2 = ⎢0 0 1 0⎥ ⎢ ⎥ ⎣0 0 0 1⎦ and

⎡0 ⎢ ⎢1 E2T = ⎢ ⎢0 ⎢0 ⎣

⎡1 0 0⎤ ⎢ ⎥ E2 F ⊗n E2T = ⎢1 1 0⎥ ⎢ ⎥ ⎣1 1 1⎦

0 0⎤ ⎥ 0 0⎥ ⎥ 1 0⎥ 0 1⎥⎦

846

Polar Codes and (E2 F

⊗n

E2T )(E2 F ⊗n E2T )

⎡1 0 0⎤ ⎢ ⎥ = ⎢0 1 0⎥ . ⎢ ⎥ ⎣0 0 1⎦

For information set 2 , the encoding process does produce a systematically encoded code! 2 would be an appropriate information set for systematic encoding. ◽

We now explore conditions under which an involution occurs. This will be examined first from the point of view of the set . This will then be related to polar codes via the polar code design process. 17.8.6

Involutions and Domination Contiguity

Definition 17.21 Let i and j be integers such that 0 ≤ i, j < N, and let the binary representations of i and j be i = (i0 , i1 , . . . , in−1 ) j = (j0 , j1 , . . . , jn−1 ). We save that i binary dominates , denoted by i ⪰ j if and only if i𝓁 ≥ j𝓁 for 0 ≤ 𝓁 < n. That is, i ⪰ j iff the support of the binary representation of i (the indices 𝓁 for which i𝓁 = 1) contains the support of the binary representation of j. ◽ Definition 17.22 A set of indices  ⊂ {0, 1, . . . , N − 1} is domination contiguous if for all h, j ∈  and for all 0 ≤ i < n such that h ⪰ i and i ⪰ j it holds that i ∈ . That is, If h, j ∈  and h ⪰ i ⪰ j then i ∈ .

Example 17.23



Let N = 4 and let  = {0, 1, 3}, as in Example 17.20. In binary representation,  = {(00), (01), (11)}.

Let h = 3 = (11) and j = 1 = (01). Then h ⪰ j. If i = 2 = (10), then h ⪰ i ⪰ j, but i ∉ . So this set  is not domination contiguous. Now let  = {1, 2, 3} = {(01), (10), (11)}. With h = 3 = (11) and j = 1 = (01) and i = 2 = (10), it is clear that h ⪰ i ⪰ j. This set is domination contiguous.◽

Theorem 17.24 [390, Theorem 1] Let the information set  ⊆ {0, 1, . . . , N − 1} be domination contiguous. Let E be defined as in (17.85). Then EF ⊗n ET is an involution: (EF ⊗n ET )(EF ⊗n ET ) = I. The proof relies on the following lemma. Lemma 17.25 The elements of the matrix F ⊗n satisfy { 1 i⪰j (F ⊗n )i,j = 0 otherwise. Proof It is clear from (17.34) that the upper right (N∕2) × (N∕2) block is 0. Elements in these locations have in−1 (the most-significant bit) equal to 0 and jn−1 equal to 0. For the other three blocks of (17.34), i ⪰ j iff (i mod2n−1 ) ⪰ (j mod2n−1 ). Since each of the remaining blocks all have the same structure it suffices to consider the claim for the lower left block.

17.8 Systematic Encoding of Polar Codes

847

Continuing this reduction recursively, it suffices to consider the 2 × 2 block F for which verification is straightforward. ◽ Proof of theorem 17.24. For 0 ≤ p, q, r < k, let h = ap , i = aq and j = ar . It is matter of checking indices to verify that (EF ⊗n ET )p,q = (F ⊗n )h,i (EF ⊗n ET )q,r = (F ⊗n )i,j so that ((EF ⊗n ET )(EF ⊗n ET ))p,r =

k−1 ∑

(EF ⊗n ET )p,q (EF ⊗n ET )q,r

q=0

=



(F ⊗n )h,i (F ⊗n )i,j .

i∈

To show that an identity is obtained, it must be shown that this sum (modulo 2) is equal to 1 if h = j, and 0 otherwise. That is, when h = j there must be an odd number of i ∈  for which h ⪰ i and i ⪰ j,

(17.86)

and if h ≠ j there must be an even number of i satisfying (17.86). This is examined on a case-by-case basis. 1. If h = j, then the only possible i satisfying (17.86) is i = h = j, so there are an odd number (one) of such i. 2. If h ≠ j and h ⋡ j there are zero values of i satisfying (17.86), which is an even number of values. 3. If h ≠ j and h ⪰ j, then the support of j is properly contained in the support of h. To retain binary domination, the number of bit positions that are eligible for changing between h and j are those 𝓁 for which j𝓁 = 0 and h𝓁 = 1. The total number of possible i satisfying (17.86) is 2𝑤(h)−𝑤(j) , where 𝑤(h) and 𝑤(j) are the weight of the binary representations of h and j, respectively. Since  is dominant contiguous, each of these possible i are in , which means there are an even number of i satisfying (17.86). ◽ 17.8.7

Polar Codes and Domination Contiguity

The previous section provided conditions on the information set, namely, domination contiguity, that ensured that systematic encoding occurs. In this section, conditions related to the polarized channels are established that relate to domination contiguity. Definition 17.26 A channel W ∶  →  is upgraded with respect to a channel Q ∶  →  if there exists a channel Φ ∶  →  such that concatenating Φ to W results in Q. This means that Q(z|x) =



Φ

W Q

Φ(z|y)W(y|x).

y∈

The statement “W is upgraded with respect to Q” is denoted as W ⪰ Q.



848

Polar Codes Recall from the data-processing inequality of information theory (see Section 1.12.2) that concatenating channels cannot improve the mutual information from input to output. In light of this, in this definition, in order for W followed by Φ to be equal to Q, W must be in some way superior to, that is, “upgraded relative to,” Q. “Upgraded relative to” is transitive: if Q1 is upgraded relative to Q2 and Q2 is upgraded relative to Q3 , then Q1 is upgraded relative to Q3 . To see this, let Φ1 be such that the concatenation of Q1 and Φ1 results in Q2 , and let Φ2 be such that the concatenation of Q2 and Φ2 results in Q3 . Then the concatenation of Φ1 and Φ2 (in that order) to Q1 results in Q3 . Lemma 17.27 [390, Lemma 3] Let W ∶  →  be a B-DMC channel. Let W 0 ∶  →  2 and W 1 ∶  2 ⊗  as in Section 17.5.2. Then W 1 ⪰ W 0 . Proof We have (see (17.61) and (17.62) ) W 0 (y0 , y1 |u0 ) =

1 ∑ W(y0 |u0 ⊕ u1 )W(y1 |u1 ) 2 u ∈0,1 1

and

1 W(y0 |u0 ⊕ u1 )W(y1 |u1 ). 2 Let Φ ∶  2 ×  →  2 be the channel that maps (y0 , y1 , u0 ) to (y0 , y1 ) with probability 1, so the transition probabilities are { 1 ỹ 0 = y0 , ỹ 1 = y1 Φ(̃y0 , ỹ 1 |u0 , y1 , u0 ) = 0 otherwise. W 1 (y0 , y1 , u0 |u1 ) =

Then the transition probabilities for the concatenation of W 1 and Φ are ∑ 1 Φ(̃y0 , ỹ 1 |y0 , y1 , u0 ) W(y0 |u0 ⊕ u1 )W(y1 |u1 ) 2 y ,y ,u 0 1 0

=

1 [W(y0 |u1 )W(y1 |u1 ) + W(y0 |1 ⊕ u1 )W(y1 |u1 )] = W 0 (y0 , y1 |u0 ). 2



The polar transformation preserves the relationship of being upgraded: Lemma 17.28 Let W ∶  →  and Q ∶  →  be two symmetric B-DMC channels such that W ⪰ Q and let (W 0 , W 1 ) and (Q0 , Q1 ) be polarizations of these channels. Then W 0 ⪰ Q0

and

W 1 ⪰ Q1 .

(17.87)

∑ Proof By the upgrade relationship, write Q(y|x) = z∈ Φ(y|z)W(z|x). The polarized channel Q0 can be written as 1 ∑ Q0 (y0 , y1 |u0 ) = Q(y0 |u0 ⊕ u1 )Q(y1 |u1 ) 2 u ∈ 1

⎛ ⎞⎛ ∑ ⎞ 1 ∑ ⎜∑ Φ(y0 |y′0 )W(y′0 |u0 ⊕ u1 )⎟ ⎜ Φ(y1 |y′1 )W(y′1 |u1 )⎟ ⎟⎜ ′ ⎟ 2 u ∈ ⎜ ′ 1 ⎝y0 ∈ ⎠ ⎝y1 ∈ ⎠ ∑ ∑ 1 Φ(y0 |y′0 )Φ(y1 |y′1 ) W(y′0 |u0 ⊕ u1 )W(y′1 |u1 ) = 2 ′ ′ 2 u ∈

=

(y0 ,y1 )∈

=



1

Φ(y0 |y′0 )Φ(y1 |y′1 )W 0 (y′0 , y′1 |u0 ),

(y′0 ,y′1 )∈ 2

so W 0 ⪰ Q0 . It can be similarly shown that W 1 ⪰ Q1 . The key lemma is the following.



17.8 Systematic Encoding of Polar Codes

849

Lemma 17.29 Let W ∶  →  be a symmetric B-DMC. For indices i, j with 0 ≤ i, j < N, if i ⪰ j then (j) WN(i) ⪰ WN . Proof For N = 2, the result follows from Lemma 17.27 if i > j, or from the fact that W ⪰ W if i = j. Proceeding inductively for n = log2 N = 2, 3, . . . , suppose that (⌊i∕2⌋)

WN∕2

(⌊ j∕2⌋)

⪰ WN∕2

.

(j)

Considering WN(i) and WN , if the least significant bits (LSBs) of i and j are the same then (17.87) (⌊i∕2⌋) (⌊j∕2⌋) applies, with W and Q in the lemma being WN∕2 and WN∕2 , respectively, and WN(i) being W 0 or W 1 , depending on the parity of i. If i and j have different LSBs, then the LSB of i must be 1 and the LSB of j must be 0. Then (⌊i∕2⌋)

(⌊i∕2⌋)

(⌊j∕2⌋)

(⌊j∕2⌋)

WN(i) = (WN∕2 )1 ⪰ WN∕2 and

(j)

WN = (WN∕2 )0 ⪰ WN∕2 . By the transitivity of ⪰,

(j)

WN(i) ⪰ WN .



The next piece relates the upgrade relationship and the Battacharyya parameter. Lemma 17.30 Let W and Q be channels with W ⪰ Q. Then Z(W) ≤ Z(Q). Proof Z(Q) =

∑√∑ ∑ ∑√ Q(y|0)Q(y|1) = Φ(y|y1 )W(y1 |0) Φ(y|y2 )W(y2 |0), y∈

y∈

y1 ∈

y2 ∈

where the second equality follows from the upgrade relationship between W and Q. Inside the square root, we can write [ ]1∕2 ∑ √ 2 ( Φ(y|y1 )W(y1 |0)) y1 ∈

√ and recognize this as the norm of vector with components Φ(y|y1 )W(y1 |0), and similarly for the other terms inside the square root. Applying the Cauchy–Schwartz inequality gives ∑ ∑√ Φ(y|y1 ) = Z(W). W(y2 |0)W(y2 |1) Z(Q) ≥ y y1 ∈ ◽ (j)

Combining Lemmas 17.29 and 17.30, if i ⪰ j then Z(WN(i) ) ≤ Z(WN ). Recall that in Section 17.6.1, polar codes were designed by selecting the K channels having the (j) smallest Battacharyya parameter. Under this design, if the inequality Z(WN(i) ) ≤ Z(WN ) when i ⪰ j and i ≠ j, then if j ∈ , then it must also be the case that i ∈ . This ensures that  is domination contiguous according to Definition 17.22 so that, by Theorem 17.24, the encoder is systematic. (j) If it turns out that under the numerical precision employed in the polar code design Z(WN(i) ) = Z(WN ) for some i and j, it may turn out that j ∈  but i ∉  in a way that makes  not domination contiguous. In this case, the design can be altered (with no practical effect on performance) by removing j from  and putting in i. If other design criteria are used (such as selecting the channels having the smallest bit error rate), similar tweaks will ensure systematic encoding with negligible performance penalties.

850

Polar Codes 17.8.8

Modifications for Polar Codes with Bit-Reverse Permutation

The discussion about systematic decoding was expressed for encoders that do not employ bit-reverse permutation, so the generator matrix is G = EF ⊗n , where E selects the rows corresponding to the information bits in . If the bit-reverse permutation is used, the generator matrix is G = EBN F ⊗n . Given an information set  for unpermuted encoding, define the set of bit-reversed active rows by ← − applying the bit-reverse operation on each ai ∈  to define a set . These elements can be ordered as ← −  = {𝛽j }k−1 j=0 ,

0 ≤ 𝛽0 < 𝛽1 < · · · < 𝛽k−1 < n.

← − Define E similar to E: ← − ( E )i,j =

{ 1 0

if j = 𝛽i otherwise.

← − The generator can now be written as G = E F ⊗n , and the encoding operation for the reversed code can be expressed as ← − ← −T ← − (𝐮) = 𝐮 E F ⊗n E E F ⊗n . Encoding is systematic if

− ← −T ← ← −T ← − ( E F ⊗n E )( E F ⊗n E ) = I.

The point now is that, by the definition of domination contiguous, if  is domination contiguous, then ← − so is .

17.9 List Decoding of Polar Codes Despite the remarkable properties of polar codes, such as low encode and decode complexity, explicit construction algorithms, and provable asymptotically good performance, finite-length polar codes are not as strong as some other modern codes of similar length, such as LDPC or turbo codes. Part of this is due to the fact that the successive cancellation decoding leaves some performance on the table, and part due to the fact that the capacity-achieving performance is asymptotic, so finite-length polar codes are not necessarily capacity achieving. Practice improvements can be obtained by improving the decoding using list decoding. The idea behind list decoding of polar codes is that instead of producing a single decoded codeword as SC cancellation does, a list decoder considers up to L decoding paths concurrently. Let û i0 denote a sequence of decisions about the bits ui0 . When the next (unfrozen) bit ui+1 is to be decoded, the decoding path is split into two paths, each of which has û i0 as a prefix. One new path has û i+1 = 0 and the other new path has û i+1 = 1. (No splitting of paths occurs for frozen bits.) This splitting is accomplished by “cloning” a path: duplicating the path û i0 , then adding to the two paths the values of û i+1 = 0 or 1. Obviously this doubling of paths cannot continue indefinitely, so after doubling the paths the decoder selects among the L most likely paths. At the end of decoding, the most probable path is selected from the L remaining paths. The list decoding idea is portrayed in Figure 17.29, where N = 4, L = 4, and all bits are unfrozen. Figure 17.29(a): the first unfrozen bit u0 can be either 0 or 1, producing two paths. Figure 17.29(b): the second unfrozen bit u1 can be either 0 or 1, producing four paths; no need to prune yet. Figure 17.29(c): With the third bit u2 , eight paths would be produced. Figure 17.29(d): the eight paths are pruned to L = 4 paths by selecting the L most likely paths. Pruned paths are indicated in gray. Figure 17.29(e): For the fourth bit u3 , unpruned bits are extended, producing eight paths. Figure 17.29(f): Pruned back

17.9 List Decoding of Polar Codes

851

1

0

1

0 1

0

0

1

0 0 1

(a)

0 0 1

1

0 1

1

0 1

1

1

0

0 1

0 1

0 1 0 1 0 1

(d)

0 1

1 0 1

1

0 0

0 1

0 1

0

(c)

0 0

0 1

1

(b) 1

0

1

0

1

0

0 1

0 1

0 1

1

0

0 1

0 1

0 1 0 1 0 1

(e)

1 0 1 0 1

(f)

Figure 17.29: Evolution of paths in list decoding. Following [431].

to four best paths. At this point in the decoding algorithm, the path having the best likelihood is selected, resulting in the decoded sequence û 30 . The likelihoods are computed for list decoding using the same formulas (17.40) and (17.41) as for successive cancellation, but here they are re-expressed in terms of a layer parameter 𝜆, 0 ≤ 𝜆 ≤ n, where the number of bits associated with a layer is Λ = 2𝜆 . Rewriting (17.40) and (17.41) using this notation, 2i−1 W𝜆(2i) (yΛ−1 |u2i ) 0 , u0 ⏟⏞⏞⏞⏟⏞⏞⏞⏟ branch 𝛽

=

1 ∑ Λ∕2−1 2i−1 (i) W (i) (y , u0,even ⊕ u2i−1 |u ⊕ u2i+1 )W𝜆−1 (yΛ−1 , u2i−1 |u ) Λ∕2 0,odd 2i+1 0,odd 2i 2 u ∈ 𝜆−1 0 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏟⏞⏞⏞⏟ 2i+1 branch 2𝛽

(17.88)

branch 2𝛽+1

2i W𝜆(2i+1) (yΛ−1 0 , u0 |u2i+1 )

⏟⏞⏟⏞⏟ branch 𝛽

=

1 (i) Λ∕2−1 2i−1 (i) , u0,even ⊕ u2i−1 |u ⊕ u2i+1 )W𝜆−1 (yΛ−1 , u2i−1 |u ) W (y Λ∕2 0,odd 2i+1 0,odd 2i 2 𝜆−1 0 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏟⏞⏞⏞⏟ branch 2𝛽

(17.89)

branch 2𝛽+1

for 0 ≤ 2i < Λ. The meaning of the 𝛽 annotations is explained below. 17.9.1

The Likelihood Data Structure P

The list decoder employs some data structures to keep track of the likelihoods, bits, and the various paths. The first data structure is the P data structure, which keeps track of the likelihood functions for the various paths. Example 17.31 To understand the data structures associated with list decoding, we will return to decoding with N = 4, with the encoding portrayed and labeled as in Figure 17.30. and write out the likelihoods W2(0) , W2(1) , W2(2) , W2(3) , recalling that now the subscript 2 refers to the level 𝜆, not the number of elements at that level, which is 2𝜆 .

852

Polar Codes u0 u2 u1 u3 Layer 𝜆 = 0

Adders at layer 𝜆 = 1

Layer 𝜆 = 1

Adders at layer 𝜆 = 2

Llayer 𝜆 = 2

Figure 17.30: Encoder for N = 4. Using (17.88) and (17.89) i = 0 W2(0) (y30 |u0 ) =

1 ∑ (0) 1 W1 (y0 |u0 ⊕ u1 )W1(0) (y32 |u1 ) 2 u 1

1 (0) 1 W (y |u ⊕ u1 )W1(0) (y32 |u1 ) 2 1 0 0 1 ∑ (1) 1 i = 2 W2(2) (y30 , u10 |u2 ) = W1 (y0 , u0 ⊕ u1 |u2 ⊕ u3 )W1(1) (y32 , u1 , |u3 ) 2 u

i = 1 W2(1) (y30 , u0 |u1 ) =

3

1 i = 3 W2(3) (y30 , u20 |u3 ) = W1(1) (y10 , u0 ⊕ u1 |u2 ⊕ u3 )W1(1) (y32 , u1 |u3 ). 2

(17.90)

Continuing recursively, the components that are needed for i = 0 are W1(0) (y10 |u0 ⊕ u1 ) and W1(0) (y32 |u1 ). Representing the conditioning argument as b (from which these functions can be evaluated), these can be computed as W1(0) (y10 |b) =

1 ∑ (0) W0 (y0 )|b ⊕ u1 )W0(0) (y1 |u1 ) 2 u 1

W1(0) (y32 |b)

1 ∑ (0) = W0 (y2 |b ⊕ u1 )W0(0) (y3 |u1 ). 2 u 1

W0(0) (y0 |u),

In turn, these quantities depend on W0(0) (y2 |u), and W0(0) (y3 |u), for u = 0 or 1. The quantities needed to compute for i = 0 can be stacked into a data structure that has one function in its first block, two functions in its second block, and four functions in its third block, as shown: 𝜆=2 𝛽=0

W0(0) (y1 |u),

𝜆=1 𝛽=1

𝛽=0

𝛽=0

] [ ] [ [ i = 0 ∶ W2(0) (y30 |b) W1(0) (y10 |b) [W1(0) (y32 |b)] W0(0) (y0 )|b)

𝛽=1 W0(0) (y1 |b)]

[

𝜆=0 𝛽=2 W0(0) (y2 |b)

𝛽=3 W0(0) (y3 |b)

]

(17.91) Similarly, when i = 1, the values needed to compute W2(1) (y30 , u0 |u1 ) in (17.90) can be stacked up as 𝜆=2 𝛽=0 ] i = 1∶ [ (1) 3 W2 (y0 , u0 |b)

𝜆=1 [

𝜆=0

𝛽=0

𝛽=1

W1(0) (y10 |b)

W1(0) (y32 |b)

]

𝛽=0 𝛽=1 𝛽=2 𝛽 = 3 . (17.92) [ ] (0) (0) (0) (0) W0 (y0 )|b) W0 (y1 |b) W0 (y2 |b) W0 (y3 |b) ◽

The index 𝛽 portrayed in (17.91) and (17.92) is called the branch number. 𝛽 indexes the values of the arguments to the W functions. As seen in (17.88) and (17.89), the arguments are of the Λ∕2−1 2i−1 form (y0 , u0,even ⊕ u2i−1 ) are associated with index 2𝛽. Arguments of the form (yΛ−1 , u2i−1 ) Λ∕2 0,odd 0,odd

17.9 List Decoding of Polar Codes

853

Figure 17.31: Data structure P used in list decoding with N = 8, n = 3, L = 4. Numbers in the braces indicate the dimension of the array. are associated with with index 2𝛽 + 1. As the example illustrates, the range of branch numbers at level 𝜆 is 0 ≤ 𝛽 < 2n−𝜆 . For the list decoding algorithm a data structure P is created for each path 𝓁 such that P𝓁𝜆 [𝛽][b] = W𝜆(i) (y(range) u(range) |b)

on path 𝓁,

(17.93)

where the range of indices of y and u are determined by the particular value of 𝛽. The data structure P thus can be considered an “array of arrays,” a four-dimensional object with real values containing the W values. The outer array dimension is of size (n + 1) × L, corresponding to the n + 1 different values of 𝜆 and the L different paths in the list. Then the dimension of the array P𝓁𝜆 is of size 2n−𝜆 × 2, where the first index is for 𝛽 and the second index is for the values of the conditioning argument b ∈ {0, 1}. A picture of the data structure for N = 8, n = 3, and L = 4 is shown in Figure 17.31. 17.9.2

Normalization

Since the likelihoods are small numbers, as they are propagated through the layers using (17.88) and (17.89), the numbers may be small enough to cause loss of numeric precision. Note that N−1 i−1 , û i−1 ̂ 0 , ui = b) Wn(i) (yN−1 0 |b) = 2P(y0 , u 0



̂i 2P(̂ui−1 0 ,u

with P(Ui = b) = 1∕2

= b) = 2 . −i

For codes having length N of any appreciable length, the numbers lose numerical precision. This problem of underflow can be mitigated by normalizing the likelihoods. The normalization is done in a way that preserves the relative ranking of the likelihoods, even though the absolute values lose their meaning. In the algorithms here, we set P𝓁𝜆 [𝛽][b] ←

P𝓁𝜆 [𝛽][b] max P𝓁𝜆 [𝛽][b] 𝛽,b

.

854

Polar Codes 17.9.3

α

α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc recursivelylycalcp()

Code to Compute P

Algorithm 17.6 shows code to compute the likelihoods in the data structure P. This function is called as recursivelyCalcP(n, i), where n is the highest layer number and i is the bit index. Recursive calls are determined by the binary structure of i (just as in SC) in line 3. Line 7 returns a pointer P𝜆 to the 2n−𝜆 × 2 that holds the likelihood data W𝜆 for path 𝓁, and similarly for P𝜆−1 . Also, C𝜆 gets the pointer to the array of bits needed for the update of P𝜆 . The actual update of the likelihoods according to (17.88) and (17.89) for the upper and lower likelihood computations is in lines 10–23. Algorithm 17.6 Polar List Decoder: Recursively Calculate P

1 2

3

4 5 6 7 8 9 10 11 12 13 14 15 16

17 18 19 20 21 22 23 24

25 26 27 28 29 30 31 32 33

Function recursivelyCalcP(𝜆, i) % For each active path, calculate the likelihood to bit i,P𝓁n (i), pushing likelihood from the right to the left.See reference [431], alg 10. Input: layer 𝜆 and bit index i if 𝜆 = 0 then return; % Recursion stopping condition set 𝜓 ← ⌊i∕2⌋ % Recurse into lower levels if needed, as determined by parity of i if i mod 2 = 0 then recursivelyCalcP(𝜆 − 1, 𝜓) % Do the calculation from (17.88) and (17.89) set 𝜎 ← 0 % Prepare to get the scaling factor for 𝓁 = 0, 1, … , L − 1 do % For each active path if activePath[𝓁] = false then continue P𝜆 = getArrayPointer_P(𝜆, 𝓁) % Getpointers to P data for this layer and path P𝜆−1 = getArrayPointer_P(𝜆, 𝓁) C𝜆 = getArrayPointer_C(𝜆, 𝓁) % Getpointers to C (bit) data for this layer and path for 𝛽 = 0, 1, … , 2n−𝜆 − 1 do % For each ‘‘branch’’ at this level if i mod 2 = 0 then % Apply (17.88) for u′ ∈ {0, 1} do ∑ P𝜆 [𝛽][u′ ] ← 12 u′′ ∈{0,1} P𝜆−1 [2𝛽][u′ ⊕ u′′ ]P𝜆−1 [2𝛽 + 1][u′′ ] set 𝜎 ← max(𝜎, P𝜆 [𝛽][u′ ]) end else % Else apply (17.89) set u″ ← C𝜆 [𝛽][0] % Get the bit that controls this ‘‘lower’’ computation for u″″ ∈ {0, 1} do P𝜆 [𝛽][u″″] ← 12 P𝜆−1 [2𝛽][u″ ⊕ u″″]P𝜆−1 [2𝛽 + 1][u″″] set 𝜎 ← max(𝜎, P𝜆 [𝛽][u″]) end end end end % Normalize the probabilities by 𝜎 for 𝓁 = 0, 1, … , L − 1 do if activePath[𝓁] = false then continue P𝜆 ← getArrayPointer_P(𝜆, 𝓁) for 𝛽 = 0, 1, … , 2n−𝜆 − 1 do for u ∈ {0, 1} do P𝜆 [𝛽][u] ← P𝜆 [𝛽][u]∕𝜎 end end end

17.9 List Decoding of Polar Codes 17.9.4

855

The Bits Data Structure C

The data structure C contains the bits necessary for likelihood computation. Because of SC, at bit index i, it is not necessary to retain all of the previous bits at indices < i, since their effect is propagated into the encoding system. At each layer 𝜆, the data that is saved is preserved in an array C𝜆 . As more data arrive, the C𝜆 fill up. Example 17.32 For a code with N = 4, consider a “path,” that is, a sequence of message bits 𝜋 = (u0 , u1 , u2 , u3 ) = (0, 1, 1, 0). We consider each of these in sequence, placing dots • on the encoding diagram to indicate the data involved. We show the data in the bit arrays C2 , C1 , and C0 as it arrives.



u0 = 0. With only a single input, no additions are computed. u0 0 u2 u1 u3

C2 = 0

C1 =

C0 =

C2 contains the most recent bit u0 . There is insufficient data for C1 or C2 to be populated.



u1 = 1. C2 contains the two most recent bits of information, u0 and u1 . There is now sufficient information to compute values at the output of the adders at layer 𝜆 = 2. These output location are shown with •s. u0 0 u2

u1 1

1 1

u3

C2 = 0

1

C1 =

1 1

C0 =

The left column of C1 contains the outputs of adders at layer 2.



u2 = 1. The information from u0 and u1 are now represented at the 𝜆 = 1 level in C1 , and u0 and u1 are no longer needed in C2 , leaving C2 free to store u2 . There is not sufficient data to propagate u2 through adders yet, so C1 is unchanged. u0

u2 1 u1

1 1

u3

C2 = 1



C1 =

1 1

C1 =

u3 = 0. C2 now contains u2 and u3 . There is sufficient information to push u2 , u3 through the adders at level 𝜆 = 2. Furthermore, there is information for the adders at level 𝜆 = 1, so the final output can be computed into C0 .

856

Polar Codes u0 u2 u1 u3

C2 = 1

0

0

1 1

1

0

1 1

1 0

0

1 C1 = 1

1 0

0 1 C0 = . 1 0 ◽

As shown in this example, a data structure C𝜆 is associated with each path in the list, where C𝜆 has the bit data at layer 𝜆. Except for the Cn data, the C𝜆 arrays contain the results of the additions of the encoder, so the bit sequence in the path is not explicitly saved. Instead, as the bit data gets pushed to the right, a codeword is formed and stored in C0 . Extraction of the message bits after the decoding is complete requires unencoding this codeword. This introduces additional complexity, but means that the decoder produces both the message bits and the codeword. The list decoder keeps a set of data analogous to the C𝜆 data in this example for each of the L decoding paths. The algorithm allocates data C𝜆𝓁 [𝛽][b] of the same size as the P data, but containing bit values instead of real values. Consider now the cloning of the path û i0 to form û i+1 for û i+1 = 0 and û i+1 = 1. When cloning a 0 path, the information associated with û i0 — all the previous bits — is preserved. The data on the path that is common to both is pooled together in the lower 𝜆 values of C𝜆 . α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc recursively UpdateCalcC()

17.9.5

Code to Compute C

Algorithm 17.7 shows code to update the bits for each path. Like the code to calculate P, pointers to the bit data at levels 𝜆 and 𝜆 − 1 are extracted for path 𝓁. The actual bit propagation is in lines 6 through 9, with line 7 pushing bits to the right through an adder, and line 8 simply copying bits over on the lower line, paralleling the adder output. Algorithm 17.7 Polar List Decoder: Recursively Calculate C

1 2 3 4 5 6 7 8 9 10 11 12 13

Function recursivelyCalcC(𝜆, i) % For each active path, push the bit i to the next lower level to\the right. See reference [431], alg 11.Input: layer 𝜆 and bit index i % i will be odd, or there is no computation to be done in theencoding graphs set 𝜓 ← ⌊i∕2⌋ for 𝓁 = 0, 1, … , L − 1 do % For each active path if activePath[𝓁] = false then continue C𝜆 = getArrayPointer_C(𝜆, 𝓁) % Getpointers to C data for this layer and path C𝜆−1 = getArrayPointer_C(𝜆 − 1, 𝓁) for 𝛽 = 0, 1, … , 2n−𝜆 − 1 do % For each ‘‘branch’’ at this level C𝜆−1 [2b][𝜓 mod 2] ← C𝜆 [𝛽][0] ⊕ C𝜆 [𝛽][1] % propagate bits through an adder to next lower level C𝜆 − 1][2𝛽 + 1][2𝛽 + 1][𝜓 mod 2] ← C𝜆 [𝛽][1] % copy bits on other branch to next lower level end end if 𝜓 mod 2 = 1 then Parity is odd, so propagate to next lower layer recursivelyUpdateC(𝜆 − 1, 𝜓 end

17.9 List Decoding of Polar Codes 17.9.6

857

Supporting Data Structures

In addition to the P and C data structures there are some supporting data structures that support efficient operations, such as pooling of data in paths, keeping track of paths, and so forth. These supporting data structures are briefly summarized here.



inactivePathIndices: A stack of path indices in {0, 1, . . . , L − 1} which are inactive. Initialized with this whole list.



activePath: A vector of booleans of length L indicating which paths are active. (activePath and inactivePathIndices are different representations of the same information.)



pathIndextoArrayIndex: (n + 1) × L array which indicates which of the P𝜆 and C𝜆 matrices is associated with path 𝓁 at level 𝜆. This is an important data structure which enables the pooling of data when paths are split. For a given layer 𝜆 and path index 𝓁, the probability array corresponding to this layer and path are in the P data structure pointed to by arrayPointer_P[𝜆][pathIndexToArrayIndex[𝜆][𝓁], and similarly for the C data.



arrayReferenceCount: (n + 1) × L array indicating how many time the P𝜆 and C𝜆 are used. When pooling of data in a path occurs, this count is incremented. When a path is killed, this count is decremented. This supports the pooling of data. That is, the value arrayReferenceCount[𝜆][s] denotes the number of paths current using the array pointed to by arrayPointer_P[𝜆][s].



inactiveArrayIndices: An array of n + 1 stacks, indicating the array indices that are inactive at each level. α α2 α3 α4 α5 α6 α7 α8 α9 +

17.9.7

Code for Polar List Decoding

The remainder of the code listings are provided in this section. Some explanations are provided in the example below. Algorithm 17.8 Polar List Decoder: Main Loop Function SCLdecode(𝐲) % Main decoder loop of successive cancellation list decoder.See reference [431], alg 12 Input: Received vector 𝐲 Output: Decoded codeword 𝐜̂ 1 2 3 4 5 6 7

% Initialize data structures and likelihoods at level 𝜆 = 0 buildDataStructures() % Done only once during construction of decoder object initializeDataStructures() % Do for each call of this function ;𝓁 ← assignInitialPath() % Get path number P0 ← getArrayPointer_P(0,𝓁) for 𝛽 = 0, 1, … , N − 1 do P0 [𝛽][0] ← W(y𝛽 |0), P0 [𝛽][1] ← W(y𝛽 |1) end

01010

00000

α10 α11 α12 α13 α14 α15 α16

polarcodes.cc listdecode()

858

Polar Codes

8 9 10 11 12 13 14 15 16 17 18 19

20 21 22 23 24 25 26 27 28 29

30

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

% Main Loop: Loop over each bit number i in sequence for i = 0, 1, … , N − 1 do recursivelyCalcP(n,i) % Calculate all likelihoods for this i, right to left if ui is frozen then % if this bit is frozen, assign bit to frozen value if activePath[𝓁] = false then continue Cn ←getArrayPointer_C(n,𝓁) set Cn ][i mod 2] to the frozen value of ui else continuePaths_UfrozenBit(i) end if i mod 2 = 1 then recursivelyUpdateC(n, i) % Push the bits left to right end % Return the codeword in the list with the highest likelihood 𝓁 ′ ← 0, p′ ← 0 for 𝓁 = 0, 1, … , L − 1 do if activePath[𝓁] = false then continue Cn ← getArrayPointer_C(n, 𝓁) Pn ← getArrayPointer_p(n, 𝓁) if Pn [0][Cn [0][1]] > p′ then 𝓁 ′ ← 𝓁, p′ ← Pn [0][Cn [0][1]] end end set C0 ← getArrayPointer_C(0, 𝓁 ′ )

end

Algorithm 17.9 Polar List Decoder: Build Polar Code List Decoder Data Structures

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc build ListDecode Data Structures()

1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

Function buildDataStuctures() % Build the data structures for the list decoder. Should appear in the constructor. See reference [431], alg 5 inactivePathIndices ← new stack of capacity L activePath ← boolean array of size L arraypointer_P ← new (n + 1) × L arrays of pointers to (2-d arrays of pointers to doubles) arrayPointer_C ← new (n + 1) × L arrays of pointers to (2-d arrays of pointers to bits) pahtIndexToArrayIndex ← new(n + 1) × L array of integers inactiveArrayIndices ← new array of size n + 1 whose elements are stacks with capacity L arrayReferenceCount ← new (n + 1) × L array of integers % Allocate space for the other two dimensions ofarraypointer_P and arraypointer_P for 𝜆 = 0, 1, … , n do for s = 0, 1, … , L − 1 do arrayPointer_P[𝜆][s] ← new 2n−𝜆 × 2 array of doubles arrayPointer_C[𝜆][s] ← new 2n−𝜆 × 2 array of bits end probForks ← double array of size L × 2 % Used to sort most likely paths contForks ← boolean array of size L × 2 % Flag which paths are kept end

17.9 List Decoding of Polar Codes

Algorithm 17.10 Polar List Decoder: Initialize Data Structures

859

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

1 2

Function initializeDataStuctures() % Initialize the data structures for the list decoder.Done for each decoded codeword. See reference [431], alg 5

α10 α11 α12 α13 α14 α15 α16

polarcode.cc initialize ListDecode Data Structures()

3 4 5 6 7 8 9 10 11 12 13 14 15

clearstack(inactivePathIndices) for 𝜆 = 0, 1, … , n do clearstack(inactiveArrayIndices[𝜆]) for s = 0, 1, … , L − 1 do arrayReferenceCounter[𝜆][s] ← 0 push(inactiveArrayIndices[𝜆], s) end end for 𝓁 = 0, 1, … , L − 1 do activePath[𝓁] ← False push(inactivePathIndices, 𝓁) end

00000

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc assign initial path()

Algorithm 17.11 Polar List Decoder: Assign Initial Path Function assignInitialPath() % Assign the initial path: grab the first available path, and associate an array with that path index. See reference [431], alg 6 1

2 3

4 5 6 7 8 9

Output: Index 𝓁 of the initial path 𝓁 ← 𝐩𝐨𝐩(inactivePathIndices) activePath[𝓁] ← true % Associate an array at each level to this path index for 𝜆 = 0, 1, … , n do s ← 𝐩𝐨𝐩(inactiveArrayIndices[𝜆]) pathIndextoArrayIndex[𝜆][𝓁] ← s arrayReferenceCount[𝜆][s] ← 1 end Return 𝓁

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc clonePath()

Algorithm 17.12 Polar Code Decoder: Clone a Path, Sharing Data in C and P

1 2

3 4 5 6 7 8

Function clonePath() % Grabs a new path, then associates with that path to path 𝓁.See reference [431], alg 7 Input: Index 𝓁 of path to clone Output: Index 𝓁 ′ of copy 𝓁 ′ ← 𝐩𝐨𝐩(inactivePathIndices) activePath[𝓁 ′ ] ← true % Make 𝓁 ′ reference the same arrays as 𝓁 for 𝜆 = 0, 1, … , n do s ← 𝐩𝐨𝐩(inactiveArrayIndices[𝜆]) pathIndextoArrayIndex[𝜆][𝓁 ′ ] ← s arrayReferenceCount[𝜆][s]++ end Return 𝓁 ′

860

Polar Codes

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

Algorithm 17.13 Polar List Decoder: Kill a Path

00000

α10 α11 α12 α13 α14 α15 α16

polarcode.cc killPath()

1 2

3 4 5 α α2 α3 α4 α5 α6 α7 α8 α9

6

01010

7

+

00000

8 α10 α11 α12 α13 α14 α15 α16

9

Function killPath(𝓁)% Kill a path by decrementing reference counts.See reference [431], alg 8. Input: Index 𝓁 of path to kill activePath[𝓁] ← false 𝐩𝐮𝐬𝐡(inactivePathIndices, 𝓁) % Disassociate array with path index for 𝜆 = 0, 1, … , n do s ← pathIndexToArrayIndex[𝜆][𝓁] arrayReferenceCount[𝜆][s]– if arrayReferencescount[𝜆][s] = 0 then 𝐩𝐮𝐬𝐡(inactiveArrayIndices[𝜆], s) end end

polarcode.cc get ArrayPointer_P()

Algorithm 17.14 Polar List Decoder: Get Pointer P𝓁𝜆

1 2 3 4 5 6

α α2 α3 α4 α5 α6 α7 α8 α9

7

+

01010

00000

8 9

Function getArrayPointer_P(𝜆, 𝓁) % Return a pointer to the 2n−𝜆 × 2 matrix pointed to by P𝓁𝜆 .See reference [431], alg 9. Input: layer 𝜆 and path index 𝓁 Output: Pointer to P𝓁𝜆 , the 2n−𝜆 × 2 probability pair array s ← pathIndexToArrayIndex[𝜆][𝓁] if arrayReferenceCount[𝜆][s] = 1 then s′ ← s else s′ ← 𝐩𝐨𝐩(inactiveArrayIndices[𝜆] Copy the contents of the 2n−𝜆 × 2 array pointedto by arrayPointer_P[𝜆][s] into that pointed to by arrayPointer_P[𝜆][s′ ]. Also copy the contents of the 2n−𝜆 × 2 array pointed to by arrayPointer_C[𝜆][s] into that pointed to by arrayPointer_C[𝜆][s′ ]. end return arrayPointer_P[𝜆][s′ ]

α10 α11 α12 α13 α14 α15 α16

polarcode.cc getArrayPointer_C()

Algorithm 17.15 Polar List Decoder: Get Pointer C𝜆𝓁

1 2 3 4 5 6

7

8 9

Function getArrayPointer_C(𝜆, 𝓁) % Return a pointer to the 2n−𝜆 × 2 matrix pointed to by C𝜆𝓁 .See reference [431], alg 9. Input: layer 𝜆 and path index 𝓁 Output: Pointer to C𝜆𝓁 , the 2n−𝜆 × 2 probability pair array s ← pathIndexToArrayIndex[𝜆][𝓁] if arrayReferenceCount[𝜆][s] = 1 then s′ ← s else s′ ← 𝐩𝐨𝐩(inactiveArrayIndices[𝜆] Copy the contents of the 2n−𝜆 × 2 array pointed to by arrayPointer_P[𝜆][s] into that pointed toby arrayPointer_P[𝜆][s′ ]. Also copy the contents of the 2n−𝜆 × 2 array pointed to by arrayPointer_C[𝜆][s] into that pointed toby arrayPointer_C[𝜆][s′ ]. end return arrayPointer_C[𝜆][s′ ]

17.9 List Decoding of Polar Codes

Algorithm 17.16 Polar List Decoder: Continue Paths for Unfrozen Bit

1

2 3 4 5 6 7 8 9 10 11 12

Function continuePathsUnfrozenBit(i) % Find the L most likely paths, kill those that don’t pass, and clone remaining paths as necessary to end up with L paths.See reference [431], alg 13. Input: bit index i nactivepath ← 0 % Store likelihood for each bit value ui for each active path for 𝓁 = 0, 1, … , L − 1 do if activePath[𝓁] = true then Pm ← getArrayPointer_P[n, 𝓁] ProbForks[𝓁][0] ← Pm [0][0] % save likelihood for ui = 0 ProbForks[𝓁][0] ← Pm [0][1] % save likelihood for ui = 1 nactivepath ← nactivepath + 1 else probForks[𝓁][0] ← −1 probForks[𝓁][1] ← −1 end end 𝜌 ← min(2nactivepath, L) Populate the boolean matrix contForks such that contForks[𝓁][b] is true iff probForks[𝓁][b] is one of the 𝜌 largest entriesin probForks. % (The present implementation uses a quicksort algorithm to do this, but it is possible in principle to do this in O(L) time.)

13 14 15 16 17 18

19 20

21 22

23 24 25 26 27

28 29 30 31 32 33 34

% Kill off noncontinuing paths for 𝓁 = 0, 1, … , L − 1 do if activePath[𝓁] = false then continue if contForks[𝓁][0] = false and contForks[𝓁][1] = false then killPath(𝓁); end end % Then, continue relevant paths and duplicate if necessary for 𝓁 = 0, 1, … , L − 1 do if contForks[𝓁][0] = false and contForks[𝓁][1] = false then continue; % if both forks are bad or invalid Cn ← getArrayPointer_C(n, 𝓁) if contForks[𝓁][0] = true andcontForks[𝓁][1] = true then % Both forks are good: clone paths set Cm [0][i mod 2] ← 0 𝓁 ′ ← clonePath(𝓁) Cm ← getArrayPointer_C(n, 𝓁 ′ ) set Cm [i mod 2] ← 1 else % Exactly one fork is good if contForks[𝓁][0] = true then set Cm [0][i mod 2] ← 0 else set Cm [0][i mod 2] ← 1 end end end

861

862

Polar Codes 17.9.8

An Example of List Decoding

In this section, the different data structures presented in the previous sections are combined to show how the pieces work together. At various stages in the decoding algorithm, the example shows the contents of the C data structure, the paths represented by the data structures, and the encoding diagram with •s to show the locations that the data in C represent. Also shown is a description of the supporting data structures. The P data structure, showing the computed likelihoods, is not portrayed. The portrayal here is conceptual, and figures may not correspond exactly to values in an actual run of the algorithm.



After initialization.

There is one active path 𝜋0 which is assigned to the first C2 , C1 , and C0 arrays by the assignInitialPath function. However, the path is empty (and not shown) since no data have been processed yet.



At the end of the main loop with i = 0.

There are two active paths 𝜋0 = 0 and 𝜋1 = 1, where 𝜋1 is cloned from 𝜋0 using clonePath. These two paths are created in continuePaths_UnfrozenBit and the call to getArrayPointer_C. Note that element pathIndexToArrayIndex[2][1] = 1. This means that path 𝜋1 (the second path) refers to the second column of C2 . This is only referenced once. The other two C𝜆 components are shared with the path 𝜋0 , which are referred to twice.

17.9 List Decoding of Polar Codes



At the end of the main loop with i = 1.

There are four active paths, 𝜋0 = 00, 𝜋1 = 10, 𝜋2 = 01, and 𝜋3 = 11. The path 𝜋2 is the result of applying clonePath to the former 𝜋0 . After this, a unique C2 is obtained by the call to getArrayPointer_C.



i = 2, just after calling killPath in the continuePaths_UnFrozenBit function.

At this point 𝜋0 has been killed leaving the active paths as 𝜋1 = 10, 𝜋2 = 01, and 𝜋3 = 11. The first block of both C2 and C1 are free as a result of killing 𝜋0 .



i = 2, at end of iteration.

863

864

Polar Codes There are again four active paths, 𝜋0 = 011, 𝜋1 = 100, 𝜋2 = 010, and 𝜋3 = 111. The current 𝜋0 is a result of applying clonePath to the former 𝜋2 . This results in 𝜋0 and 𝜋2 sharing the C1 array, but 𝜋0 get its own C2 array in getArrayPointer_C. The path 𝜋1 is a continuation of the former 𝜋1 .



i = 3, end of main loop

At the end of the main loop with i = 3, there are four active paths 𝜋0 = 0110, 𝜋1 = 0111, 𝜋2 = 0100, and 𝜋3 = 1111. The previous 𝜋1 has been killed, a path was cloned and split into two paths (the former 𝜋0 split into 𝜋0 and 𝜋1 ). The call to recursivelyUPdateC has resulted in private copies of C2 , C1 , and C0 for each path. At the end of this iteration, the codewords corresponding to the each path of input bits is stored in the C0 arrays.

17.9.9

Computational Complexity

The size of the data structure associated with each path is O(N). Each time a decoding path is split into two, the P data for path 𝓁 is duplicated. The number of splits is O(NL). A naive implementation, the splitting and duplication would thus have a computational complexity of O(N) ⋅ O(NL) = O(LN 2 ). This would make the computational complexity impractical unless N is too small to be useful. However, a couple of observations can reduce the computational complexity. First, the cost of duplicating the data in P or C (when a path splits) at layer 𝜆 is proportional to 2n−𝜆 . So the larger 𝜆 is, the less the cost to duplicate. The other cost-saving aspect is that the larger P𝜆 or C𝜆 data is, less frequently it is accessed. This is illustrated in the following example. Example 17.33 Consider the decoding algorithm detailed in Figures 17.23 through 17.25. The LLRs, the lines on which they are computed, and the smallest layer number at which the update occurs, are summarized as follows:

1. Figure 17.23a: (j = 0) LLR(7) – LLR(14), lines 0 – 7 layer 0. 2. Figure 17.23c: (j = 1) LLR(0) line 4, layer 3. 3. Figure 17.23e: (j = 2) LLR(1) ,LLR(0) lines 2, 6 layer 2. 4. Figure 17.24a: (j = 3), LLR(0), line 6, layer 3. 5. Figure 17.24c: (j = 4) LLR(3), LLR(4), LLR(5), LLR(6), lines 1, 3, 5, 7, layer 1.

17.9 List Decoding of Polar Codes

865

6. Figure 17.24e: (j = 5) LLR(), line 5, layer 3. 7. Figure 17.25a: (j = 6) LLR(1), LLR(0), lines 3, 7, layer 2. 8. Figure 17.25c: (j = 7) LLR() line 7, layer 3. ◽

Generalizing from this example, layer 𝜆 is accessed every 2n−𝜆 increments of j. The bigger the data in P𝜆 and C𝜆 (from smaller 𝜆), the less often it is accessed. These observations are used in a decoding algorithm as follows. At a given stage of decoding, data shared by more than one decoding path is “flagged” as belonging to more than one decoding path. This is done with the arrayReferenceCount array. When a decoding path needs access to an array it is sharing with another path, a copy is made. 17.9.10

Modified Polar Codes

It has been found in simulation that when a decoding error occurs (that is, the path with the ML is not the path for the correct codeword), the path corresponding to the transmitted codeword is, in fact, a member of the final list. It was simply not selected because another path had higher likelihood. If it were possible to know, somehow, what the correct codeword is, it could have been correctly decoded. This is done (with high probability) using a simple concatenated coding scheme as follows. Choose a length r of a CRC code, such as r = 8 or r = 16. Of the K unfrozen bits that are available to transmit data on, send information on K − r of them, and use the last r unfrozen bits to hold the CRC parity information. This does result in a slight reduction in rate, since the rate is now (K − r)∕N, instead of K∕N. In decoding, the selection is refined as follows. If at least one path has a correct CRC, then paths having incorrect CRC are removed from further consideration, and the path having the highest likelihood among those with the correct CRC is selected. Otherwise, in the hope to minimize the number of bits in error, the codeword with the highest likelihood is selected. List decoding is obviously more complex than SC decoding. Modified polar codes can also be used provide a mechanism for obtaining low probability and lower average decoding complexity. The concept is called a two-step list decoder and is shown in Figure 17.32. The received data are first decoded using a fast (non-list-based) decoder, such as SC or simplified successive cancellation (SSC). Most of the time this will decode correctly, as can be verified by examining the CRC. If there is not a valid CRC, which happens infrequently, then a more complex list-based decoder can be called which will do a better job with the decoding. The two-step decoder provides approximately the same throughput as the fast decoder, with essentially the same performance as the list decoder.

Fast decoder (not list-based)

Valid CRC?

no

ListCRC

yes Done

Figure 17.32: A two-step list decoder.

866

Polar Codes

17.10 LLR-Based Successive Cancellation List Decoding As noted in Section 17.9.2, likelihood computation is subject to underflow. This problem was mitigated with the likelihood-based list decoder using normalization. Another way of addressing underflow is to use log likelihoods, but it has been determined that in hardware implementations, preserving numerical accuracy requires a different number of bits per word at different levels. Further numerical improvement can be obtained using log- likelihood ratio decoding. This has been shown to be area efficient in hardware implementations and numerically stable. But there is a problem in list decoding, where the top L most likely children (paths) out of 2L paths of different parents must be chosen. What is needed is a path metric, computing the likelihood along the entire path of bits. This path metric is established in the following theorem. Theorem 17.34 [22, Theorem 1] For a path 𝓁 having bits û 0 (𝓁), û 1 (𝓁), . . . , û i (𝓁) and bit number i ∈ 0, 1, . . . , N − 1 , define the path metric as PM(i) = 𝓁

i ∑

(j)

ln(1 + exp[−(1 − 2̂uj (𝓁))LN [𝓁]]),

(17.94)

j=0

where

( LN(i) [𝓁] = ln

, û i−1 [𝓁]|0) WN(i) (yN−1 0 0

)

WN(i) (yN−1 , û i−1 [𝓁]|1) 0 0

is the log-likelihood ratio of the bit ui given the channel output yN−1 and the past trajectory of the path 0 û i−1 [𝓁]. 0 Furthermore, if the bits are equally probable {0, 1} for any pair of paths 𝓁1 , 𝓁2 , WN(i) (yN−1 , û i−1 ui [𝓁1 ]) < WN(i) (yN−1 , û i−1 ui [𝓁2 ]) 0 [𝓁1 ]|̂ 0 [𝓁2 ]|̂ 0 0 if and only if PM(i) > PM(i) . 𝓁 𝓁 1

2

That is, comparing bits along a path using the likelihood computed along that path is equivalent to the path metric. However, the path metric is computed using LLRs, in a numerically stable way. Proof Define the likelihood ratio Λ(i) [𝓁] = 𝜆

=

, û i−1 [𝓁]|ui = 0) W𝜆(i) (yN−1 0 0 W𝜆(i) (yN−1 , û i−1 [𝓁]|ui = 1) 0 0 , û i−1 [𝓁], ui = 0)∕P(ui = 0) P(yN−1 0 0 P(yN−1 û i−1 [𝓁], ui = 1)∕P(ui = 1) 0 0

=

P(yN−1 , û i−1 [𝓁], ui = 0) 0 0 P(yN−1 , û i−1 [𝓁], ui = 1) 0 0

assuming, as usual, equal prior probabilities on the bits. Note that L𝜆(i) [𝓁] = ln L𝜆(i) [𝓁].

,

17.11 Simplified Successive Cancellation Decoding

867

The joint probability of the observation and the bits û i−1 [𝓁] can be expressed as 0 ∑ , û i−1 P(yN−1 , û i−1 ̂ i [𝓁]) P(yN−1 0 [𝓁]) = 0 [𝓁], u 0 0 û i [𝓁]∈{0,1}

= P(yN−1 , û i−1 ̂ i [𝓁] = 0) + P(yN−1 , û i−1 ̂ i [ell] = 1) 0 [𝓁], u 0 [𝓁], u 0 0 { N−1 i−1 P(y0 , û 0 [𝓁], û i = 1)(1 + Λ(i) [𝓁]) û i = 1 𝜆 = , û i−1 [𝓁], û i = 0)(1 + (Λ(i) [𝓁])−1 ) û i = 0 P(yN−1 𝜆 0 0 , û i−1 ̂ i [𝓁])(1 + (Λ(i) [𝓁])−(1−2̂ui ) ). = P(yN−1 0 [𝓁], u 0 𝜆 Moving the factor to the other side, we have P(yN−1 , û i0 [𝓁]) = (1 + (Λ(i) [𝓁])−(1−2̂ui [𝓁]) )−1 P(yN−1 , û i−1 0 [𝓁]). 0 0 𝜆 Repeated application of this for i − 1, i − 2, . . . , 0 yields , û i0 [𝓁]) P(yN−1 0

=

i ∏ j=0

(j)

(1 + (Λ𝜆 )−(1−2̂uj [𝓁]) )−1 P(yN−1 ). 0

Dividing both sides of this by P(yN−1 ) gives the posterior probability 0 )= P(̂ui0 [𝓁]|yN−1 0

i ∏ j=0

(j)

(1 + (Λ𝜆 )−(1−2̂uj ) )−1 .

Taking the negative log of this and expressing in terms of

α α2 α3 α4 α5 α6 α7 α8 α9

L𝜆(i) [𝓁],

+

we obtain

01010

) = PM(i) , − ln P(̂ui0 [𝓁]|yN−1 0 𝓁 which establishes the conclusion of the theorem. 17.10.1

00000

α10 α11 α12 α13 α14 α15 α16



Implementation Considerations

The implementation provided follows the same outline as the likelihood-based list decoder, with a new data structure pathmetric associated with each path. This is cloned and copied just like an element of the P data structure.

17.11 Simplified Successive Cancellation Decoding Even the efficient decoding algorithm described in Section 17.4.7 is limited when it comes to very high-speed hardware and software implementations. One limitation is that the decoding is sequential, so bit i + 1 cannot be decoded until bit i has been decoded. This limits the throughput rate of polar codes. The throughput has been shown to be approximately 0.5fR, where f is the decoder clock frequency and R is the code rate [265]. In this section, we describe approaches to simplified successive decoding which can be used for very high-speed decoding by decoding at the level of constituent codes instead of at the bit level. Throughput achieved by this SSC decoding is between two and twenty times faster than conventional SC decoding, depending on the code length [7]. Central to the simplification is the tree structure identified in Section 17.4.6. In addition to the notation introduced in that section, we introduce some additional notation related to trees here. Let the leaves of the tree be indexed by the integers 0, 1, . . . , 2n − 1. If 𝑣 is a leaf node of the tree, let 𝓁(𝑣) denote the index of the leaf node, so 𝓁(𝑣) ∈ 0, 1, . . . , 2n − 1. For a node 𝑣, let V𝑣 denote the set of

listdecodeLLR() clonepathllr() getArrayPointer _llr() getArrayPointer _Cllr() recursivelyCalc llr() recursivelyUpdate Cllr() continuePaths _UnFrozenBitllr() findMostProbable Pathllr()

868

Polar Codes nodes of the subtree rooted at 𝑣. Let 𝑣 be the set of the indices of all the leaf nodes that are descendants of node 𝑣, 𝑣 = {𝓁(u) ∶ u ∈ V𝑣 and u is a leaf node}. Example 17.35

To illustrate this notation, consider the tree labeled in Figure 17.33. For this tree:

b = {0, 1, 2, 3} Vb = {b, d, e, 0, 1, 2, 3} d = {0, 1} Vd = {d, 0, 1}

c = {4, 5, 6, 7} Vc = {c, f , g, 4, 5, 6, 7} e = {2, 3}

Ve = {e, 2, 3}

f = {4, 5} Vf = {f , 4, 5} g = {6, 7}

Vg = {g, 6, 7}.



In Figure 17.34, nodes 𝑣1 and 𝑣2 and their subtrees have been highlighted. All the leaf node descendents of 𝑣1 are frozen nodes, and all the leaf node descendents of 𝑣2 are unfrozen nodes. Let  ⊂ {0, 1, . . . , N − 1} be the information set (the unfrozen bits) and let  = c be the frozen bits. We say that a node 𝑣 in the tree is a rate one node with respect to  if the leaf node descendants of 𝑣 are all information bits, that is, I𝑣 ⊆ . Such nodes are designated as  1 nodes. Similarly, we say that a node 𝑣 in the tree is a rate zero node with respect to  if the leaf node descendants of 𝑣 are all frozen bits, that is, I𝑣 ⊆  . These nodes are the  0 nodes. In Figure 17.34, node 𝑣1 , and in fact all the white nodes, are rate zero nodes; node 𝑣2 and all the black nodes are rate one nodes. The nodes that are not in  0 or  1 (colored in gray on the tree) having rate 0 < R < 1 are called the  R nodes. Intuitively, since the rate zero nodes are frozen, there no need to do any decoding on them. The entire subtree rooted at 𝑣1 requires no computation, since all its nodes are zero rate nodes. This modification to the decoding algorithm results in some reduction of computational complexity. a dυ = 1 dυ = 2 dυ = 3

b e

d

0

c

1

2

f

3

4

g

5

6

7

Figure 17.33: A tree labeled to illustrate the notation for simplified successive cancellation decoding.

υ1 υ2

(a)

(b)

Figure 17.34: Identifying rate one nodes (black) and a rate zero nodes (white) in the tree, and the reduced tree. (a) Rate one nodes and rate zero nodes; (b) the reduced tree.

17.11 Simplified Successive Cancellation Decoding

869

Further simplifications are possible. With the tree structure in mind, we associate to each node 𝑣 a “constituent code” with a generator as follows. Let Gn denote the generator matrix for a polar code of block length 2n where, for example (neglecting the bit-reverse permutation matrix), [ ] 1 0 G1 = F = 1 1

G2 = F ⊗2

⎡1 ⎢1 =⎢ ⎢1 ⎢ ⎣1

0 0 0⎤ 1 0 0⎥ ⎥. 0 1 0⎥ ⎥ 1 1 1⎦

Assign to each node 𝑣 a constituent generator G𝑣 obtained from taking the rows i and columns i of Gn−d𝑣 for i ∈ 𝑣 ∩ . Example 17.36

For the labeling in Figure 17.33, with  = {5, 6, 7} Gb = [ Gc = G2

] Gd = [ G f = [1]

]

Ge = [ ]

(empty matrix for zero-rate node)

Gg = G1 .



At a rate one node, the constituent generator is simply the square matrix Gn−d𝑣 , which is invertible with G−1 = Gn−d𝑣 . n−d𝑣 A modified decoding algorithm is obtained as follows. Let 𝛼𝑣 be the message vector to a rate one node 𝑣 such as node 𝑣2 in Figure 17.34(a). When 𝑣 is activated, it immediately calculates 𝛽𝑣 by simply quantizing the result as 𝛽𝑣 = h(𝛼𝑣 ), where h(⋅) is applied to each element of 𝛼𝑣 . Then the bit indices in 𝑣 are decoded as û [min𝑣 ], . . . , û [max𝑣 ]) = 𝛽𝑣 G𝑣 . The decoder for a rate-one constituent code simply hard-quantizes its soft input values and computes the inverse of the encoding operation (which is the same as the encoding operation). Example 17.37 In Figure 17.33, the message 𝛼g = 𝛼cr to node g has two components (see (17.56)). These are thresholded to bit values using 𝛽g = h(𝛼cr ), then un-encoded as (̂u[6], û [7]) = 𝛽𝑣

[ ] 1 0 . 1 1



None of the lower decoders at nodes u ∈ V𝑣 ∖{𝑣} need to be activated — the information at the highest rate one node in a subtree is immediately transformed to the decoded bit values. The modified algorithm replaces laboriously computed soft information propagated through the intervening layers with easily computed hard information at the leaf nodes of V𝑣 . The result of this observation is that the decoding on the full tree can be reduced to decoding on a reduced tree, as shown in Figure 17.34(b). This modified decoder produces the same decoding result as if the fully decoding algorithm were employed. Why does this work? It can be verified on a case-by-case basis that if both x and y are nonzero real numbers, h(x ⊞ y) = h(x) ⊕ h(y), (17.95) where h is an instance of the quantization function defined in (17.47). That is, computing the log-likelihood computation x ⊞ y then quantizing it is equivalent to quantizing both x and y then

870

Polar Codes adding modulo 2. The proof that the modified decoder produces the same decoding result proceeds by induction, starting at level d𝑣 = n − 1, the level above the leaf nodes. For such a node 𝑣, 𝛽𝑣l = h(𝛼𝑣l )

𝛽𝑣r = h(𝛼𝑣r ).

and

(17.96)

Using (17.54) and (17.95), h(𝛼𝑣l [i]) = h(𝛼𝑣 [2i]) ⊕ h(𝛼𝑣 [2i + 1]). Using (17.55) and (17.96), h(𝛼𝑣r [i]) = h(𝛼𝑣 [2i](1 − 2h(𝛼𝑣l [2i])) + 𝛼𝑣 [2i + 1]) = h(𝛼𝑣 [2i](1 − 2(h(𝛼𝑣 [2i]) ⊕ h(𝛼𝑣 [2i + 1]))) + 𝛼𝑣 [2i + 1]) = h(𝛼𝑣 [2i + 1]), where the last equality is verified by considering each of the four cases (𝛼𝑣 [2i] > 0, 𝛼𝑣 [2i + 1] > 0),

(𝛼𝑣 [2i] > 0, 𝛼𝑣 [2i + 1] < 0),

(𝛼𝑣 [2i] < 0, 𝛼𝑣 [2i + 1] > 0),

(𝛼𝑣 [2i] < 0, 𝛼𝑣 [2i + 1] < 0).

The 𝛽 messages are computed as follows: 𝛽𝑣 [2i] = 𝛽𝑣l [i] ⊕ 𝛽𝑣r [i] = h(𝛼𝑣l [i]) ⊕ h(𝛼𝑣r [i]) = h(𝛼𝑣 [2i + 1]) ⊕ h(𝛼𝑣 [2i + 1]) ⊕ h(𝛼𝑣 [2i]) = h(𝛼𝑣 [2i]) 𝛽𝑣 [2i + 1] = 𝛽𝑣r [i] = 𝛽𝑣r [i] = h(𝛼𝑣r [i]) = h(𝛼𝑣 [0]). Thus, at a rate one node at depth d𝑣 = n − 1, 𝛽𝑣 [i] = h(𝛼𝑣 [i]) for all i. For rate one nodes at d𝑣 < n − 1, we can proceed inductively. Assume that (17.96) holds at depth d (the inductive hypothesis). For a rate one node at depth d − 1, its children are also rate one nodes at a depth d which, based upon the inductive hypothesis, satisfy (17.96), so the steps in the proof above apply. 17.11.1

Modified SC Decoding

The discussion above results in decoding simplifications for rate zero and rate one subtrees. Are there simplifications that can be applied to the  R nodes with 0 < R < 1, such as node f in Figure 17.33? One approach is to do ML decoding. In general, a decoder for a binary block code C of dimension k ̂ where estimates the transmitted codeword based on the received information 𝐲 as 𝐱, 𝐱̂ = arg max P(𝐲|𝐱), 𝐱∈C

where the maximization requires (in principle) a search over 2k different codewords. When BSPK modulation is used (transmitted values are ±a for some amplitude a) and the received information is represented using LLR values, this becomes 𝐱̂ = arg max 𝐱∈C



n𝑣 −1

(1 − 2𝐱[i]))𝛼𝑣 [i],

i=0

where n𝑣 is the number of bits associated with the code at node 𝑣, n𝑣 = 2d(𝑣) . The complexity of this grows exponentially with the code dimension and linearly with the code length n𝑣 . For some nodes this turns out to be a valid tradeoff, especially since the ML search can be carried out in parallel.

17.14 Generalizations, Extensions, and Variations

17.12 Relations with Reed–Muller Codes Arıkan [16] observed connections between polar codes and Reed–Muller (RM) codes. One construction of RM codes is follows. Form the matrix F ⊗n , then delete the rows of that matrix which have least weight, until an K × N matrix of obtained. This is similar to the polar code construction, which similarly selects rows of F ⊗n . The difference is that the RM generator selects the information set (rows of F ⊗n retained) in a channel-independent manner, while the polar codes retain rows in a channel-dependent manner. Polar coding is thus more finely tuned to the channel than RM coding is. In fact, Arıkan proves that the RM rule for designing the generator leads to asymptotically unreliable codes under SC decoding.

17.13 Hardware and Throughput Performance Results Because of the great potential of polar codes there has been considerable research devoted to producing high-speed polar code decoders. The performance of these decoders will only go up, so this list cannot be exhaustive. But a few results are mentioned that are interesting at the time of publication.



In 2013, [332] reported a throughput of approximately 100 Mbps for a rate 0.9 code of length 16,384 with a 230 MHz clock on an FPGA.



In 2014, [391] reported a throughput of approximately 1 Gbps for a rate 0.9 code of length 32,768 with a clock of 108 MHz on an FPGA, using fast SSC (that is, SSC with specialized decoders for  R nodes).



By unrolling the SSC into a pipeline (which results in large hardware requirements), a decoder with approximately 100 Gbps throughput was reported in 2015 for a (1024, 512) code [157].



A survey reporting throughput of 415.7 Gbps is reported in [155].

Other articles of interest reporting on hardware implementations are [264, 389, 390, 393]. Besides hardware implementations, software polar decoders can also operate at high speed, making them very competitive with software implementation of decoders for other codes such as turbo codes or LDPC codes. Decoders operating in excess of 250 Mbps throughput for (32768, 29492) codes have been reported [146,156].

17.14 Generalizations, Extensions, and Variations Beyond the basic concepts related to polar coding presented here, there are many generalizations and variations. We touch on a few here.



Arıkan [16] indicated how to extend the polarization idea from binary input channels to channels operating with a q-input set  = {0, 1, . . . , q − 1}. The basic idea is very similar to the polarization we have already seen, with an invertible function (generalizing the two-input/two-output ⊕ stage used for binary codes) is used to process blocks of the input data, which then passes through a permuter, then through the individual channels. The encoding and SC decoding complexity for this generalization is still O(N log N).



Arıkan also describes how to obtain channels whose length N is not a power of 2, using ∏ a decomposition N = ni=1 mi . Performance comparisons for non-power-of-2 channels is not provided in Arıkan.



Arıkan also describes using a factor graph for iterative decoding and doing belief propagation decoding. This idea was explored in [207], which showed that belief propagation decoding

871

872

Polar Codes can significantly outperform SC decoding, if an appropriate message passing schedule can be established. See also [65].



An alternative proof of channel polarization is provided in [9], which extends polarization to nonstationary memoryless channels.



Sa¸ ¸ soˇglu et al. [395] generalized Arıkan’s results to channels where the input alphabet size is prime (not just 2).



Generalization of polarization to multiple access channels is discussed in [323] and references therein.

Appendix 17.A BN is a Bit-Reverse Permutation We begin with an observation. Let A be m × n and B be p × q so C = A ⊗ B is mp × nq. Using 0-based indexing, the (i, j)th element of C is Cij = A⌊i∕p⌋,⌊j∕q⌋ Bi

(mod p),j (mod q) .

(17.97)

For an integer i, 0 ≤ i < N, let its binary expansion be written as i = b1 b2 . . . bn , where b1 is the most significant bit or, more precisely, n ∑ i= bj 2n−j . j=1

The i th element of a vector uN−1 can be denoted as ui = ub1 . . . bn . Similarly, the element Aij of an N × N 0 matrix can be written as Ab1 . . . bn ,b′ . . . b′n . 1 Let A be a 2m × 2m matrix and let B be a 2n × 2n matrix and let C = A ⊗ B. Then, referring to (17.97) and expressing the indices in binary, Cb1 . . . bn+m ,b′ . . . b′ 1

n+m

= Ab1 . . . bn ,b′ . . . b′n Bbn+1 . . . bn+m ,b′ 1

n+1

. . . b′n+m .

(17.98)

[ ] 1 0 The elements of the matrix F = can be expressed as Fb,b′ = 1 ⊕ b′ ⊕ bb′ for b, b′ ∈ {0, 1}, 1 1 as can be verified exhaustively. Recursively applying (17.98), Fb⊗n. . . b 1

′ ′ n ,b1 . . . bn

=

n ∏ i=1

Fbi ,b′ = i

n ∏

(1 ⊕ b′i ⊕ bi b′i ).

(17.99)

i=1

= sN−1 RN . As observed in Section 17.4.1, Let 𝑣N−1 0 0 (𝑣0 , 𝑣1 , . . . , 𝑣N−1 ) = (s0 , s2 , . . . , sN−2 , s1 , s3 , . . . , sN−1 ). For example, when N = 8 and expressing the indices in binary, (𝑣000 , 𝑣001 , 𝑣010 , 𝑣011 , 𝑣100 , 𝑣101 , 𝑣110 , 𝑣111 ) = (s000 , s010 , s100 , s110 , s001 , s011 , s101 , s111 ). This shows that the permutation operator RN acts on sN−1 to replace the element at index b1 . . . bn with 0 the element in position b2 . . . bn b1 , that is, a cyclic shift to the right, so 𝑣b1 . . . bn = sb2 . . . bn b1 .

Appendix 17.B The Relationship of the Battacharyya Parameter to Channel Capacity

873

Now consider the operator BN in (17.35). BN can be shown inductively to be a bit-reverse permutation. The induction is best demonstrated by example. It is known (see, for example, Figure 17.9) that B4 is a bit-reverse permutation. We will use this to show that B8 is also a bit-reverse permutation. (u000 , u001 , u010 , u011 , u100 , u101 , u110 , u111 )R8 (I2 ⊗ B4 ) = (u000 , u010 , u100 , u110 , u001 , u011 , u101 , u111 )(I2 ⊗ B4 ) = ((u000 , u010 , u100 , u110 )B4 , (u001 , u011 , u101 , u111 )B4 ). Using the fact that B4 is a bit-reverse permutation, we now obtain (u000 , u100 , u010 , u110 , u001 , u101 , u011 , u111 ), which is a bit-reverse permutation. A matrix A is invariant under bit-reversal if Ab1 . . . bn ,b′ . . . b′n = Abn . . . b1 ,b′n ,b1 for every index pair 1 b1 . . . bn , b′1 . . . b′n . If A is invariant under bit reversal, A = BTN ABN or, using the fact that BTN = BN , BN A = ABT . Thus, bit-reversal-invariant matrices commute with the bit-reverse permutation. From (17.99), it follows that F ⊗n is invariant under bit-reversal (just read the product in the reverse order), so that BN F ⊗n = F ⊗n BN .

Appendix 17.B The Relationship of the Battacharyya Parameter to Channel Capacity Appendix 17.B.1

Error Probability for Two Codewords

In this section, we develop a bound for the probability of error when ML decoding is used to distinguish between two vectors of length N. This will involve the Battacharyya parameter. Most of this work will be done with nonbinary alphabets. This will be simplified to binary alphabets at some points. Let  be the alphabet of input symbols, with || = q. Let x0 and x1 be symbols from  transmitted through a channel with transmission probability W(y|xi ). Suppose that x0 is sent. Under ML decoding, x0 is decoded if W(y|x0 ) > W(y|x1 ). Let 0 be set of received values y such that the decoder selects x0 : Y0 = {y ∈ ℝ ∶ W(y|x0 ) > W(y|x1 )}. Then the probability of error when message x0 is sent is Pe,0 =



W(y|x0 ).

(17.100)

y∈Y0c

Since for y ∈ Y c , W(y|x1 ) ≥ W(y|x0 ), it follows that for s in the range 0 < s < 1, W(y|x1 )s ≥ W(y|x0 )s , so W(y|x0 ) ≤ W(y|x0 )1−s W(y|x1 )s 0 < s < 1. Substituting this into (17.100), we obtain ∑ ∑ W(y|x0 )1−s W(y|x1 )s ≤ W(y|x0 )1−s W(y|x1 )s . Pe,0 ≤ y∈Y1c

Similarly, Pe,1 ≤

(17.101)

y

∑ y

W(y|x1 )1−r W(y|x0 )r

0 < r < 1.

(17.102)

874

Polar Codes Substituting r → 1 − s in (17.102) gives (17.101), resulting in the same bounds for Pe,0 and Pe,1 , since s is arbitrary in [0, 1]. Thus, ∑ W(y|x0 )1−s W(y|x1 )s m = 0, 1, 0 < s < 1. Pe,m ≤ y

Let the sum be denoted as Z(W, s) =



W(y|x0 )1−s W(y|x1 )1 ,

y

so that Pe,m ≤ Z(W, s) m = 0, 1

0 < s < 1.

The tightest bound is obtained when s is selected to minimize the Z(W, s). It can be shown that Z(W, s) is convex in s and, for symmetric channels, the minimum values occurs when s = 12 . For such symmetric channels, then, let ∑√ Z(W) = Z(W, s)|s= 1 = W(y|x0 )W(y|x1 ). 2

y

This is the Battacharyya parameter. Then Pe,m ≤ Z(W).

(17.103)

The smaller the Battacharyya parameter is, the lower the probability of error. Appendix 17.B.2

Proof of Inequality (17.59)

We prove here not only the inequality I(W) ≥ log

2 , 1 + Z(W)

which applies to binary channels, but the more general form I(W) ≥ log

q , 1 + (q − 1)Z(W)

which is pertinent to channels with q-ary inputs. This section builds on the results of the preceding section. Consider an information transmission system having M messages to be transmitted, each of which is represented by a codeword 𝐱 of length N of symbols xi ∈ .4 For a message m, the codeword 𝐱m representing it is drawn at random according to the distribution QN (𝐱). A codeword 𝐱m is transmitted and a vector 𝐲 is received, which consists of N symbols each from the alphabet  (which may not be a binary alphabet). Let P[error|m, 𝐱m , 𝐲] denote the probability of error conditioned on a message number m entering the system, and the selection of 𝐱m as the mth codeword and on the reception of 𝐲, where 𝐲 is selected according to the transition probability WN (𝐲|𝐱m ). Let pe,m denote the average probability of error for the mth message, averaged over the random selections of 𝐱m and the received 𝐲. Then ∑ ∑ pe,m = QN (𝐱m )WN (𝐲|𝐱m )P[error|m, 𝐱m , 𝐲]. (17.104) 𝐱m ∈ N 𝐲∈ N

4 For our purposes here, it is possible to take N = 1. The argument with arbitrary N > 1 is the random coding argument and leads, if continued beyond our argument here, to the channel coding theorem, and hence makes a more plausible system. See [147, Section 5.6].

Appendix 17.B The Relationship of the Battacharyya Parameter to Channel Capacity

875

A ML decoder selects the 𝐱 such that WN (𝐲|𝐱) is the largest. A decoding error occurs if message m is input and 𝐱m is transmitted but WN (𝐲|𝐱m′ ) > WN (𝐲|𝐱m ) for some m′ ≠ m. Thus, let Am′ denote the event that WN (𝐲|𝐱m′ ) ≥ WN (𝐲|𝐱m ). From this definition, ∑

P(Am′ ) =

QN (𝐱m′ ) ≤

′ ∶W (𝐲|𝐱 )≥W (𝐲|𝐱 ) 𝐱m N N m m′



QN (𝐱m′ )

′ ∈ N 𝐱m

WN (𝐲|𝐱m′ )s WN (𝐲|𝐱m )s

for any s > 0. (17.105)

The probability of the error event can be expressed using the events Am′ as ) ( ⋃ Am′ . P[error|m, 𝐱m , 𝐲] ≤ P

(17.106)

m′ ≠m

Here the inequality follows since a ML decodes does not necessarily make an error if WN (𝐲|𝐱m′ ) = WN (𝐲|𝐱m ) for some m′ . We will now invoke a generalization of the union bound: ) [ ]𝜌 ( ∑ ⋃ Am ≤ P(Am ) , any 𝜌, 0 < 𝜌 ≤ 1. (17.107) P m

m

This bound is straightforward to show. By the union bound and the properties of probabilities, ( ) {∑ M ⋃ m=1 P(Am ) P Am ≤ 1. m If ∑

(∑ )𝜌 is less than for 0 < 𝜌 ≤ 1. On the other hand, if m P(Am ) [∑ 1, then]𝜌 it is smaller than P(A ) ≥ 1, then P(A ) ≥ 1 for these 𝜌. In either case, the inequality (17.107) results. m m m m Applying the inequalities (17.107) and (17.105)–(17.106) we obtain ]𝜌 [ ∑ WN (𝐲|𝐱)s P[error|m, 𝐱m , 𝐲] ≤ (M − 1) QN (𝐱) . WN (𝐲|𝐱m )s N ∑

m P(Am )

𝐱∈

Substituting this into (17.104) gives pe,m

⎤ ∑ ⎡ ∑ ⎢ ≤ (M − 1) QN (𝐱m )WN (𝐲|𝐱m )1−s𝜌 ⎥ ⎢ ⎥ 𝐲∈ N ⎣𝐱m ∈ N ⎦ 𝜌

[



]𝜌 QN (𝐱)WN (𝐲|𝐱)

s

.

𝐱∈ N

Setting s = 1∕(1 + 𝜌) (which can be shown to minimize the bound) and using 𝐱 as the dummy variable of summation we end up with the bound [ ]1+𝜌 ∑ ∑ 𝜌 1∕(1+𝜌) pe,m ≤ (M − 1) QN (𝐱)WN (𝐲|𝐱) . 𝐲∈ N

𝐱∈ N

For our purposes it will suffice to assume that each symbol is selected independently and that the channel is memoryless so that QN (𝐱) =

N ∏ n=1

This leads to pe,m

Q(xn )

WN (𝐲|𝐱) =

N ∏

W(yn |xn ).

n=1

[ ]1+𝜌 N ⎡∑ ∑ ⎤ ⎥ . ≤ (M − 1)𝜌 ⎢ Q(x)W(y|x)1∕(1+𝜌) ⎢y∈ x∈ ⎥ ⎣ ⎦

876

Polar Codes Define now the quantity

E0 (𝜌, Q) = − log

[ ∑ ∑ y∈

]1+𝜌 Q(x)W(y|x)1∕1+𝜌

(17.108)

x∈

so that pe,m ≤ (M − 1)𝜌 exp[−NE0 (𝜌, Q)]. To prove the bounds related to the Battacharyya parameter and the capacity, our attention now turns to the function E0 (𝜌, Q). Let ∑∑

I(Q, W) =

Q(x)W(y|x) log ∑

y∈ x∈

W(y|x) . ′ )W(y|x′ ) Q(x x′

I(Q, W) is the mutual information for the channel W with input data selected according to the distribution Q. When || = 2 (binary source) and Q(x) = 12 for x = 0, 1, we get the quantity I(W) defined in (17.1) and (17.57). The following theorem describes some relationships between E0 (𝜌, Q) and I(Q, W). Theorem 17.38 [147, Theorem 5.6.3] E0 (𝜌, Q) ≥ 0 I(Q, W) =

𝜕E0 (𝜌, Q) || | 𝜕r |𝜌=0

for 𝜌 ≥ 0,

I(Q, W) ≥

and

(17.109)

𝜕E0 (𝜌, Q) >0 𝜕𝜌

𝜕E0 (𝜌, Q) ≤ 0 for 𝜌 ≥ 0. 𝜕𝜌

for 𝜌 ≥ 0,

(17.110) (17.111)

The proof of this theorem is given below. Here, we explore some implications of it. The typical shape of the function is shown in Figure 17.35. From (17.110), the initial slope of E0 (𝜌, Q) is I(Q, W). Due to the concavity of the function, E0 (𝜌, Q) curves downward away from this initial slope. The result is that then E0 (𝜌, Q) is evaluated at 𝜌 = 1, for any Q E0 (1, Q) ≤ I(Q, W).

(17.112)

E0 (𝜌, Q)

Slo pe

=I

(Q

,W

)

I(Q, W )

1

Figure 17.35: Plot of E0 (𝜌, Q).

𝜌

Appendix 17.B The Relationship of the Battacharyya Parameter to Channel Capacity With q = 2, expanding the definition of E0 (𝜌, Q) in (17.108) at 𝜌 = 1 and with Q(x) = x = 0, 1, which we denote as E0 (1, 12 ),

877 1 2

for

) ( ]2 ∑ [1√ 1 1√ = − log W(y|0) + W(y|1) E0 1, 2 2 2 y∈ = − log

∑1 1 1√ W(y|0)W(y|1) + W(y|1) W(y|0) + 4 4 4 y∈

= − log

2 1 1 ∑√ W(y|0)W(y|1) = log + ∑√ 2 2 y∈ 1 + y W(y|0)W(y|1)

= log

2 . 1 + Z(W)

Hence, I(W) ≥ log

2 , 1 + Z(W)

which establishes (17.59). Similarly, for other values of q, set each probability Q(x) = 1q . Denoting this as E0 (1, 1q ), and taking  = {0, 1, . . . , q − 1}: [ ]2 ( ) ∑ 1 ∑√ 1 E0 1, W(y|x) = − log q q x∈ y∈ 2

⎡ ⎢ ⎢ ∑⎢ 1 = − log ⎢ 2 q y∈ ⎢ ⎢ ⎢ ⎣

⎛ ⎞⎤ ⎜ ⎟⎥ ⎜ ⎟⎥ ∑√ ∑ ⎜∑ ⎟⎥ W(y|x) + W(y|x)W(y|x′ )⎟⎥ ⎜ ⎜y∈ ⎟⎥ x,x′ ∈∶x≠x′ y∈ ⎜ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⎟⎥ ⎜ ⎟⎥ Z(W{x,x′ } ) ⎝ ⎠⎦ ∑ 1 q−1 1 = − log + Z(W{x,x′ } ) q q q(q − 1) x,x′ ∈∶x≠x′ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Z(W)

q = log . 1 + (q − 1)Z(W)

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

Proof of Theorem 17.38. (See [147, Appendix 5B].) It is straightforward to verify from (17.108) that E0 (0, Q) = 0. By differentiation, it is straightforward to show that

00000

α10 α11 α12 α13 α14 α15 α16

diffE0.m

𝜕E0 (𝜌, Q) || = I(Q, W). | 𝜕r |𝜌=0 For the rest of the proof, we will rely on the following fact ([147, p. 193]): Let [

Q(0), . . . , Q(K − 1)

]

878

Polar Codes be a probability vector and let a0 , . . . , aK−1 be non-negative numbers. Then the function f (s) = log

[K−1 ∑

]s 1∕s Q(k)ak

k=0

is nonincreasing and convex in s for s > 0. The convexity of f (s) means that for r > 0, s > 0 and t = 𝜆s + (1 − 𝜆)r for 0 < 𝜆 < 1, f (t) ≤ 𝜆f (s) + (1 − 𝜆)f (r). Exponentiating both sides of this, we obtain [ ∑

]t 1∕t Q(k)ak

[ ≤



k

1∕s Q(k)ak

]s𝜆 [ ∑

k

]r(1−𝜆) 1∕r Q(k)ak

.

(17.113)

k

Applying this in (17.108), we see that [



]1+𝜌 Q(k)W(j|k)

1∕(1+𝜌)

k

is nonincreasing with 𝜌 and hence (since log is monotonic increasing), E0 (𝜌, Q) is nondecreasing, establishing (17.109). To establish (17.111), it is sufficient to show that E0 (𝜌, Q) is concave in 𝜌, which is done as follows. Let 𝜌1 > 0 and 𝜌2 > 0, and let 𝜆 satisfy 0 < 𝜆 < 1. Let 𝜌 = 𝜌1 𝜆 + 𝜌2 (1 − 𝜆). From (17.113), [ ∑ ∑ j

]1+𝜌 Q(k)W(j|k)



1∕(1+𝜌)

[ ∑ ∑

k

j

×

](1+𝜌1 )𝜆 Q(k)W(j|k)

1∕(1+𝜌1 )

k

[ ∑ ∑ j

](1+𝜌2 )(1−𝜆) Q(k)W(j|k)

1∕(1+𝜌2 )

.

(17.114)

k

Hölder’s inequality says that ∑

aj bj ≤ [



j

aj ]𝜆 [ 1∕𝜆

j



1∕(1−𝜆) 1−𝜆

aj

]

.

j

Applying this to the right-hand side of (17.114) gives [ ∑ ∑ j

]1+𝜌 Q(k)W(j|k)

k

1∕(1+𝜌)

[ ]1+𝜌1 𝜆 ⎤ ⎡∑ ∑ 1∕(1+𝜌 ) ⎥ 1 ≤⎢ Q(k)W(j|k) ⎥ ⎢ j k ⎦ ⎣ [ ]1+𝜌2 1−𝜆 ⎤ ⎡∑ ∑ ⎥ . ×⎢ Q(k)W(j|k)1∕(1+𝜌2 ) ⎥ ⎢ j k ⎦ ⎣

Taking the negative logarithm of both sides yields E0 (𝜌, Q) ≥ 𝜆E0 (𝜌1 , Q), +(1 − 𝜆)E0 (𝜌2 , Q), so E0 (𝜌, Q) is concave in 𝜌.



Appendix 17.B The Relationship of the Battacharyya Parameter to Channel Capacity Appendix 17.B.3

879

Proof of Inequality (17.60) [16]

For any B-DMC W ∶  → , define the variational distance between the distributions as 1∑ |W(y|0) − W(y|1)|. 2 y∈

d(W) =

(17.115)

Let the output alphabet be  = {1, 2, . . . , m}, and express the capacity (17.57) as [ ] m ∑ Pi Qi 1 I(W) = Pi log2 1 , + Qi log2 1 2 P + 1Q P + 1Q i=1 2

i

2

i

2

i

2

i

where Pi = W(i|0) and Qi = W(i|1). Each bracketed term can be written as f (x) = x log2

x x + 2𝛿 + (x + 2𝛿) log2 , x+𝛿 x+𝛿

where x = min(Pi , Qi ) and 𝛿 = 12 |Pi − Qi |. To maximize f (x) with respect to x take the derivative and find that √ x(x + 2𝛿) df (x) 1 = log . dx 2 x+𝛿 The numerator is the geometric mean of the numbers x and (x + 2𝛿) and the denominator is the arithmetic mean of the numbers x and (x + 2𝛿). The arithmetic/geometric mean inequality [318, p. 874] says that the geometric mean is less than or equal to the arithmetic mean, so that df (x)∕dx ≤ 0. The maximum occurs when x = 0, in which case f (0) = 2𝛿, so f (x) ≤ 2𝛿. This means that I(W) ≤

m ∑ 1 i=1

2

|Pi − Qi | =

m ∑

𝛿i = d(W).

(17.116)

i=1

∑ Let 𝛿 = m i=1 𝛿i . The Battacharyya parameter Z(W) can be expressed in terms of the Pi and Qi and 𝛿i as follows. Let Ri = (Pi + Qi )∕2. Then Z(W) =

m m √ ∑ ∑ √ (Ri − 𝛿i )(Ri + 𝛿i ) = R2i − 𝛿i2 . i=1

i=1

∑ √ 2 Ri − 𝛿i2 , subject to the constraints that 0 ≤ 𝛿i ≤ Ri Z(W) is upper-bounded by the maximizing m i=1 ∑m and i=1 𝛿i = 𝛿. To do the maximization, compute the derivatives 𝛿 𝜕Z = −√ i 𝜕𝛿i R2i − 𝛿i2

2

R 𝜕2Z =− √ i . 2 3∕2 𝜕𝛿i R2i − 𝛿i2

Thus, Z(W) is a decreasing concave function of 𝛿i for each 𝛿i in the range 0 ≤ 𝛿i ≤ Ri . The constrained optimization problem can be expressed as minimize f0 (𝛿1 , . . . , 𝛿m ) = −

m √ ∑ Ri − 𝛿i2 i=1

subject to fi (𝛿i ) = 𝛿r − Ri ≤ 0, h=

m ∑ i=1

𝛿i − 𝛿 = 0.

i = 1, 2, . . . , m

880

Polar Codes The Lagrangian for this problem is m √ m ∑ ∑ L(𝛿1 , . . . , 𝛿m , 𝜆1 , . . . , 𝜆m , 𝜈) = − R2i − 𝛿i2 + 𝜆i fi (𝛿i ) + 𝜈 i=1

i=1

(

m ∑

) 𝛿i − 𝛿

.

i=1

Taking the gradient with respect to the vector of 𝛿i results in ⎡ − 𝜕Z ⎤ ⎡ 𝜆 ⎤ ⎡1⎤ 𝜕L ⎢ 𝜕𝛿1 ⎥ ⎢ 1 ⎥ = ⎢ ⋮ ⎥ + ⋮ + 𝜈 ⎢⋮⎥ = 0. ⎢ ⎥ 𝜕𝜹 ⎢ ⎥ ⎢𝜆 ⎥ ⎣1⎦ ⎣−𝜕Z𝛿m ⎦ ⎣ m ⎦

(17.117)

The slackness condition of the Karush–Kuhn Tucker (KKT) [49, Section 5.5.3] also require that 𝜆i fi (𝛿i ) = 0, with 𝜆i = 0 if fi (𝛿i ) < 0 (the i th constraint is inactive). Taking this as the case, the elements of (17.117) result in 𝜕Z = 𝜈 for i = 1, 2, . . . , m. 𝜕𝛿i That is, each partial has the same value. Using the derivatives computed above, 𝛿 −√ i = 𝜈, R2i + 𝛿i2 √ ∑m ∑m 2 2 which results in 𝛿√ i = Ri 𝜈 ∕(1 + 𝜈 ). Enforcing the constraint i=1 𝛿i = 𝛿 using the fact that i=1 Ri = 2 ) = 𝛿. This means that the maximum of Z(W) occurs when 𝛿 = 𝛿R , and 1, it follows that 𝜈 2 ∕(1 + 𝜈√ i i √ √ ∑ 2 − 𝛿 2 R2 = 2 = 2. this maximum value is m R 1 − 𝛿 1 − d(W) i=1 i i In summary, √ Z(W) ≤ 1 − d(W)2 . Turning this around,

√ 1 − Z(W)2 . √ Combining this with (17.116), we have I(W) ≤ 1 − Z(W)2 . d(W) ≤

(17.118)

Appendix 17.C Proof of Theorem 17.12 Let W′ (F(y0 , y1 )|u1 ) denote the transition probability for the W′ channel. For the proof of (17.66), Arıkan makes use of a very clever inequality. It can be verified algebraically that √ √ √ √ √ √ [ (𝛼𝛽 + 𝛿𝛾)(𝛼𝛾 + 𝛿𝛽)]2 + 2 𝛼𝛽𝛿𝛾( 𝛼 − 𝛿)2 ( 𝛽 − 𝛾)2 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ (∗)

√ √ √ √ √ = [( 𝛼𝛽 + 𝛿𝛾)( 𝛼𝛾 + 𝛿𝛽) − 2 𝛼𝛽𝛿𝛾]2 . Since the term (*) is ≥ 0, this leads to the inequality √ √ √ √ √ √ [ (𝛼𝛽 + 𝛿𝛾)(𝛼𝛾 + 𝛿𝛽)]2 ≤ [( 𝛼𝛽 + 𝛿𝛾)( 𝛼𝛾 + 𝛿𝛽) − 2 𝛼𝛽𝛿𝛾]2 .

(17.119)

Appendix 17.C Proof of Theorem 17.12 Write Z(W′ ) as

881

∑√ ′ W (F(y0 , y1 )|0)W′ (F(y0 , y1 |1))

Z(W′ ) =

y0 ,y1

√ 1 ∑ W(y0 |0)W(y1 |0) + W(y0 |1)W(y1 |1) 2 y ,y 0 1 √ × W(y0 |0)W(y1 |1) + W(y0 |1)W(y1 |0)

=

(using (17.61)). Now let 𝛼(y0 ) = W(y0 |0), 𝛽(y1 ) = W(y1 |0), 𝛿(y0 ) = W(y0 |1) and 𝛾(y1 ) = W(y1 |1) and write Z(W′ ) =

√ 1 ∑√ 𝛼(y0 )𝛽(y2 ) + 𝛿(y0 )𝛾(y1 ) 𝛼(y0 )𝛾(y1 ) + 𝛿(y0 )𝛽(y1 ). 2 y ,y 0 1

Using the inequality in (17.119), Z(W′ ) ≤

∑ 1 √ √ √ √ ( 𝛼(y0 )𝛽(y1 ) + 𝛿(y0 )𝛾(y1 ))( 𝛼(y0 )𝛾(y1 ) + 𝛿(y0 )𝛽(y1 )) 2 y0 ,y1 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ (∗∗) ∑√ − 𝛼(y0 )𝛽(y1 )𝛿(y1 )𝛾(y1 ).

(17.120)

y0 ,y1

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ (∗∗∗)

It is straightforward to show that ∑ y0 ,y1

√ 1 𝛼(y0 ) 𝛽(y1 )𝛾(y1 ) = Z(W), 2

(17.121)

and, similarly, each of the four terms in the expansion of (**) is equal to Z(W), so that (**) is equal to 2Z(W). Similarly, it can be shown that (***) is equal to Z(W)2 . Let W′′ (F(y0 , y1 ), u0 |u1 ) denote the transition probability for the W′′ channel. To prove (17.67): Z(W′′ ) =

∑ √ ′′ W (F(y0 , y1 ), u0 |u1 = 0)W′′ (F(y0 , y1 ), u0 |u1 = 1)

y0 ,y1 ,u0

=

∑ 1√ W(y0 |u0 )W(y1 |u1 = 0)W(y0 |u0 ⊕ 1)W(y1 |u1 = 1) 2 y ,y ,u 0 1

0

(using (17.62)) ∑√ ∑ 1 ∑√ = W(y1 |u1 = 0)W(y1 |u1 = 1) W(y0 |u0 )W(y0 |u0 ⊕ 1). 2 y y u 1

0

0

For any value of u0 , the inner sum is equal to Z(W). So the sum over u0 gives Z(W), absorbing the factor 12 since there are two terms in that sum. The outer sum over y1 is also equal to Z(W), giving Z(W ′ ) = Z(W)2 . To prove (17.68), we first establish a kind of convexity result for Z(W), then apply a version of Minkowski’s inequality.

882

Polar Codes First, let  be a set of indices, and let {Wj ∶  → , j ∈  } denote a collection of B-DMCs with ∑ channel transition probabilities Wj (y|x). Let Q be a probability distribution on  , j Qj = 1. Define (here) W ∶  →  as the channel with transition probabilities ∑ W(y|x) = Qj Wj (y|x). (17.122) j∈

The channel W thus works like this: Pick a channel Wj according to Q, then transmit on that channel. Claim: ∑ Qj Z(Wj ) ≤ Z(W). j∈

Proof of claim: [ ]2 1 ∑ ∑√ Z(W) = W(y|0)W(y|1) = −1 + W(y|x) 2 y y x ( )1∕2 2 ⎡ ⎤ 1 ∑ ⎢∑ ∑ ⎥ , Qj Wj (y|x) = −1 + ⎥ 2 y ⎢ x j∈ ⎣ ⎦ ∑√

(17.123)

(17.124)

which can be readily verified by expanding the sum on the right of (17.123). A version of Minkowski’s inequality [147, p. 524] says that for positive numbers ajk and a distribution Q, ( )1∕r r ( )r ⎤ ⎡∑ ∑ ∑ ∑ 1∕r ⎥ ⎢ Qj ajk ≤ Qj ajk ⎥ ⎢ j k k j ⎦ ⎣ for r < 1. Now applying this claim to (17.124) with r = 1∕2, we get [ ]2 ∑√ ∑ 1 ∑∑ Qj Wj (y|x) = Q(j)Z(Wj ). Z(W) ≥ −1 + 2 y j∈ x j∈

(17.125)

Using (17.61), W(F(y0 , y1 )|u0 ) =

1 [W(y0 |u0 )W(y1 |0) + W(y0 |u0 ⊕ 1)W(y1 |1)]. 2 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ W0 (y0 ,y1 |u0 )

W1 (y0 ,y1 |u0 )

The channel W′ is thus seen to be a combination of channels of the sort in (17.122), with B-DMCs W0 and W1 as identified. Since Z(W0 ) = Z(W1 ) = Z(W), applying (17.125) gives Z(W′ ) ≥

1 (Z(W0 ) + Z(W1 )) = Z(W). 2

It can furthermore be argued that Z(W′ ) = Z(W′′ ) if and only if I(W) = 0.

17.15 Exercises 1. For a BEC(p) channel show that I(W) = 1 − p. 2. For a BSC(p) channel show that I(W) = 1 − H2 (p). 3. For a BEC(p) channel show that Z(W) = p. √ 4. For a BSC(p) channel show that Z(W) = 4p(1 − p).

17.16 References 5. For 𝛼(y0 ), 𝛽(y1 ), and 𝛾(y1 ) as defined in Appendix 17.C, show that (17.121) is true. 6. In Equation (17.120), show that each of the term labeled (**) is equal to Z(W). Also show that the term labeled (***) is equal to Z(W)2 ).

17.16 References The theory of polar codes as described here was presented in [16] and this chapter draws much from that. The presentation here has benefitted from the insights of Emre Telatar [437]. Discussion of successive cancellation decoding benefitted from [461]. Systematic encoding was described first in [17]. Three different algorithms for systematic encoding were described in [460]. The presentation in Section 17.8 is their EncoderA. Another variation on this, which reports only O(N) complexity is [68]. The fast systematic decoding discussion is from [390]. List decoding was described in [431] and the coverage here closely follows their description. The C++ implementation is a close transcription of their pseudocode, filling in some omissions. A description of a log-likelihood ratio-based formulation is provided in [499]. Our formulation more closely follows [22], which also provides a description of hardware considerations and comparisons. Insights related to AEP come from [324]. Simplified SC decoding is described in [7]. Our discussion benefitted from [169] and [392].

883

Part VI

Applications

Chapter 18

Some Applications of Error Correction in Modern Communication Systems 18.1 Introduction Error-correction coding is ubiquitously used in modern communication systems. This chapter highlights how a few technological systems use error-correction coding to give a sense of the practical scale of “real” codes and how different methods of error-correction coding are combined together to make real systems. The variety of systems that could be explored here is enormous. For example, quick response (QR) codes employ Reed–Solomon codes. Depending on the manufacturer, flash memories employ different codes, such as LDPC, Reed–Solomon, or BCH. But the systems that are described here in some detail are related to broad-based communications technologies.

18.2 Digital Video Broadcast T2 (DVB-T2) The DVB-T2 standard describes channel coding and modulation for digital terrestrial TV. As a full-fledged communication system, it makes use of topics introduced in this book in several ways, including scrambling, CRC codes, and error correction [105]. Of the many system components involved in this standard, we discuss here only the forward error-correction components. This consists of concatenated coding, with a BCH as the outer code and an LDPC code as the inner code. (This is one of the changes between the DVB-T2 standard compared to the earlier DVB standard, which uses Reed–Solomon codes as the outer code and a convolutional code as the inner code, with convolutional interleaving [104].) Actually, the LDPC code is an irregular repeat-accumulate (IRA) code. The general FEC coding structure is shown in Figure 18.1. KBCH bits are BCH-encoded to produce NBCH bits. These are further LDPC-encoded to produce NLDPC bits. There are two codeword lengths for forward error correction (FEC) frame: a “normal” FECFRAME of NLDPC = 64,800 bits, and a “short” FECFRAME of NLDPC = 16,200 bits. The system is parameterized to deal with various block sizes and rates. The coding parameters for a normal FECFRAME are shown in Table 18.1 and the coding parameters for a short FECFRAME are shown in Table 18.2. 18.2.1

BCH Outer Encoding

The BCH generator polynomial for the outer codes, capable of correcting t errors, is obtained by multiplying together the first t minimal polynomials to obtain the specified amount of error correction. For the normal codes, these minimal polynomials are shown in Table 18.3. For the short codes, these minimal polynomials are shown in Table 18.4.

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

888

Some Applications of Error Correction in Modern Communication Systems NLDPC NBCH = KLDPC

NLDPC – KLDPC

KBCH

NBCH – KBCH

Data

BCH parity

LDPC parity

Figure 18.1: FEC encoding structure (see [105, Figure 12]).

Table 18.1: Coding Parameters for a Normal Frame NLDPC = 64,800 (See [105, Table 6(a)] LDPC BCH t-Error KBCH NBCH = KLDPC NBCH − KBCH NLDPC Code Rate Correction 1/2 32,208 32,400 12 192 64,800 3/5 38,688 38,880 12 192 64,800 2/3 43,040 43,200 10 160 64,800 3/4 48,408 48,600 12 192 64,800 4/5 51,648 51,840 12 192 64,800 5/6 53,840 54,000 10 160 64,800

Table 18.2: LDPC Code Rate 1/4 1/2 3/5 2/3 3/4 4/5 5/6

18.2.2

Coding Parameters for a Short Frame NLDPC = 16,200 (See [105, Table 6(b)] BCH t-Error KBCH NBCH = KLDPC NBCH − KBCH NLDPC Correction 3072 3240 12 168 16,200 7032 7200 12 168 16,200 9552 9720 12 168 16,200 10,632 10,800 12 168 16,200 11,712 11,880 12 168 16,200 12,432 12,600 12 168 16,200 13,152 13,320 12 168 16,200

LDPC Inner Encoding

Let 𝐢 = (i0 , i1 , . . . , iKLDPC−1 ) be the result of the outer BCH encoding. This is systematically encoded into a codeword 𝐜 = (i0 , i1 , . . . , iKLDPC−1 , p0 , p1 , . . . , pNLDPC −KLDPC −1 ). The “LDPC” code in the standard is actually an IRA code. The standard provides tables describing the interleaving of the different parities. One such table is shown in Table 18.5 for the rate 2/3

18.2 Digital Video Broadcast T2 (DVB-T2) Table 18.3: BCH Minimal Polynomials for Normal Codes (NLDPC = 64,800) (See [105, Table 7(a)]) g1 (x)

1 + x2 + x3 + x5 + x16

g2 (x)

1 + x + x4 + x5 + x6 + x8 + x16

g3 (x)

1 + x2 + x3 + x4 + x5 + x7 + x8 + x9 + x10 + x11 + x16

g4 (x) g5 (x)

1 + x2 + x4 + x6 + x9 + x11 + x12 + x14 + x16 1 + x + x2 + x3 + x5 + x8 + x9 + x10 + x11 + x12 + x16

g6 (x)

1 + x2 + x4 + x5 + x7 + x8 + x9 + x10 + x12 + x13 + x14 + x15 + x16

g7 (x)

1 + x2 + x5 + x6 + x8 + x9 + x10 + x11 + x13 + x15 + x16

g8 (x)

1 + x + x2 + x5 + x6 + x8 + x9 + x12 + x13 + x14 + x16

g9 (x)

1 + x5 + x7 + x9 + x10 + x11 + x16

g10 (x)

1 + x + x2 + x5 + x7 + x8 + x10 + x12 + x13 + x14 + x16

g11 (x)

1 + x2 + x5 + x9 + x11 + x12 + x13 + x16

g12 (x)

1 + x + x5 + x6 + x7 + x9 + x11 + x12 + x16

Table 18.4: BCH Minimal Polynomials for Normal Codes (NLDPC = 6200) (See [105, Table 7(b)]) g1 (x)

1 + x + x3 + x5 + x14

g2 (x)

1 + x6 + x8 + x11 + x14

g3 (x)

1 + x + x2 + x6 + x9 + x10 + x14

g4 (x) g5 (x)

1 + x4 + x7 + x8 + x10 + x12 + x14 1 + x2 + x4 + x6 + x8 + x9 + x11 + x13 + x14

g6 (x)

1 + x3 + x7 + x8 + x9 + x13 + x14

g7 (x)

1 + x2 + x5 + x6 + x7 + x10 + x11 + x14

g8 (x)

1 + x5 + x8 + x9 + x10 + x14

g9 (x)

1 + x + x2 + x3 + x9 + x10 + x14

g10 (x)

1 + x3 + x6 + x9 + x11 + x12 + x14

g11 (x)

1 + x4 + x11 + x12 + x14

g12 (x)

1 + x + x2 + x3 + x5 + x6 + x7 + x8 + x10 + x13 + x14

code with NLDPC = 64,800. Quoting [105, Section 6.1.2.1], the data in this table are to be used as follows.



Set the parities to 0: p0 = p1 = · · · = pNLDPC −KLDPC −1 = 0.

889

890

Some Applications of Error Correction in Modern Communication Systems Table 18.5: IRA Indices for DVB-T2 Rate 2/3 Code, NLDPC = 64,800 (See [105, Table A.3]) [317 2255 2324 2723 3538 3576 6194 6700 9101 10057 12739 17407 21039] [1958 2007 3294 4394 12762 14505 14593 14692 16522 17737 19245 21272 21379] [127 860 5001 5633 8644 9282 12690 14644 17553 19511 19681 20954 21002] [2514 2822 5781 6297 8063 9469 9551 11407 11837 12985 15710 20236 20393] [1565 3106 4659 4926 6495 6872 7343 8720 15785 16434 16727 19884 21325] [706 3220 8568 10896 12486 13663 16398 16599 19475 19781 20625 20961 21335] [4257 10449 12406 14561 16049 16522 17214 18029 18033 18802 19062 19526 20748] [412 433 558 2614 2978 4157 6584 9320 11683 11819 13024 14486 16860] [777 5906 7403 8550 8717 8770 11436 12846 13629 14755 15688 16392 16419] [4093 5045 6037 7248 8633 9771 10260 10809 11326 12072 17516 19344 19938] [2120 2648 3155 3852 6888 12258 14821 15359 16378 16437 17791 20614 21025] [1085 2434 5816 7151 8050 9422 10884 12728 15353 17733 18140 18729 20920] [856 1690 12787] [6532 7357 9151] [4210 16615 18152] [11494 14036 17470] [2474 10291 10323] [1778 6973 10739] [4347 9570 18748] [2189 11942 20666] [3868 7526 17706] [8780 14796 18268] [160 16232 17399] [1285 2003 18922] [4658 17331 20361] [2765 4862 5875] [4565 5521 8759] [3484 7305 15829] [5024 17730 17879] [7031 12346 15024] [179 6365 11352] [2490 3143 5098] [2643 3101 21259] [4315 4724 13130] [594 17365 18322] [5983 8597 9627] [10837 15102 20876] [10448 20418 21478] [3848 12029 15228] [708 5652 13146] [5998 7534 16117] [2098 13201 18317] [9186 14548 17776] [5246 10398 18597] [3083 4944 21021] [13726 18495 19921] [6736 10811 17545] [10084 12411 14432] [1064 13555 17033] [679 9878 13547] [3422 9910 20194] [3640 3701 10046] [5862 10134 11498] [5923 9580 15060] [1073 3012 16427] [5527 20113 20883] [7058 12924 15151] [9764 12230 17375] [772 7711 12723] [555 13816 15376]

[10574 11268 17932] [15442 17266 20482] [390 3371 8781] [10512 12216 17180] [4309 14068 15783] [3971 11673 20009] [9259 14270 17199] [2947 5852 20101] [3965 9722 15363] [1429 5689 16771] [6101 6849 12781] [3676 9347 18761] [350 11659 18342] [5961 14803 16123] [2113 9163 13443] [2155 9808 12885] [2861 7988 11031] [7309 9220 20745] [6834 8742 11977] [2133 12908 14704] [10170 13809 18153] [13464 14787 14975] [799 1107 3789] [3571 8176 10165] [5433 13446 15481] [3351 6767 12840] [8950 8974 11650] [1430 4250 21332] [6283 10628 15050] [8632 14404 16916] [6509 10702 16278] [15900 16395 17995] [8031 18420 19733] [3747 4634 17087] [4453 6297 16262] [2792 3513 17031] [14846 20893 21563] [17220 20436 21337] [275 4107 10497] [3536 7520 10027] [14089 14943 19455] [1965 3931 21104] [2439 11565 17932] [154 15279 21414] [10017 11269 16546] [7169 10161 16928] [10284 16791 20655] [36 3175 8475] [2605 16269 19290] [8947 9178 15420] [5687 9156 12408] [8096 9738 14711] [4935 8093 19266] [2667 10062 15972] [6389 11318 14417] [8800 18137 18434] [5824 5927 15314] [6056 13168 15179] [3284 13138 18919] [13115 17259 17332]

18.2 Digital Video Broadcast T2 (DVB-T2)



891

Accumulate the first information bit i0 at parity bit addresses specified in Table 18.5. Using the first row of the table as indices, p317 = p317 ⊕ i0

p6700 = p6700 ⊕ i0

p2255 = p2255 ⊕ i0

p9101 = p9101 ⊕ i0

p2324 = p2324 ⊕ i0

p10057 = p10057 ⊕ i0

p2723 = p2723 ⊕ i0

p12739 = p12739 ⊕ i0

p3538 = p3528 ⊕ i0

p17407 = p17407 ⊕ i0

p3576 = p3576 ⊕ i0

p21039 = p21039 ⊕ i0

p6194 = p6194 ⊕ i0



Similarly, for each of the next 359 information bits im , m = 1, 2, . . . , 359, accumulate im at parity bit addresses (x + (m (mod 360))QLDPC ) (mod (NLDPC − KLDPC )), where x denotes the address of the parity bit accumulated corresponding to the first bit i0 , and QLDPC is a code rate-dependent constant specified in Table 18.6. For rate 2/3, QLDPC = 2∕3. For example, for the information bit i0 , the following parity operations are performed: p377 = p377 ⊕ i0

p6760 = p6760 ⊕ i0

p2315 = p2315 ⊕ i0

p9161 = p9161 ⊕ i0

p2384 = p2384 ⊕ i0

p10117 = p10117 ⊕ i0

p2783 = p2783 ⊕ i0

p12799 = p12799 ⊕ i0

p3598 = p3598 ⊕ i0

p17467 = p17467 ⊕ i0

p3636 = p3636 ⊕ i0

p21099 = p21099 ⊕ i0

p6254 = p6254 ⊕ i0



For the 361st information bit i360 , the addresses of the parity bit accumulators are given in the second row of the table (1958, 2007, . . . ). In a similar manner, the addresses of the parity bit accumulators for the following 359 information bits are obtained using (x + (m (mod 360))QLDPC ) (mod (NLDPC − KLDPC )), where x denotes the address of the parity bit accumulator corresponding to information bit i360 , that is, the entries in the second row of the table.



In a similar manner, for every group of 360 new information bits, a new row from the table is used to find the address of the parity bit accumulators. Table 18.6: QLDPC Values for Frames of Normal and Short Length N LDPC = 16; 200 N LDPC = 64; 800 Code Rate Q LDPC Code Rate Q LDPC 1/4 36 1/2 90 1/2 25 3/5 72 3/5 18 2/3 60 2/3 15 3/4 45 3/4 12 4/5 36 4/5 10 5/6 30 5/6 8

892

Some Applications of Error Correction in Modern Communication Systems After all of the information bits are exhausted, the final parity bits are obtained as follows:



Sequentially perform the following operations, starting with i = 1: pi = pi ⊕ pi−1

i = 1, 2, . . . , NLDPC − KLDPC − 1.

The final content of pi is the parity bit. (End quote.) The function ldpcencode.m performs this encoding function. For decoding using message-passing decoding, it is helpful to represent the linear operations as a parity check matrix. Summarizing some of the computations above, a parity check matrix (using 0-based indexing) is produced by the following computations (a small subset is shown). For the columns of H associated with the information bits:

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

ldpcencode.m α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

H(2255, 0) = 1

H(2255, 0) = 1,

...

H(377, 1) = 1

H(2315, 1) = 1

H(2384, 1) = 1,

...

H(437, 2) = 1

H(2375, 2) = 1

H(2444, 2) = 1,

...

For the columns of H associated with the parity bits:

00000

α10 α11 α12 α13 α14 α15 α16

H(317, 0) = 1

H(0, 43200) = 1 H(1, 43201) = 1

H(2, 43202) = 1

H(3, 43203) = 1,

...

Using this H matrix, any of the LDPC decoding algorithm can be used on this LDPC code. This concatenated code is similar to that used in the DVB-S2 (digital video broadcast satellite), which also uses BCH and LDPC (really IRA) codes. (The IRA interleaving details are different between the two standards.)

18.3 Digital Cable Television

ldpccodesetupl.m

Digital cable television is a way of delivering digital video content distinct from wireless transmission. Since cable carries many television channels, it is important to transmit in a way that is bandwidth efficient. And, of course, it is important for the transmission to be virtually error-free. In digital TV, the video data is first compressed (using, for example, MPEG-2 compression), then the resulting digital data are protected using error-correction coding. One way of achieving this may be similar to the framework shown in Figure 18.2, drawn from the U.S. Patent 5,511,096 (figure 1). It employs concatenated coding, with Reed–Solomon codes as the outer code and convolutional coding as the inner code. In the patent, the following details are described:



The Reed–Solomon code is either (127,121) or (128,122), using data in 7-bit bytes.

Compressed digitalvideo data

Sync/control insert Reed–Solomon encoder

Downconverter

Interleaver

QAM demod.

Iˆ Qˆ

Trellis encoder

Iˆ Qˆ

QAM mod.

Upconverter

Channel (cable) Trellis decoder

Deinterleaver stripper

Reed–Solomon decoder

Information bits

Figure 18.2: Concatenated Reed–Solomon and Trellis coding for television transmission (see [204]).

18.3 Digital Cable Television

893 4 bits/subset

Bits in (28 bits)

Split bit stream

Select symbol from subset

20 bits

8 bits

Rate 4/5 convolutional encoder

10 bits

I Q

(5 symbols out)

Select symbol subset 2 bits/subset

Figure 18.3: Trellis encoder ([204, Figure 2]).

• •

The interleaver is used to minimize the effects of burst errors. A 7-bit synchronization/control symbol is inserted into the stream. When the (127,121) code is used, the symbol is inserted every 10 codewords; when the (128,122) code is used, the symbol is inserted every 20 codewords. With the inclusion of this symbol, the effective rate is (in both cases) symbols out 20 ⋅ 121 20 10 ⋅ 122 20 R= = = R= = . symbols in 20 ⋅ 127 + 1 21 10 ⋅ 128 + 1 21



These blocks are encoded using a “trellis encoder” having rate of either 4/5 or 3/4. The rate 4/5 encoder is used with a 64-QAM constellation and the rate 3/5 encoder is used with a 16-QAM constellation.



The bits produced by the trellis encoder along the I and Q branches are used to select points in the QAM constellation. This complex baseband symbol produces a baseband waveform which is upconverted for transmission.

The structure of the “trellis encoder” is shown in Figure 18.3, which is the structure used for the 64-QAM system. The input to this system is four 7-bit symbols.1 These 28 bits are split into 20 uncoded bits and 8 other bits which pass through a rate 4/5 convolutional encoder which produces 10 coded bits, producing a total of 30 bits. Each 64-QAM symbol is selected with a 6-bit index, so these 30 bits determine five symbols. The 10 coded bits, 2 bits per symbol, are used to select one of four signal subsets for each of five symbols, as illustrated in Figure 18.4. The output of the convolutional encoder may be viewed as a sequence of subsets. For each subset (containing 16 bits), 4 bits out of the 20 uncoded bits are used to select a symbol from its selected subset. Each symbol is represented by its II (x) component and Q (y) component. Returning to Figure 18.2, after passing through a channel (which such as the cable, for cable TV) which introduces, among other things, noise (which is assumed to be AWGN), the received signal is down-converted and the complex baseband signal is passed through a QAM demodulator (comprising ̂ These are passed to a “trellis a matched filter), which produces coordinates in the signal space Î and Q. decoder.” The “trellis decoder” is shown in Figure 18.5. The first step in the trellis decoder is what [204] calls a “pruner.” The pruner produces two kinds of outputs, an output called “metrics” and an ̂ to each subset, output called “uncoded bits.” The metrics are distances from the received point (Î, Q) i.e., the nearest point in each subset, as suggested by Figure 18.6. These metrics are provided as inputs to a Viterbi decoder, which provides a maximum likelihood estimate of the sequence of subsets. The output of this Viterbi algorithm is a sequence of 2 bit pair which indicate which subset is associated with the received symbol.

1

The patent [204] is scaled differently by a factor of two.

894

Some Applications of Error Correction in Modern Communication Systems

0

0

1

1

0

1

Figure 18.4: Set partition of 64-QAM. At the last stage, different shapes are used to indicate the different subsets. (This decomposition is for example here, and is not indicated in [204].) Delay buffer Iˆ



Select

Uncoded bits 2 bits/symbol

“Pruner” metrics

Viterbi decoder

Serializer

Rate 4/5 convolutional encoder

Figure 18.5: Trellis decoder ([204, Figure 3]).

ˆ Q) ˆ (I,

̂ to subsets in the 64-QAM constellation. Figure 18.6: Distances from (Î, Q)

Bits out

18.4 E-UTRA and Long-Term Evolution

895

Puncture map =

1 0 0 0 1 1 1 1

Puncture logic

Puncture output buffer Figure 18.7: Rate 4/5 convolutional encoder.

The other output of the pruner is a sequence of bits corresponding to the uncoded bits that selects a point within a subset, 4 bits per symbol, with one set of 4 bits for each subset. As Figure 18.6 suggests, any of these set of 4 bits could occur with roughly equal likelihood. To select which of these is likely to be the best, the output of the Viterbi decoder is re-encoded using the same encoder used in the trellis encoder. This re-encoded data is used to select a set of the uncoded bits from the pruner, which have been passed through a delay buffer that accounts for the delay in the Viterbi decoder and the re-encoding operation. These bits are placed in order with the output of the Viterbi decoder in the serializer, then output. The bits of the trellis decoder block are passed through a deinterleaver/stripper. The stripper part removes the synchronization byte (where it occurs) and the deinverleaver puts the bytes back into correct order for the Reed–Solomon decoder. The Reed–Solomon decoder removes any residual errors left after the trellis decoder. Since errors remaining after Viterbi decoding tend to be bursty, the Reed–Solomon decoder is well-suited as an outer code, since it deals well with bursty errors. Figure 18.7 shows an example of a rate 4/5 convolutional encoder. Four bits are provided into the convolutional structure, which is a rate 1/2 code with memory length, with generators 10101 (25 octal) and 11111 (37 octal). The output of the encoder is then punctured by the puncture map [ ] 1 0 0 0 1 1 1 1 and the five resulting bits are buffered for output.

18.4 E-UTRA and Long-Term Evolution The Evolved Universal Telecommunications System (UMTS) Terrestrial Radio Access (E-UTRA) is the air interface of the Third Generation Partnership (3GPP) Long-Term Evolution (LTE) upgrade path for mobile networks. In short, it is a fourth-generation wireless communication standard. Naturally, it employs error-correction coding. Modern wireless communication systems depend on a variety of technologies up and down a networking protocol stack, including MIMO and space-time coding. The error-correction coding aspects function primarily at layer 1, the physical layer. However, without a reliable physical layer upon which to build the rest of the system, wireless technology as presently widely enjoyed could not exist. The modulation employed in LTE is orthogonal frequency division multiplexing (OFDM), which employs multiple carriers to carry information from the signal constellation. This can be efficiently implemented using fast Fourier transforms. Often, real channels are frequency selective, so that some frequency subbands experience different amounts of attenuation (fade). In an OFDM setting, this can present a problem, because entire sets of

896

Some Applications of Error Correction in Modern Communication Systems OFDM signal Subcarriers experiencing bad channel quality

frequency

RF channel quality

(a) bits Channel coding c1

bits

Channel coding

Frequency interleaving

c2

OFDM modulation

RF channel quality

c3

c4

Frequency interleaving (map to subcarriers)

frequency

(b)

Figure 18.8: OFDM transmission over a frequency-selective channel (without and with coding) (see [87, Section 4.7]). (a) OFDM transmission over a frequency-selective channel; (b) channel coding and interleaving to provide frequency diversity.

subcarriers can be wiped out by a fade. Channel coding can provide an important frequency diversity in such frequency-selective channels. In Figure 18.8(a), there is a representation of several subcarriers (the vertical bars) and the quality of the channel as a function of frequency. A few of the subcarriers experience severe attenuation (low-quality channels). Symbols transmitted on those subcarriers would experience severe degradation. The situation is mitigated by the use of channel coding, as suggested by Figure 18.8(b). Coded spreads information across several symbols. These symbols are then distributed across different frequency bands by an interleaver. A fade experienced by a coded symbol in some subchannel is compensated by information in other coded symbol which do not experience the same fade. This is sometimes referred to as frequency interleaving. LTE uses several channels to perform various communication and control functions. These are summarized in Table 18.7, along with the coding methods they employ.

18.4 E-UTRA and Long-Term Evolution

897

Table 18.7: Channel Coding Schemes for Transmission and Control Channels [2, Section 5.1.3] Transmission Channels Acronym Description Coding Scheme Rate UL-SCH Uplink shared channel Turbo coding 1/3 DL-SCSH Downlink shared channel Turbo coding 1/3 PCH Paging channel Turbo coding 1/3 MCH Multicast channel Turbo coding 1/3 SL-SCH Sidelink shared channel Turbo coding 1/3 SL-DCH Sidelink discovery channel Turbo coding 1/3 BCH Broadcast channel TB convolutional coding 1/3 SL-BCH Sidelink broadcast channel TB convolutional coding 1/3 Control Channels Acronym Description Coding Scheme Rate DCI Downlink control information TB convolutional coding 1/3 CFI Control format indicator TB convolutional coding 1/3 HI HARQ indicator repetition 1/3 UCI Uplink control indicator Uci (variable) or TB CC 1/3 SCI Sidelink control information TB convolutional coding 1/3

Output 0 G0 = 557 Output 1 G1 = 663 Output 2 G2 = 711

Figure 18.9: Rate 1/3 convolutional encoder for LTE [1, Figure 2].

18.4.1

LTE Rate 1/3 Convolutional Encoder

The rate 1/3 convolutional code uses generators 557, 663, and 711, as shown in Figure 18.9. To drive the final state of a codeword to 0, 8 bits of 0 are added to the end of the code block before encoding, which results in a slight reduction in rate. The state of the encoder is set to 0 before encoding. 18.4.2

LTE Turbo Code

The LTE rate 1/3 turbo code encoder is shown in Figure 18.10. The transfer function for the constituent parallel concatenated convolutional coders is [ ] 2 3 G(X) = 1 1+x +x3 . 1+x+x

At the beginning of encoding, the shift registers of both encoders are set to 0. The input sequence is x1 , x2 , x3 , . . . . The output from the encoder is taken in the order x1 , z1 , z′1 , x2 , z2 , z′2 , . . . , xK , zK , z′K , where zi and z′i are the outputs from the first and second constituent encoders, and K is the number of input bits.

898

Some Applications of Error Correction in Modern Communication Systems xk zk xk

Turbo code interleaver

z′k

x′k x′k

Figure 18.10: LTE rate 1/3 turbo code encoder [1, Figure 3].

Trellis termination is accomplished by moving the upper switch in the figure to lower position (disconnecting the encoder from the input and connecting it to the feedback path). The encoder is then clocked three more times, with the lower constituent encoder disabled. Then the last three tail bits (passed through the interleaver) are used to terminate the second constituent encoder, with the lower switch in lower position. The transmitted bits for trellis termination are [1, Section 4.2.3.2.2] ′ ′ ′ , z′K+1 , xK+2 , z′K+2 , xK+3 , z′K+3 . xK+1 , zK+1 , xK+2 , zK+2 , xK+3 , zK+3 , xK+1

The interleaver (permuter) is specifically described in the standard, and works (essentially) by writing into the rows of a matrix then reading out from the columns of a matrix.

18.5 References A convenient summary of LTE is provided in (for example) [87]. More complete details, of course, can be found in the standards available online.

Part VII

Space-Time Coding

Chapter 19

Fading Channels and Space-Time Codes 19.1 Introduction For most of this book the codes have been designed for transmission through either an AWGN channel or a BSC. One exception is the convolutive channel, for which turbo equalization was introduced in Section 14.7. In this chapter, we introduce a coding technique appropriate for Rayleigh flat fading channels. Fading is a multiplicative change in the amplitude of the received signal. As will be shown in Section 19.2, fading is mitigated by means of diversity, that is, multiple, independent transmissions of the signal. Space-time coding provides a way of achieving diversity for multiple transmit antenna systems with low-complexity detection algorithms. A very important “meta-lesson” from this chapter is that the coding employed in communicating over a channel should match the particular requirements of the channel. In this case, space-time codes are a response to the question: Since diversity is important to communicating over a fading channel, how can coding be used to obtain diversity for portable receivers? A discussion of the fading channel and its statistical model are presented in Section 19.2. In Section 19.3, the importance of diversity in combating fading is presented. Section 19.4 presents space-time codes, which are able to provide diversity with only a single receive antenna and moderate decode complexity. Trellis codes used as space-time codes are presented in Section 19.5.

19.2 Fading Channels In the channels most frequently used in this book, communication has been from the transmitter directly to the receiver, with only additive noise and attenuation distorting the received signal. In many communication problems, however, the transmitted signal may be subject to multiple reflections. Furthermore, the reflectors and transmitter may be moving with respect to each other, as suggested by Figure 19.1. For example, in an urban environment, a signal transmitted from a cell phone base station may reflect off of buildings, cars, or even trees, so that the signal received at a cell phone may consist only of the superposition of reflected signals. In fact, such impediments are very typical of most wireless channels. The received signal may thus be represented in the (complex baseband) form ∑ r(t) = 𝛼n (t)e−j2𝜋fc 𝜏n (t) s(t − 𝜏n (t)) + n(t) n

=



𝛼n (t)e−j𝜃n (t) s(t − 𝜏n (t)) + n(t),

(19.1)

n

where s(t) is the transmitted signal, 𝛼n (t) is the attenuation on the nth path (which may be time-varying), 𝜏n (t) is the delay on the nth path, and e−j2𝜋fc 𝜏n (t) represents the phase change in the carrier with frequency fc due to the delay, with phase 𝜃n (t) = 2𝜋𝜏n (t)fc . 𝛼n (t), 𝜃n (t), and 𝜏n (t) can be considered as random processes. The noise n(t) is a complex, stationary, zero-mean Gaussian random process

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

902

Fading Channels and Space-Time Codes

Transmitter Receiver

Figure 19.1: Multiple reflections from transmitter to receiver.

with independent real and imaginary parts and E[n(t)n∗ (s)] = N0 𝛿(t − s). In the limit, if the number of reflectors can be regarded as existing over a continuum (e.g., for signals reflecting off the ionosphere), the received signal can be modeled as r(t) =



𝛼s (t)e−j𝜃s (t) s(t − 𝜏s (t)) ds + n(t).

Frequently, the delays are similar enough relative to the symbol period that for all practical purposes the delayed signals s(t − 𝜏n (t)) are the same, so sn (t − 𝜏n (t)) = s(t − 𝜏(t)) for all n. However, even in this case the changes in phase due to delay can be significant. Since fc is usually rather large (in the megahertz or gigahertz range), small changes in delay can result in large changes in phase. Example 19.1 A signal transmitted at fc = 900 MHz is reflected from two surfaces in such a way that at some particular instant of time, one signal to receiver travels 0.16 m farther than the other signal. The time difference is therefore 0.16 = 5.33 × 10−10 s 𝜏= c and the phase difference is

𝜃 = 2𝜋𝜏fc = 3.0159 radians = 172.8∘ .

The change in phase in the carrier results in a factor of ej2𝜋3.0159 ≈ −1 between the two signals — the two received signals will almost cancel out! ◽

Fading in this case is thus due primarily to changes in the phase 𝜃n (t). The randomly varying phase 𝜃n (t) associated with the factor 𝛼n e−j𝜃n results in signals that at times add constructively and at times add destructively. When the signals add destructively, then fading occurs. If all of the delays are approximately equal, say, to a delay 𝜏, then ∑ r(t) = 𝛼n (t)e−j𝜃n (t) s(t − 𝜏) + n(t) = g(t)s(t − 𝜏) + n(t), (19.2) n

19.2 Fading Channels

903 10

|g(t)|, dB

0

−10

−20

−30

0

50

100 Time t/T

150

200

Figure 19.2: Simulation of a fading channel. where g(t) =



𝛼n (t)e−j𝜃n (t) ≜ gI (t) + jgQ (t) ≜ 𝛼(t)e−j𝜙(t)

n

is a time-varying complex amplitude factor. The channel transfer function is then T(t, f ) = 𝛼(t)e−𝜙(t) , with magnitude response |T(t, f )| = |g(t)| = 𝛼(t). Since all frequency components are subjected to the same gain 𝛼(t), the channel is said to induce flat fading. The channel model for which the space-time codes of this chapter are applicable is flat fading. The effect of fading on the received signal can be severe. For example, suppose that 90% of the time the signals add constructively, resulting in an SNR at the receiver so that the probability of error is essentially 0, but that 10% of the time the channel introduces a deep fade, so that the probability of error is essentially 0.5. The probability of error averaged over time is then 0.05, much too high for most practical purposes, even though the receiver works perfectly most of the time! As we shall see, the way to combat fading is through diversity, sending multiple copies of the signal in the expectation that not all of the signals will fade simultaneously. Consider, for example, the plot in Figure 19.2, which shows a simulation of |g(t)| (in dB) for a particular channel for two different realizations of the channel. From the plot, it is clear that both of the signals are not necessarily highly attenuated at the same time. If these represented two different paths from transmitter to receiver, there is hope that at least one of the paths would present a reliable channel. Looked at from another point of view, if at one instant of time one channel is bad, at another instant of time, that channel might be good. These observations lead to various forms of diversity. In time diversity, the transmitter sends the same signal at different times, with sufficient delay between symbols that the transmissions experience independent fading. Time diversity may be accomplished using error-control coding in conjunction with interleaving. A second means of diversity is frequency diversity, in which the signal is transmitted using carriers sufficiently separated that the channel over which the signals travel experience independent fading. This can be accomplished using spread spectrum techniques or multiple carriers. A third means of diversity is spatial diversity, in which the signal is transmitted from or received by multiple antennas, whose spatial separation is such that the paths from transmitter antennas to receiver

α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

00000

α10 α11 α12 α13 α14 α15 α16

fadeplot.m jakes.m

904

Fading Channels and Space-Time Codes antennas experience independent fading. Spatial diversity has the advantage of good throughput (not requiring multiple transmission for time diversity) and good bandwidth (not requiring broad bandwidth for frequency diversity), at the expense of some additional hardware. Space-time codes are essentially a means of achieving spatial diversity. 19.2.1

Rayleigh Fading

Since the amplitude factor g(t) is the summed effect of many reflectors, it may be regarded (by the central limit theorem) as a complex Gaussian random variable. That is, g(t) = gI (t) + jgQ (t) has gI (t) and gQ (t) as independent, identically distributed random variables. If there is no strong direct path signal from transmitter to receiver, then these random variables are modeled as zero-mean random variables, so gI (t) ∼  (0, 𝜎f2 ) and gQ (t) ∼  (0, 𝜎f2 ), where 𝜎f2 is the fading variance. It can be shown that the magnitude 𝛼 = |g(t)| is Rayleigh distributed (see Exercise 19.2), so f𝛼 (𝛼) =

𝛼 −𝛼2 ∕2𝜎f2 e 𝜎f2

𝛼 ≥ 0.

(19.3)

A flat fading channel with magnitude distributed as (19.3) is said to be a Rayleigh fading channel. In the channel model (19.2), it is frequently assumed that 𝜏 is known (or can be estimated), so that it can be removed from consideration. On this basis, we write (19.2) as r(t) = g(t)s(t) + n(t). Let this r(t) represent a BPSK-modulated signal transmitted with energy per bit Eb . Suppose furthermore that g(t) = 𝛼(t)e−j𝜙(t) is such that the magnitude 𝛼(t) is essentially constant over at least a few symbols, and that random phase 𝜙(t) varies slowly enough that it can be estimated with negligible error. This is the quasistatic model. Then conventional BPSK detection can be used. This results in a probability of error (see (1.23)) for a particular value of 𝛼 as ⎛ P2 (𝛼) = P(bit error|𝛼) = Q ⎜ ⎜ ⎝



2𝛼 2 Eb ⎞⎟ . N0 ⎟ ⎠

(19.4)

The probability of bit error is then obtained by averaging out the dependence on 𝛼: ∞

P2 =

∫0

P2 (𝛼)f𝛼 (𝛼) d𝛼.

Substituting (19.3) and (19.4) into this integral and integrating by parts (twice) yields ⎛ 1 P2 = ⎜1 − 2⎜ ⎝



𝛾 b ⎞⎟ , 1 + 𝛾b ⎟ ⎠

(19.5)

where α α2 α3 α4 α5 α6 α7 α8 α9 +

01010

𝛾b =

Eb E[𝛼 2 ]. N0

00000

α10 α11 α12 α13 α14 α15 α16

fadepbplot.m

In Figure 19.3, the line corresponding to m = 1 illustrates the performance of the narrowband fading channel, plotted as a function of 𝛾 b (in dB). Clearly, there is significant degradation compared to BPSK over a channel impaired only by AWGN.

19.3 Diversity Transmission and Reception: The MIMO Channel

905

Average probability of bit error

10–1

m=1 m=2 m=3 m=4 m=6 m = 10

10–2 10–3 10–4 10–5 10–6 10–7 10–8

0

5

10

15 20 Average γb (dB)

25

30

Figure 19.3: Diversity performance of quasi-static, flat-fading channel with BPSK modulation. Solid is exact; dashed is approximation.

19.3 Diversity Transmission and Reception: The MIMO Channel To provide some background for diversity receivers, let us now consider the somewhat more general problem of transmission through a general linear multiple-input/multiple-output (MIMO) channel. For insight, we first present the continuous-time channel model, then a discrete-time equivalent appropriate for detection (e.g., after filtering and sampling). Suppose that the signal ∞ ∑ aik 𝜑(t − kT) si (t) = k=−∞

is transmitted from the ith antenna in a system of n antennas, where aik is the (complex) signal amplitude drawn from a signal constellation for the kth symbol period, and 𝜑(t) is the transmitted pulse shape, normalized to unit energy. (See Figure 19.4.) This signal passes through a channel with impulse response h̃ ji (t) and is received by the jth receiver antenna out of m antennas, producing rj (t) = si (t) ∗ h̃ ji (t) + nj (t) =

∞ ∑

aik [𝜑(t − kT) ∗ h̃ ji (t)] + nj (t).

k=−∞

Each h̃ ji (t) may be the response due to scattering and multipath reflections, just as for a single fading channel. The total signal received at the jth receiver due to all transmitted signals is rj (t) =

∞ n ∑ ∑

aik [𝜑(t − kT) ∗ h̃ ji (t)] + nj (t)

k=−∞ i=1

(assuming that the noise nj (t) is acquired at the receiver, not through the separate channels). Stacking up the vectors as ⎡a1,k ⎤ ⎡ r1 (t) ⎤ ⎡ n1 (t) ⎤ ⎢a ⎥ ⎢ r (t) ⎥ ⎢ n (t) ⎥ 𝐚(k) = ⎢ 2,k ⎥ 𝐫(t) = ⎢ 2 ⎥ 𝐧(t) = ⎢ 2 ⎥ , ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎣an,k ⎦ ⎣rm (t)⎦ ⎣nm (t)⎦

906

Fading Channels and Space-Time Codes

Fading channels

Receiver 1

s1 (t)

n1 (t)

Transmitter 1

r1 (t) Receiver 2

s2 (t)

n2 (t)

Transmitter 2

r2 (t)

Receiver m

sn (t)

nm (t)

Transmitter n

rm (t)

Figure 19.4: Multiple transmit and receive antennas across a fading channel. we can write 𝐫(t) =

∞ ∑

𝐇(t − kT)𝐚k + 𝐧(t),

(19.6)

k=−∞

where 𝐇(t) is the m × n matrix of impulse responses with hji (t) = 𝜑(t) ∗ h̃ ji (t) =



∫−∞

𝜑(𝜏)h̃ ji (t − 𝜏) d𝜏.

If the vector noise process 𝐧(t) is white and Gaussian, with independent components, then the log-likelihood function can be maximized by finding the sequence of vector signals 𝐚 ∈  n minimizing ∞

J=

∫−∞

||𝐫(t) −

∞ ∑ k=−∞



=

∫−∞

||𝐫(t)||2 − 2

∞ ∑

[ Re 𝐚∗k

k=−∞ ∞

+

𝐇(t − kT)𝐚k ||2 dt

∫−∞

||



]



∫−∞

𝐇H (t − kT)𝐫(t) dt

𝐇(t − kT)𝐚k ||2 dt,

k

where ||𝐱||2 = 𝐱H 𝐱 and where H denotes the transpose-conjugate and ∗ denotes complex conjugation. In this general case, the minimization can be accomplished by a maximum-likelihood vector sequence estimator, that is, a Viterbi algorithm. Let us denote ∞

𝐫k =

∫−∞

𝐇H (t − kT)𝐫(t) dt

(19.7)

as the outputs of a matrix matched filter, matched to the transmitted signal and channel. Substituting (19.6) into (19.7), we can write ∞ ∑ 𝐫k = 𝐒k−l 𝐚l + 𝐧k , l=−∞

where



𝐒k =

∫−∞

𝐇H (t − kT)𝐇(t) dt

19.3 Diversity Transmission and Reception: The MIMO Channel and



𝐧k = 19.3.1

907

∫−∞

𝐇H (t − kT)𝐧(t) dt.

The Narrowband MIMO Channel

The formulation of the previous section is rather more general than is necessary for our future development. Consider now the narrowband MIMO channel, in which the frequency response is essentially constant for the signals that are transmitted over the channel. In this case, the transmitted waveform 𝜑(t) (transmitted by all antennas) is received as 𝐇(t) = 𝜑(t)𝐇. The matched filter 𝐇H (−t) can be decomposed into multiplication by 𝐇H followed by conventional matched filtering with 𝜑(−t). Note that the narrowband case can occur in the case of a flat fading channel, where the channel coefficients hji are randomly time-varying, due, for example, to multiple interfering signals obtained by scattering. ∑ As a specific and pertinent example, suppose that s(t) = k ak 𝜑(t − kT) is transmitted (i.e., there is only a single transmit antenna) over a channel, and two receive antennas are used, with r1 (t) = h1 s(t) + n1 (t) r2 (t) = h2 s(t) + n2 (t),

[ ] h where h1 and h2 are complex and constant over at least one symbol interval. Thus, 𝐇 = 1 . The h2 receiver first computes 𝐇H 𝐫(t) = h∗1 h1 s(t) + h∗2 h2 s(t) + (h∗1 n1 (t) + h∗2 n2 (t)) = (|h1 |2 + |h2 |2 )s(t) + (h∗1 n1 (t) + h∗2 n2 (t)), then passes this signal through the matched filter 𝜑(−t), as shown in Figure 19.5, to produce the signal rk = (|h1 |2 + |h2 |2 )ak + nk , where nk is a complex Gaussian random variable with E[nk ] = 0 and E[nk n∗k ] = (|h1 |2 + |h2 |2 )N0 . Thus, the maximum likelihood receiver employs the decision rule âk = arg min |rk − (|h1 |2 + |h2 |2 )a|2 . a

This detector is called a maximal ratio combiner, since it can be shown that it maximizes the signal-to-noise ratio.

n1 (t) h1 ak

+

h*1

n2 (t)

φ (t) h2

+

+

φ (–t)

rk

h*2 Maximal ratio combiner

Temporal matched filter

Figure 19.5: Two receive antennas and a maximal ratio combiner receiver.

908

Fading Channels and Space-Time Codes 19.3.2

Diversity Performance with Maximal-Ratio Combining

Let us now consider the performance of the single-transmitter, m-receiver system using maximal ratio combining. Suppose that the signal a ∈  is sent. The matched filter output is rk = ||𝐡||2 ak + nk , where nk = 𝐡H 𝐧 is a complex Gaussian random variable with E[|nk |2 ] = N0 ||𝐡||2 = N0 us define an effective signal-to-noise ratio as 2 Eb

𝛾eff = ||𝐡||

N0

=

m ∑

2 Eb

|hj |

j=1

N0



m ∑

∑m

2 j=1 |hj | .

Let

𝛾j ,

j=1

where 𝛾j = |hj |2 Eb ∕N0 is the effective SNR for the jth channel. We assume a calibration so that the average SNR is E[𝛾j ] = Eb ∕N0 for j = 1, 2, . . . , m. Transmitting from a single antenna then recombining the multiple received signals through a maximal ratio combiner results in a single-input, single-output channel. For BPSK transmission (assuming that the channel varies sufficiently slowly that 𝐡 can be adequately estimated) the probability of error as a function of the effective signal-to-noise ratio is √ P2 (𝛾eff ) = Q( 2𝛾eff ). As for the case of a single fading channel, the overall probability of error is obtained by averaging over channel coefficients. Assume, as for the single fading case, that each coefficient hi is a complex Gaussian random variable, so 𝛾i is a 𝜒 2 distribution with 2 degrees of freedom. If each hi varies independently (which can be assumed if the receive antennas are at least a half wavelength apart), then 𝛾eff is a 𝜒 2 distribution with 2m degrees of freedom. The pdf of such a distribution can be shown to be (see, e.g., [333]) 1 f𝛾eff (t) = tm−1 e−tN0 ∕Eb t ≥ 0. (m − 1)!(Eb ∕N0 )m The overall probability of error is



P2 =

f𝛾eff (t)P2 (t) dt.

∫0

The result of this integral (integrating by parts twice) is P2 = pm

m−1 (

∑ m − 1 + k) (1 − p)k , k k=0

where p=

1 2

(

√ 1−

𝛾eff 1 + 𝛾eff

)

is the probability of error we found in (19.5) for single-channel diversity. Figure 19.3 shows the result for various values of m. It is clear that diversity provides significant performance improvement. At high SNR, the quantity p can be approximated by p≈

1 . 4𝛾eff

For m > 1, the probability of error can be approximated by observing that (1 − p)k ≈ 1, so P2 ≈ pm

m−1 (

( ) ∑ m − 1 + k) 2m − 1 = pm ≜ Km pm . m k k=0

(19.8)

19.4 Space-Time Block Codes

909

Using the m = 1 approximation, we find ( P2 ≈

1 4𝛾eff

)m (

) 2m − 1 . m

(19.9)

While the probability of error in an AWGN channel decreases exponentially with the signal-to-noise ratio, in a fading channel the probability of error only decreases reciprocally with the signal-to-noise ratio, with an exponent equal to the diversity m. We say that this scheme has diversity order m.

19.4 Space-Time Block Codes We have seen that performance in a fading channel can be improved by receiver diversity. However, in many systems the receiver is required to be small and portable — such as a cell phone or personal digital assistant — and it may not be practical to deploy multiple receive antennas. The transmitter at a base station, however, may easily accommodate multiple antennas. It is of interest, therefore, to develop means of diversity which employ multiple transmit antennas instead of multiple receive antennas. This is what space-time codes provide. 19.4.1

The Alamouti Code

To introduce space-time codes, we present the Alamouti code, an early space-time code and still one of the most commonly used. Consider the transmit diversity scheme of Figure 19.6. Each antenna sends sequences of data from a signal constellation . In the Alamouti code, a frame of data lasts for two symbol periods. In the first symbol time, antenna 1 sends the symbol a0 ∈  while antenna 2 sends the symbol a1 ∈ . In the second symbol time, antenna 1 sends the symbol −a∗1 while antenna 2 sends the symbol a∗0 . It is assumed that the fading introduced by the channel varies sufficiently slowly that Time 1: a0 Time 2: –a1*

Information source

Signal {a0 , a1} constellation (e.g., QAM, 8PSK, etc.)

Space-time encoder space time a1 a0 a0* –a1*

Time 1: a1 Time 2: a0*

Transmit antenna1

Transmit antenna 2

Receive antenna

n0, n1

+ r0, r1

Channel estimator

h0, h1

Combiner r∼0

r∼1

Maximum likelihood detector aˆ0, aˆ1

Figure 19.6: A two-transmit antenna diversity scheme: the Alamouti code.

910

Fading Channels and Space-Time Codes it is constant over two symbol times (e.g., the quasistatic assumption applies for the entire duration of the codeword). The channel from antenna 1 to the receiver is modeled as h0 = 𝛼0 e−j𝜙0 and the channel from antenna 2 to the receiver is modeled as h1 = 𝛼1 e−j𝜙1 . The received signal for the first signal (i.e., the sampled matched filter output) is r0 = h0 a0 + h1 a1 + n0

(19.10)

r1 = −h0 a∗1 + h1 a∗0 + n1 .

(19.11)

and for the second signal is The receiver now employs a combining scheme, computing r̃0 = h∗0 r0 + h1 r1∗ r̃1 = h∗1 r0 − h0 r1∗ .

(19.12)

Substituting (19.10) and (19.11) into (19.12), we have r̃0 = (|h0 |2 + |h1 |2 )a0 + h∗0 n0 + h1 n∗1 r̃1 = (|h0 |2 + |h1 |2 )a1 − h∗0 n1 + h∗1 n0 . The key observation is that r̃0 depends only on a0 , so that detection can take place with respect to this single quantity. Similarly, r̃1 depends only on a1 , again implying a single detection problem. The receiver now employs the maximum likelihood decision rule on each signal separately â0 = arg min |̃r0 − (|h0 |2 + |h1 |2 )a|2 a∈

â1 = arg min |̃r1 − (|h0 |2 + |h1 |2 )a|2 . a∈

Overall the scheme is capable of sending two symbols over two symbol periods, so this represents a rate 1 code. However, it also provides a diversity of 2. Assuming that the total transmitter power with two antennas in this coded scheme is equal to the total transmitted power of a conventional receiver diversity method, the transmit power must be split into two for each antenna. This power split results in a 3 dB performance reduction compared to m = 2 using two receive antennas, but otherwise equivalent performance. Suppose that, unlike this Alamouti scheme, the combining scheme produced values which are a mixture of the transmitted signals. For example, suppose r̃0 = aa0 + ba1 + ñ0 r̃1 = ca0 + da1 + ñ1 for some coefficients a, b, c, d. We could write this as [ ][ ] [ ] [ ] [ ] a b a0 ñ a ñ 𝐫̃ = + 0 ≜A 0 + 0 . c d a1 ñ1 a1 ñ1 Then the maximum likelihood decision rule must maximize jointly: [â0 , â1 ] = arg min ||̃𝐫 − A𝐚||2 . 𝐚∈ 2

The search is over the vector of length 2, so that if the constellation has M points in it, then the search complexity is O(M 2 ). In the case of m-fold diversity, the complexity rises as O(M m ). This increase of complexity is avoided in the Alamouti code case because of the orthogonality of the encoding matrix. It is important for computational simplicity that a symbol appears in only one combined received waveform. The remainder of the development of space-time block codes in this chapter is restricted to considerations of how to achieve this kind of separation, using orthogonal designs. The Alamouti code has been adopted by the IEEE 802.11a and IEEE 802.16a wireless standards.

19.4 Space-Time Block Codes 19.4.2

911

A More General Formulation

Let us now establish a more general framework for space-time codes, in which the Alamouti scheme is a special case. In the interest of generality, we allow for both transmit and receive diversity, with n transmit antennas and m receive antennas. The code frames exist for l symbol periods. (Thus, for the Alamouti scheme, n = 2, m = 1, and l = 2.) At time t, the symbols ci (t), i = 1, 2, . . . , n are transmitted simultaneously from the n transmit antennas. The signal rj (t) at antenna j is rj (t) =

n ∑

hj,i ci (t) + nj (t),

j = 1, 2, . . . , m,

i=1

where the ci (t) is the coded symbol transmitted by antenna i at time t. The codeword for this frame is thus the sequence 𝐜 = c1 (1), c2 (1), . . . , cn (1), c1 (2), c2 (2), . . . , cn (2), . . . , c1 (l), c2 (l), . . . , cn (l)

(19.13)

of length nl. 19.4.3

Performance Calculation

Before considering how to design the codewords, let us first establish the diversity order for the coding scheme. Consider the probability that a maximum-likelihood decoder decides in favor of a signal 𝐞 = e1 (1), e2 (1), . . . , en (1), e1 (2), e2 (2), . . . , en (2), . . . , e1 (l), e2 (l), . . . , en (l) over the signal 𝐜 of (19.13) which was transmitted, for a given channel state. We denote this probability as P(𝐜 → 𝐞|hj,i , j = 1, 2, . . . , m, i = 1, 2, . . . , n). Over the AWGN channel, this probability can be bounded (using a bound on the Q function) by P(𝐜 → 𝐞|hj,i , j = 1, 2, . . . , m, i = 1, 2, . . . , n) ≤ exp(−d2 (𝐜, 𝐞)Es ∕4N0 ), where

m l | n |2 ∑ ∑ |∑ | d (𝐜, 𝐞) = | hj,i (ci (t) − ei (t))| . | | j=1 t=1 | i=1 | 2

Let 𝐡Tj = [hj,1 , hj,2 , . . . , hj,n ]T and 𝐜Tt = [c1 (t), c2 (t), . . . , cn (t)]T and similarly 𝐞t . Then d (𝐜, 𝐞) = 2

m ∑

[ 𝐡Tj

] l ∑ H (𝐜t − 𝐞t )(𝐜t − 𝐞t ) 𝐡∗j .

j=1

Let A(𝐜, 𝐞) =

t=1 l ∑

(𝐜t − 𝐞t )(𝐜t − 𝐞t )H .

t=1

Then d2 (𝐜, 𝐞) =

m ∑

𝐡Tj A(𝐜, 𝐞)𝐡∗j ,

j=1

so that P(𝐜 → 𝐞|hj,i , j = 1, 2, . . . , m, i = 1, 2, . . . , n) ≤

m ∏ j=1

exp(−𝐡Tj A(𝐜, 𝐞)𝐡∗j Es ∕4N0 ).

912

Fading Channels and Space-Time Codes The matrix A(𝐜, 𝐞) can be written as A(𝐜, 𝐞) = B(𝐜, 𝐞)BH (𝐜, 𝐞), where

⎡e1 (1) − c1 (1) ⎢e (1) − c (1) 2 2 B(𝐜, 𝐞) = ⎢ ⎢ ⋮ ⎢ ⎣en (1) − cn (1)

e1 (2) − c1 (2)

···

e2 (2) − c2 (2)

···

en (2) − cn (2)

···

e1 (l) − c1 (l)⎤ e2 (l) − c2 (l)⎥ ⎥. ⎥ ⎥ en (l) − cn (l)⎦

(19.14)

In other words, A(𝐜, 𝐞) has B(𝐜, 𝐞) as a square root. It is known (see [202]) that any matrix having a square root is non-negative definite; among other implications of this, all of its eigenvalues are non-negative. The symmetric matrix A(𝐜, 𝐞) can be written as (see, e.g., [319]) VA(𝐜, 𝐞)V H = D, where V is a unitary matrix formed from the eigenvectors of A(𝐜, 𝐞) and D is diagonal with real non-negative diagonal elements 𝜆i . Let 𝜷 j = V𝐡∗j . Then P(𝐜 → 𝐞|hj,i , j = 1, 2, . . . , m, i = 1, 2, . . . , n) ≤

m ∏

exp(−𝜷 H j D𝜷 j Es ∕4N0 )

j=1

=

m ∏

( exp −

n ∑

j=1

) 𝜆i |𝛽i,j | Es ∕4N0 2

.

i=1

Assuming the elements of 𝐡i are zero mean Gaussian and normalized to have variance 0.5 per dimension, then the |𝛽i,j | are Rayleigh distributed with density p(|𝛽i,j |) = 2|𝛽i,j | exp(−|𝛽i,j |2 ). The average performance is obtained by integrating ) ( n m ∏ ∏ ∑ 2 exp − 𝜆i |𝛽i,j | Es ∕4N0 2|𝛽i,j | exp(−|𝛽i,j |2 )d|𝛽1,1 | · · · d|𝛽n,m |. P(𝐜 → 𝐞) ≤ ∫ i,j j=1

i=1

After some effort, this can be shown to be P(𝐜 → 𝐞) ≤

(

1 (1 + 𝜆i Es ∕4N0 ) i=1

∏n

)m .

(19.15)

Let r be the rank of A(𝐜, 𝐞), so there are n − r eigenvalues of A(𝐜, 𝐞) equal to 0. Then (19.15) can be further approximated as )−m ( r ∏ 𝜆i (Es ∕4N0 )−rm . P(𝐜 → 𝐞) ≤ i=1

Comparing with (19.9), we see that the probability of error decreases reciprocally with the signal-to-noise ratio to the rmth power. We have thus proved the following theorem: Theorem 19.2 The order of the diversity for this coding is rm. From this theorem, we obtain the rank criterion: To obtain the maximum diversity mn, the matrix B(𝐜, 𝐞) must of (full any pair of codewords 𝐜 and 𝐞. ∏r rank)for −m 𝜆 in (19.15) is interpreted as the coding advantage. In combination with the The factor i=1 i diversity from the other factor, we obtain the following two design criteria for Rayleigh space-time codes [435].

19.4 Space-Time Block Codes

913



In order to achieve maximum diversity, B(𝐜, 𝐞) of (19.14) must be full rank for any pair of codewords 𝐜 and 𝐞. The smallest r over any pair of codewords leads to a diversity of rm.



The coding benefit is maximized by maximizing the sum of the determinants of all r × r principle cofactors of A(𝐜, 𝐞) = B(𝐜, 𝐞)B(𝐜, 𝐞)H , since this sum is equal to the product of the determinants of the cofactors.

19.4.3.1

Real Orthogonal Designs

Let us now turn attention to the problem of designing the transmitted codewords. Recall that to minimize decoder complexity, it is desirable to be able to decompose the decision problem so that optimal decisions can be made on the basis of a single symbol at a time, as was possible for the Alamouti code. This can be achieved using orthogonal designs. For the moment, we consider only real orthogonal designs which are associated with real signal constellations. Definition 19.3 A real orthogonal design of size n is an n × n matrix  = (x1 , x2 , . . . , xn ) with elements drawn from ±x1 , ±x2 , . . . , ±xn such that T  = I

n ∑

xi2 ≜ IK.

i=1

That is,  is proportional to an orthogonal matrix.



By means of column permutations and sign changes, it is possible to arrange  so that the first row has all positive signs. Examples of orthogonal designs are [ 2 = ⎡ ⎢ ⎢ ⎢ 8 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

x1 −x2

x1 −x2 −x3 −x4 −x5 −x6 −x7 −x8

x2 x1 x2 x1 −x4 x3 −x6 x5 x8 −x7

⎡ ⎢ 4 = ⎢ ⎢ ⎣ x3 x4 x4 −x3 x1 x2 −x2 x1 −x7 −x8 −x8 x7 x5 −x6 x6 x5

]

x2 x3 x4 ⎤ x1 −x2 x1 −x4 x3 ⎥ −x3 x4 x1 −x2 ⎥ ⎥ −x4 −x3 x2 x1 ⎦ x5 x6 x7 x8 ⎤ x6 −x5 −x8 x7 ⎥ x7 x8 −x5 −x6 ⎥ ⎥ x8 −x7 x6 −x5 ⎥ . x1 x2 x3 x4 ⎥ −x2 x1 −x4 x3 ⎥ ⎥ −x3 x4 x1 −x2 ⎥ −x4 −x3 x2 x1 ⎦

In fact, it is known that these are the only orthogonal designs [154]. Each row of an orthogonal design  is a permutation with sign changes of the {x1 , x2 , . . . , xn }. Let us denote the (i, j)th element of  as oij = x𝜖i (j) 𝛿i (j), where 𝜖i (j) is a permutation function for the ith row and 𝛿i (j) is the sign of the (i, j) entry. Observe that permutations in the orthogonal matrices above are symmetric so that 𝜖i (j) = 𝜖i−1 (j). By the orthogonality of , {∑n n 2 ∑ i=k i=1 xi T [ ]i,k = ≜ K𝛿i−k oli olk = 0 otherwise l=1 or

n ∑ l=1

x𝜖l (i) x𝜖l (k) 𝛿l (i)𝛿l (j) = K𝛿i−k .

(19.16)

914

Fading Channels and Space-Time Codes 19.4.3.2

Encoding and Decoding Based on Orthogonal Designs

At the encoder, the symbols a1 , a2 , . . . , an are selected from the (for the moment real) signal constellation  and used to fill out the n × n orthogonal design (a1 , a2 , . . . , an ). At time slot t = 1, 2, . . . , n, the elements of the tth row of the orthogonal matrix are simultaneously transmitted using n transmit antennas. The frame length of the code is l = n. At time t, the jth antenna receives rj (t) =

n ∑

hj,i ci (t) + nj (t)

i=1

=

n ∑

hj,i 𝛿t (i)a𝜖t (i) + nj (t).

i=1

Now let l = 𝜖t (i), or i = 𝜖t−1 (l). The received signal can be written as rj (t) =

n ∑

hj,𝜖−1 (l) 𝛿t (𝜖t−1 (l))al + nj (t). t

l=1

Let 𝐡Tj,t = [hj,𝜖−1 (1) 𝛿t (𝜖t−1 (1)), . . . , hj,𝜖−1 (n) 𝛿t (𝜖t−1 (n))] and 𝐚T = [a1 , . . . , an ]. Then t

t

rj (t) = 𝐡Tj,t 𝐚 + nj (t). Stacking the received signals in time, the jth receive antenna receives T ⎡rj (1)⎤ ⎡𝐡j,1 ⎤ ⎡nj (1)⎤ ⎥ ⎢ 𝐫j = ⎢ ⋮ ⎥ = ⎢ ⋮ ⎥ 𝐚 + ⎢ ⋮ ⎥ ≜ Hj,eff 𝐚 + 𝐧j . ⎥ ⎥ ⎢ ⎢ ⎣rj (n)⎦ ⎢⎣𝐡Tj,n ⎥⎦ ⎣nj (n)⎦

(19.17)

The maximum likelihood receiver computes J(𝐚) =

m ∑

||𝐫j − Hj,eff 𝐚||2

j=1

over all vectors 𝐚 and selects the codeword with the minimizing value. However, rather than having to search jointly over all ||n vectors 𝐚, each component can be selected independently, as we now show. We can write (m ) [ ] m m ∑ ∑ ∑ 2 H H H H J(𝐚) = ||𝐫j || − 2Re 𝐚 Hj,eff 𝐫 + 𝐚 Hj,eff Hj,eff 𝐚. j=1

j=1

j=1

Using (19.16), we see that each Hj,eff is an orthogonal matrix (hj,1 , hj,2 , . . . , hj,n ), so that m ∑

H Hr,eff Hr,eff ≜ I

j=1

m ∑

Kj ≜ IK

j=1

for some scalar K. Let 𝐯=

m ∑

H Hj,eff 𝐫.

j=1

Then minimizing J(𝐚) is equivalent to minimizing J ′ (𝐚) = −2Re[𝐚H 𝐯] + ||𝐚||2 K.

19.4 Space-Time Block Codes

915

Now let Si = −2Rea∗i 𝑣i + K|ai |2 . Minimizing J ′ (𝐚) is equivalent to minimizing to minimizing each Si separately. Since each Si depends only on ai , we have

∑n

i=1 Si , which amounts

âi = arg min K|a|2 − 2Rea∗ 𝑣i . a

Thus, using orthogonal designs allows for using only scalar detectors instead of vector detectors. Let us now examine the diversity order of these space-time block codes. Theorem 19.4 [434] The diversity order for coding with real orthogonal designs is nm. Proof The matrix B(𝐜, 𝐞) of Theorem 19.2 is formed by B(𝐜, 𝐞) = (𝐜) − (𝐞) = (𝐜 − 𝐞). [ ] ∑ 2 n is not equal to 0 for any 𝐜 and 𝐞 ≠ 𝐜, (𝐜 − 𝐞) must be Since det(T (𝐜 − 𝐞)(𝐜 − 𝐞)) = i |ci − ei | full rank. By the rank criterion, then, the maximum diversity of nm obtains. ◽ This encoding and decoding scheme provides a way of sending n message symbols over n symbol times, for a rate R = 1 coding scheme. However, as mentioned above, it applies only to n = 2, 4, or 8. 19.4.3.3

Generalized Real Orthogonal Designs

In the interest of obtaining more possible designs, we now turn to a generalized real orthogonal design . Definition 19.5 [434] A generalized orthogonal design of size n is a p × n matrix  with entries 0, ±x1 , ±x2 , . . . , ±xk such that T  = KI, where K is some constant. The rate of  is R = k∕p. ◽ Example 19.6

The following are examples of generalized orthogonal designs.

⎡ x1 ⎢ −x2 3 = ⎢ −x3 ⎢ ⎣ −x4

x2 x1 x4 −x3

⎡ ⎢ ⎢ ⎢ ⎢ 6 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ ⎢ 7 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

x1 −x2 −x3 −x4 −x5 −x6 −x7 −x8

x3 −x4 x1 x2

⎡ ⎢ ⎢ ⎢ ⎢ 5 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

⎤ ⎥ ⎥ ⎥ ⎦

x1 −x2 −x3 −x4 −x5 −x6 −x7 −x8 x2 x1 −x4 x3 −x6 x5 x8 −x7

x2 x1 −x4 x3 −x6 x5 x8 −x7 x3 x4 x1 −x2 −x7 −x8 x5 x6

x2 x1 −x4 x3 −x6 x5 x8 −x7

x1 −x2 −x3 −x4 −x5 −x6 −x7 −x8

x3 x4 x1 −x2 −x7 −x8 x5 x6 x4 −x3 x2 x1 −x8 x7 −x6 x5

x4 −x3 x2 x1 −x8 x7 −x6 x5 x5 x6 x7 x8 x1 −x2 −x3 −x4

x3 x4 x1 −x2 −x7 −x8 x5 x6

x5 x6 x7 x8 x1 −x2 −x3 −x4 x6 −x5 x8 −x7 x2 x1 x4 −x3

x4 −x3 x2 x1 −x8 x7 −x6 x5

x6 −x5 x8 −x7 x2 x1 x4 −x3

x5 x6 x7 x8 x1 −x2 −x3 −x4

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

x7 −x8 −x5 x6 x3 −x4 x1 x2

⎤ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎦



916

Fading Channels and Space-Time Codes The encoding for a generalized orthogonal design is as follows. A set of k symbols a1 , a2 , . . . , ak arrives at the encoder. The encoder builds the design matrix  by setting xi = ai . Then for t = 1, 2, . . . , p, the n antennas simultaneously transmit the n symbols from the tth row of . In p symbol times, then, k message symbols are sent, resulting in a rate R = k∕p code. In the interest of maximizing rate, minimizing encoder and decoder latency, and minimizing complexity, it is of interest to determine designs having the smallest p possible. A design having rate at least R with the smallest possible value of p is said to be delay-optimal. (The designs of Example 19.6 yield delay-optimal, rate 1 codes.) A technique for constructing generalized orthogonal designs is provided in [434]. 19.4.4

Complex Orthogonal Designs

We now extend the concepts of real orthogonal designs to complex orthogonal designs. Definition 19.7 A complex orthogonal design of size n is a matrix  formed from √ the elements ±x1 , ±x2 , . . . , ±xn , their conjugates ±x1∗ , ±x2∗ , . . . , ±xn∗ , or multiplies of these by j = −1, such that  H  = (|x1 |2 + · · · + |xn |2 )I. That is,  is proportional to a unitary matrix. ◽ Without loss of generality, the first row can be formed of the elements x1 , x2 , . . . , xn . The same method of encoding is used for complex orthogonal designs as for real orthogonal designs: n antennas send the rows for each of n time intervals. Example 19.8

The matrix

time space → ] [ ⏐ ⏐ ⏐ x2 x1 2 = ⏐ ⏐ ⏐ ⏐ ⏐ −x2∗ x1∗ ↓

is a complex orthogonal design. As suggested by the arrows, if the columns are distributed in space (across two antennas) and the rows are distributed in time (over different symbol times), this can be used as a space-time code. In fact, this is the Alamouti code. ◽

Unfortunately, as discussed in Exercise 19.6, complex orthogonal designs of size n exist only for n = 2 or n = 4. The Alamouti code is thus, in some sense, almost unique. We turn therefore to generalized complex orthogonal designs. Definition 19.9 [434] A generalized complex orthogonal design is a √ p × n matrix  whose entries are formed from 0, ±x1 , ±x1∗ , . . . , ±xk , ±xk∗ or their product with j = −1 such that  H  = (|x1 |2 + · · · + |xk |2 )I. ◽ A generalized orthogonal design can be used to create a rate R = k∕p space-time code using n antennas, just as for orthogonal designs above. Example 19.10

The following are examples of generalized complex orthogonal designs. ⎡ ⎢ ⎢ ⎢ ⎢ 3 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

x1 −x2 −x3 −x4 x1∗ −x2∗ −x3∗ −x4∗

x2 x1 x4 −x3 x2∗ x1∗ x4∗ −x3∗

x3 −x4 x1 x2 x3∗ −x4∗ x1∗ x2∗

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎡ ⎢ ⎢ ⎢ ⎢ 4 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

x1 −x2 −x3 −x4 x1∗ −x2∗ −x3∗ −x4∗

These provide R = 1∕2 coding using 3 and 4 antennas, respectively.

x2 x1 x4 −x3 x2∗ x1∗ x4∗ −x3∗

x3 −x4 x1 x2 x3∗ −x4∗ x1∗ x2∗

x4 x3 −x2 x1 x4∗ x3∗ −x2∗ x1∗

⎤ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

19.5 Space-Time Trellis Codes

917

A rate R = 3∕4 code using an orthogonal design is provided by [445] is ⎡ ⎢ 4 = ⎢ ⎢ ⎢ ⎣

x1 x2 x3 0

−x2∗ x1∗ 0 −x3

−x3∗ 0 x1∗ x2

0 x3∗ −x2∗ x1

⎤ ⎥ ⎥. ⎥ ⎥ ⎦



Higher rate codes are also known using linear processing orthogonal designs, which form matrices not by individual elements, but by linear combinations of elements. These produce what are known as generalized complex linear processing orthogonal designs. Examples of linear processing designs producing R = 3∕4 codes are √ ⎡ ⎤ x2 x3 ∕√2 x1 ⎢ ⎥ ∗ ∗ x x ∕ 2 −x 3 2 1 ⎥ √  =⎢ ∗ √ ∗ ∗ ∗ ⎢ x ∕ 2 x3 ∕ √2 (−x1 − x1 + x2 − x2 )∕2 ⎥ 3 √ ⎢ ∗ ⎥ ⎣ x3 ∕ 2 −x3∗ ∕ 2 (x2 + x2∗ + x1 − x1∗ )∕2 ⎦ √ √ ⎡ x2 x3 ∕√2 x3 ∕ √2 x1 ⎢ ∗ x√ x3 ∕ 2 −x3 ∕ 2 −x2∗ 1  =⎢ ∗ √ ∗ ∗ ∗ ⎢ x ∕ 2 2 (−x x ∕ − x + x − x )∕2 (−x − x2∗ + x1 − x1∗ )∕2 1 2 2 3 √ 1 2 ⎢ 3∗ √ ∗ ∗ ∗ ⎣ x3 ∕ 2 −x3 ∕ 2 −(x1 + x1∗ + x2 − x2∗ )∕2 (x2 + x2 + x1 − x1 )∕2

Example 19.11



for n = 3 and n = 4 antennas, respectively.

19.4.4.1

⎤ ⎥ ⎥ ⎥ ⎥ ⎦

Future Work

Several specific examples of designs leading to low-complexity decoders have been presented in the examples above. However, additional work in high-rate code designs is still a topic of ongoing research.

19.5 Space-Time Trellis Codes Trellis codes can also be used to provide diversity. We present here several examples from the literature. Example 19.12 Consider the coding scheme presented in Figure 19.7. This is an example of delay diversity, which is a hybrid of spatial diversity and time diversity. The signal transmitted from two antennas at time t consists of the current symbol at and the previous symbol at−1 . The received signal is rt = h0 at + h1 at−1 + nt . If the channel parameters {h0 , h1 } is known, then an equalizer can be used to detect the transmitted sequence. Because the encoder has a memory element in it, this can also be regarded as a simple trellis code, so that decoding can be accomplished using the Viterbi algorithm. If this encoder is used in conjunction with an 8-PSK

Transmit 1

at Delay

at–1

+

Transmit 2

n (t)

Figure 19.7: A delay diversity scheme.

r(t)

918

Fading Channels and Space-Time Codes 00,01,02,03, 04,05,06, 07 3

2

10,11,12,13, 14,15,16, 17

1

4

0 5

6

7

20,21,22,23, 24,25,26, 27 30,31,32,33, 34,35,36, 37 40,41,42,43, 44,45,46, 47 50,51,52,53, 54,55,56, 57 60,61,62,63, 64,65,66, 67 70,71,72,73, 74,75,76, 77

Figure 19.8: 8-PSK constellation and the trellis for a delay-diversity encoder.

signal constellation, then there are 8 states. The trellis and the output sequence are shown in Figure 19.8. The outputs listed beside the states are the pair (ak−1 , ak ) for each input. (A similar four-state trellis is obtained for a 4-PSK using delay diversity.) For a sequence of symbols a1 , a2 , . . . , an , a delay diversity coder may also be written as a space-time block code, with codeword [ ] an 0 a a2 a3 · · · an−1 . A(a1 , a2 , . . . , an ) = 1 · · · an−1 an 0 a1 a2 a3 The two zeros in the first and last columns ensure that the trellis begins and ends in the 0 state. By viewing this as a space-time block code, the rank-criterion may be employed. The matrix B(𝐚, 𝐞) = A(𝐚) − A(𝐞) is, by the linearity of the coding mechanism, equal to A(𝐚′ ), where 𝐚′ = 𝐚 − 𝐞. The columns containing the first and last elements of ◽ 𝐚′ are linearly independent, so B has full rank: This code provides a diversity of m = 2 at 2 bits/second/Hz.

Example 19.13

In the trellis of Figure 19.8, replace the output mappings with the sequences State 0: State 1: State 2: State 3: State 4: State 5: State 6: State 7:

00, 01, 02, 03, 04, 05, 06, 07 50, 51, 52, 53, 54, 55, 56, 57 20, 21, 22, 23, 24, 25, 26, 27 70, 71, 72, 73, 74, 75, 76, 77 . 40, 41, 42, 43, 44, 45, 46, 47 10, 11, 12, 13, 14, 15, 16, 17 60, 61, 62, 63, 64, 65, 66, 67 30, 31, 32, 33, 34, 35, 36, 37

This corresponds to a delay diversity code, with the additional modification that the delayed symbol is multiplied by −1 if it is odd {1, 3, 5, 7}. It has been observed [322] that this simple modification provides 2.5 dB of gain compared to simple delay diversity. ◽

Example 19.14 Figure 19.9 shows space-time codes for 4-PSK (transmitting 2 bits/second/Hz) using 8, 16, and 32 states. Each of these codes provides a diversity of 2. Figure 19.11(a) shows the probability of error performance (obtained via simulation) for these codes when used with two transmit antennas and two receive antennas. Figure 19.11(b) shows the performance with two transmit antennas and one receive antenna. ◽

Example 19.15 Figure 19.10 shows space-time codes for 8-PSK (transmitting 3 bits/second/Hz) using 16 and 32 states (with the code in Example 19.13 being an 8-state code). Each of these codes provides a diversity of 2. Figure 19.12(a) shows the probability of error performance (obtained via simulation) for these codes when used with two transmit antennas and two receive antennas. Figure 19.12(b) shows the performance with two transmit antennas and one receive antenna. ◽

19.6 How Many Antennas?

919 00,01,02, 03 11,12,13, 10 22,23,20, 21 33,30,31, 32 20,21,22, 23

00,01,02, 03

33,30,31, 32

10,11,12, 13

02,03,00, 01

20,21,22, 23

13,10,11, 12

30,31,32, 33

33,30,31, 32

22,23,20, 21

00,01,02, 03

32,33,30, 31

11,12,13, 10

02,03,00, 01

22,23,20, 21

12,13,10, 11

13,10,11, 12 20,21,22, 23 31,32,33, 30 02,03,00, 01

00,01,02, 03

22,23,20, 21

12,13,10, 11

33,30,31, 32

20,21,22, 23

00,01,02, 03

32,33,30, 31

13,10,11, 12

20,21,22, 23

02,03,00, 01

32,33,30, 31

13,10,11, 12

00,01,02, 03

20,21,22, 23

12,13,10, 11

31,32,33, 30

02,03,00, 01

11,12,13, 10

10,11,12, 13

22,23,20, 21

22,23,20, 21

33,30,31, 32

30,31,32, 33

00,01,02, 03

22,23,20, 21

31,32,33, 30

30,31,32, 33

02,03,00, 01

02,03,00, 01

13,10,11, 12

10,11,12, 13

20,21,22, 23

Figure 19.9: Space-time codes with diversity 2 for 4-PSK having 8, 16, and 32 states [435].

19.5.1

Concatenation

Concatenation is also frequently employed in space-time coded systems. In this case, the outer code is frequently a TCM system whose symbols are transmitted via an inner space-time coded system.

19.6 How Many Antennas? We cannot continue to add antennas without reaching a point of diminishing returns. One argument for the number of antennas is based on channel capacity. It has been proved [139, 436] that the capacity of a multiantenna system with a single receive antenna is a random variable of the form log2 (1 + (𝜒 22n ∕2n)SNR), where 𝜒 22n is a 𝜒 2 random variable with 2n degrees of freedom (e.g., formed by summing the squares of 2n independent zero-mean, unit-variance Gaussian random variables). By

920

Fading Channels and Space-Time Codes 00,01,02,03,04,05,06,07 51,52,53,54,55,56,57,50 22,23,24,25,26,27,20,21 73,74,75,76,77,70,71,72 44,45,46,47,40,41,42,43 15,16,17,10,11,12,13,14 66,67,60,61,62,63,64,65 37,30,31,32,33,34,35,36 37,30,31,32,33,34,35,36 00,01,02,03, 04, 05, 06, 07 51,52,53,54, 55, 56, 57, 50 22,23,24,25, 26, 27, 20, 21 73,74,75,76, 77, 70, 71, 72 44,45,46,47, 40, 41, 42, 43 15,16,17,10, 11, 12, 13, 14 66,67,60,61, 62, 63, 64, 65 37,30,31,32, 33, 34, 35, 36 15,16,17,10, 11, 12, 13, 14 66,67,60,61, 62, 63, 64, 65 37,30,31,32, 33, 34, 35, 36 00,01,02,03, 04, 05, 06, 07 51,52,53,54, 55, 56, 57, 50 22,23,24,25, 26, 27, 20, 21 73,74,75,76, 77, 70, 71, 72 44,45,46,47, 40, 41, 42, 43

00,01,02,03,04,05,06,07 51,52,53,54,55,56,57,50 22,23,24,25,26,27,20,21 73,74,75,76,77,70,71,72 44,45,46,47,40,41,42,43 15,16,17,10,11,12,13,14 66,67,60,61,62,63,64,65 22,23,24,25,26,27,20,21 73,74,75,76,77,70,71,72 44,45,46,47,40,41,42,43 15,16,17,10,11,12,13,14 66,67,60,61,62,63,64,65 37,30,31,32,33,34,35,36 00,01,02,03,04,05,06,07 51,52,53,54,55,56,57,50 51,52,53,54,55,56,57,50 22,23,24,25,26,27,20,21 73,74,75,76,77,70,71,72 44,45,46,47,40,41,42,43 15,16,17,10,11,12,13,14 66,67,60,61,62,63,64,65 37,30,31,32,33,34,35,36 00,01,02,03,04,05,06,07

Figure 19.10: Performance of codes with 4-PSK that achieve diversity 2 [435]. (a) Two receive and two transmit antennas; (b) one receive and two transmit antennas. the law of large numbers, as n → ∞, 1 1 2 𝜒 → (E[X 2 ] + E[X 2 ]) = 1 where X ∼  (0, 1). 2n 2n 2 In practice, the limit begins to be apparent for n ≥ 4, suggesting that more than four transmit antennas will provide little additional improvement over four antennas. It can also be argued that with two receive antennas, n = 6 transmit antennas provides almost all the benefit possible. However, there are systems for which it is of interest to employ both interference suppression and diversity. In these cases, having additional antennas is of benefit. A “conservation theorem” [26, p. 546] says that diversity order + number of interferers = number of receive antennas. As the number of interferers to be suppressed increases, having additional antennas is of value.

19.7 Estimating Channel Information

921 100

4 states 8 states 16 states 32 states 64 states

Frame error probability

Frame error probability

100

10–1

10–2

6

5

7 8 SNR (dB)

9

10–1

10–2

10

4 states 8 states 16 states 32 states 64 states

9

10

11

(a)

12

13 14 SNR (dB)

15

16

17

18

(b)

Figure 19.11: Space-time codes with diversity 2 for 8-PSK having 16 and 32 states [435].

100

8 states 16 states 32 states 64 states

Frame error probability

Frame error probability

100

10–1

10–2

8

9

10

11 SNR (dB)

12

13

14

(a)

8 states 16 states 32 states 64 states

10–1

10–2

10

12

14 16 SNR (dB)

18

20

(b)

Figure 19.12: Performance of codes with 8-PSK that achieve diversity 2 [435]. (a) Two receive and two transmit antennas; (b) one receive and two transmit antennas.

19.7 Estimating Channel Information All of the decoders described in this chapter assume that the channel parameters in the form of the hj,i is known at the receiver. Estimating these is viewed for our purposes as signal processing beyond the scope of this book, so we say only a few words regarding the problem. It is possible to send a “pilot” signal which is known by the receiver, and from this pilot to estimate the channel parameters if the channel is sufficiently static. However, a pilot signal consumes bandwidth and transmitter power and reduces the overall throughput. Another approach is to use differential space-time codes [200, 205], where information is coded in the change of symbols, but at the expense of a 3 dB penalty. Another approach is to blindly estimate the channel parameters, without using the transmitted symbols. This is developed in [426].

922

Fading Channels and Space-Time Codes

19.8 Exercises 19.1 Consider a transmission scheme in which n transmit antennas are used. The vector 𝐚 = [a1 , a2 , . . . , an ]T is transmitted to a single receiver through a channel with coefficients h∗i so that r = 𝐡H 𝐚 + n, where 𝐡 = [h1 , h2 , . . . , hn ]T and where n is zero-mean AWGN with variance 𝜎 2 . Diversity can be obtained by sending the same symbol a from each antenna, so that 𝐚 = a𝐰, for some “steering vector” 𝐰. The received signal is thus r = 𝐡H 𝐰a + n. Show that the weight vector 𝐰 of length ||𝐰|| = 1 which maximizes the SNR at the receiver is 𝐰 = 𝐡∕||𝐡|| and determine the maximum SNR. √ 19.2 Show that if G = X + jY, where X ∼  (0, 𝜎f2 ) and Y ∼  (0, 𝜎f2 ), then Z = |G| = X 2 + Y 2 has density ⎧ z e−z2 ∕2𝜎f2 z≥0 ⎪ 𝜎2 f fZ (z) = ⎨ ⎪0 otherwise. ⎩ 19.3 Show that if G = X + jY, where X ∼  (0, 𝜎f2 ) and Y ∼  (0, 𝜎f2 ), then A = |G|2 = X 2 + Y 2 has density ⎧ 1 e−a∕2𝜎f2 a≥0 ⎪ 2 fA (a) = ⎨ 2𝜎f ⎪0 otherwise. ⎩ 19.4 19.5 19.6

19.7

A is said to be a chi-squared (𝜒 2 ) random variable with two degrees of freedom. Show that (19.5) is correct. Using the 4 × 4 orthogonal design 4 , show that the matrix Hj,eff of (19.17) is proportional to an orthogonal matrix. Let  be a complex orthogonal design of size n.[Show]that by replacing each complex variable x1 x2 xi = xi1 + jxi2 in the matrix with the 2 × 2 matrix i2 i1 , that a 2n × 2n real matrix  is formed xi xi that is a real orthogonal design of size 2n. Conclude that complex orthogonal designs of size n exist only if n = 2 or n = 4. Show that a modified delay diversity scheme which uses codewords formed by [ ] a1 a2 · · · an−1 an A= , an a1 a2 · · · an−1

which is a tail-biting code, does not satisfy the rank criterion and hence does not achieve full diversity. 19.8 [26] For the set of space-time codes [ [ [ ∗ [ ] ] ] ] x x∗ x −x2 x −x2 x x A1 = 1 2∗ A2 = 1 A3 = 1∗ A4 = 1 2 x2 x1 x2 x1 x2 x1 x2 x1 (a) Find the diversity order of each code, assuming that the transmitted symbols are selected independently and uniformly from a 4-QAM alphabet and that the receiver has a single antenna. (b) Determine which of these codes allows for scalar detection at the receiver. (c) For those codes having full diversity, determine the coding gain.

19.9 Exercises

19.9 References Propagation modeling, leading to the Rayleigh channel model, is described in [216]; see also [422], [26], and [352]. The Jakes model which produced Figure 19.2 is described in [216]. Our discussion of MIMO channels, and the discussion of diversity following from it, was drawn from [26]. Additional coding-related discussions relating to multiple-receiver diversity appear in [99,402,425,479,487]. The Alamouti scheme is described in [8]. The generalization to orthogonal designs is described in [434]. The rank criterion is presented following [435]. This paper also presents a thorough discussion of space-time trellis codes and hybrid codes capable of dealing with either slow or fast fading channels. The paper [435] presents many designs of space-time trellis codes, as does [322]. Combined interference suppression and space-time coding is also discussed in the latter.

923

References [1] “3GPP TS 25.222 v15.0.0 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Multiplexing and channel coding (TDD),” Jun. 2018. [2] “3GPP TS 36.212 v 15.6.0 3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Univeral Terrestrial Radio Access (e-utra); Multiplexing and Channel Coding,” https://portal.3gpp.org/desktopmodules/Specifications/ SpecificationDetails.aspx?specificationId=2426, Jun. 2019. [3] A. Abbasfar, D. Divsalar, and K. Yao, “Accumulate Repeat Accumulate Codes,” in Proceedings of the 2004 IEEE GlobeCom Conference, Dallas, TX, Nov. 2004. [4] ——, “Accumulate Repeat Accumulate Codes,” IEEE Trans. Commun., vol. 55, no. 4, pp. 692–702, Apr. 2007. [5] D. Agarwal and A. Vardy, “The Turbo Decoding Algorithm and its Phase Trajectories,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 699–722, Feb. 2001. [6] S. M. Aji and R. J. McEliece, “The Generalized Distributive Law,” IEEE Trans. Inf. Theory, vol. 46, no. 2, pp. 325–343, Mar. 2000. [7] A. Alamdar-Yazdi and F. R. Kschischang, “A Simplified Successive-Cancellation Decoder for Polar Codes,” IEEE Commun. Lett., vol. 15, no. 12, pp. 1378–1380, Dec. 2011. [8] S. Alamouti, “A Simple Transmit Diversity Technique for Wireless Communications,” IEEE J. Selected Areas Commun., vol. 16, no. 8, pp. 1451–1458, Oct. 1998. [9] M. Alsan and E. Telatar, “A Simple Proof of Polarization and Polariztion for Non-Stationary Memoryless Channels,” IEEE Trans. Inf. Theory, vol. 62, no. 9, pp. 4873–4878, Sep. 2016. [10] B. Ammar, B. Honary, Y. Kou, J. Xu, and S. Lin, “Construction of Low-Density Parity-Check Codes Based on Incomplete Designs,” IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1257–1268, Jun. 2004. [11] J. B. Anderson and A. Svensson, Coded Modulation Systems. New York: Kluwer Academic/Plenum, 2003. [12] K. Andrews, C. Heegard, and D. Kozen, “A Theory of Interleavers,” Cornell University Department of Computer Science, TR97-1634, Jun. 1997. [13] ——, “Interleaver Design Methods for Turbo Codes,” in Proceedings of the 1998 International Symposium on Information Theory, Cambridge, MA, 1998, p. 420. [14] S. Ar, R. Lipton, R. Rubinfeld, and M. Sudan, “Reconstructing Algebraic Functions from Mixed Data,” in Proceedings of the 33rd Annual IEEE Symposium on Foundations of Computer Science, Pittsburgh, PA, 1992, pp. 503–512. [15] M. Ardakani, “Efficient Analysis, Design, and Decoding of Low-Density Parity-Check Codes,” Ph.D. dissertation, University of Toronto, Electrical and Computer Engineering Department, 2004. [16] E. Ar𝚤kan, “Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009. [17] ——, “Systematic Polar Encoding,” IEEE Commun. Lett., vol. 15, no. 8, pp. 860–862, Aug. 2011. [18] E. Ar𝚤kan and E. Telatar, “On the Rate of Channel Polarization,” in 2009 IEEE International Symposium on Information Theory, Seoul, Jun 2009, pp. 1493–1495. [19] D. Augot and L. Pecquet, “A Hensel Lifting to Replace Factorization in List-Decoding of Algebraic-Geometric and Reed–Solomon Codes,” IEEE Trans. Inf. Theory, vol. 46, pp. 2605–2614, Nov. 2000. [20] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate,” IEEE Trans. Inf. Theory, vol. 20, pp. 284–287, Mar. 1974. [21] L. R. Bahl, C. D. Cullum, W. D. Frazer, and F. Jelinek, “An Efficient Algorithm for Computing Free Distance,” IEEE Trans. Inf. Theory, vol. 12, no. 3, pp. 437–439, May 1972. [22] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “LLR-Based Successive Cancellation List Decoding of Polar Codes,” IEEE Trans. Signal Process., vol. 63, no. 19, pp. 5165–5179, Oct. 2015. [23] A. Barbulescu and S. Pietrobon, “Interleaver Design for Turbo Codes,” Electron. Lett., vol. 30, no. 25, p. 2107, 1994. [24] ——, “Terminating the Trellis of Turbo-Codes in the Same State,” Electron. Lett., vol. 31, no. 1, pp. 22–23, 1995. [25] J. R. Barry, “The BCJR Algorithm for Optimal Equalization,” http://users.ece.gatech.edu/~barry/6603/handouts/Notes[SS1] BCJR Algorithm.

on the

[26] J. R. Barry, E. A. Lee, and D. G. Messerschmitt, Digital Communication, 3rd ed. Boston: Kluwer Academic, 2004. [27] G. Battail and J. Fang, “D’ecodage pondéré optimal des codes linéaires en blocs,” Annales des Télécommunications, vol. 41, pp. 580–604, Nov. 1986. [28] R. E. Bellman and S. E. Dreyfus, Applied Dynamic Programming. Princeton, NJ: Princeton University Press, 1962. [29] S. Benedetto, R. Garello, M. Mondin, and M. Trott, “Rotational Invariance of Trellis Codes Part II: Group Codes and Decoders,” IEEE Trans. Inf. Theory, vol. 42, pp. 766–778, May 1996. [30] S. Benedetto and G. Montorsi, “Average Performance of Parallel Concatenated Block Codes,” Electron. Lett., vol. 31, no. 3, pp. 156–158, 1995. [31] ——, “Design of Parallel Concatenated Convolutional Codes,” IEEE Trans. Commun., vol. 44, no. 5, pp. 591–600, 1996. [32] ——, “Unveiling Turbo Codes: Some Results on Parallel Concatenated Coding Schemes,” IEEE Trans. Inf. Theory, vol. 42, no. 2, pp. 409–428, Mar. 1996. [33] E. Berlekamp, Algebraic Coding Theory. New York: McGraw-Hill, 1968.

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

926

References [34] E. Berlekamp, “Bounded Distance +1 Soft-Decision Reed-Solomon Decoding,” IEEE Trans. Inf. Theory, vol. 42, no. 3, pp. 704–720, May 1996. [35] E. R. Berlekamp, R. J. McEliece, and H. C. van Tilborg, “On the Inherent Intractability of Certain Coding Problems,” IEEE Trans. Inf. Theory, vol. 24, no. 3, pp. 384–386, May 1978. [36] E. Berlekamp, H. Rumsey, and G. Solomon, “On the Solution of Algebraic Equations over Finite Fields,” Inf. Control, vol. 10, pp. 553–564, 1967. [37] C. Berrou, Codes, Graphs, and Systems. Boston: Kluwer Academic, 2002, ch. The Mutations of Convolutional Coding (Around the Trellis), p. 4. [38] C. Berrou and A. Gaviuex, “Near Optimum Error Correcting Coding and Decoding: Turbo Codes,” IEEE Trans. Commun., vol. 44, no. 10, pp. 1261–1271, Oct. 1996. [39] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limit Error-Correcting Coding and Decoding: Turbo Codes,” in Proceedings of the 1993 IEEE International Conference on Communications, Geneva, Switzerland, 1993, pp. 1064–1070. [40] E. Biglieri, D. Divsalar, P. J. McLane, and M. K. Simon, Introduction to Trellis-Coded Modulation, with Applications. New York: Macmillan, 1991. [41] P. Billingsley, Probability and Measure. New York: Wiley, 1986. [42] G. Birkhoff and S. MacLane, A Survey of Modern Algebra, 3rd ed. New York: Macmillan, 1965. [43] U. Black, The V Series Recommendations, Protocols for Data Communications. New York: McGraw-Hill, 1991. [44] R. E. Blahut, Fast Algorithms for Digital Signal Processing. Reading, MA: Addison-Wesley, 1985. [45] ——, Theory and Practice of Error Control Codes. Reading, MA: Addison-Wesley, 1983. [46] ——, Algebraic Codes on Lines, Planes, and Curves. London: Cambridge University Press, 2008. [47] I. Blake and K. Kith, “On the Complete Weight Enumerator of Reed-Solomon Codes,” SIAM J. Disc. Math, vol. 4, no. 2, pp. 164–171, May 1991. [48] R. Bose and D. Ray-Chaudhuri, “On a Class of Error-Correcting Binary Codes,” Inf. Control, vol. 3, pp. 68–79, 1960. [49] S. Boyd and L. VandenBerghe, Convex Optimization. London: Cambridge University Press, 2004. [50] A. E. Brouwer, “Bounds on the Minimum Distance of q-ary Linear Codes,” http://www.win.tue.nl/~aeb/voorlincod.html, 2004. [51] J. Bucklew and R. Radecke, “On the Monte Carlo Simulation of Digital Communication Systems in Gaussian Noise,” IEEE Trans. Commun., vol. 51, no. 2, pp. 267–274, Feb. 2003. [52] H. O. Burton, “Inversionless Decoding of Binary BCH Codes,” IEEE Trans. Inf. Theory, vol. 17, no. 4, pp. 464–466, Jul. 1971. [53] H. Burton and E. Weldon, “Cylic Product Codes,” IEEE Trans. Inf. Theory, vol. 11, pp. 433–440, Jul. 1965. [54] J. B. Cain, G. C. Clark, and J. M. Geist, “Punctured Convolutional Codes of Rate (n − 1)∕n and Simplified Maximum Likelihood Decoding,” IEEE Trans. Inf. Theory, vol. 25, no. 1, pp. 97–100, Jan. 1979. [55] A. R. Calderbank and N. J. A. Sloane, “An Eight-Dimensional Trellis Code,” IEEE Trans. Info. Theory, vol. 74, no. 5, pp. 757–759, May 1986. [56] ——, “New Trellis Codes Based on Lattices and Cosets,” IEEE Trans. Inf. Theory, vol. 33, no. 2, pp. 177–195, Mar. 1987. [57] M. Cedervall and R. Johannesson, “A Fast Algorithm for Computing Distance Spectrum of Convolutional Codes,” IEEE Trans. Inf. Theory, vol. 35, pp. 1146–1159, 1989. [58] W. Chambers, “Solution of Welch–Berlekamp Key Equation by Euclidean Algorithm,” Electron. Lett., vol. 29, no. 11, p. 1031, 1993. [59] W. Chambers, R. Peile, K. Tsie, and N. Zein, “Algorithm for Solving the Welch-Berlekamp Key-Equation with a Simplified Proof,” Electron. Lett., vol. 29, no. 18, pp. 1620–1621, 1993. [60] D. Chase, “A Class of Algorithms for Decoding Block Codes With Channel Measurement Information,” IEEE Trans. Inf. Theory, vol. 18, no. 1, pp. 168–182, Jan. 1972. [61] O. Chauhan, T. Moon, and J. Gunther, “Accelerating the Convergence of Message Passing on Loopy Graphs Using Eigenmessages,” in Proceedings of the 37th Annual Asilomar Conference on Signals, Systems, and Computers, Monterey, CA, 2003, pp. 79–83. [62] J. Chen, A. Dholakia, E. Eleftherious, M. Fossorier, and X.-Y. Hu, “Reduced Complexity Decoding of LDPC codes,” IEEE Trans. Commun., vol. 53, no. 8, pp. 1288–1299, Aug. 2005. [63] L. Chen, L. Lan, I. Djurdjevic, S. Lin, and K. Abdel-Ghaffar, “An algebraic method for constructing quasi-cyclic ldpc codes,” in Proceedings of the International Symposium on Information Theory and its Applications (ISITA), Parma, Oct. 2004, pp. 535–539. [64] ——, “Near-Shannon Limit Quasi-Cyclic Low Density Parity Check Codes,” IEEE Trans. Commun., vol. 52, no. 7, pp. 1038–1042, Jul. 2004. [65] K. Chen, K. Niu, and J. Lin, “Improved Successive Cancellation Decoding of Polar Codes,” IEEE Trans. Commun., vol. 61, no. 8, pp. 3100–3107, Aug. 2013. [66] L. Chen, J. Xu, I. Djurdjevic, and S. Lin, “Near-Shannon Limit Quasi-Cyclic Low-Density Parity-Check Codes,” IEEE Trans. Commun., vol. 52, no. 7, pp. 1038–1042, Jul. 2004. [67] N. Chen and Z. Yan, “Reduced-Complexity Reed–Solomon Decoders Based on Cyclotomic FFTs,” IEEE Signal Process. Lett., vol. 16, no. 4, p. 279, 2009. [68] G. T. Chen, Z. Zhang, C. Zhong, and L. Zhang, “A Low Complexity Encoding Algorithm for Systematic Polar Codes,” IEEE Commun. Lett., vol. 20, no. 7, pp. 1277–1280, Jul. 2016. [69] P. Chevillat and J. D.J. Costello, “A Multiple Stack Algorithm for Erasurefree Decoding of Convolutional Codes,” IEEE Trans. Commun., vol. 25, pp. 1460–1470, Dec. 1977. [70] R. Chien, “Cyclic Decoding Procedures for BoseChaudhuri–Hocquenghem Codes,” IEEE Trans. Inf. Theory, vol. 10, pp. 357–363, 1964.

References [71] K. L. Chung, A Course in Probability Theory. New York, NY: Academic Press, 1974. [72] S.-Y. Chung, G. D. Forney, Jr., T. J. Richardson, and R. Urbanke, “On the Design of Low-Density Parity-Check Codes within 0.0045 dB of the Shannon Limit,” IEEE Commun. Lett., vol. 5, no. 2, pp. 58–60, Feb. 2001. [73] S.-Y. Chung, T. J. Richardson, and R. L.Urbanke, “Analysis of Sum-Product Decoding of Low-Density Parity-Check Codes Using Gaussian Approximation,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 657–670, Feb. 2001. [74] T. K. Citron, “Algorithms and Architectures for Error Correcting Codes,” Ph.D. dissertation, Stanford, Aug. 1986. [75] C. A. Cole, S. G. Wilson, E. K. Hall, and T. R. Giallorenzi, “A General Method for Finding Low Error Rates of LDPC Codes,” arXiv:cs/0605051v1 [cs.IT] 11 May 2006, Mar. 2018. [76] J. H. Conway and N. J. A. Sloane, “Fast Quantizing and Decoding Algorithms for Lattice Quantizers and Codes,” IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 227–232, Mar. 1982. [77] ——, “Voronoi Regions of Lattices, Second Moments of Polytopes, and Quantization,” IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 211–226, Mar. 1982. [78] ——, “On the Voronoi Regions of Certain Lattices,” SIAM J. Alg. Discrete Methods, vol. 5, no. 3, pp. 294–305, Sep. 1984. [79] ——, Sphere Packings, Lattices, and Groups, 2nd ed. New York: Springer-Verlag, 1993. [80] ——, “A Fast Encoding Method for Lattice Codes and Quantizers,” IEEE Trans. Inf. Theory, vol. 29, no. 6, pp. 820–824, Nov. 1983. [81] ——, “Soft Decoding Techniques for Codes and Lattices, Including the Golay Code and the Leech Lattice,” IEEE Trans. Inf. Theory, vol. 32, no. 1, pp. 41–50, Jan. 1986. [82] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. New York: Wiley, 2006. [83] D. Cox, J. Little, and D. O’Shea, Ideals, Varieties, and Algorithms. New York: Springer-Verlag, 1992. [84] ——, Using Algebraic Geometry. New York: Springer-Verlag, 1998. [85] J. Crockett, T. Moon, J. Gunther, and O. Chauhan, “Accelerating LDPC Decoding Using Multiple-Cycle Eigenmessages,” in Asilomar Conference on Signals, Systems, and Computers, Monterey, CA, Nov. 2004, pp. 1141–1145. [86] D. Dabiri and I. F. Blake, “Fast Parallel Algorithms for Decoding Reed-Solomon Codes Based on Remainder Polynomials,” IEEE Trans. Inf. Theory, vol. 41, no. 4, pp. 873–885, Jul. 1995. [87] E. Dahlman, S. Parkvall, J. Sköld, and P. Beming, 3G Evolution: HSPA and LTE for Mobile Broadband. New York, NY: Academic Press, 2008. [88] F. Daneshgaran, M. Laddomada, and M. Mondin, “Interleaver Design for Serially Concatenated Convolutional Codes: Theory and Applications,” IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1177–1188, Jun. 2004. [89] D. G. Daut, J. W. Modestino, and L. D. Wismer, “New Short Constraint Length Convolutional Code Constructions for Selected Rational Rates,” IEEE Trans. Inf. Theory, vol. 28, no. 5, pp. 794–800, Sep. 1982. [90] M. C. Davey and D. J. MacKay, “Low Density Parity Check Codes Over GF(q),” IEEE Comput. Lett., vol. 2, no. 6, pp. 165–167, 1998. [91] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete–Time Processing of Speech Signals. New York: Macmillan, 1993. [92] P. Delsarte, “An Algebraic Approach to Coding Theory,” Phillips, Research Reports Supplements, no. 10, 1973. [93] C. Di, D. Proietti, E. Telatar, R. Richardson, and Urbanke, “Finite Length Analysis of Low-Density Parity-Check Codes on the Binary Erasure Channel,” IEEE Trans. Inf. Theory, vol. 48, pp. 1570–1579, Jun. 2002. [94] D. Divsalar, H. Jin, and R. McEliece, “Coding Theorems for Turbo-like Codes,” in Proceedingsof the 36th Annual Allerton Conference, Monticello, IL, 1998, pp. 201–210. [95] D. Divsalar, and S. D. C. Jones, “Construction of Protograph LDPC Codes with Linear Minimum Distance,” in International Symposium on Information Theory, Seattle, WA, 2006. [96] D. Divsalar, C. Jones, S. Dolinar, and J. Thorpe, “Protograph Based LDPC Codes with Minimum Distance Growing Linearly with Block Size,” in 2005 IEEE Global Telecommunications Conference, St. Louis, MO, 2005. [97] D. Divsalar and R. McEliece, “Effective Free Distance of Turbo Codes,” Electron. Lett., vol. 32, no. 5, pp. 445–446, Feb. 1996. [98] D. Divsalar and F. Pollara, “Turbo Codes for Deep-Space Communications,” Jet Propulsion Laboratory, JPL TDA Progress Report 42-122, pp. 42–120, Feb. 1995. [99] D. Divsalar and M. Simon, “The Design of Coded MPSK for Fading Channel: Performance Criteria,” IEEE Trans. Commun., vol. 36, pp. 1013–1021, Sep. 1988. [100] I. Djurdjevic, J. Xu, K. Abdel-Ghaffar, and S. Lin, “A Class of Low-Density Parity-Check Codes Constructed Based on Reed-Solomon Codes with Two Information Symbols,” IEEE Commun. Lett., vol. 7, no. 7, pp. 317–319, Jul. 2003. [101] L. Dolecek, P. Lee, Z. Zhang, V. Anantharam, B. Nikolic, and M. Wainwright, “Predicting Error Floors of Stuctured LDPC Codes: Deterministic Bounds and Estimates,” IEEE J. Sel Areas Commun., vol. 27, no. 6, pp. 908–917, Aug. 2009. [102] S. Dolinar and D. Divsalar, “Weight Distributions for Turbo Codes Using Random and Nonrandom Permutations,” Jet Propulsion Laboratory, JPL TDA Progress Report 42-121, pp. 42–120, Aug. 1995. [103] B. Dorsch, “A Decoding Algorithm for Binary Block Codes and j-ary Output Channels,” IEEE Trans. Inf. Theory, vol. 20, pp. 391–394, May 1974. [104] “ETSI EN 300 744 v1.6.1 Digital Video Broadcasting (DVB); Framing Structure, Channel Coding and Modulation for Digital Terretrial Television,” https://www.etsi.org/deliver/etsi_en/300700_300799/300744/01.06.01_60/en_300744v010601p.pdf, 2009. [105] “DVB: ETSI EN 302 755 v1.2.1 Digital Video Broadcasting (DVB); Frame Structure Channel Coding and Modulation for a Second Generation Digital Terrestrial Television Broadcasting System (DVB-T2),” https://www.etsi.org/deliver/etsi_en/302700_302799/302755/ 01.02.01_40/en_302755v010201o.pdf, 2010. [106] H. El Gamal and J. A. Roger Hammons, “Analyzing the Turbo Decoder Using the Gaussian Approximation,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 671–686, Feb. 2001.

927

928

References [107] E. Eldon, “Euclidean Geometry Cyclic Codes,” in Proceedings of the Symposium on Combinatorial Mathematics. Chapel Hill, NC: University of North Carolina, Apr. 1967. [108] M. Elia, “Algebraic Decoding of the (23,12,7) Golay Code,” IEEE Trans. Inf. Theory, vol. 33, no. 1, pp. 150–151, Jan. 1987. [109] P. Elias, “Coding for Noisy Channels,” IRE Conv. Rept. Pt. 4, New York, NY, pp. 37–47, 1955. [110] ——, “List Decoding For Noisy Channels,” Research Laboratory of Electronics, MIT,, Cambridge, MA, Technical Report 335, 1957. [111] M. Eyuboglu and G. D. Forney, Jr., “Trellis Precoding: Combined Coding, Precoding and Shaping for Intersymbol Interference Channels,” IEEE Trans. Inf. Theory, vol. 36, pp. 301–314, Mar. 1992. [112] D. Falconer, “A Hybrid Sequential and Algebraic Decoding Scheme,” Ph.D. dissertation. MIT, Cambridge, MA, 1967. [113] R. M. Fano, “A Heuristic Discussion of Probabilistic Decoding,” IEEE Trans. Inf. Theory, vol. 9, pp. 64–73, Apr 1963. [114] S. Federenko, “A Simple Algorithm for Decoding Reed-Solomon Codes and its Relation to the Welch Berlekamp Algorithm,” IEEE Trans. Inf. Theory, vol. 51, no. 3, pp. 1196–1198, Mar. 2005. [115] ——, “Correction to A Simple Algorithm for Decoding Reed-Solomon Codes and its Relation to the Welch Berlekamp Algorithm,” IEEE Trans. Inf. Theory, vol. 52, no. 3, p. 1278, Mar. 2006. [116] S. Fedorenko, “A Simple Algorithm for Decoding Both Errors and Erasures in Reed–Solomon Codes,” in Proceedings of the Workshop: Coding Theory Days in St. Petersburg. , Sep. 2008, pp. 18–21. [117] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using Linear Programming to Decode Binary Linear Codes,” IEEE Trans. Inf. Theory, vol. 51, no. 3, pp. 954–972, Mar. 2005. [118] A. J. Felstrom and K. S. Zigangirov, “Time-Varying Periodic Convolutional Codes with Low-Density Parity-Check Matrix,” IEEE Trans. Inf. Theory, vol. 45, no. 6, pp. 2181–2191, Sep. 1999. [119] G.-L. Feng, “A Fast Special Factorization Algorithm in the Sudan Decoding Procedure for Reed-Solomon Codes,” in Proceedings of the 31st Allerton Conference on Communications, Control, and Computing, Monticello, IL, 2000, pp. 593–602. [120] G.-L. Feng and K. K. Tzeng, “A Generalized Euclidean Algorithm for Multisequence Shift-Register Synthesis,” IEEE Trans. Inf. Theory, vol. 35, no. 3, pp. 584–594, May 1989. [121] ——, “A Generalization of the Berlekamp-Massey Algorithm for Multisequence Shift-Register Synthesis with Applications to Decoding Cyclic Codes,” IEEE Trans. Inf. Theory, vol. 37, no. 5, pp. 1274–1287, Sep. 1991. [122] W. Feng and Z. Yu, “Error Evaluator for Inversionless Berlekamp-Massey Algorithm in Reed-Solomon Decoders,” United States Patent 7,249,310, Jul. 2007. [123] M. Ferrari and S. Bellini, “Importance Sampling Simulation of Concatenated Block Codes,” Proc. IEE, vol. 147, pp. 245–251, Oct. 2000. [124] P. Fire, A Class of Multiple-Error-Correcting Binary Codes For Non-Independent Errors, Sylvania Report No. RSL-E-2, Sylvania Electronic Defense Laboratory, Reconnaissance Systems Division, Mountain View, CA, Mar. 1959. [125] G. D. Forney, Jr., “On Decoding BCH Codes,” IEEE Trans. Inf. Theory, vol. 11, no. 4, pp. 549–557, Oct. 1965. [126] ——, Concatenated Codes. Cambridge, MA: MIT Press, 1966. [127] ——, “Generalized Minimum Distance Decoding,” IEEE Trans. Inf. Theory, vol. 12, no. 2, pp. 125–131, Apr. 1966. [128] ——, “The Viterbi Algorithm,” Proc. IEEE, vol. 61, no. 3, pp. 268–278, Mar. 1973. [129] ——, “Convolutional Codes III: Sequential Decoding,” Inf. Control, vol. 25, pp. 267–297, Jul. 1974. [130] ——, “Coset Codes — Part I: Introduction and Geometrical Classification,” IEEE Trans. Inf. Theory, vol. 34, no. 5, pp. 1123–1151, Sep. 1988. [131] ——, “Coset Codes — Part II: Binary Lattices and Related Codes,” IEEE Trans. Inf. Theory, vol. 34, no. 5, pp. 1152–1187, Sep. 1988. [132] ——, “Geometrically Uniform Codes,” IEEE Trans. Inf. Theory, vol. 37, pp. 1241–1260, 1991. [133] ——, “Trellis Shaping,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 281–300, Mar. 1992. [134] ——, “Maximum-Likelihood Sequence Estimation of Digital Sequences in the Presence of Intersymbol Interference,” IEEE Trans. Inf. Theory, vol. IT-18, no. 3, pp. 363–378, May 1972. [135] ——, “Convolutional Codes I: Algebraic Structure,” IEEE Trans. Inf. Theory, vol. 16, no. 6, pp. 720–738, Nov. 1970. [136] G. D. Forney, Jr., L. Brown, M. V. Eyuboglu, and I. J. LMoran, “The V.34 High-Speed Modem Standard,” IEEE Commun Mag, vol. 34, pp. 28–33, Dec. 1996. [137] G. D. Forney, Jr. and M. Eyuboglu, “Combined Equalization and Coding Using Precoding,” IEEE Commun. Mag., vol. 29, no. 12, pp. 25–34, Dec. 1991. [138] G. D. Forney, Jr., R. G. Gallager, G. R. Lang, F. M. Longstaff, and S. U. Qureshi, “Efficient Modulation for Band–Limited Channels,” IEEE J. Sel Areas Commun., vol. 2, no. 5, pp. 632–647, Sep. 1984. [139] G. Foschini and M. Gans, “On Limits of Wireless Communications in a Fading Environment,” Wirel. Pers. Commun., vol. 6, no. 3, pp. 311–355, Mar. 1998. [140] M. P. Fossorier, F. Burkert, S. Lin, and J. Hagenauer, “On the Equivalence Between SOVA and Max-Log-MAP Decodings,” IEEE Commun. Lett., vol. 2, no. 5, pp. 137–139, May 1998. [141] M. P. Fossorier and S. Lin, “Chase-Type and GMD-Type Coset Decoding,” IEEE Trans. Commun., vol. 48, pp. 345–350, Mar. 2000. [142] M. Fossorier and S. Lin, “Soft-Decision Decoding of Linear Block Codes Based on Ordered Statistics,” IEEE Trans. Inf. Theory, vol. 41, pp. 1379–1396, Sep. 1995. [143] ——, “Differential Trellis Decoding of Convolutional Codes,” IEEE Trans. Inf. Theory, vol. 46, pp. 1046–1053, May 2000. [144] J. B. Fraleigh, A First Course in Abstract Algebra. Reading, MA: Addison-Wesley, 1982. [145] B. Friedland, Control System Design: An Introduction to State-Space Design. New York: McGraw-Hill, 1986.

References [146] B. L. Gal, C. Leroux, and C. Jego, “Multi-gb/s Software Decoding of Polar Codes,” IEEE Trans. Signal Process., vol. 63, no. 2, pp. 349–359, Jan. 2015. [147] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. [148] ——, “Low-Density Parity-Check Codes,” IRE Trans. Inf. Theory, vol. IT-8, pp. 21–28, Jan. 1962. [149] ——, Low-Density Parity-Check Codes. Cambridge, MA: M.I.T. Press, 1963. [150] S. Gao, “A new algorithm for decoding Reed-Solomon codes,” in Communications, Information, and Network Security, V. Bhargava, H. Poor, V. Tarokh, and S. Yoon, Eds. Kluwer, 2003, vol. 712, pp. 55–68. [151] S. Gao and M. A. Shokrollahi, “Computing Roots of Polynomials over Function Fields of Curves,” in Coding Theory and Cryptography, D. Joyner, Ed. Springer-Verlag, 1999, pp. 214–228. [152] J. Geist, “An Empirical Comparison of Two Sequential Decoding Algorithms,” IEEE Trans. Commun. Tech., vol. 19, pp. 415–419, Aug. 1971. [153] ——, “Search Properties for Some Sequential Decoding Algorithms,” IEEE Trans. Inf. Theory, vol. 19, pp. 519–526, Jul. 1973. [154] A. Geramita and J. Seberry, Orthogonal Designs, Quadratic Forms and Hadamard Matrices, ser. Lecture Notes in Pure and Applied Mathematics, vol. 43. New York and Basel: Marcel Dekker, 1979. [155] P. Giard, G. Sarkis, A. Balatsoukas-Stimming, Y. Fan, C. Tsui, A. Burg, C. Thibeault, and W. J. Gross, “Hardware Decoders for Polar Codes: An Overview,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, QC, May 2016, pp. 149–152. [156] P. Giard, G. Sarkis, C. Leroux, C. Thibeault, and W. J. Gross, “Low-Latency Software Polar Decoders,” CoRR, vol. abs/1504.00353, 2015. [Online]. http://arxiv.org/abs/1504.00353 [157] P. Giard, G. Sarkis, C. Thibeault, and W. J. Gross, “237 Gbit/s Unrolled Hardware Polar Decoder,” Electron. Lett., vol. 51, pp. 762–763. 2015. [158] R. Gold, “Maximal Recursive Sequences with 3-Valued Recursive Cross-Correlation Functions,” IEEE Trans. Info. Theory, vol. 14, no. 1, pp. 154–156, 1968. [159] ——, “Optimal Binary Sequences for Spread Spectrum Multiplexing,” IEEE Trans. Inf. Theory, vol. 13, no. 4, pp. 619–621, Oct. 1967. [160] S. W. Golomb, Shift Register Sequences. San Francisco: Holden–Day, 1967. [161] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. Baltimore, MD: Johns Hopkins University Press, 1996. [162] M. Gonzàlez-Lopez, L. Castedo, and J. García-Frías, “BICM for MIMO Systems Using Low-Density Generator Matrix (LDGM) Codes,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Montreal, QC: IEEE Press, May 2004. [163] V. Goppa, “Codes on Algebraic Curves,” Soviet Math. Dokl., vol. 24, pp. 170–172, 1981. [164] ——, Geometry and Codes. Dordrecht: Kluwer Academic, 1988. [165] D. Gorenstein and N. Zierler, “A Class of Error Correcting Codes in pm Symbols,” J. Soc. Ind. Appl. Math., vol. 9, pp. 207–214, Jun. 1961. [166] R. Graham, D. Knuth, and O. Patashnik, Concrete Mathematics. Reading, MA: Addison-Wesley, 1989. [167] S. Gravel, “Using Symmetries to Solve Asymmetric Problems,” Ph.D. dissertation, Cornell Univerity, 2009. [168] S. Gravel and V. Elser, “Divide and Concur: A General Approach to Constraint Satisfaction,” Phys. Rev., vol. 78, p. 36706, 2008. [169] W. Gross, “Are Polar Codes Practical?” https://youtu.be/S3bZOSlNFGo, Feb. 10, 2015. [170] J. Gunther, M. Ankapura, and T. Moon, “Blind Turbo Equalization Using a Generalized LDPC Decoder,” in Digital Signal Processing Workshop. Taos Ski Valley, NM: IEEE Press , 2004, pp. 206–210. [171] F. Guo and H. Henzo, “Reliability Ratio Based Weighted Bit-Flipping Decoding for Low-Density Parity-Check Codes,” IEEE Electron Lett., vol. 40, no. 21, pp. 1356–1358, 2004. [172] V. Guruswami and M. Sudan, “Improved Decoding of Reed-Solomon Codes and Algebraic Geometry Codes,” IEEE Trans. Inf. Theory, vol. 45, no. 6, pp. 1757–1767, Sep. 1999. [173] D. Haccoun and M. Ferguson, “Generalized Stack Algorithms for Decoding Convolutional Codes,” IEEE Trans. Inf. Theory, vol. 21, pp. 638–651, Apr. 1975. [174] J. Hagenauer, “Rate Compatible Punctured Convolutional Codes and their Applications,” IEEE Trans. Commun., vol. 36, pp. 389–400, Apr. 1988. [175] ——, “Source-Controlled Channel Decoding,” IEEE Trans. Commun., vol. 43, no. 9, pp. 2449–2457, Sep. 1995. [176] J. Hagenauer and P. Hoeher, “A Viterbi Algorithm with Soft-Decision Outputs and its Applications,” in Proceedings of. Globecom, Dallas, TX, Nov. 1989, pp. 1680–1686. [177] J. Hagenauer, E. Offer, and L. Papke, Reed Solomon Codes and their Applications. New York: IEEE Press, 1994, ch. Matching Viterbi Decoders and Reed-Solomon Decoders in a Concatenated System. [178] ——, “Iterative Decoding of Binary Block and Convolutional Codes,” IEEE Trans. Inf. Theory, vol. 42, no. 2, pp. 429–445, Mar. 1996. [179] D. Haley, A. Grant, and J. Buetefuer, “Iterative encoding of Low-Density Parity-Check Codes,” in Global Communication Conference (GLOBECOM). Taipei: IEEE Press, Nov. 2002, pp. 1289–1293. [180] R. W. Hamming, Coding and Information Theory, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1986. [181] R. Hamming, “Error Detecting and Error Correcting Codes,” Bell Syst. Tech. Jl, vol. 29, pp. 41–56, 1950. [182] A. R. Hammonds, Jr., P. V. Kummar, A. Calderbank, N. Sloane, and P. Solè, “The ℤ4 -Linearity of Kerdock, Preparata, Goethals, and Related Codes,” IEEE Trans. Inf. Theory, vol. 40, no. 2, pp. 301–319, Mar. 1994. [183] Y. Han, “LDPC Coding for Magnetic Storage: Low-floor Decoding Algorithms, System Design, and Performance Analysis,” Ph.D. dissertation, ECE Department, University of Arizona, Aug. 2008. [184] Y. Han, C. Hartmann, and C. Chen, “Efficient Priority-First Search Maximum-Likelihood Soft-Decision Decoding of Linear Block Codes,” IEEE Trans. Inf. Theory, vol. 39, pp. 1514–1523, Sep. 1993.

929

930

References [185] Y. Han, C. Hartmann, and K. Mehrota, “Decoding Linear Block Codes Using a Priority-First Search: Performance Analysis and Suboptimal Version,” IEEE Trans. Inf. Theory, vol. 44, pp. 1233–1246, May 1998. [186] Y. Han and W. Ryan, “Low-Floor Decoders for LDPC Codes,” IEEE Trans. Commun., vol. 57, no. 5, May 2009. [187] L. Hanzo, T. Liew, and B. Yeap, Turbo Coding, Turbo Equalization and Space-Time Coding for Transmission Over Fading Channels. West Sussex, England: Wiley, 2002. [188] H. Harashima and H. Miyakawa, “Matched-Transmission Technique for Channels with Intersymbol Interference,” IEEE Trans. Commun., vol. 20, pp. 774–780, Aug. 1972. [189] C. R. P. Hartmann and L. D. Rudolph, “An Optimum Symbol-by-Symbol Decoding Rule for Linear Codes,” IEEE Trans. Inf. Theory, vol. 22, no. 5, pp. 514–517, Sep. 1976. [190] C. Hartmann and K. Tzeng, “Generalizations of the BCH Bound,” Inf. Contr., vol. 20, no. 5, pp. 489–498, Jun. 1972. [191] C. Hartmann, K. Tzeng, and R. Chen, “Some Results on the Minimum Distance of Cyclic Codes,” IEEE Trans. Inf. Theory, vol. 18, no. 3, pp. 402–409, May 1972. [192] H. Hasse, “Theorie der höen Differentiale in einem algebraishen Funcktionenköper mit volkommenem Kostantenköerp bei Charakteristic,” J. Reine. Ang. Math., pp. 50–54, 175. [193] C. Heegard, “Randomizer for Byte-Wise Scrambling of Data,” United States Patent 5,745,522, Apr. 1998. [194] C. Heegard and S. B. Wicker, Turbo Coding. Boston: Kluwer Academic, 1999. [195] J. Heller, “Short Constraint Length Convolutional Codes,” Jet Propulsion Labs, Space Programs Summary 37–54, vol. III, pp. 171–177, 1968. [196] J. Heller and I. M. Jacobs, “Viterbi Decoding for Satellite and Space Communications,” IEEE Trans. Commun. Tech., vol. 19, no. 5, pp. 835–848, Oct. 1971. [197] F. Hemmati and D. Costello, “Truncation Error Probability in Viterbi Decoding,” IEEE Trans. Commun., vol. 25, no. 5, pp. 530–532, May 1977. [198] R. W. Hinton, “Finite-Length Scaling of Multi-Edge Type LDPC Ensembles,” Ph.D. dissertation, University of Virginia, 2011. [199] M. Hirotomo, Y. Konishi, and M. Morii, “Approximate Examination of Trapping Sets of LDPC Codes using the Probabilistic Algorithm,” in Proceedingsof the International Symposium on Information Theory and its applications (ISITA), Auckland, Dec. 2008, pp. 754–759. [200] B. Hochwald and W. Sweldons, “Differential Unitary Space-Time Modulation,” Bell Labs, Lucent Technology Technical Report, 1999. [201] A. Hocquenghem, “Codes Correcteurs D’erreurs,” Chiffres, vol. 2, pp. 147–156, 1959. [202] R. A. Horn and C. A. Johnson, Matrix Analysis. Cambridge: Cambridge University Press, 1985. [203] X.-Y. Hu, E. Eleftheriou, and D. Arnold, “Regular and Irregular Progressive Edge-Growth Tanner Graphs,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 386–398, Jan. 2005. [204] Z. Huang and C. Heegard, “Quadrature Amplitude Modulated Data for Standard Bandwidth Television Channel,” United States Patent 5,511,096, Apr. 1996. [205] B. Hughes, “Differential Space-Time Modulation,” in Proceedings of the IEEE Wireless Communications and Networking Conference, New Orleans, LA, Sep. 1999. [206] T. W. Hungerford, Algebra. New York: Springer-Verlag, 1974. [207] N. Hussami, S. Korada, and R. Urbanke, “Performance of Polar Codes for Channel and Source Coding,” in Proceediongsof the 2009 IEEE International Symposium on Information Theory. Seoul: IEEE Press, 2009. [208] S. Ikeda, T. Tanaka, and S. Amari, “Information Geometry of Turbo and Low-Density Parity-Check Codes,” IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1097–1114, Jun. 2004. [209] K. A. S. Immink, “Runlength-Limited Sequences,” Proc. IEEE, vol. 78, pp. 1745–1759, 1990. [210] ——, Coding Techniques for Digital Recorders. Englewood Cliffs, NJ: Prentice-Hall, 1991. [211] K. Immink, Reed Solomon Codes and their Applications. New York: IEEE Press, 1994, ch. RS Code and the Compact Disc. [212] International Telegraph and Telephone Consultive Committee (CCITT), “Recommendation v.29: 9600 Bits per Second Modem Standardized for Use on Point-to-Point 4-Wire Leased Telephone-Type Circuits,” in Data Communication over the Telephone Network “Blue Book”. Geneva: International Telecommunications Union, 1988, vol. VIII, pp. 215–227. [213] “IEEE J. Sel. Areas Commun. (Special Issue on Importance Sampling),” vol. 11, no. 3, pp. 289–476, Apr. 1993. [214] I. Jacobs and E. Berlekamp, “A Lower Bound to the Distribution of Computation for Sequential Decoding,” IEEE Trans. Inf. Theory, vol. 13, pp. 167–174, Apr. 1967. [215] N. Jacobson, Basic Algebra I. New York: Freeman, 1985. [216] W. Jakes, Microwave Mobile Communication. New York, NY: IEEE Press, 1993. [217] S. Jayasooriya, M. Shirvanimoghaddam, L. Ong, G. Lechner, and S. Johnson, “A New Density Evolution Approximation for LDPC and Mult-Edge Type LDPC Codes,” IEEE Trans. Commun., vol. 64, no. 10, pp. 4044–4056, Oct. 2016. [218] F. Jelinek, “An Upper Bound on Moments of Sequential Decoding Effort,” IEEE Trans. Inf. Theory, vol. 15, pp. 140–149, Jan. 1969. [219] ——, “Fast Sequential Decoding Algorithm Using a Stack,” IBM J. Res. Dev., pp. 675–685, Nov. 1969. [220] F. Jelinek and J. Cocke, “Bootstrap Hybrid Decoding for Symmetric Binary Input Channels,” Inf. Control., vol. 18, pp. 261–281, Apr. 1971. [221] M. Jiang, Z. Shi, and Y. Chen, “An Improvement on the Moodified Weighted Bit Flipping Decoding Algorithm for LDPC Codes,” IEEE Commun. Lett., vol. 9, no. 9, pp. 814–816, 2005. [222] M. Jiang, C. Zhao, Z. Shi, and Y. Chen, “An Improvement on the Modified Weighted Bit Flipping Decoding Algorithm for LDPC Codes,” IEEE Commun. Lett., vol. 9, no. 9, pp. 814–816, 2005.

References [223] H. Jin, A. Khandekar, and R. McEliece, “Irregular Repeat-Accumulate Codes,” in Proceedings 2nd International Symposium on Turbo Codes and Related Topics, Brest, France, Sep. 2000, pp. 1–8. [224] O. Joerssen and H. Meyr, “Terminating the Trellis of Turbo-Codes,” Electron. Lett., vol. 30, no. 16, pp. 1285–1286, 1994. [225] R. Johannesson, “Robustly Optimal Rate One-Half Binary Convolutional Codes,” IEEE Trans. Inf. Theory, vol. 21, pp. 464–468, Jul. 1975. [226] ——, “Some Long Rate One-Half Binary Convolutional Codes with an Optimum Distance Profile,” IEEE Trans. Inf. Theory, vol. 22, pp. 629–631, Sep. 1976. [227] ——, “Some Rate 1/3 and 1/4 Binary Convolutional Codes with an Optimum Distance Profile,” IEEE Trans. Inf. Theory, vol. 23, pp. 281–283, Mar. 1977. [228] R. Johannesson and E. Paaske, “Further Results on Binary Convolutional Codes with an Optimum Distance Profile,” IEEE Trans. Inf. Theory, vol. 24, pp. 264–268, Mar. 1978. [229] R. Johannesson and P. Stahl, “New Rate 1/2, 1/3, and 1/4 Binary Convolutional Encoders with Optimum Distance Profile,” IEEE Trans. Inf. Theory, vol. 45, pp. 1653–1658, Jul. 1999. [230] R. Johannesson and Z.-X. Wan, “A Linear Algebra Approach to Minimal Convolutional Encoders,” IEEE Trans. Inf. Theory, vol. 39, no. 4, pp. 1219–1233, Jul. 1993. [231] R. Johannesson and K. Zigangirov, Fundamentals of Convolutional Coding. Piscataway, NJ: IEEE Press, 1999. [232] S. Johnson and S. Weller, “Regular Low-Density Parity-Check Codes from Combinatorial Designs,” in Proceedings of the 2001 Information Theory Workshop, Cairns, Australia, Sep. 2001, pp. 90–92. [233] ——, “Construction of Low-Density Parity-Check Codes from Kirkman Triple Systems,” in Global Communications Conference (GLOBECOM), Nov. 25–29, 2001, pp. 970–974. [234] ——, “Regular Low-Density Parity-Check Codes from Combinatorial Designs,” in Information Theory Workshop, Cairns, 2001, pp. 90–92. [235] ——, “High-Rate LDPC Codes from Unital Designs,” in Global Telecommunications Conference (GLOBECOM). San Francisco, CA: IEEE, 2003, pp. 2036–2040. [236] ——, “Resolvable 2-Designs for Regular Low-Density Parity Check Codes,” IEEE Trans. Commun., vol. 51, no. 9, pp. 1413–1419, Sep. 2003. [237] C. Jones, S. Dolinar, K. Andrews, D. Divsalar, Y. Zhang, and W. Wyan, “Functions and Architectures for LDPC Decoding,” in IEEE Information Theory Workshop, Tahoe City, CA, Sep. 2–6 2007, pp. 577–583. [238] P. Jung and M. Nasshan, “Dependence of the Error Performance of Turbo-Codes on the Interleaver Structure in Short Frame Transmission Systems,” Electron. Lett., vol. 30, no. 4, pp. 287–288, Feb. 1994. [239] T. Kailath, Linear Systems. Englewood Cliffs, NJ: Prentice-Hall, 1980. [240] N. Kamiya, “On Algebraic Soft-Decision Decoding Algorithms for BCH codes,” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 45–58, Jan. 2001. [241] T. Kaneko, T. Nishijima, and S. Hirasawa, “An Improvement of Soft-Decision Maximum-Likelihood Decoding Algorithm Using Hard-Decision Bounded-Distance Decoding,” IEEE Trans. Inf. Theory, vol. 43, pp. 1314–1319, Jul. 1997. [242] T. Kaneko, T. Nishijima, H. Inazumi, and S. Hirasawa, “An Efficient Maximum Likelihood Decoding of Linear Block Codes with Algebraic Decoder,” IEEE Trans. Inf. Theory, vol. 40, pp. 320–327, Mar. 1994. [243] T. Kasami, “A Decoding Method for Multiple-Error-Correcting Cyclic Codes by Using Threshold Logics,” in Conference Record of the Information Processing Society of Japan (in Japanese), Tokyo, Nov. 1961. [244] ——, “A Decoding Procedure for Multiple-Error-Correction Cyclic Codes,” IEEE Trans. Inf. Theory, vol. 10, pp. 134–139, Apr. 1964. [245] T. Kasami, S. Lin, and W. Peterson, “Some Results on the Weight Distributions of BCH Codes,” IEEE Trans. Inf. Theory, vol. 12, no. 2, p. 274, Apr. 1966. [246] T. Kasami, S. Lin, and W. W. Peterson, “Polyonomial Codes,” IEEE Trans. Inf. Theory, vol. 14, no. 6, pp. 807–814, Nov. 1968. [247] D. E. Knuth, The Art of Computer Programming. Reading, MA: Addison-Wesley, 1997, vol. 1. [248] R. Koetter, “On Algebraic Decoding of Algebraic-Geometric and Cyclic Codes,” Ph.D. dissertation, University of Linkoping, 1996. [249] R. Koetter, A. C. Singer, and M. Tüchler, “Turbo Equalization,” IEEE Signal Process. Mag., no. 1, pp. 67–80, Jan. 2004. [250] R. Koetter and A. Vardy, “The Structure of Tail-Biting Trellises: Minimality and Basic Principles,” IEEE Trans. Inf. Theory, vol. 49, no. 9, pp. 2081–2105, Sep. 2003. [251] R. Koetter and A. Vardy, “Algebraic Soft-Decision Decoding of Reed–Solomon Codes,” IEEE Trans. Inf. Theory, vol. 49, no. 11, pp. 2809–2825, Nov. 2003. [252] V. Kolesnik, “Probability Decoding of Majority Codes,” Prob. Pered. Inf., vol. 7, pp. 3–12, Jul. 1971. [253] R. Kötter, “Fast Generalized Minimum Distance Decoding of Algebraic-Geometry and Reed–Solomon Codes,” IEEE Trans. Inf. Theory, vol. 42, no. 3, pp. 721–737, May 1996. [254] Y. Kou, S. Lin, and M. Fossorier, “Low-Density Parity-Check Codes Based on Finite Geometries: A Rediscovery and New Results,” IEEE Trans. Inf. Theory, vol. 47, no. 13, pp. 2711–2736, Nov. 2001. [255] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor Graphs and the Sum-Product Algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001. [256] L. Lan, Y. Tai, S. Lin, and K. Abdel-Ghaffar, “A Trellis-Based Method for Removal of Cycles from Bipartite Graphs and Construction of Low Density Parity Check Codes,” IEEE Commun. Lett., vol. 8, no. 7, pp. 443–445, Jul. 2004. [257] L. Lan, Y. Tai, S. Lin, B. Memari, and B. Honary, “New Construction of Quasi-Cyclic LDPC Codes Based on Special Classes of BIBDs for the AWGN and Binary Erasure Channels,” IEEE Trans. Commun., vol. 56, no. 1, pp. 39–48, Jan. 2008. [258] L. Lan, L.-Q. Zen, Y. Tai, L. Chen, S. Lin, and K. Abdel-Ghaffar, “Construction of Quasi-Cyclic LDPC Codes for AWGN and Binary Erasure Channels: A Finite Field Approach,” IEEE Trans. Inf. Theory, vol. 53, no. 7, pp. 2429–2458, Jul. 2007.

931

932

References [259] L. Lan, L.-Q. Zeng, Y. Tai, S. Lin, and K. Abdel-Ghaffar, “Construction of Quasi-Cyclic LDPC Codes for AWGN and Binary Erasure Channels Based on Finite Fields and Affine Permutations,” in Proceedings of the IEEE Symposium on Information Theory, Adelaide, Sep. 2005, pp. 2285–2289. [260] R. Laroia, N. Farvardin, and S. A. Tretter, “On Optimal Shaping of Multidimensional Constellations,” IEEE Trans. Inf. Theory, vol. 40, no. 4, pp. 1044–1056, Jul. 1994. [261] K. Larsen, “Short Convolutional Codes with Maximal Free Distance for Rates 1/2, 1/3 and 1/4,” IEEE Trans. Inf. Theory, vol. 19, pp. 371–372, May 1973. [262] L. Lee, Convolutional Coding: Fundamentals and Applications. Boston, MA: Artech House, 1997. [263] C. Lee and W. Wolf, “Implementation-Efficient Reliability Ratio Based Weighted Bit-Flipping Decoding for LDPC Codes,” IEEE Electron. Lett., vol. 41, no. 13, pp. 755–757, 2005. [264] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross, “A Semi-Parallel Successive-Cancellation Decoder for Polar Codes,” IEEE Trans. Signal Process, vol. 61, no. 2, pp. 289–299, Jan. 2013. [265] C. Leroux, I. Tal, A. Vardy, and W. Gross, “Hardware Architectures for Successive Cancellation Decoding of Polar Codes,” in Proceedings of the IEEE International Conference oN Acoustics, Speech, and Signal Processing (ICASSP), Prague, 2011, pp. 1665–1668. [266] N. Levanon, Radar Principles. New York: Wiley Interscience, 1988. [267] Z. Li, L. Chen, L. Zeng, S. Lin, and W. Fong, “Efficient Encoding of Quasi-Cyclic Low-Density Parity-Check Codes,” IEEE Trans. Commun., vol. 54, no. 1, pp. 71–81, Jan. 2006. [268] R. Lidl and H. Niederreiter, Finite Fields. Reading, MA: Addison-Wesley, 1983. [269] ——, Introduction to Finite Fields and their Applications. Cambridge: Cambridge University Press, 1986. [270] T.-C. Lin, P.-D. Chen, and T. Trieu-Kien, “Simplified Procedure for Decoding Nonsystematic Reed-Solomon Codes over GF(2m ) Using Euclid’s Algorithm and the Fast Fourier Transform,” IEEE Trans. Commun., vol. 57, no. 6, pp. 1588–1592, 2009. [271] S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983. [272] S. Lin and D. J. Costello, Jr., Error Control Coding: Fundamentals and Applications, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2004. [273] S. Lin, T. Kasami, T. Fujiwara, and M. Fossorier, Trellises and Trellis-based Decoding Algorithms for Linear Block Codes. Boston: Kluwer Academic Publishers, 1998. [274] S. Lin and E. Weldon, “Further Results on Cyclic Product Codes,” IEEE Trans. Inf. Theory, vol. 6, no. 4, pp. 452–459, Jul. 1970. [275] D. Lind and B. Marcus, Symbolic Dynamics and Coding. Cambridge: Cambridge University Press, 1995. [276] J. Lodge, P. Hoeher, and J. Hagenauer, “The Decoding of Multidimensional Codes Using Separable MAP ‘Filters’,” in Proceedings of the 16th Biennial Symposium on Communication. Kingston, ON: Queen’s University, May 1992, pp. 343–346. [277] J. Lodge, R. Young, P. Hoeher, and J. Hagenauer, “Separable MAP ‘Filters’ for the Decoding of Product and Concatenated Codes,” in Proceedings of the IEEE International Conference on Communication, Geneva, Switzerland, May 1993, pp. 1740–1745. [278] T. D. Lookabaugh and R. M. Gray, “High-Resolution Quantization Theory and the Vector Quantizer Advantage,” IEEE Trans. Inf. Theory, vol. 35, no. 5, pp. 1020–1033, Sep. 1989. [279] D. Lu and K. Yao, “Improved Importance Sampling Technique for Efficient Simulation of Digital Communication Systems,” IEEE J. Sel. Areas Commun., vol. 6, no. 1, pp. 67–75, Jan. 1988. [280] ——, “Improved Importance Sampling Technique for Efficient Simulation of Digital Communication Systems,” IEEE J. Spec. Areas Commun., vol. 6, no. 1, pp. 67–75, Jan. 1988. [281] M. Luby, “LT codes,” in The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings, Nov. 2002, pp. 271–280. [282] M. G. Luby, M. Miteznmacher, M. A. Shokrollahi, and D. A. Spielman, “Improved Low-Density Parity-Check Codes Using Irregular Graphs,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 585–598, Feb. 2001. [283] M. Luby, M. Mitzenmacher, M. Shokrollahi, D. Spielman, and V. Stemann, “Practical Loss-Resilient Codes,” in Proceedings of the ACM Symposium on Theory of Computing, 1997, pp. 150–159. [284] X. Ma and X.-M. Wang, “On the Minimal Interpolation Problem and Decoding RS Codes,” IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 1573–1580, Jul. 2000. [285] H. Ma and J. Wolf, “On Tailbiting Convolutional Codes,” IEEE Trans. Commun., vol. 34, pp. 104–111, Feb. 1986. [286] D. J. MacKay, http://www.inference.phy.cam.ac.uk/mackay/CodesFiles.html. [287] ——, “Near Shannon Limit Performance of Low Density Parity Check Codes,” Electron. Lett., vol. 33, no. 6, pp. 457–458, Mar. 1997. [288] ——, “Good Error-Correcting Codes Based on Very Sparse Matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999. [289] D. MacKay and M. Postol, “Weaknesses of Margulis and Ramanujan-Margulis Low-Density Parity-Check Codes,” in Proceedings of the conference on Mathematical Foundations of Computer Science and Information Technology (MFCSIT), Galway, 2002. [290] D. J. MacKay and R. M. Neal, “Good Codes Based on Very Sparse Matrices,” in Cryptography and Coding 5th IMA Conference, ser. Lecture Notes in Computer Science, C. Boyd, Ed. Springer, 1995, vol. 1025, pp. 100–111. [291] F. MacWilliams, “A Theorem on the Distribution of Weights in a Systematic Code,” Bell Syst. Tech. J., vol. 42, pp. 79–94, 1963. [292] F. MacWilliams and N. Sloane, The Theory of Error-Correcting Codes. Amsterdam: North-Holland, 1977. [293] G. Margulis, “Explicit Construction of Graphs Without Short Cycles and Low Density Codes,” Combinatorica, vol. 2, no. 1, pp. 71–78, 1982. [294] S. J. Mason, “Feedback Theory — Some Properties of Signal Flow Graphs,” Proceedings IRE, pp. 1144–1156, Sep. 1953. [295] J. L. Massey, “Shift-Register Synthesis and BCH Decoding,” IEEE Trans. Inf. Theory, vol. IT-15, no. 1, pp. 122–127, 1969. [296] ——, “Variable-Length Codes and the Fano Metric,” IEEE Trans. Inf. Theory, vol. IT-18, no. 1, pp. 196–198, Jan. 1972.

References [297] J. Massey, Threshold Decoding. Cambridge, MA: MIT Press, 1963. [298] ——, “Coding and Modulation in Digital Communications,” in Proceedings of the International Zurich Seminar on Digital Communications, Zurich, Switzerland, 1974, pp. E2(1)–E2(4). [299] J. Massey and M. Sain, “Inverses of Linear Sequential Circuits,” IEEE Trans. Comput., vol. 17, pp. 330–337, Apr. 1968. [300] R. McEliece, The Theory of Information and Coding, ser. Encyclopedia of Mathematics and its Applications. Reading, MA: Addison-Wesley, 1977. [301] ——, “A Public-Key Cryptosystem Based on Algebraic Coding Theory,” JPL, DSN Progress Report 42–44, January and February 1978. [302] ——, “The Guruswami-Sudan Algorithm for Decoding Reed–Solomon Codes,” JPL, IPN Progress Report 42-153, May 15, 2003, http:// iprn.jpl.nasa.gov/progress_report/42-153/. [303] R. J. McEliece, E. R. Rodemich, H. Rumsey, Jr., and L. R. Welch, “New Upper Bounds on the Rate of a Code via the Delsarte-MacWilliams Inequalities,” IEEE Trans. Inf. Theory, vol. 23, no. 2, pp. 157–166, Mar. 1977. [304] R. J. McEliece and J. B. Shearer, “A Property of Euclid’s Algorithm and an Application to Padè Approximation,” SIAM J. Appl. Math, vol. 34, no. 4, pp. 611–615, Jun 1978. [305] R. McEliece and L. Swanson, Reed–Solomon Codes and their Applications. New York: IEEE Press, 1994, ch. Reed-Solomon Codes and the Exploration of the Solar System, pp. 25–40. [306] J. Meggitt, “Error Correcting Codes and their Implementations,” IRE Trans. Inf. Theory, vol. 7, pp. 232–244, Oct. 1961. [307] A. Michelson and A. Levesque, Error Control Techniques for Digital Communication. New York: Wiley, 1985. [308] O. Milenkovic, E. Soljanin, and P. Whiting, “Asymptotic Spectra of Trapping Sets in Regular and Irregular LDPC Code Ensembles,” IEEE Trans. Inf. Theory, vol. 53, no. 1, pp. 39–55, Jan. 2007. [309] W. Mills, “Continued Fractions and Linear Recurrences,” Math.Comput., vol. 29, no. 129, pp. 173–180, Jan. 1975. [310] M. Mitchell, “Coding and Decoding Operation Research,” G.E. Advanced Electronics Final Report on Contract AF 19 (604)-6183, Air Force Cambridge Research Labs, Cambridge, MA, Technical Report, 1961. [311] ——, “Error-Trap Decoding of Cyclic Codes,” G.E. Report No. 62MCD3, General Electric Military Communications Department, Oklahoma City, OK, Technical Report, Dec. 1962. [312] M. Mitzenmacher, “Digital Fountains: A Survey and Look Forward,” in Information Theory Workshop, San Antonio, TX, 2004, pp. 271–276. [313] T. Moon, “Wavelets and Lattice Spaces,” in International Symposium on Information Theory, Whistler, BC, Sep. 17–22 1995, p. 250. [314] ——, “On General Linear Block Code Decoding Using the Sum-Product Iterative Decoder,” IEEE Commun. Lett., vol. 8, no. 6, pp. 383–385, 2004. [315] T. K. Moon and S. Budge, “Bit-Level Erasure Decoding of Reed-Solomon Codes Over GF(2n ),” in Proceedings of the Asilomar Conference on Signals and Systems, Pacific Grove, CA, 2004, pp. 1783–1787. [316] T. K. Moon, J. S. Crockett, J. H. Gunther, and O. S. Chauhan, “Iterative Decoding using Eigenmessages,” IEEE Trans. Commun., vol. 57, pp. 3618–3628, 2009. [317] T. K. Moon and J. Gunther, “On the Equivalence of Two Welch-Berlekamp Key Equations and their Error Evaluators,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 399–401, 2004. [318] T. Moon and W. Stirling, Mathematical Methods and Algorithms for Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 2000. [319] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing. Upper Saddle River, NJ: Prentice-Hall, 2000. [320] M. Morii and M. Kasahara, “Generalized Key-Equation of Remainder Decoding Algorithm for Reed-Solomon Codes,” IEEE Trans. Inf. Theory, vol. 38, no. 6, pp. 1801–1807, Nov. 1992. [321] D. Muller, “Application of Boolean Switching Algebra to Switching Circuit Design,” IEEE Trans. Comput., vol. 3, pp. 6–12, Sep. 1954. [322] A. Naguib, N. Seshadri, and A. Calderbank, “Increasing Data Rate over Wireless Channels,” IEEE Signal Process. Mag., pp. 76–92, May 2000. [323] R. Nasser and E. Telatar, “Fourier Analysis of MAC Polarization,” IEEE Trans. Inf. Theory, vol. 63, no. 6, pp. 3600–3620, Jun. 2017. [324] K. Niu, K. Chen, J. Lin, and Q. Zhang, “Polar Codes: Primary Concepts and Practical Decoding Algorithms,” IEEE Commun. Mag., vol. 52, no. 7, pp. 192–203, Jul. 2014. [325] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An Introduction to the Theory of Numbers, 5th ed. New York: Wiley, 1991. [326] J. Odenwalder, “Optimal Decoding of Convolutional Codes,” Ph.D. dissertation, Department of Systems Sciences, School of Engineering and Applied Sciences, University of California, Los Angeles, 1970. [327] V. Olshevesky and A. Shokrollahi, “A Displacement Structure Approach to Efficient Decoding of Reed–Solomon and Algebraic-Geometric Codes,” in Proceedings of the 31st ACM Symposium on Theory of Computing, Atlanta, GA, May 1999. [328] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989. [329] A. Orlitsky, R. Urbanke, K. Viswanathan, and J. Zhang, “Stopping Sets and the Girth of Tanner Graphs,” in Proceedings of the IEEE International Symposium on Information Theory, Lausanne, Jun. 2002. [330] A. Orlitsky and K. Viswanathan, “Stopping Set Distribution of LDPC Code Ensembles,” IEEE Trans. Inf. Theory,vol. 51, no. 3, pp. 929–953, 2005. [331] E. Paaske, “Short Binary Convolutional Codes with Maximal Free Distance For Rate 2/3 and 3/4,” IEEE Trans. Inf. Theory, vol. 20, pp. 683–689, Sep. 1974. [332] A. Pamuk and E. Ar𝚤kan, “A Two Phase Successive Cancellation Decoder Architecture for Polar Codes,” in IEEE International Symposium on Information Theory (ISIT), Istanbul, 2013, pp. 957–961. [333] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 2nd ed.. New York: McGraw Hill, 1984. [334] N. Patterson, “The Algebraic Decoding of Goppa Codes,” IEEE Trans. Inf. Theory, vol. 21, no. 2, pp. 203–207, Mar. 1975.

933

934

References [335] L. Perez, J. Seghers, and D. Costello, “A Distance Spectrum Interpretation of Turbo Codes,” IEEE Trans. Inf. Theory, vol. 42, pp. 1698–1709, Nov. 1996. [336] W. Peterson, “Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes,” IEEE Trans. Inf. Theory, vol. 6, pp. 459–470, 1960. [337] W. W. Peterson, Error-Correcting Codes. Cambridge, MA and New York: MIT Press and Wiley, 1961. [338] W. Peterson and E. Weldon, Error Correction Codes. Cambridge, MA: MIT Press, 1972. [339] S. S. Pietrobon and D. J. Costello, “Trellis Coding with Multidimensional QAM Signal Sets,” IEEE Trans. Inf. Theory, vol. 29, no. 2, pp. 325–336, Mar. 1993. [340] S. S. Pietrobon, R. H. Deng, A. Lafanechère, G. Ungerboeck, and D. J. Costello, “Trellis-Coded Multidimensional Phase Modulation,” IEEE Trans. Inf. Theory, vol. 36, no. 1, pp. 63–89, Jan. 1990. [341] S. Pietrobon, G. Ungerböck, L. Perez, and D. Costello, “Rotationally Invariant Nonlinear Trellis Codes for Two-Dimensional Modulation,” IEEE Trans. Inf. Theory, vol. 40, no. 6, pp. 1773–1791, 1994. [342] M. Plotkin, “Binary Codes with Specified Minimum Distances,” IEEE Trans. Inf. Theory, vol. 6, pp. 445–450, 1960. [343] Y. Polyanskey, “Channel Coding: Non-Asymptotic Fundamental Limits,” Ph.D. dissertation, Princeton University, Nov. 2010. [344] H. V. Poor, An Introduction to Signal Detection and Estimation. New York: Springer-Verlag, 1988. [345] J. Porath and T. Aulin, “Algorithm Construction of Trellis Codes,” IEEE Trans. Commun., vol. 41, no. 5, pp. 649–654, 1993. [346] A. B. Poritz, “Hidden Markov Models: A Guided Tour,” in Proceedings of ICASSP, New York, NY 1988. [347] G. Pottie and D. Taylor, “A Comparison of Reduced Complexity Decoding Algorithms for Trellis Codes,” IEEE J. Sel Areas Commun., vol. 7, no. 9, pp. 1369–1380, 1989. [348] E. Prange, “Cyclic Error-Correcting Codes in Two Symbols,” Air Force Cambridge Research Center, Cambridge, MA, Technical Report TN-57-103, Sep. 1957. [349] ——, “Some Cyclic Error-Correcting Codes with Simple Decoding Algorithms,” Air Force Cambridge Research Center, Cambridge, MA, Technical Report TN-58-156, Apr. 1958. [350] ——, “The Use of Coset Equivalence in the Analysis and Decoding of Group Codes,” Air Force Cambridge Research Center, Cambridge, MA, Technical Report TN-59-164, 1959. [351] O. Pretzel, Codes and Algebraic Curves. Oxford: Clarendon Press, 1998. [352] J. G. Proakis, Digital Communications. New York:McGraw Hill, 1995. [353] J. G. Proakis and M. Salehi, Communication Systems Engineering. Upper Saddle River, NJ: Prentice-Hall, 1994. [354] ——, Communication Systems Engineering. Englewood Cliffs, NJ: Prentice-Hall, 1994. [355] A. Pusane, A. Jimenez-Feltstrom, A. Sridharam, A. Lentmaier, and M. Ziganarov, “Implementation Aspects of LDPC Convolutional Codes,” IEEE Trans. Commun., vol. 56, no. 7, pp. 1060–1069, Jul. 2008. [356] M. Püschel and J. Moura, “The Algebraic Approach to the Discrete Cosine and Sine Transforms and their Fast Algorithms,” SIAM J. Comput., vol. 32, no. 5, pp. 1280–1316, 2003. [357] R. M. Pyndiah, “Near-Optimum Decoding of Product Codes: Block Turbo Codes,” IEEE Trans. Commun., vol. 46, no. 8, pp. 1003–1010, Aug. 1998. [358] R. Pyndiah, A. Glavieux, A. Picart, and S. Jacq, “Near Optimum Decoding of Product Codes,” in IEEE Global Telecommunications Conference. San Francisco: IEEE, Nov. 1994, pp. 339–343. [359] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989. [360] T. V. Ramabadran and S. S. Gaitonde, “A Tutorial On CRC Computations,” IEEE Micro, vol. 8, no. 4, pp. 62–75, Aug. 1988. [361] J. Ramsey, “Realization of Optimum Interleavers,” IEEE Trans. Inf. Theory, vol. 16, pp. 338–345, May 1970. [362] I. Reed, “A Class of Multiple-Error-Correcting Codes and a Decoding Scheme,” IEEE Trans. Inf. Theory, vol. 4, pp. 38–49, Sep. 1954. [363] I. Reed, R. Scholtz, T. Truong, and L. Welch, “The Fast Decoding of Reed-Solomon Codes Using Fermat Theoretic Transforms and Continued Fractions,” IEEE Trans. Inf. Theory, vol. 24, no. 1, pp. 100–106, Jan. 1978. [364] I. Reed and G. Solomon, “Polynomial Codes over Certain Finite Fields,” J. Soc. Ind. Appl. Math, vol. 8, pp. 300–304, 1960. [365] I. Reed, T.-K. Truong, X. Chen, and X. Yin, “Algebraic Decoding of the (41,21,9) Quadratic Residue Code,” IEEE Trans. Inf. Theory, vol. 38, no. 3, pp. 974–986, May 1992. [366] I. Reed, X. Yin, and T.-K. Truong, “Algebraic Decoding of (32,16,8) Quadratic Residue Code,” IEEE Trans. Inf. Theory, vol. 36, no. 4, pp. 876–880, Jul. 1990. [367] M. Rice, Digital Communications: A Discrete-Time Approach, 2nd ed. (published by author). ISBN 9781790588545, 2017. [368] T. Richardson, “Multi-Edge type LDPC Codes,” Workshop Honoring Professor R. McEliece on his 60th Birthday. Pasadena, CA: California Institute of Technology, May 24–25, 2002. [369] ——, “Error floors of LDPC codes,” in Proceedings of the 41st Allerton Conference on Communications, Control, and Computing, , Monticello, IL: Allterton House, Oct. 2003. [370] T. Richardson and V. Novichkov, “Methods and Apparatus for Decoding LDPC Codes,” U.S. Patent 6,633,856, Oct. 14 2003. [371] T. Richardson and V. Novichkov, “Node Processors For Use in Parity Check Decoders,” U.S. Patent 6938196, Aug. 30, 2005, Flarion Technologies. [372] ——, “Method and Apparatus for Decoding LDPC Codes,” U.S. Patent 7,133,853, Nov. 7, 2006. [373] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of Capacity-Approaching Irregular Low-Density Parity-Check Codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001.

References [374] T. J. Richardson and R. L. Urbanke, “Efficient Encoding of Low-Density Parity-Check Codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 638–656, Feb. 2001. [375] T. J. Richardson and R. Urbanke, “The Capacity of Low-Density Parity-Check Codes Under Message-Passing Decoding,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001. [376] T. J. Richardson and R. Urbanke, Iterative Coding Systems. (available online), Mar. 30, 2003. [377] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge: Cambridge University Press, 2008. [378] R. Rivest, A. Shamir, and L. Adleman, “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems,” Commun. ACM, vol. 21, no. 2, pp. 120–126, 1978. [379] P. Robertson, “Illuminating the Structure of Parallel Concatenated Recursive (TURBO) Codes,” in Proceedings of the GLOBECOM, vol. 3, San Francisco, CA, Nov. 1994, pp. 1298–1303. [380] P. Robertson, E. Villebrun, and P. Hoher, “A Comparison of Optimal and Sub-Optimal MAP Decoding Algorithms Operating in the Log Domain,” in Proceedings of the International Conference on Communication (ICC), Seattle, WA, Jun. 1995, pp. 1009–1013. [381] C. Roos, “A New Lower Bound for the Minimum Distance of a Cyclic Code,” IEEE Trans. Inf. Theory, vol. 29, no. 3, pp. 330–332, May 1983. [382] R. M. Roth and G. Ruckenstein, “Efficient Decoding of Reed-Solomon Codes Beyond Half the Minimum Distance,” IEEE Trans. Inf. Theory, vol. 46, no. 1, pp. 246–256, Jan. 2000. [383] L. Rudolph, “Easily Implemented Error-Correction Encoding-Decoding,” G.E. Report 62MCD2, General Electric Corporation, Oklahoma City, OK, Technical Report, Dec. 1962. [384] ——, “Geometric Configuration and Majority Logic Decodable Codes,” Master’s thesis, Univeristy of Oklahoma, Norman, OK, 1964. [385] ——, “A Class of Majority Logic Decodable Codes,” IEEE Trans. Inf. Theory, vol. 13, pp. 305–307, Apr. 1967. [386] W. E. Ryan and S. Lin, Channel Codes Classical and Modern. Cambridge:Cambridge University Press, 2009. [387] J. Sadowsky and J. Bucklew, “On Large Deviation Theory and Asymptotically Efficient Monte Carlo Estimation,” IEEE Trans. Inf. Theory, vol. 36, no. 3, pp. 579–588, May 1990. [388] S. Sankaranarayanan, A. Cvetkovi`c, and B. Vasi`c, “Unequal Error Protection for Joint Source-Channel Coding Schemes,” in Proceedings of the International Telemetering Conference (ITC), Las Vegas, NV, 2003, p. 1272. [389] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast List Decoders for Polar Codes,” IEEE J. Sel. Areas Commun., vol. 34, no. 2, pp. 318–328, Feb. 2016. [390] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Flexible and Low-Complexity Encoding and Decoding of Systematic Polar Codes,” IEEE Trans. Commun., vol. 64, no. 7, pp. 2732–2745, Jul. 2016. [391] ——, “Fast Polar Decoders: Algorithm and Implementation,” IEEE J. Sel. Areas Commun., vol. 32, no. 5, pp. 946–957, May 2014. [392] G. Sarkis and W. J. Gross, “Inreasing the Throughput of Polar Decoders,” IEEE Commun. Lett., vol. 32, no. 5, pp. 725–728, Apr. 2013. [393] G. Sarkis, I. Tal, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Flexible and Low-Complexity Encoding and Decoding of Systematic Polar Codes,” IEEE Trans. Commun., vol. 64, no. 7, pp. 2732–2745, Jul. 2016. [394] D. V. Sarwate and M. B. Pursley, “Crosscorrelation Properties of Pseudorandom and Related Sequences,” Proc. IEEE, vol. 68, no. 5, pp. 593–619, May 1980. [395] E. Sa¸ ¸ soˇglu, E. Telatar, and E. Ar𝚤kan, “Polarization for Arbitrary Discrete Memoryless Channels,” in Proceedings of the IEEE Information Theory Workshop, 2009, pp. 144–148. [396] J. Savage, “Sequential Decoding — The Computational Problem,” Bell Syst. Tech. J., vol. 45, pp. 149–175, Jan. 1966. [397] C. Schlegel, Trellis Coding. New York: IEEE Press, 1997. [398] A. Schrijver, Theory of Linear and Integer Programming. New York: Wiley, 1987. [399] M. Schroeder, Number Theory in Science and Communication: with Applications in Cryptography, Physics, Digital Information. Berlin:Springer-Verlag, 1986. [400] R. Sedgewick, Algorithms. Reading, MA: Addison-Wesley, 1983. [401] J. Seghers, “On the Free Distance of TURBO Codes and Related Product Codes,” Swiss Federal Institute of Technology, Zurich, Switzerland, Final Report, Diploma Project SS 1995 6613, Aug. 1995. [402] N. Seshadri and C.-E. W. Sundberg, “Multi-level Trellis Coded Modulation for the Rayleigh Fading Channel,” IEEE Trans. Commun., vol. 41, pp. 1300–1310, Sep. 1993. [403] K. Shanmugan and P. Balaban, “A Modified Monte Carlo Simulation Technique for Evaluation of Error Rate in Digital Communication Systems,” IEEE Trans. Commun., vol. 28, no. 11, pp. 1916–1924, Nov. 1980. [404] C. E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech. J., vol. 27, pp. 379–423, 1948. [405] R. Shao, M. Fossorier, and S. Lin, “Two Simple Stopping Criteria for Iterative Decoding,” in Proceedings of the IEEE Symposium on Information Theory, Cambridge, MA, Aug. 1998, p. 279. [406] R. Y. Shao, S. Lin, and M. P. Fossorier, “Two Simple Stopping Criteria for Turbo Decoding,” IEEE Trans. Commun., vol. 47, no. 8, pp. 1117–1120, Aug. 1999. [407] M. Shao and C. L. Nikias, “An ML/MMSE Estimation Approach to Blind Equalization,” in Proceedings of ICASSP, vol. 4. Adelaide: IEEE, 1994, pp. 569–572. [408] A. Shibutani, H. Suda, and F. Adachi, “Reducing Average Number of Turbo Decoding Iterations,” Electron. Lett., vol. 35, no. 9, pp. 70–71, Apr. 1999. [409] A. Shiozaki, “Decoding of Redundant Residue Polynomial Codes Using Euclid’s Algorithm,” IEEE Trans. Inf. Theory, vol. 34, no. 3, pp. 1351–1354, Sep. 1988. [410] A. Shokrollahi, “Raptor Codes,” IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2551–2567, Jun. 2006.

935

936

References [411] J. J. Shynk, Probability, Random Variables, and Random Processes. Hoboken, NJ: Wiley, 2003. [412] R. Singleton, “Maximum Distance q-ary Codes,” IEEE Trans. Inf. Theory, vol. 10, pp. 116–118, 1964. [413] B. Sklar, “A Primer on Turbo Code Concepts,” IEEE Commun. Mag., vol. 35, no. 12, pp. 94–101, Dec. 1997. [414] P. J. Smith, M. Shafi, and H. Gao, “Quick Simulation: A Review of Importance Sampling Techniques in Communications Systems,” IEEE J. Sel. Areas Commun., vol. 15, no. 4, pp. 597–613, May 1997. [415] J. Snyders, “Reduced Lists of Error Patterns for Maximum Likelihood Soft Decoding,” IEEE Trans. Inf. Theory, vol. 37, pp. 1194–1200, Jul. 1991. [416] J. Snyders and Y. Be’ery, “Maximum Likelihood Soft Decoding of Binary Block Codes and Decoders for the Golay Codes,” IEEE Trans. Inf. Theory, vol. 35, pp. 963–975, Sep. 1989. [417] H. Song and B. V. Kumar, “Low-Density Parity-Check Codes For Partial Response Channels,” IEEE Signal Process. Mag., vol. 21, no. 1, pp. 56–66, Jan. 2004. [418] S. Song, B. Zhou, S. Lin, and K. Abdel-Ghaffar, “A Unified Approach to the Construction of Binary and Nonbinary Quasi-Cylic LDPC Codes Based on Finite Fields,” IEEE Trans. Commun., vol. 54, no. 10, pp. 1765–1774, Jan. 2009. [419] R. Srinivasan, Importance Sampling: Applications in Communications and Detection. Berlin: Springer-Verlag, 2002. [420] P. Staahl, J. B. Anderson, and R. Johannesson, “Optimal and Near-Optimal Encoders for Short and Moderate-Length Tail-Biting Trellises,” IEEE Trans. Inf. Theory, vol. 45, no. 7, pp. 2562–2571, Nov. 1999. [421] H. Stichtenoth, Algebraic Function Fields and Codes. Berlin: Springer-Verlag, 1993. [422] G. Stüber, Principles of Mobile Communication, 2nd ed. Boston: Kluwer Academic Press, 2001. [423] M. Sudan, “Decoding of Reed-Solomon Codes Beyond the Error-Correction Bound,” J. Complex., vol. 13, pp. 180–193, 1997. [424] Y. Sugiyama, M. Kasahara, S. Hirasawa, and T. Namekawa, “A Method for Solving Key Equation for Goppa Codes,” Inf. Control, vol. 27, pp. 87–99, 1975. [425] C.-E. W. Sundberg and N. Seshadri, “Coded Modulation for Fading Channels: An Overview,” Eur. Trans. Telecommun. Relat.Technol., vol. 4, no. 3, pp. 309–324, May 1993, special issue on Application of Coded Modulation Techniques. [426] A. Swindlehurst and G. Leus, “Blind and Semi-Blind Equalization for Generalized Space-Time Block Codes,” IEEE Trans. Signal Process., vol. 50, no. 10, pp. 2489–2498, Oct. 2002. [427] G. Szegö, Orthogonal Polynomials, 3rd ed. Providence, RI: American Mathematical Society, 1967. [428] D. Taipale and M. Pursley, “An Improvement to Generalized-Minimum-Distance Decoding,” IEEE Trans. Inf. Theory, vol. 37, pp. 167–172, Jan. 1991. [429] O. Takeshita and J. D.J. Costello, “New Classes of Algebraic Interleavers for Turbo-Codes,” in International Conference on Information Theory (Abstracts), Cambridge, MA, Aug. 1998. [430] ——, “New Deterministic Interleaver Designs for Turbo Codes,” IEEE Trans. Inf. Theory, vol. 46, pp. 1988–2006, Sep. 2000. [431] I. Tal and A. Vardy, “List Decoding of Polar Codes,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, May 2015. [432] R. Tanner, “A Recursive Approach To Low Complexity Codes,” IEEE Trans. Inf. Theory, vol. 27, no. 5, pp. 533–547, Sep. 1981. [433] R. M. Tanner, “On Quasi-Cyclic Repeat-Accumulate Codes,” in Proceedingsof the 37th Allerton Conference on Communication, Control, and Computing, Monticello, IL, 1999. [434] V. Tarokh, H. Jararkhami, and A. Calderbank, “Space-Time Block Codes from Orthogonal Designs,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp. 1456–1467, Jul. 1999. [435] V. Tarokh, N. Seshadri, and A. Calderbank, “Space-Time Codes for High Data Rate Wireless Communication: Performance Criterion and Code Construction,” IEEE Trans. Inf. Theory, vol. 44, no. 2, pp. 744–765, Mar. 1998. [436] E. Telatar, “Capacity of Multi-Antenna Gaussian Channels,” AT&T Bell Labs Internal Technical Memorandum, Jun. 1995. [437] ——, “The Flesh of Polar Codes,” ISIT Plenary Talk, https://youtu.be/VhyoZSB9g0w, Jun. 29, 2017. [438] S. ten Brink, “Iterative Decoding Trajectories of Parallel Concatenated Codes,” in Third IEEE ITG Conference on Source and Channel Coding, Munich, Germany, Jan. 2000. [439] ——, “Rate One-Half Code for Approaching the Shannon Limit by 0.1 dB,” Electron. Lett., vol. 36, pp. 1293–1294, Jul. 2000. [440] ——, “Convergence Behavior of Iteratively Decoded Parallel Concatenated Codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1727–1737, Oct. 2001. [441] J. Thorpe, “Low Density Parity Check (LDPC) Codes Constructed from Protographs,” JPL INP Progress Report 42-154, Aug. 15, 2003. [442] T. Tian, C. R. Jones, J. D. Villasenor, and R. D. Wesel, “Selective Avoidance of Cycles in Irregular LDPC Code Construction,” IEEE Trans. Commun., vol. 52, no. 8, pp. 1242–1247, Aug. 2004. [443] T. Tian, J. Villasenor, and R. Wesel, “Construction of Irregular LDPC codes with Low Error Floors,” in Proceedingsof the IEEE International Conference on Communication, Anchorage, AK, May 2003, pp. 3125–3129. [444] A. Tietäväinen, “On the Nonexistence of Perfect Codes Over Finite Fields,” SIAM J. Appl. Math., vol. 24, pp. 88–96, 1973. [445] O. Tirkkonen and A. Hottinen, “Square-Matrix Embeddable Space-Time Block Codes for Communication: Performance Criterion and Code Construction,” IEEE Trans. Inf. Theory, vol. 44, no. 2, pp. 744–765, Mar. 2002. [446] M. Tomlinson, “New Automatic Equalizer Employing Modulo Arithmetic,” Electron. Lett., vol. 7, pp. 138–139, Mar. 1971. [447] M. Trott, S. Benedetto, R. Garello, and M. Mondin, “Rotational Invariance of Trellis Codes Part I: Encoders and Precoders,” IEEE Trans. Inf. Theory, vol. 42, pp. 751–765, May 1996. [448] T.-K. Truong, J. H. Jeng, and K.-C. Hung, “Inversionless Decoding of Both Errors and Erasures of Reed-Solomon Code,” IEEE Trans. Commun., vol. 46, no. 8, pp. 973–976, Aug. 1998. [449] M. Tsfasman and S. Vlˇadu¸t, Algebraic-Geometric Codes. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1991.

References [450] M. Tsfasman, S. Vlˇadu¸t, and T. Zink, “On Goppa Codes Which Are Better than the Varshamov-Gilbert Bound,” Math. Nachr., vol. 109, pp. 21–28, 1982. [451] M. Tüchler, R. Koetter, and A. C. Singer, “Turbo Equalization: Principles and New Results,” IEEE Trans. Commun., vol. 50, no. 5, pp. 754–766, May 2002. [452] G. Ungerboeck, “Channel Coding with Multilevel/Phase Signals,” IEEE Trans. Inf. Theory, vol. 28, no. 1, pp. 55–67, Jan. 1982. [453] ——, “Trellis-Coded Modulation with Redundant Signal Sets. Part I: Introduction,” IEEE Commun. Mag., vol. 25, no. 2, pp. 5–11, Feb. 1987. [454] ——, “Trellis-Coded Modulation with Redundant Signal Sets. Part II: State of the Art,” IEEE Commun. Mag., vol. 25, no. 2, pp. 12–21, Feb. 1987. [455] G. Ungerböck and I. Csajka, “On Improving Data-Link Performance by Increasing Channel Alphabet and Introducing Sequence Coding,” in IEEE International Symposium on Information Theory, Ronneby, Sweden, Jun. 1976, p. 53. [456] A. Valembois and M. Fossorier, “An Improved Method to Compute Lists of Binary Vectors that Optimize a Given Weight Function with Application to Soft Decision Decoding,” IEEE Commun. Lett., vol. 5, pp. 456–458, Nov. 2001. [457] G. van der Geer and H. van Lint, Introduction to Coding Theory and Algebraic Geometry. Basel: Birkhäuser, 1988. [458] J. van Lint, Introduction to Coding Theory, 2nd ed. Berlin: Springer-Verlag, 1992. [459] H. Vangala, “Polar codes in Matlab,” http://www.harishvangala.com or http://www.ecse.monash.edu.au/staff/eviterbo/polarcodes.html. [460] H. Vangala, Y. Hong, and E. Viterbo, “Efficient Algorithms for Systematic Polar Encoding,” IEEE Commun. Lett., vol. 20, no. 1, pp. 17–19, 2016. [461] ——, “Practical Introduction to Polar Codes: Part 4 The Decoding,” https://youtu.be/s4lzKbDxJpo, Feb. 23, 2016. [462] H. Vangala, E. Viterbo, and Y. Hong, “A Comparative Study of Polar Code Constructions for the AWGN Channel,” CoRR, vol. abs/1501.02473, 2015. [Online]. http://arxiv.org/abs/1501.02473 [463] B. Vasic, “Structured Iteratively Decodable Codes Based on Steiner Systems and their Application in Magnetic Recording,” in Global Telecommunications Conference (GLOBECOM). San Antonio, TX: IEEE, Nov. 2001, pp. 2954–2960. [464] B. Vasic, E. Kurtas, and A. Kuznetsov, “LDPC Codes Based on Mutually Orthogonal Latin Rectangles and their Application in Perpendicular Magnetic Recording,” IEEE Trans. Mag., vol. 38, no. 5, pp. 2346–2348, Sep. 2002. [465] B. Vasic and O. Milenkovic, “Combinatorial Constructions of Low-Density Parity-Check Codes for Iterative Decoding,” IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1156–1176, Jun. 2004. [466] S. Verdu, “Optimum Multi-User Signal Detection,” Ph.D. dissertation, University of Illinois, 1984. [467] A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. Info. Theory, vol. 13, pp. 260–269, Apr. 1967. [468] ——, “Convolutional Codes and Their Performance in Communication Systems,” IEEE Trans. Commun. Technol., vol. 19, no. 5, pp. 75–772, Oct. 1971. [469] A. J. Viterbi, “Approaching the Shannon Limit: Theorist’s Dream and Practitioner’s Challenge,” in Proceedings of the International Conference on Millimeter Wave and Far Infrared Science and Technology, Beijing, 1996, pp. 1–11. [470] ——, “An Intuitive Justification and Simplified Implementation of the MAP Decoder for Convolutional Codes,” IEEE J. Spec Areas Commun., vol. 2, no. 16, pp. 260–264, Feb. 1997. [471] A. J. Viterbi and J. Omura, Principles of Digital Communication and Coding. New York: McGraw-Hill, 1979. [472] J. von zur Gathen and J. Gerhard, Modern Computer Algebra. Cambridge: Cambridge University Press, 1999. [473] T. Wadayama, K. Nakamura, M. Yagita, Y. Funahashi, S. Usami, and I. Takumi, “Gradient Descent Bit Flipping Algorithms for Decoding LDPC Codes,” IEEE Trans. Commun., vol. 58, no. 6, pp. 1610–1614, Jun. 2010. [474] N. Wang and R. Srinivasan, “On Importance Sampling for Iteratively Decoded Linear Block Codes,” in Proceedings of the IEEE International Conference on Communications, Circuits, and Systems, Hong Kong, 2005, pp. 18–23. [475] L. Wei, “Rotationally Invariant Convolutional Channel Coding with Expanded Signal Space I: 180 Degrees,” IEEE J. Sel. Areas Commun., vol. 2, pp. 659–672, 1984. [476] ——, “Rotationally Invariant Convolutional Channel Coding with Expanded Signal Space II: Nonlinear Codes,” IEEE J. Sel. Areas Commun., vol. 2, pp. 672–686, 1984. [477] ——, “Trellis–Coded Modulation with Multidimensional Constellations,” IEEE Trans. Inf. Theory, vol. 33, no. 4, pp. 483–501, Jul. 1987. [478] ——, “Rotationally Invariant Trellis-Coded Modulation with Multidimensional M-PSK,” IEEE J. Sel. Areas Commun., vol. 7, no. 9, pp. 1281–1295, 1989. [479] L.-F. Wei, “Coded M-DPSK with Built-in Time Diversity for Fading Channels,” IEEE Trans. Inf. Theory, vol. 39, pp. 1820–1839, Nov. 1993. [480] L. R. Welch and E. R. Berlekamp, “Error Correction for Algebraic Block Codes,” U.S. Patent Number 4,633,470, Dec. 30, 1986. [481] N. Wiberg, “Codes and Decoding on General Graphs,” Ph.D. dissertation, Linköping University, 1996. [482] N. Wiberg, H.-A. Loeliger, and R. Kötter, “Codes and Iterative Decoding on General Graphs,” Eur. Trans. Telecommun., vol. 6, pp. 513–525, 1995. [483] S. B. Wicker, Error Control Systems for Digital Communications and Storage. Englewood Cliffs, NJ: Prentice-Hall, 1995. [484] S. Wicker and V. Bhargava, Reed Solomon Codes and their Applications. New York: IEEE Press, 1994. [485] S. B. Wicker and V. K. Bhargava, Reed-Solomon Codes and their Applications. IEEE Press, 1994. [486] R. J. Wilson, Introduction to Graph Theory. Essex, England: Longman Scientific, 1985.

937

938

References [487] S. Wilson and Y. Leung, “Trellis Coded Phase Modulation on Rayleigh Fading Channels,” in Proceedings of the IEEE ICC, Montreal, QC, Jun. 1997. [488] J. K. Wolf, “Efficient Maximum Likelihood Decoding of Linear Block Codes Using a Trellis,” IEEE Trans. Inf. Theory, vol. 24, no. 1, pp. 76–80, Jan. 1978. [489] J. Wozencraft and B. Reiffen, Sequential Decoding. Cambridge, MA: MIT Press, 1961. [490] X.-W. Wu and P. Siegel, “Efficient Root-Finding Algorithm with Application to List Decoding of Algebraic-Geometric Codes,” IEEE Trans. Inf. Theory, vol. 47, pp. 2579–2587, Sep. 2001. [491] Y. Wu, B. D. Woerner, and W. J. Ebel, “A Simple Stopping Criterion for Turbo Decoding,” IEEE Commun. Lett., vol. 4, no. 8, pp. 258–260, Aug. 2000. [492] B. Xia and W. Ryan, “On Importance Sampling for Linear Block Codes,” in Proceedings of the IEEE International Conference on Communications, vol. 4, Anchorage, AK, May 2003, pp. 2904–2908. [493] B. Xia and W. E. Ryan, “Importance Sampling for Tanner Trees,” IEEE Trans. Inf. Theory, vol. 51, no. 6, pp. 2183–2189, Jun. 2005. [494] H. Xiao and A. Banihashemi, “Improved Progressive-Edge-Growth (PEG) Construction of Irregular LDPC Codes,” IEEE Commun. Lett., vol. 8, no. 12, pp. 715–717, Dec. 2004. [495] J. Xu, L. Chen, L.-Q. Zeng, L. Han, and S. Lin, “Construction of Low-Density Parity-Check Codes by Superposition,” IEEE Trans. Commun., vol. 53, no. 2, pp. 243–251, Feb. 2005. [496] J. Xu and S. Lin, “A Combinatoric Superposition Method for Constructive Low-Density Parity-Check Codes,” in Proceedingsof the International Symposium on Information Theory, vol. 30, Yokohama, June–July 2003. [497] J. S. Yedidia, Y. Wang, and S. C. Draper, “Divide and Concur and Difference-Map BP Decoders for LDPC Codes,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 786–802, Feb. 2011. [498] R. W. Yeung, A First Course in Information Theory. New York: Kluwer Academic, 2002. [499] B. Yuan and K. K. Parhi, “Successive Cancellation List Polar Decoding Using Log-Likelihood Ratios,” arXiv 14117282, 2014. [500] E. Zehavi and J. Wolf, “On the Performance Evaluation of Trellis Codes,” IEEE Trans. Inf. Theory, vol. 32, pp. 196–202, Mar. 1987. [501] F. Zhai and I. J. Fair, “New Error Detection Techniques and Stopping Criteria for Turbo Decoding,” in IEEE Canadian Conference on Electronics and Computer Engineering (CCECE), Halifax, NS, 2000, pp. 58–60. [502] W. Zhang, “Finite-State Machines in Communications,” Ph.D. dissertation, University of South Australia, 1995. [503] Y. Zhang, “Design of Low-Floor Quasi-Cyclic IRA Codes and the FPGA Decoders,” Ph.D. dissertation, University of Arizona, May 2007. [504] Z. Zhang, L. Dolecek, B. Nikolic, V. Anantharam, and M. Wainwright, “Investigation of Error Floors of Structured Low-Density Parity Check Codes by Hardware Emulation,” in Proceedings of the IEEE Globecom, San Francisco, CA, 2006. [505] J. Zhang and M. Fossorier, “A Modified Weighted Bit-Flipping Decoding of Low-Density Parity-Check Codes,” IEEE Commun. Lett., vol. 8, pp. 165–167, Mar. 2004. [506] Y. Zhang and W. Ryan, “Structured IRA Codes: Performance, Analysis, and Construction,” IEEE Trans. Commun., vol. 55, no. 5, pp. 837–844, May 2007. [507] ——, “Toward Low LDPC Floors: A Case Study,” IEEE Trans. Commun., vol. 57, no. 5, pp. 1566–1573, May 2009. [508] Y. Zhang, W. Ryan, and Y. Li, “Structured eIRA codes,” in Proceedings of the 38th Asilomar Conference on Signals, Systems, and Computing, Pacific Grove, CA, Nov. 2004, pp. 7–10. [509] R. E. Ziemer and R. L. Peterson, Digital Communications and Spread Spectrum Systems. New York: Macmillan, 1985. [510] N. Zierler, “Linear Recurring Sequences,” J. Soc. Ind. Appl. Math., vol. 7, pp. 31–48, 1959. [511] K. S. Zigangirov, “Some Sequential Decoding Procedures,” Prob. Pederachi Inf., vol. 2, pp. 13–25, 1966. [512] V. Zyablov and M. Pinsker, “Estimation of the Error-Correction Complexity of Gallager Low-Density Codes,” Prob. Pered. Inf., vol. 11, pp. 23–26, Jan. 1975.

Index A

A1-2.txt, 711 A1-3.txt, 711 Abelian, 72 absorbing set, 775 accumulate-repeat-accumulate codes, 774 add-compare-select (ACS), 483 additive white Gaussian noise channel (AWGNC), 9 adjacent, 460 Agall.m, 640 Agall.tx, 640 Alamouti code, 909 𝛼 (BCJR algorithm), 598 𝛼t (forward probability, MAP algorithm), 599 𝛼-multiplied row constraint, 731 𝛼 q (𝛿), 408 alternant code, 283 aminstartdecode, 662 annihilator, 165 approximate cycle extrinsic message degree (ACE), 748 Aq (n,d), 408 arithmetic coding, 7 arithmetic–geometric inequality, 337 Asmall.tx, 710 associative, 71 asymptotic coding gain, 112 convolutional code, 505 TCM code, 551 asymptotic equipartition property (AEP), 49 asymptotic rate, 408 augmented block code, 116 automorphism, 90

B ⊞, 644 B-DMC, 782 backward pass BCJR algorithm, 601 Barker code, 177 base field, 204 basic transfer function matrix, 465 basis of vector space, 86 Battacharyya parameter, 826 Bayes’ rule, 17 BCH bound, 244, 245 BCH code, 241 decoder programming, 289 design, 241 design distance, 244 narrow sense, 241 primitive, 241 weight distribution, 246 BCHdec.cc, 289 BCHdec.h, 289 bchdesigner, 248 bchweight.m, 247 BCJR algorithm, 471, 596 matrix formulation, 605 BCJR.cc, 631 BCJR.h, 631

Bellman’s principle, 476 Berlekamp–Massey algorithm, 261 programming, 287 Bernoulli random process, 23 best known code, 116 𝛽t (backward probability, MAP algorithm), 599 𝛽 (BCJR algorithm), 599 Bhattacharya bound, 502 bijective, 79 binary additive white Gaussian noise channel (BAWGNC), 48 binary erasure channel, 541 binary detection, 18 binary operation, 71 binary phase-shift keying (BPSK), 10 binary symmetric channel (BSC), 23 BinConv.h, 531 BinConvdec01.h, 533 BinConvdecBPSK.cc, 538 BinConvdecBPSK.h, 538 BinConvFIR.cc, 536 BinConvFIR.h, 536 BinConvIIR.cc, 536 BinConvIIR.h, 536 BinLFSR.cc, 169 BinLFSR.h, 169 binomial theorem, 208 BinPolyDiv.cc, 170 BinPolyDiv.h, 170 bipartite, 641 bipartite graph, 459, 641 bit error rate, 108, 492 bitflip1decode(), 670 block code definition, 93 trellis representation, 37, 533 block turbo coding, 626 Bluestein chirp algorithm, 296 Boolean functions, 377 bound comparison of, 408 Elias, 420 Gilbert–Varshamov, 120, 410 Griesmer, 412 Hamming, 99, 407 linear programming, 414 McEliece–Rodemich–Rumsey–Welch, 419 Plotkin, 120, 411 Singleton, 98, 407 Varshamov–Gilbert, 248 bounded distance decoding, 29, 103, 110, 326, 452 bpskprob.m, 21 bpskprobplot.m, 21 branch metric, 475 bsc, 000 bucket algorithm, 514 burst error, 425 detection, cyclic codes, 157

C canonical homomorphism, 82 capacity, 9 BSC, 67 Cartesian product, 73 catastrophic code, 464, 540 Cauchy–Schwartz inequality, 412 causal codeword, 357 cawgnc.m, 54 cawgnc2.m, 49 Cayley–Hamilton theorem, 177 cbawgnc.m, 54 cbawgnc2.m, 49 central limit theorem, 904 channel reliability, 19, 611 channel capacity, 46 BSC, 46 of MIMO channel, 919 channel coding theorem, 9, 48 LDPC codes and, 637 turbo codes and, 591 channel polarization, 782, 828 channel reliability, 648 character (of a group), 415 characteristic polynomial, 177 Chase decoding algorithms, 444 Chauhan, Ojas, 543 checksum, 64 Chernoff bound, 502, 542 chernoff1.m, 502 𝜒 p (x), 374 𝜒 (character), 415 𝜒 2 distribution, 908, 922 chi-squared (𝜒 2 ) random variable, 922 Chien search, 255 Chiensearch.cc, 289 Chiensearch.h, 289 Chinese remainder theorem, 196 chirp algorithm, 296 chk function, 644 Christoffel–Darboux formula, 419 circuits, see realizations circulant matrix, 725 circulant permutation matrix (CPM), 730 code Alamouti, 909 alternant, 284 BCH, 241 best known, 116 convolutional, 455 CRC, 155 dual, 96 Fire, 432 generalized RS, 283 Golay, 399 Goppa, 285 Hadamard, 376 Hamming, 33, 61 IRA, 721 Justeson, 297 LDPC, 637 maximal-length, 107

Error Correction Coding: Mathematical Methods and Algorithms, Second Edition. Todd K. Moon © 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc. Companion website: www.wiley.com/go/Moon/ErrorCorrectionCoding

940

Index code (contd.) MDS, 252 paritycheck, 116 polar, 779 quadratic residue, 397 Reed–Muller, 378 Reed–Solomon, 249 repeat accumulate (RA), 717 repetition, 27 self-dual, 118, 400 simplex, 107, 376 space-time block, 909 space-time trellis, 917 TCM, 545 turbo, 593 coding gain, 35 asymptotic, 112 column distance function, 524 column space, 86 commutative, 72 compact disc (CD), 40, 428 companion matrix, 177 compare-select-add (CSA), 483 complete decoder, 103 comptut.pdf, xxii computekm.m, 336 computeLbar.m, 358 computeLm.cc, 336 computeLm.m, 336, 358 computetm.m, 358 concatenated codes, 431 concentration principle, 639 concentration theorem, 742 concodequant.m, 487 conditional entropy, 41 congruence, 192 conjugacy class, 217 conjugate elements, 216 conjugate of field element, 217 connection polynomial, 140 consistent random variables, 623, 696 constellation expansion factor, 550 constituent codes, 867 constituent encoder, 593 constraint length, 467 continued fraction, 233 Convdec.cc, 533 Convdec.h, 533 convolutional code, 455 equivalent, 463 feedforward/feedback encoder, 457 Markov property, 598 tables of codes, 506 correction distance, of GS decoder, 336 correlation, 170 correlation discrepancy, 451 coset, 76 of standard array, 102 coset code, 806 coset leader weight distribution, 110 cross entropy, 41 stopping criteria, 612 crtgamma.m, 197 crtgammapoly.m, 197 cryptography, 8 McEliece public key, 286 RSA public key, 193 cyclic code, 36

burst detection, 157 definition, 123 encoding, 143 cyclic group, 75 cyclic redundancy check (CRC) code, 155 byte oriented algorithm, 157 cyclic shift, 123 cyclic shift register adder accumulator (CSRAA), 726 cyclomin, 223 cyclotomic coset, 223

D D-transform, 137, 455 data processing inequality, 43 dB scale, 21 DBV-S2, 637 DC1decode(), 674 decibel, 21 decode(), 649 decoder failure, 29, 103, 256 decoding depth, 483 densev1.m, 698 densevtest.m, 698 density evolution, 696, 742 irregular codes, 704 derivative formal, 269, 296 Hasse, 332, 333 derivative, formal, 224 design distance, 244 design rate, 638 detected bit error rate, 108 detection binary, 18 dfree , 496 difference sets, 742, 775 differential encoder, 566 differential Viterbi algorithm, 483 digraph, 460 Dijkstra’s algorithm, 474 dimension of linear code, 93 of vector space, 86 direct product, 73 direct sum, 393 matrix, 394 directed graph, 460 discrete Fourier transform (DFT), 200, 277, 278 discrete memoryless channel, 781 displacement, 370 distance distribution, 414 distance spectrum, 556, 618 distributive law, 83, 84, 124 diversity, 901, 903 delay, 917 frequency, 903 spatial, 903 time, 903 diversity order, 909 space-time code, 912 divide and concur, 674 divides, 78, 182 divisible, 182 division algorithm, 183 polynomial, 124 DMBPdecode(), 682

DMC, 781

doexitchart.m, 700 dotrajectory.m, 700 double-adjacent-error pattern, 175 double error pattern, 176 D(P||Q), 41 dual code, 96 cyclic generator, 175 dual space, 88

E Eb , 11 Ec , 25 edge, 459 eigenmessage, 715 eight-to-fourteen code, 428 Eisenstein integer, 573 elementary symmetric functions, 258 Elias bound, 420 encryption, RSA, 195 entropy, 5, 39 differential, 47 function, binary, 5, 421 function, q-ary, 408 equivalence relation, 77 equivalent block codes, 95 convolutional code, 463 erase.mag, 276 erasure decoding, 113 binary, 114 Reed–Solomon codes, 274 Welch–Berlekamp decoding, 326 error-evaluator polynomial, 269 error floor, 592, 695, 710, 750 error locator polynomial, 253, 255 error locators, 254 error rate, 108 error trapping decoder, 434 Euclidean algorithm, 184, 370 continued fractions and, 233 extended, 188 to find error locator, 272 LFSR and, 189 matrix formulation, 232, 233 Padé approximations and, 234 properties, 233 Reed–Solomon decoding, 272 Euclidean distance, 13 Euclidean domain, 187 Euclidean function, 187 Euclidean geometry over GF(q), 736 Euler 𝜙 function, 193, 234 Euler integration formula, 369 Euler’s theorem, 194 evaluation homomorphism, 198 exact sequence, 317 EXIT chart, 622 LDPC code, 700 exit1.m, 700 exit2.m, 700 exit3.m, 700 expurgated block code, 116 extended code, 114 Reed–Solomon, 283 extension field, 203 extrinsic information, 610 extrinsic probability, 591, 607

Index

941 F

factor group, 80 factoring xn – 1 , 134 factorization step, GS decoder, 328, 334 factorization theorem, 335 fadepbplot.m, 904 fadeplot.m, 903 fading channel, 901 flat, 903 quasistatic, 904 Rayleigh, 904 family (set), 459 Fano algorithm, 510, 516 Fano metric, 510, 513 fanoalg.m, 516 fanomet.m, 514 fast Hadamard transform, 383 feedback encoder, 457 feedforward encoder, 457 Feng–Tzeng algorithm, 341 fengtzeng.m, 344 Fermat’s little theorem, 194 fht.cc, 384 fht.m, 384 field, 82, 200 finite, 200 finddfree, 506 finite field, 200 finite geometry, 736 finite impulse response, 138 Fire code, 432 flow graph simplification, 495 formal derivative, 238, 269, 296 formal series, 495 Forney’s algorithm, 268 forward-backward algorithm, 601 forward/backward table, 657 forward pass, BCJR algorithm, 601 free distance, 496 free Euclidean distance, 550 free module, 308 frequency domain decoding, 282 frequency domain vector, 279 freshman exponentiation, 208 fromcrt.m, 197 fromcrtpolym., 197 frozen bit, 793, 806 fundamental parallelotope, 572 fundamental theorem of algebra, 203 fundamental volume, 572

generator of cyclic group, 75 matrix, 94 lattice, 570 polynomial, 131 of principal ideal, 129 genie-aided receiver, 785 genrm.cc, 378 genstdarray.c, 101 getinf.m, 700 getinfs.m, 700 GF(q ), 204 GF(16) table, 205 GF2.h, 230 gf257ex.m, 732 GFNUM2m.cc, 230 GFNUM2m.h, 230 Gilbert–Varshamov bound, 120, 410 girth, 641 of graph, 460, 714 Golay code, 399 algebraic decoder, 401 arithmetic decoder, 402 golayrith.m, 403 golaysimp.m, 401 good codes, 639 Goppa code, 285 Gröbner basis, 370

grad1bitflipdecode(), 673 grad2bitflipdecode(), 673 Gram matrix, 572 graph bipartite, 459, 641 definitions, 459 simple, 459 Tanner, 641 tree, 460 trellis, 459 Gray code, 23 order, 13 greatest common divisor (GCD), 183 Green machine, 385 Griesmer bound, 412 ground field, 208 group cyclic, 75 definition, 71 Guruswami–Sudan algorithm, 326

H G

galAbitflipdecode(), 671 Galois biography, 204 Galois field, example, 204 Galois field Fourier transform (GFFT), 277 𝛾 (BCJR algorithm), 601 𝛾t (transition probability), 599 gap, xxii Gaussian integers, 237 gaussj2, 95 gaussj2.m, 638 gcd.c, 177 gcdpoly.cc, 229 generalized minimum distance (GMD) decoding, 370, 441 generalized Reed–Solomon code, 283 generating function, 224

H(X) (entropy), 5, 39 H2 (x), 5 h2.m, 49 Hölder’s inequality, 878 Hadamard matrix, 371 transform, 105, 371 transform, decoding, 381 transform, fast, 383 Hadamard code, 376 hadex.m, 384 hamcode74pe.m, 35 Hamming bound, 99, 407 asymptotic, 423 Hamming code, 33, 61, 106 decoder, 150 dual, 106, 107

encoder circuit, 144 extended, 378 message passing decoder, 695 Tanner graph, 38 Hamming distance, 23 Hamming sphere, 28, 98, 407 Hamming weight, 93 Hammsphere, 98 hard-decision decoding convolution code, 486 Hartman–Tzeng bound, 246, 370 Hasse derivative, 332, 333 theorem, 332 hill climbing, 707 historical milestones, 39 homomorphism, 81 Horner’s rule, 290 Huffman code, 7

I I(X, Y), 42 i.i.d., 23, 49 ideal of ring definition, 128 principal, 129 identity, 71 IEEE wireless standard, 910 importance sampling, 68 increased distance factor, 550 induced operation, 78 inequality arithmetic–geometric, 337 Cauchy–Schwartz, 412 information, 67 information inequality, 67 information theory, 39 inequality, 42 injective, 79 inner product, 12, 87 input redundancy weight enumerating function (IRWEF), 618 interference suppression, 923 interleaver, 425 block, 425 convolutional, 427 cross, 427 turbo code, 593 interpolating polynomial, 198 interpolation, 198 step, GS decoder, 327, 334 interpolation theorem, 334 intersection of kernels, 345 invariant factor decomposition, 466 inverse, 71 invmodp.m, 344 involution, 845 IRA codes, 774 irreducible, 167 irreducible polynomial, 203, 204, 214 number of, 224 irregular repeat-accumulate-accumulate (IRAA), 722 irregular repeat accumulate code, 887 ISBN, 4 isomorphism, 79 ring, 128

942

Index J Jacobian logarithm, 616 Jacobsthal matrix, 375 jakes.m, 903 Jakes method, 903 Justeson codes, 297

K kernel, 310, 315, 317 of homomorphism, 82 key equation, 269, 272 Welch–Berlekamp, 303 Kirkman triple, 742, 775 Klein 4-group, 73, 89 Kötter algorithm, 345 kotter.cc, 349 kotter1.cc, 353 krawtchouk.m, 415 Krawtchouk polynomial, 415, 417 properties, 423 Kronecker construction of Reed–Muller, 392 product properties, 372 product theorem, 372 Kullback–Leibler distance, 41, 612, 623

L Lagrange interpolation, 199, 235–236, 305, 309, 368 Lagrange’s theorem, 77–78, 194, 209 latency, 52, 113, 154, 426, 483, 695, 715, 916 latin rectangle, 742, 775 latta2.m, 576 lattice, 81, 90, 405, 569–578, 580, 582, 584, 586–587 code, 574–575 lattstuff.m, 570 lattz2m, 576 Laurent series, 455, 460 Lbarex.m, 358 LDPC codes, 637 combinatoric constructions, 742, 775 concentration principle, 639 concentration theorem, 742 decode threshold, 698 definition, 638–639 density evolution, 696–699 difference set, 742, 775 eigenmessage, 715 error floors, 750–751 EXIT chart, 700 fast encoding, 708 irregular, 702–708 iterative hard decoder, 715 Kirkman triple, 742, 775 latin rectangle, 742, 775 MET, 749–750 protograph, 749–750 Steiner designs, 742, 775 sum product decoding, 715 trapping sets, 750–751 LDPC code ensembles, 742–746 LDPC convolutional code, 723–725 LDPC decoding Approximate min*, 663

attenuated and offset decoding, 661–662 Binary erasure channel, 691–692 difference map belief propagation, 681–682 divid and concur, 674–681 Gallager’s algorithm A and B, 671–672 hard decison (bit flipping) decoding, 669–674 linear programming decoding, 682–691 Log-likelihood belief propagation, 643–644 Min-Sum with correction, 662–669 peeling algorithm, 691 peeling decoder, 691 probability belief propagation, 652–659 reduced complexity box-plus (RCBP), 665–667 Richardson–Novichkov decoder, 667–669 sum product decoding, 656 ldpc.m, 656 LDPCC Code, 717, 723, 775 ldpccodesetup1.m, 732 ldpcdec.cc, 695, 711 ldpcdec.h, 711 ldpcdecode.m, 656, 711 ldpcdecoder.cc, 642, 649, 655, 660, 661, 662, 663, 665, 673, 674, 682 ldpcencode.m, 732, 892 ldpclogdecode.m, 656 ldpcsim.mat, 700 ldpctest.cc, 711 ldpctest1, 695, 711, 748 ldpctest1.cc, 695, 711 ldpctest2.cc, 642 leading coefficient, 305, 331, 348 leading term, 129, 329, 348, 353 leading monomial, 331, 346, 353 least common multiple (LCM), 167, 232, 234, 241–243, 297, 432–434 least-squares, 100, 630, 715 leave-one-out computation, 645–646, 649, 657, 664, 682, 720 left inverse, 172 Legendre symbol, 373–375, 403 Lempel–Ziv coding, 7 lengthened block code, 116 lexicographic order, 329 LFSR, 9, 161–171, 176–178, 189–190, 212, 216, 229–230, 234, 239, 261–262, 264–266, 282, 287–288, 296–297, 300, 342, 533 for extension field, 207 likelihood function, 16, 18, 23–25, 31, 359, 365, 472, 474, 630, 673, 786, 807, 812 ratio, 18–19, 525, 596, 609–611, 622, 627, 632, 644, 647, 653, 712, 788, 807–810, 866 lim sup, 408, 410 linear code

definition, 93 dimension, 93 generator matrix, 94–96 rate, 93 linear combination, 85 linear feedback shift register see LFSR linear function, 237 linear programming, 419, 424, 682–691, 715 bound, 413–418 linearly dependent, 86, 96–98, 219, 245, 252, 308, 341–345, 638, 728–729 list decoder, 30, 299, 326–327, 357, 850–851, 854, 856–861, 865–867 lldecode(), 649 location vector, 730, 733–736 log likelihood, 472, 475, 488, 541, 596, 611, 615, 630, 633, 645–647, 649–650, 652, 696, 698, 810–811, 815, 825, 866, 869 ratio, 19, 31, 440, 451, 528, 609–610, 612, 622, 629, 641–644, 647, 656, 676–677, 699–701, 767, 788–791, 807, 824, 866, 883 log-likelihood function, 472–474, 511, 793, 807, 812, 906 lower complexity, 791 loghist.m, 700 low density generator matrix codes, 710–712 low density parity check see LDPC code lpboundex.m, 418 LPdecode(), 682 luck, 834

M M algorithm, 471, 524, 543 MacWilliams identity(ies), 105–106, 118, 121, 246, 247 magma, xxii majority logic decoding, 385, 388, 405, 452 makeB.m, 557 makegenfromA.m, 638 MakeLFSR, 169 MAP algorithm, 596 detection, 17 Markov property, convolutional codes, 598 Markov source, 596 martingale, 830–832 Mason’s rule, 497–498 masseymodM.m, 265 matched filter, 9–10, 15, 23, 60–61, 439–440, 447, 547, 893, 907–908, 910 matrix, 906 matrix dispersion, 730–736, 774–775 Mattson–Solomon polynomial, 295 max-log-MAP algorithm, 614–616, 635 maximal-length sequence, 162, 167 shift register, 239 maximal-length code, 107, 175 maximal ratio combiner, 907–908 maximum a posteriori detection see MAP

Index maximum distance separable (MDS) see MDS code maximum likelihood decoder, 29, 326–327 detection, 17 maximum-likelihood sequence estimator, 471 vector, 906 McEliece public key, 8, 286–287, 298 MDS code, 98, 249, 252–253, 283–284, 293, 298, 772 extended RS, 283 generalized Reed–Solomon, 283 weight distribution, 253 Meggitt decoder, 149–153, 173–174, 178, 405 memoryless channel, 23–25, 44, 51, 359, 362, 450, 472, 499, 526, 596, 647, 781, 790, 872 Mersenne prime, 239 MET ensembles, 745–746 LDPC codes, 749–750 metric quantization, 486–487 milestones, historical, 39, 40 MIMO channel, 895, 905–909, 923 narrowband, 907 mindist.m, 33 minimal basic convolutional encoder, 468 minimal basic encoder, 468–470, 539 minimal polynomial, xx, 216–223, 236, 238, 241–243, 245, 250, 279–280, 289, 296–297, 887, 889 minimum distance, 22, 27–28, 30, 32–33, 93, 97–99, 106–107, 116–118, 120–121, 123, 153, 174–175, 179, 232, 244–245, 249, 252, 281, 283, 285, 287, 293, 295, 297, 370, 376, 378–379, 441–444, 545, 638–639, 750, 762, 772 Minkowski’s inequality, 881 minsumcorrectdecode(), 659 minsumdecode(), 661 ML detection see maximum likelihood detection MLFSR code, 107 ModArnew.h, 228 module definition, 308 free, 308 modwtedbitflipdecode(), 673 Moebius function, 227–228 moment generating function, 225, 542 monic, 129–131, 183, 186, 218–220, 225–227 monomial ordering, 329, 331 multi-edge type see MET multiplicative order, 218–219 multiplicity matrix, 359–362 multiplicity of zero, 331 mutual information, 41–48, 57, 67, 622–625, 699–700, 779, 782–786, 804, 832, 839, 848, 876

943 N narrow sense BCH, 241, 243, 247–248, 258, 290 narrowband MIMO channel, 907 natural homomorphism, 82 nchoosektest.m, 35 Newton identities, 259, 291–293, 298 (n,k,dmin ) code, 93, 112 nilpotent, 173 node error, 492, 494, 499–502, 541–542, 555–556, 560 normalization alpha and beta, 602–603 probability, 600 nullspace, 96, 305, 727–728, 730–732, 741 Nyquist sampling theorem, 53

O one-to-one, 12, 73, 79, 81, 93, 305, 783, 829 O notation, 409 onto, 9, 14, 73, 79, 82, 90, 365, 547, 674–675, 679, 829 ord, 209–210, 213, 215, 217–218, 220, 238, 331–333 order multiplicative, 218 of a field element, 209 of a finite group, 72 of a group element, 76 of zero, 331 ordered statistic decoding, 447–449 orthogonal, 12, 87, 90, 97, 371, 373, 390, 400, 546, 571, 683, 764–765, 910, 923 on a bit, 386–389 orthogonal complement, 88, 117 orthogonal design complex, 916 generalized complex, 916–917 generalized real, 915–916 real, 913 orthogonal polynomial, 419–420, 423–424 orthogonal matrix, 371, 402, 571, 913–914, 922 orthonormal, 12, 53, 546 output transition matrix, 557–558, 585

P P(E), 108 Padé approximation, 234, 239 Paley construction, 373–376 parallel concatenated code, 591, 593–595, 627 parity, 95, 131–133, 686–715, 717–775 overall, 115 parity check code, 116 equations, 97 matrix, 33, 96–99 polynomial, 133 probability, 714 symbols, 95 partial syndrome, 533 partition, 17, 52, 77, 78, 100, 217, 346, 385, 394–395, 442, 547, 553–554, 557, 561, 565, 575, 580, 582, 585, 598, 724, 734, 894

partition chain, 574, 575 path algorithm, shortest, 474 enumerator, 494 in graph, 460 merging, 475 metric, 474, 475 survivor, 476 Pb , 108 Pb (E), 108 Pd (E), 108 Pdb , 108 peak-to-average power ratio, 570 peeling algorithm, 691–693, 772 peeling decoder, 691, 693 perfect code, 99, 103, 121, 399 permutation, 46, 73–76, 89, 95, 201, 286, 373, 399, 447–448, 457, 470, 572, 592, 639, 708, 710, 722, 730–736, 742–743, 749–750, 775, 782, 795–797, 813, 815, 840, 843–844, 850, 869, 872–873, 913 permuter, 591–595, 607, 612, 617–623, 626, 629, 631, 634, 717–718, 774, 796, 871, 898 turbo code, 593, 618 perp, 88 Peterson’s algorithm, 253, 258–261, 266 𝜙 function, 193, 194, 234–235, 660, 667 phidecode(), 660 phifun.m, 54 philog.m, 54 pi2m1, 363 pivottableau.m, 414 plotbds.m, 408 plotcapcmp.m, 49 plotcbawn2.m, 54 plotconprob.m, 504 Plotkin bound, 120, 408, 411–412, 424 asymptotic, 421 polar code, 779 coset code, 806 design, 837–839 frozen bits, 793, 806 high speed decoding, 871 list decoding, 850–865 log-likelihood ratios, 789–791 rate of polarization, 835 relation to Reed–Muller code, 871 simplified decoding, 867–870 successive cancellation decoder, 787–788 synthetic channel, 783–786 systematic encoding, 839–850 tree representation, 792 polarization theorem, 829, 830–835 polyadd.m, 126 polyaddm.m, 126 polydiv.m, 126 polymult.m, 126 polymultm.m, 126 polynomial irreducible, 203, 204 Krawtchouk, 415 minimal, 218 orthogonal, 423 primitive, 215

944

Index polynomial (contd.) polynomial division, 124, 141–143, 147, 157–158, 161–162, 165–170, 187 circuits, 138–141 polynomial encoder, 465–467, 471, 536, 597 polynomial multiplication, 36, 123, 126, 128, 132, 136–138, 190, 200, 202–203, 224, 401 circuits, 137 polynomial ring, xix, 125, 129 polynomialT.cc, 229 polynomialT.h, 229 polysub.m, 126 polysubm.m, 126 power sum symmetric functions, 258, 292 primfind, 216 primfind.c, 163, 216 primitive.txt, 163 primitive BCH code, 241, 246, 248 primitive element, 209–213, 215, 219, 222–223, 236, 238, 241–242, 249–250, 290, 293, 297, 397, 736 primitive polynomial, xx, 163, 175, 177–178, 214–216, 219–220, 231, 236, 238, 242, 250, 271, 275–276, 433, 634 table of, 216 principal character, 415–416 principal ideal, 129, 466 probability of bit error, 13, 21, 35, 54, 61–62, 108, 112, 492, 500, 504–505, 561, 612, 616, 621, 634, 732, 751, 768, 838, 904 probability of decoder error, 30, 32, 108, 112, 494 probability of decoder failure, 108, 111 probability of undetected codeword error, 108 probdecode(), 655 product code, 429–431, 436–437, 626 progdet.m, 105 progdetH15.m, 105 protograph LDPC codes, 749 pseudonoise, 162, 170 pseudorandom sequence, 9 Psi.m, 698 psifunc.m, 697 Psiinv.m, 698 Pu (E), 108 Pub , 108 puncture block code, 115 convolutional code, 506–509 matrix, 508 Reed–Solomon, 283–285

Q Q function, 19 bounds, 65, 503–505 QAM, 366, 488, 545–547, 554, 561–564, 571, 577, 585, 586, 892–894, 909, 922 QAM constellation energy requirements, 546–547 QC codes, 717, 725, 774

qphidecode(), 660 quadratic residue, 373, 374, 376, 397–399, 401, 403, 405 code, 397 quadrature-amplitude modulation (QAM) see QAM quantization of metric, 486–488, 665 quasi-cyclic codes, 124, 725–730 quotient ring, 126–128, 367

R R-linear combination, 12, 14, 33, 37, 85, 86, 88, 90, 93–95, 97, 118, 129–131, 186, 208, 237, 239, 308, 316, 341, 342, 345, 347, 377–379, 382, 412, 469, 570, 734, 917 random code, 49, 51, 52, 591, 639, 694, 839 random error correcting capability, 29, 103 codes, 425 rank criterion, 912, 915, 918, 922 rank of polynomial, 331 rate, 27 asymptotic, 408 of convolutional code, 455 of linear code, 93 rate compatible punctured codes, 509, 542 rate-distortion theory, 7, 54 rational encoder, 465, 466–467, 535 rational interpolation, 307, 309, 314, 315, 368 Rayleigh density, 904–905, 922 Rayleigh fading, 904 channel, 904 RC constraint, 642, 731, 732, 741, 775 RCBP, 665, 668 rcbpdecode(), 665 real orthogonal design, 913, 915, 916, 922 realizations controller form, 141–143 division, 140 first-element first, 141–143 multiplication first-element first, 138 last-element first, 138–141 multiplication and division, first-element first, 141 observability form, 141, 457–458 polynomial multiplication, 137–138 reciprocal polynomial, 165, 174, 177, 236, 741 recursive systematic convolutional (RSC), 591, 593 reduced complexity box-plus decoder, 665–667, 715 reducefree.m, 414 reducible solution, 309, 311, 312 redundancy, 7, 8, 10, 86, 99, 119, 131, 155, 299, 407, 410, 420, 545, 552, 569, 618, 638, 708 Reed–Muller code, 371, 376–393, 396, 405, 533, 573, 871 Reed–Solomon code, 249 burst correction, 431 decoder workload, 299 generalized, 283–285

Guruswami–Sudan Decoder, 326–366 pipelined decoder algorithm, 315 programming, 290–293 soft output decoder, 358–360 weight distribution, 253 reedsolwt.m, 253 reflexive property, 77 register exchange, 484, 485 relative distance, 248, 407, 408 relative entropy, 41–43, 612 relatively prime, 73, 76, 90, 183, 184, 188, 193–199, 201, 209, 225, 226, 235, 239, 280, 309, 354, 368, 432–434 reliability, 52, 361–363, 439–444, 447–448, 471, 528–531, 596, 616, 643, 648, 650, 660, 663–664, 672 matrix, 359, 362, 365–366 reliability class, 441, 442 remainder decoding, 299 repcodeprob.m, 31 repcodes.m, 32 repeat-accumulate (RA) code, 595, 637, 717–723, 774, 887 repetition code, 27–33, 35, 65, 66, 99, 117, 120, 379, 404, 646, 712, 718 residue class, 80 restorefree.m, 414 restriction to GF(q), 284 reversible code, 174, 295 right inverse in a ring, 172 of a matrix, 464 ring characteristic, 125 definition, 124–125 polynomial, 125–126 quotient, 126–128 rmdecex.m, 383 rmdecex2.m, 388 Roos bound, 246, 370 root of unity, 200, 222, 223, 241, 243, 245, 277, 295, 397, 398, 399, 401, 416, 417, 449 rotational invariance, 562–568, 570, 579, 584, 587 TCM codes, 562 Roth–Ruckenstein algorithm, 353, 354, 356, 357, 364, 365, 370 rothruck.cc, 356 rothruck.h, 356 row-column constraint, 642 row space, 33, 86, 88, 89, 840 RSA encryption, 193, 195, 235 RSdec.cc, 291 RSdec.h, 291 rsdecode.cc, 291 RSenc.cc, 290 RSenc.h, 290 rsencode.cc, 291 runlength-limited codes, 9

S scaling, 191, 332, 399, 513, 574, 597, 600, 661, 662, 711, 775, 854 self-dual code, 96, 118, 400, 402, 404, 405 separation theorem, 8, 54

Index sequential decoding, 510, 523, 524, 543 serially concatenated code, 595, 629 set partitioning, 549, 553, 554 Shannon sampling theorem, 53 shape gain, 576–578 shortened block code, 116, 153 showchktable.m, 667 signal constellation, 10–11, 13–16, 18, 21–22, 60–62, 365–366, 439, 471, 503, 545–550, 552–553–554, 560–561, 563–565, 567–570, 577, 580, 582–586, 597, 650, 895, 905, 909, 913–914, 918 signal energy, 13, 14, 22, 503, 545, 547, 569, 577, 586 signal shape, 570 signal space, 11–15, 18, 22, 23, 60, 68, 439, 471, 552, 762, 893, 901 signal-to-noise ratio (SNR), 10, 21, 60, 61, 504, 505, 525, 538, 539, 611, 662, 701, 702, 907–909, 912 simplex code, 107, 118, 376, 423 simplex1.m, 414 Singleton bound, 98, 249, 252, 293, 407, 408, 424 Sn , 74 SNR (signal to noise ratio), 10, 21, 60, 61, 504, 505, 525, 538, 611, 662, 701, 907–909, 912 soft-decision decoding, 10, 31–32, 35, 37, 66, 112, 326, 327, 358–366, 439–452, 455, 486, 504, 505, 533, 534, 596 BCJR algorithm, 596 convolution code, 486 performance, 112 soft input, 31–32 soft output Viterbi algorithm, 616 turbo code, 591, 596 soft-input, hard-output (SIHO), 439 soft-input, soft-output (SISO), 439 decoding, turbo code, 596 sort, 93, 167, 224, 251, 283, 448, 514, 609, 677, 787, 838, 882 for soft detection, 440 source/channel coding, 8 source code, 7, 291 source coding theorem, 6, 7 SOVA, 471, 525, 528–532, 543, 596, 616, 635 space-time code block, 909–917 trellis, 917–919 spanning set, 85–87 sparseHno4.m, 708 sparse matrix, 638, 639, 709, 710, 740 spectral efficiency, 54, 545, 547, 569, 570 TCM advantage, 570 spectral thinning, 618–621, 634 spectrum, 9, 162, 279–282, 474, 556, 558, 559, 618, 621, 634, 903 sphere packing, 569, 587 splitting field, 211 spread-spectrum, 9, 474 squaring construction, 393, 394, 395 SSC, 865, 867, 871 stackalg.m, 514

945 stack algorithm, 471, 510, 514–516, 523, 542, 543 stack bucket, 514–516 standard 802.11, 910 ISBN, 4 V.32, V.33, 564–565 V.34, 569, 578–584 standard array, 100–104 Steiner designs, 742, 775 Stirling formula, 409, 421, 422, 578 stopping sets, 692–694, 715 subcode, 115, 116, 153, 284, 295, 393–396, 404 subfield, 204, 208, 211, 213, 214, 217, 220, 243, 279, 284, 285, 295 subcode, 284–285, 295 subgroup, 71, 74–82, 89, 126–127, 209, 393, 395, 574, 733–736, 806 proper, 74 sublattice, 570, 573–574, 580, 582, 583 subspace of vector space, 87, 88, 93, 96, 102 successive cancellation decoder, 824, 838, 787–793 Sugiyama algorithm, 189–192, 229, 239, 272, 287, 290, 294 sum-product algorithm LDPC codes, 655–656 supermartingale, 830, 831, 832 support set, 249, 333, 336, 350, 352, 361 surjective, 79 survivor path, 476, 483, 484, 529, 532, 615 Sylvester construction, 372, 373, 376, 383 symmetric channel, 9, 23–24, 44–45, 46, 171, 425, 502, 542, 696, 782, 826, 874 symmetric group, 74 symmetric property, 9, 23–24, 44–46, 74, 77, 177, 193, 258, 291, 292, 425, 442, 502, 542, 660, 666, 696, 759, 768–770, 771, 782, 826, 848, 849, 874, 912, 913 synchronization, 9–10, 60, 162, 178, 562, 893, 895 of convolutional decoders, 488 syndrome, 99 BCH, RS, 253–256 decoding, 103 syndrome polynomial, 133, 150, 156, 269, 270, 273, 274 systematic convolutional code, 456, 471–488 definition, 95 encoding, cyclic codes, 134–136

T T algorithm, 524 tail biting, 543, 922 tanh rule, 642, 644, 646, 696, 697, 712, 766, 769, 791, 810 Tanner graph, xxi, 38, 67–68, 637–638, 641–642, 644, 646, 652, 686, 691–692, 695–696, 698–700, 703, 712, 714–715, 718, 721–722, 741–743, 746–750, 771–772, 775 TCM multidimensional, 568–578

Ungerboeck framework, 553

tcmrot2.cc, 565 tcmt1.cc, 557 TCP/IP protocol, 113, 426

testBCH.cc, 289 testbcjr.cc, 631 testBinLFSR.cc, 169 testBinPolyDiv.cc, 170 testBM.cc, 288 testChien.cc, 289 testconvdec.cc, 538 testconvenc, 536 testcrp.m, 197 testcrt.m, 197 testfht.cc, 384 testft.m, 344 testgcdpoly.cc, 229 testgfnum.cc, 230 testGolay.cc, 401, 402 testGS1.cc, 349 testGS2.cc, 356 testGS3.cc, 350 testGS5.cc, 353 testmodarnew1.cc, 228 testpoly1.cc, 229 testpxy.cc, 328 testQR.cc, 398 testrepcode.cc, 31 testRS.cc, 291 testRSerasure.cc, 275 teststack.m, 514 testturbodec2.cc, 631 threshtab.m, 698 time domain vector, 279, 281

tocrt.m, 197 tocrtpoly.m, 197 Toeplitz matrix, 132, 133, 189, 257 total order, 329 totient function, 193, 210, 736 trace (of a field element), 168, 173, 175, 237, 372, 484–486, 518, 586, 700, 702, 775, 834 traceback, 484–486 transfer function bound, 560 matrix, 456 of graph, 494 transitive property, 77, 848 trapping sets, 717, 750–751, 775 trellis, 38, 459, 483–484, 494–498, 509, 545–587, 630, 893, 898, 901, 917–919 for block code, 37, 533–538 trellis coded modulation see TCM triangle inequality, Hamming distance and, 65 truncation error, 483 turbo code, 430, 591, 592, 616, 617, 618, 626, 631, 634, 657, 708, 897–898 block coding, 626 decoding, 608 error floor, 616–617 EXIT chart, 622–626 extrinsic information, 610 hard decision aided termination, 614 interleaver, 593 likelihood ratio decoding, 609–612

946

Index turbo code (contd.) parallel concatenated code, 591 primitive polynomials and, 634 sign change ratio, 614 stopping criteria, 612–614 terminal state, 609 turbo equalization, 9, 592, 628–631, 901 turboprocessing, 449 typical set, 49–51, 839

U UDP protocol, 113, 426 ugliness, 832, 833 undetected bit error rate, 108 unequal error protection, 490 Ungerboeck coding framework, 553 unimodular matrix, 466, 467, 539, 571 union bound, 21–23, 112, 494, 499, 555, 560, 767, 836, 875 block code performance and, 112 convolutional code performance, 499–501 TCM performance, 555–556 unit of a ring, 125 universal codes, 4, 510, 837, 895 up to isomorphism, 89, 206 UPC, 4, 64 upgraded channel, 847 utiltkm.cc, 440 utiltkm.h, 440

V V.32 standard, 564, 584 V.33 standard, 564, 565, 584 V.34 standard, 569, 578–584 valuation, 187, 188, 196, 198–200, 235, 237, 256, 290, 309, 351, 385, 398, 700, 752 Vandermonde matrix, 244, 246, 269, 285, 733 variable-rate error control, 508–509 vector space definition, 27, 71, 82, 84–88, 90, 91, 93, 94, 96, 202, 208, 219, 284, 308, 377, 386, 734, 737 vertex, 389, 414, 459, 460, 497, 498, 683 Viterbi algorithm, 471, 474 hard decisions, 596 soft metric, 486–487 soft output, 616 voln.m, 570 V2 (n,t), 65, 98

BCH code, 246–248 RS code, 252–253 weighted bit flip decode(), 673 weighted code, 4, 64 weighted degree, 328–331, 335, 336, 338, 348, 349, 351, 359–361 order, 329 weight enumerator, 104–109, 118, 119, 174, 225, 247, 253, 500, 538–541, 543 weight profile, 446, 452 Welch–Berlekamp algorithm, 299–306, 309, 310, 312, 314, 315, 323, 326 well defined, 79, 80, 90 802 wireless standard, 637, 910 Wilson’s theorem, 231 Wolf trellis, 37, 67, 533 writesparse.m, 640

Y y-root, 335, 353–355

W waterfall region, 592 weight, 104–106, 246–248, 252–253, 616–622, 638, 731–732 weight distribution, 104–106, 107, 109, 110, 111, 120, 174, 246–248, 252–253, 380, 400, 402, 414, 616–622, 634

Z Zech logarithm, 206, 211–212, 236 zero divisor, 173, 201, 203, 207, 208, 236 zero, multiplicity of, 332, 334, 359 zero state forcing sequence, 471–472, 593, 596–597 ZJ algorithm, 471, 510, 514

WILEY END USER LICENSE AGREEMENT Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.