Analog and Digital Communications (Schaum’s Outlines) [2 ed.]

1,380 136 4MB

English Pages 338

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Analog and Digital Communications (Schaum’s Outlines) [2 ed.]

Table of contents :
SIGNALS AND SPECTRA......Page 8
Solved Problems......Page 13
Supplementary Problems......Page 28
SIGNAL TRANSMISSION AND FILTERING......Page 31
Solved Problems......Page 36
Supplementary Problems......Page 46
AMPLITUDE MODULATION......Page 50
DOUBLE-SIDEBAND MODULATION......Page 51
Solved Problems......Page 60
Supplementary Problems......Page 73
ANGLE MODULATION......Page 75

Citation preview

ANALOG AND DIGITAL COMMUNICATIONS Second Edition

333 Fully-solved problems New chapters on Signal and Spectra 4IIA[

Updated information on information and channel capacity 130 supplementary problems

Use with these courses:

@Communications Theory and Practice ~ Computer Engineering @ Principles of Communication Engineering ~ Communications Engineering @Electrical Engineering

Theory and Problems of

ANALOG AND DIGITAL MMUNICATIONS Second Edition HWEI P. HSU, Ph.D.

Schaum 's Outline Series McGRAW-HILL New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto

The McGraw·Hill Companies

-' •

Hwei P. Hsu received his B.S. from National Taiwan University and M.S. and Ph.D. from Case Institute of Technology. He has published several books which include Schaum's Outline of Signals and Systems and Schaum's Outline of Probability, Random Variables, & Random Processes.

Schaum's Outline of Theory and Problems of ANALOG AND DIGITAL COMMUNICATIONS Copyright © 2003, 1993 by The McGraw-Hill Companies, Inc. All rights reserved. Printed in the United States of America. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the publisher. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 VFM VFM O 9 8 7 6 5 4

ISBN 0-07-140228-4

The objective of this book, like its previous edition, is to provide an introduction to the basic principles in analog and digital communications for advanced undergraduates in electrical or computer engineering. The assumed background is circuit analysis, and prior exposure to signals and systems course is helpful. The book can be used as a self-contained textbook or for self-study. Each topic is introduced in a chapter with numerous solved problems. The solved problems constitute an integral part of the text. In this edition, there are two chapters dealing with basic tools. These tools are then applied to next three chapters in analog communication systems, including sampling and digital transmission of analog signal. Probability, random variables, and random processes are introduced in Chapters 6 and 7 and applied to the chapters that follow. Effect of noise in analog communication systems is covered in Chapter 8. Chapter 9 treats the effect of noise in digital communication and the subject of optimum reception. Information theory and source coding is treated in Chapter 10. A distinctive feature of this edition is the new chapter, Chapter 11, on error control coding, including linear block codes, cyclic codes, and convolutional codes.

HwE1 P. Hsu Monteville, NJ

lll

CHAPTER 1

CHAPTER 2

Signals and Spectra

1

1.1 1.2 1.3 1.4 1.5

1 1 2 3 5

Signal Transmission and Filtering 2.1 2.2 2.3 2.4 2.5 2.6

CHAPTER 3

Introduction Impulse Response and Frequency Response Filter Characteristics of LTI Systems Transmission of Signals Through LTI Systems Filters Quadrature Filters and Hilbert Transforms

Amplitude Modulation 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8

CHAPTER 4

Introduction Fourier Series and Discrete Spectra Fourier Transforms and Continuous Spectra Properties of Fourier Transform Fourier Transforms of Power Signals

Introduction Amplitude Modulation Double-Sideband Modulation Ordinary Amplitude Modulation Single-Sideband Modulation Vestigial-Sideband Modulation Frequency Translation and Mixing Frequency-Division Multiplexing

24 24 26 26 27 29

43 43 43 44 45 47 49 52 52

68

Angle Modulation 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9

24

Introduction Angle Modulation and Instantaneous Frequency Phase and Frequency Modulation Fourier Spectra of Angle-Modulated Signals Narrowband Angle Modulation Sinusoidal (or Tone) Modulation Bandwidth of Angle-Modulated Signals Generation of Angle-Modulated Signals Demodulation of Angle-Modulated Signals iv

68 68 69 70 70 71 71 72 74

CONTENTS

CHAPTER 5

CHAPTER 6

Digital Transmission of Analog Signals

90

5.1 Introduction 5.2 Pulse Code Modulation 5.3 Sampling Theorem 5.4 Sampling 5.5 Pulse Amplitude Modulation 5.6 Quantizing 5.7 Encoding 5.8 Bandwidth Requirements of PCM 5.9 Delta Modulation 5.10 Signaling Format 5.11 Time-Division Multiplexing 5.12 Bandwidth Requirements for TDM 5.13 Pulse Shaping and Intersymbol Interference 5.14 Digital Carrier Modulation Systems

90 90 91 91 93 93 96 96 97 99 100 101 101 104

Probability and Random Variables 6.1 6.2 6.3 6.4 6.5 6.6 6. 7

CHAPTER 7

Introduction Definitions and Notations of Random Processes Statistics of Random Processes Correlations and Power Spectral Densities Transmission of Random Processes through Linear Systems Special Classes of Random Processes

Noise in Analog Communication Systems 8.1 8.2 8.3 8.4 8.5

CHAPTER 9

Introduction Probability Random Variables Two-Dimensional Random Variables Functions of Random Variables Statistical Averages Special Distributions

Random Processes 7.1 7.2 7.3 7.4 7.5 7.6

CHAPTER 8

V

Introduction Additive Noise and Signal-to-Noise Ratio Noise in Baseband Communication Systems Noise in Amplitude Modulation Systems Noise in Angle Modulation Systems

Optimum Detection 9.1 Introduction 9.2 Binary Signal Detection and Hypothesis Testing

128 128 128 131 133 135 137 138

165 165 165 165 169 171 172

202 202 202 203 204 208

226 226 226

vi

CONTENTS

9.3 Probability of Error and Maximum Likelihood Detector 9 .4 Optimum Detection 9.5 Error Probability Performance of Binary Transmission Systems

CHAPTER 10

Information Theory and Source Coding 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8

CHAPTER 11

Introduction Measure of Information Discrete Memoryless Channels Mutual Information Channel Capacity Additive White Gaussian Noise Channel Source Coding Entropy Coding

Error Control Coding 11.1 Introduction 11.2 Channel Coding 11. 3 Block Codes 11.4 Linear Block Codes 11. 5 Cyclic Codes 11.6 Convolutional Codes 11.7 Decoding of Convolutional Codes

227 229 231

245 245 245 247 250 251 252 253 255

282 282 282 283 283 286 290 295

APPENDIX A

Fourier Transform

321

APPENDIX B

Bessel Functions Jn (13)

323

APPENDIX C

The Complementary Error Function Q(z)

325

INDEX

327

GNALS AND SPECTRA is chapter we review the basics of signals in the frequency domain. The frequency domain n is called the spectrum. The spectral analysis of signals using Fourier series and Fourier s is one of the fundamental methods of communication engineering. ER SERIES AND DISCRETE SPECTRA Exponential Fourier Series: ) be a periodic signal with fundamental period T0 . Then we define the complex exponential ies of x(t) as x(t) =

L

n=-oo

Cn ejnwot

wo

2n/T0

(1.1) (1.2)

c,,

To /2, we have Cn

= J_fTo/2 To

-T0 /2

x(t)e-Jnwot dt

(1.3)

s c,, are called the Fourier coefficients of x(t). These are. in general, complex numbers ressed as c,, = lc,, le10" (1.4) amplitude and 0n is the phase angle of c,, .

versus the angular frequency w = 2nf is called the amplitude spectrum of the periodic lot of 0n versus w is called the phase spectrum of x(t). These are referred to as frequency . Since the index n assumes only integers, the frequency spectra of a periodic signal exist 11

2

SIGNALS AND SPECTRA

[CHAP. 1

only at the discrete frequencies nw 0 . These are therefore referred to as discrete frequency spectra or line spectra. If the periodic signal x(t) is a real function of time, then (1.5)

This means that, for a real periodic signal, the positive and negative coefficients are conjugate, that is, (1.6)

Hence, the amplitude spectrum is an even function of wand the phase spectrum is an odd function of w. C.

Power Content of a Periodic Signal and Parseval's Theorem:

The power content of a periodic signal x(t) is defined as the mean square value over a period: P

=-

1

f

To

To/2

lx(t)l 2 dt

U.7)

-T0 /2

Parseval's theorem for the Fourier series states that if x(t) is a periodic signal with period, T 0 , then 1 To

1.3

J

To/2 -To/2 lx(t)l2dt

co

= nioo lc,,12

(1.8)

FOURIER TRANSFORMS AND CONTINUOUS SPECTRA

To generalize the Fourier series representation (I.I) to a representation valid for nonperiodic signals in frequency domain, we introduce the Fourier transform. A.

Definition:

Let x(t) be a nonperiodic signal. Then the Fourier transform of x(t), symbolized by ff, is defined by X(w)

J: =- f

= ff[x(t)] =

x(t)e-Jwtdt

(1.9)

00

The inverse Fourier transform of X(w), symbolized by g-- 1 , is defined by 00

x(t)

= ff- 1 [X(w)]

1

27r

X(w)e 1.w1dw

(1.10)

-co

Equations (1.9) and (1 .JO) are often called the Fourier transform pair denoted by x(t)

B.

+-+

X(w)

Frequency Spectra:

In general, the Fourier transform X(w) is a complex function of angular frequency w, so that we may express it in the form X(w)

=

IX(w)le 10 I) results in its spectral expansion and that time expansion of the signal (a < I) results in its spectral compression.

5.

Time-Reversal: x(-t)

+-+

X(-w)

(1.21)

SIGNALS AND SPECTRA

4

6.

[CHAP. 1

Duality: X(t)

+-+

2nx(-w)

(1.22)

Differentiation:

7.

Time differentiation: I

x (t)

=

d

dtx(t)

.

+-+ 1wX(w)

(1.23)

Equation (1.23) shows that the effect of differentiation in the time domain is the multiplication of X(w) by jw in the frequency domain. Frequency differentiation: •

(-1t)x(t)

8.

I

+-+

X (w)

=

d

dw X(w)

(1.24)

Integration:

If X(0) = 0, then t

f

x(1:) dr:

+-+

-co

1 -;-X(w) JW

(1.25)

Equation (1.25) shows that the effect of integration in the time domain is the division of X( w) by jw in the frequency domain, assuming that X(0) = 0. Note that by definition (1.9) X(0)

=

f:

x(t) dt

(1.26)

00

The more general case pertaining to X(0) =/- 0 is considered in Sec. 1.5 [Eq. (1.42)]. 9.

Convolution:

The convolution of two signals x 1(t) and xi(t), denoted by x 1(t) * xi(t), is a new signal x(t) defined by (1.27)

Then

(1.28)

Equation (1.28) is referred to as the time convolution theorem, and it states that the convolution in the time domain becomes multiplication in the frequency domain (Prob. 1.18). Note that the operation of convolution is commutative, that is,

IO.

Multiplication: (1.29)

Equation (1.29) is often referred to as the frequency convolution theorem. Thus, the multiplication in the time domain becomes convolution in the frequency domain.

CHAP. 1)

1.5

SIGNALS AND SPECTRA

5

FOURIER TRANSFORMS OF POWER SIGNALS

To find the Fourier transform of a periodic signal or a power signal, we need to introduce the unit impulse function. A.

Impulse Function:

The unit impulse function, also known as the Dirac delta function, b(t), is not an ordinary function and is defined in terms of the following process: (1.30)

where cp(t) is any regular function continuous at t = 0. Equation (1.30) and all subsequent expressions will also apply to the frequency-domain impulse b(w) by replacing t by w. Note that Eq. (1.30) is a symbolic expression and should not be considered an ordinary Riemann integral. In this sense, b(t) is often called a generalized function and ¢(t) is known as a testing function. A different class of testing function will define a different generalized function. Similarly, the delayed delta function b(t - t 0 ) is defined by

f:

=

c/J(t)b(t - t 0 ) dt

c/J(to)

(1.31)

(1.32)

00

where cp(t) is any regular function continuous at t Some additional properties of b(t) are

=

t0•

x(t)b(t - to)

=

x(to)b(t - to)

if x(t) is continuous at t

=

t0•

= x(0)b(t)

x(t)b(t)

if x(t) is continuous at t

(1.33)

= 0. b(at)

1

= Taf b(t) b(-t)

(1.34)

a -=I- 0

= b(t)

(1.35)

x(t) * b(t - to)

= x(t -

x(t) * b(t)

= x(t)

(1.36)

to)

(1.37)

Note that Eq. (1.35) can be obtained by setting a= -1 in Eq. (1.34). Equations (1.33) and (1.37) are the special cases of Eqs. (1.32) and (1 .36), respectively, for t 0 = 0. An alternative definition of b(t) is provided by the following two conditions: 12

f

b(t- t 0 ) dt

=

1

t,

< to < t2

(1.3S)

/1

b(t - t 0 )

=0

(1.39)

t -=I- t 0

Conditions (1.38) and (1.39) correspond to the intuitive notion of a unit impulse as the limit of a suitably chosen conventional function having unity area in an infinitely small width. For convenience, b(t) is shown schematically in Fig. 1-l(a). B.

Fourier Transforms of b(t) and a Constant Signal:

Using Eq. (1.9) and Eq. (1.30), the Fourier transform of b(t) is given by ff [b(t)]

=

f:

00

b(t)e-Jwt dt

=

1

6

SIGNALS AND SPECTRA

[CHAP. 1

We thus have the Fourier transform pair for J(t) : J(t) ..... 1

(1.40)

This relation states that the spectrum of J(t) extends uniformly over the entire frequency interval, as shown in Fig. 1-l(b). x(t)

X(w)

,5(/)

0

0

(a)

(b)

w

Unit impulse function and its spectrum

Fig. 1-1

By applying the duality property [Eq. (1.22)] to Eq. (1.40) and noting that the delta function is an even function [Eq. (1.35)], we obtain 1 +-+ 2nb(w)

(1.41)

Equation (1.41) states that the spectrum of a constant signal (or de signal) [Fig. l-2(a)] is a delta function 2nO

e-lalt

(] .54)

Signal x(t) can be rewritten as ( )_ x

X(w)

Then

=

t -

Io

eate-jwt

IO

e

0

t > t < 0

-

{ eat

dt

+ Ioo e-ate-jwt dt 0

-00

=

-at

-altl _ e

e

X(w)

We

sin at x(t) * - - = x(t) nt

Hence,

1.22.

+-+

If show that (1.70)

From the frequency convolution theorem (1.29), we have

That is,

Setting w = 0, we get

By changing the dummy variable of integration, we get

1.23.

Prove Parseval's theorem Eq. (1.16) for a real x(t). If x(t) is real, then from Eq. (1.65) of Prob. 1.17 we have X(-w)

=

X'(w)

Then, setting x 1(t) = xi(t) = x(t) in Eq. (1.70) of Prob. 1.22, we have

J

oo _

2

lx(t)l dt =

Joo_

[x(t)J2 dt =

00

00

= Jn Joo_ 2

Joo

Jn _ X(w)X(-w)dw

2

X(w)X'(w)dw

00

00

= Jn Joo_ 2

FOURIER TRANSFORMS OF POWER SIGNALS 1.24.

Verify property (1.32): x(t)ii(t- t 0 )

=

x(t0 )ii(t- t 0 )

The proof will be based on the following equivalence property:

00

2

1X(w)l dw

18

SIGNALS AND SPECTRA

[CHAP. 1

Let g 1(t) and gi(t) be generalized functions. Then the equivalence property states that g 1(t) = gi(t) if and only if (1.71)

for all test functions ¢(t). If x(t) is continuous at t = t 0 , then [,

[x(t)b(t - t 0 )]cp(t)dt

=

f

=

x(to)

0

00

=

b(t - t 0 )[x(t)cp(t)] dt

f

00

b(t - to)c/J(t) dt

x(t 0 )¢(t 0 )

f

=

[x(to)b(t - t0 )]¢(t) dt 00

for all ¢(t). Hence, by the equivalence property (1.71), we conclude that x(t)b(t - t0 )

1.25.

= x(to)b(t -

to)

Verify Eq. (1.36), that is, x(t) * b(t - t 0 )

= x(t -

t0)

According to the commutative property of convolution and the definition (1 .31) of b(t), we have x(t)

* b(t -

t 0 ) = b(t- t 0 )

* x(t) =

f~

b(r - to)x(t - r)dr

00

=

1.26.

x(t - r)IFto

Find the complex Fourier series of the unit impulse train by

bT(t)

=

x(t - lo)

shown in Fig. 1-12(a) and defined

(1.72) n=-oo

00

bT(t)

Let

=

L

2n

Cnejnwot

wo=T

n=-oo

From Eq. (1 .3), the coefficients c,, are Cn

= -1

f

T/2

T -T/2

.

bT(t)e-inwotdt

= -1

f

T/2

T -T/2

.

b(t)e-fl''°01 dt

= -1 T

Thus,

1.27.

(1.73)

Find the Fourier transform of the following signals: (a)

(a)

x(t) = eJwot

(b)

x(t) = e-:iwot

(c)

x(t) =

cos

Wot

Applying the frequency-shifting property (1.19) to Eq. (1.41), we get (1.74)

CHAP. I]

SIGNALS AND SPECTRA

... t

j"'"1

t -T

0 (a)

Fig. 1-12

(b)

I;

I I I I·.

t ...

T

19

2T

0

-wo

Wo

2w 0

w

(b)

Unit impulse train and its magnitude spectrum

From Eq. (1.74) it follows that (1.75)

(c)

From Euler's formula we have

Thus, using Eqs. (1.74) and (1.75) and the linear property (1.17), we get cos w 0 t

1.28.

+--+

nb(w - wo)

+ nb(w + w 0 )

(]. 76)

Find the Fourier transform of a periodic signal x(t) with period T. We express x(t) as 00

x(t)

=

L

cnejnwot

2n

Wo=T

n=-oo

Taking the Fourier transform of both sides, and using Eq. (J.74), we obtain 00

X(w)

L

= 2n

(1.77)

cnb(w - nwo)

n=-oo

Note that the Fourier transform of a periodic signal consists of a sequence of equidistant impulses located at the harmonic frequencies of the signal.

1.29.

Find the Fourier transform of the periodic train of unit impulses r(t - nT)

n=-oo

From Eq. (J.73) of Prob. 1.26, the complex Fourier series of by{t) is given by 2n

wo=T Using Eq. (J.77) of Prob. 1.28, 2

:F [bT(t)]

=

;

L n=-oo

or

00

00

b(w - nw 0 ) =

w0

L

b(w - nw0 )

=

WoDw 0 (w)

n=-oo

(1. 78)

20

[CHAP. I

SIGNALS AND SPECTRA

t -Tt

j"'' ,

t 2T

T

0

•I

-----

ll -wo

(a)

rT· l 0

"'o



2w 0

(&I

(b)

Fig. 1-13

Unit impulse train and its Fourier transform

Thus, the Fourier transform of a unit impulse train is also a similar impulse train [see Fig. l-13(b)].

1.30.

Find the Fourier transform of the unit step function u(t). As shown in Fig. 1-14, u(t) can be expressed as u(t)

I

I

= + sgn (t)

2 2

Note that½ is the even component of u(t) and ½sgn(t) is the odd component of u(t). Thus, ffe [u(t)]

= ½ffe [I]+ ½ffe [sgn (t)]

which, using Eqs. (1.41) and (1 .61), becomes ffe [u(t)]

=

nb(w)

I

+--:--

(1.79)

JW

I

2

sgn(t)

, J ""' 0

.

+

0

2 _ _ _ ___.0-i

I

2

Fig. 1-14

1.31.

Unit step function and its even and odd components

Show that x(t)

* u(t)

= Loo x(r)dr

and find its Fourier transform. By definition (1 .27) of the convolution, x(t)

* u(t)

=

J:

x(r)u(t - r)dr =

00

u(t - r)

since

= {1

r

r

0




Loo x(r)dr

t t

Next, by the time convolution theorem (1.28) and Eq. (I .79), we obtain ffe

[L

00

x(r)dr] = X(w>[ nb(w) +

=

j~] I

nX(w)b(w) +--:--X(w) JW

=

I

nX(O)b(w) +--:--X(w) JW

CHAP. I]

since X(w)i5(w)

1.32.

21

SIGNALS AND SPECTRA

= X(O)i5(w) by Eq. (1.33) [see Eq. (1.42)].

Use the frequency convolution theorem (1.29) to derive the modulation theorem (I .64). (See Prob. 1.15.) From Eq. (J.76), we have cos w 0 t

+-+

nb(w - w 0 )

+ nb(w + w 0 )

By the frequency convolution theorem (J .29), x(t)cos w 0 t

I X(w) * [nb(w - w0 ) 2n

+-+ -

+ nb(w + w0 )] = -I X(w 2

w0 )

+ -I X(w + w0 ) 2

The last equality follows from Eq. (J .36).

1.33.

Let x(t) be a periodic signal with fundamental period T 0 and let x!. t) be the segment of x(t) between -T0 /2 < t < To /2; that is,

-T0 /2 (1 +a)W

where W

= nT,. The corresponding impulse response is h(t)

=

~s

(5.25)

ci:t)[ 1 -c~sa:;n)2]

Note that the second term on the right-hand side of Eq. (5.25) is of the form (sin x)/x, which we encountered in the ideal low-pass filter; it retains the original zero crossings of that waveform. Plots of H(w) and h(t) are shown in Fig. 5-15 for three values ofcx. The parameter ex is called the roll-off factor. Note that the case for a = 0 coincides with the ideal low-pass filter. The case for a= 1 is known as the full-cosine roll-off characteristic; its frequency response function is

H(w)

H(w)

-2w -w

0

Fig. 5-15

=

{

~

2w

lwl,s:;;2W

(5.26)

2W

elsewhere h(t)

a=O

w

nw)

-1 ( 1 +cos-

w

Raised-cosine pulse shaping and resulting time response

From Fig. 5-15 we see that the bandwidth occupied by a raised-cosine type of transmission characteristic varies from aminimum offs= 1/(2Ts) Hz (a= 0) to a maximum offs= 1/T, Hz(a = 1).

104

DIGITAL TRANSMISSION OF ANALOG SIGNALS

[CHAP. 5

The bandwidth/pulse-rate trade-off from using the raised-cosine type of transmission characteristics depends, of course, on r:t. If the desired rate of pulse transmission is 1/T pulses per second, the bandwidthfn required is l+a fn=~

(5.27)

Hz

Alternatively, withfn specified, the allowable pulse rate is given by T

5.14

2fn 1 +a

(5.28)

DIGITAL CARRIER MODULATION SYSTEMS

Because baseband digital signals have sizable power at low frequencies, they are suitable for transmission over a pair of wires or coaxial cables. Baseband digital signals cannot be transmitted over a radio link because this would require impractically large antennas to efficiently radiate the low-frequency spectrum of the signal. Hence, for such purposes, we use analog modulation techniques in which the digital messages are used to modulate a high-frequency continuous-wave (CW) carrier. In binary modulation schemes, the modulation process corresponds to switching (or keying) the amplitude, frequency, or phase of the CW carrier between either of two values corresponding to binary symbols O and 1. The three types of digital modulation are amplitude-shift keying, frequencyshift keying, and phase-shift keying. A.

Amplitude-Shift Keying (ASK):

In ASK, the modulated signal can be expressed as Xe () t

= {Acos wet 0

symbol 1 symbol 0

(5.29)

Note that the modulated signal is still an on-off signal. Thus, ASK is also known as on-off keying (OOK).

B.

Frequency-Shift Keying (FSK):

In FSK, the modulated signal can be expressed as x.(t) '

C.

= {Acos wit Acos w 2 t

symbol 1 symbol 0

(5.30)

Phase-Shift Keying (PSK):

In PSK, the modulated signal can be expressed as Acos wet

Xe(t)

= { Acos (wet+ n)

symbol 1 symbol 0

(5.31)

Figure 5-16 illustrates these digital modulation schemes for the case in which the data bits are represented by the bipolar NRZ waveform. The performance of these modulation methods in noisy channels is discussed in Chap. 9.

CHAP. 5]

DIGITAL TRANSMISSION OF ANALOG SIGNALS

Binary code

0

I I

I I

I

I

I

I

I

I

I I

I I

I I I I I

I

I

I I

ASK

0

0

I I

NRZ

I I

~

I I I I I I

J I

:,

105

I I I I I

I I

J

I

FSK 'I I I I

I I I I I I I

PSK

I

Fig. 5-16

I I I I

I I I I I

~

~

'

I

~

~

v

Digital carrier modulation

Solved Problems SAMPLING AND SAMPLING THEOREM 5.1.

Verify Eq. (5.3), that is, 00

ms(t)

_L

= m(t)JT,(t) =

m(nTs)J(t-nTs)

n=-oo

From Eqs. (J.72) and (1.32), we have 00

L

m,(t) = m(t)bT, (!) = m(t)

00

b(t - nTs) =

n=-oo

L m(t)b(t - nTs) n=-oo 00

=

L m(nTs)b(t - nTs) n=-oo

5.2.

Verify the sampling theorem (5.2). Let m(t) be a band-limited signal defined by Eq. (5.J) [Fig.5-17(a) and (b)]. From Eq. (J.78) of Prob. 1.29, we have [Fig. 5-l 7(c) and (d)] 00

ffe[bT,(t)]

= Ws

L

b(w-nws)

n=-oo

According to the frequency convolution theorem (1.29), we have

2n

ws=-

Ts

106

DIGIT AL TRANSMISSION OF ANALOG SIGNALS

M,(w) =

1 2n: [ M(w) * Ws

s-[ m(t)bT,(t)] =

1

[CHAP. 5

ntoo b(w - nws)]

oo

L

=y

* b(w-nw,)

M(w)

s n=-oo

Using Eq. (1.36), we obtain 1

Ms(w)

=T

s

oo

L

(5.32)

M(w - nws)

n=-oo

Note that Ms(w) will repeat periodically without overlap as long as w,;,,, 2wM, or 2n/T,;,,, 2wM, that is, m(ll

M(wl

0

-WM

r!i' 1 1 1 1 -

-T,

0

2T,

T,

! '10,!l 1

1 0

-w,

--

m,(1)

-,

'

/

/ /

,,

/

,, ,,

''

''

/

-T,

0

,, ,,

'

/

-

-w,

2T,

T,

~

):~' ~ . -wM

!"."' !

l

0

-T,

T,

I -

1 2T,

I

1 -2w,

-w,

1 0

/

/

,, ,,

--

'

''

/

/

'' '

/

0

2w,

w,

2w,

•w

M,(w)

,,

'

/

-T,

w,

(h)

m,(1)

/

w

w,

!J]O,

1

fy(y)

=I 0

(6.52)

where f y(y) is the marginal pdf of Y.

6.5. A.

FUNCTIONS OF RANDOM VARIABLES Random Variable g(X):

Given a r.v. X and a function g(x), the expression Y= g(X)

(6.53)

defines a new r.v. Y. With y a given number, we denote Dy the subset of Rx (range of X) such that g(x) :;;; y. Then (Y:;;;y)

= [g(X) :;;;y] = (XE Dy)

where (XE Dy) is the event consisting of all outcomes A such that the point X(Jc) E Dy. Hence, Fy(y)

=

P(Y:;;; y)

=

P[g(X):;;; y]

=

P(X E Dy)

(6.54)

If Xis a continuous r.v. with pdf fx(x), then

Fy(y) =

f

fx(x) dx

(6.55)

Dy

Determination offy(y) from fx(x) : Let X be a continuous r.v. with pdf/x(x). If the transformation y = g(x) is one-to-one and has the inverse transformation (6.56)

then the pdf of Y is given by (Prob. 6.30) f y(y)

= fx(x) dxl dy = fx[h(y)] ldh(y)I ~

(6.57)

l

Note that if g(x) is a continuous monotonic increasing or decreasing function, then the transformation y = g(x) is one-to-one. If the transformation y = g(x) is not one-to-one,Jy(y) is obtained as follows: Denoting the real roots of y = g(x) by xk, that is y

then where g(x) is the derivative of g(x).

= g(xi) = ... = ,

f y(y)

g(xk) = ...

fx(xk)

= 'T li(xk)I

(6.58)

136

B.

PROBABILITY AND RANDOM VARIABLES

[CHAP. 6

One Function of Two Random Variables: Given two random variables X and Yanda function g(x, y), the expression

Z= g(X, Y)

(6.59)

is a new random variable. With z a given number, we denote by Dz the region of the xy plane such that g(x, y),;;; z. Then [Z,;:; z]

= {g(X,

Y) ,;:; z}

= {(X,

Y) E Dz}

where {(X, Y) E Dz} is the event consisting of all outcomes A such that the point {X(Jc), Y(Jc)} is in Dz. Hence, Fz(z)

= P(Z,;;; z) =

P{(X, Y) E Dz}

(6.60)

If X and Y are continuous r.v.s with joint pdf fXY(x, y), then Fz(z)

=ff

(6.61)

fXY(x, y) dxdy

Dz

C.

Two Functions of Two Random Variables: Given two r.v.s. X and Y and two functions g(x,y) and h(x,y), the expression

Z= g(X, Y)

W= h(X, Y)

(6.62)

defines two new r.v.s Zand W. With z and w two given numbers we denote Dzw the subset of Rxy [range of (X, Y)] such that g(x, y),;;; z and h(x, y),;;; w. Then (Z,;:; z, W,;;; w)

=

[g(x, y),;;; z, h(x, y),;;; w]

= {(X,

Y) E Dzw}

where {(X, Y) E Dzw} is the event consisting of all outcomes A such that the point {X(Jc), Y(Jc)} E Dzw· Hence, Fzw(z, w)

= P(Z,;;; z,

W,;;; w)

= P{(X,

Y) E Dzw}

(6.63)

In the continuous case we have Fzw(z, w)

ff

=

fXY(x,y) dxdy

(6.64)

Dzw

Determination of fzw(z, w) from fXY(x, y) : Let X and Y be two continuous r.v.s with joint pdf fXY(x, y). If the transformation z

= g(x, y)

= h(x, y)

(6.65)

y = r(z, w)

(6.66)

w

is one-to-one and has the inverse transformation x = q(z, w)

then the joint pdf of Z and Wis given by fzw(z, w)

where x

= q(z,

w), y

= r(z,

= fXY(x,y)IJ(x, YW 1

(6.67)

w) and

J(x, y)

=

og

og

ox oh

oy oh

ox

oy

which is the Jacobian of the transformation (6.65).

oz

oz

ox

oy

ox

oy

ow ow

(6.68)

CHAP. 6]

PROBABILITY AND RANDOM VARIABLES

137

6.6 STATISTICAL AVERAGES A.

Expectation:

The expectation (or mean) of a r.v. X, denoted by E(X) or µx, is defined by X: discrete

L.X;Px(x;) µx

The expectation of Y

= g(X) E(Y)

=

=

E(X)

;

{

(6.69) X : continuous

f ~oo xfx(x) dx

is given by

=

E[g(X)]

=

I. g(x;)p x(x;)

(discrete case)

;

{ f ~oo g(x)fx(x) dx

(6.70)

(continuous case)

The expectation of Z = g(X, Y) is given by E(Z)

=

E[g(X, Y)]

=

{

I_I_g(x;, Y)PXY(x;, YJ) ; i

(discrete case)

f~oo f~oog(x, y)fXY(x, y)dxdy

(continuous case)

(6.71)

Note that the expectation operation is linear, that is, E[X + Y]

= E[X] + E[Y]

E[cX]

(6.72)

= cE[X]

(6.73)

where c is a constant (Prob. 6.45). B.

Moment:

The nth moment of a r.v. Xis defined by I.x7Px(x;) E(X")

C.

=

{

X: discrete

; f~oo xnfx(x) dx

(6.74) X : continuous

Variance:

The variance of a r.v. X, denoted by a} or Var(X), is defined by Var (X) =a}= E[(X - µx) 2 ]

(6.75)

Thus, X: discrete (6.76) X : continuous

The positive square root of the variance, or ax, is called the standard deviation of X. The variance or standard variation is a measure of the "spread" of the values of X from its mean µx. By using Eqs. (6.72) and (6.73), the expression in Eq. (6.75) can be simplified to 2 2 2 2 E[X] ) 2 ax=E[X]-µx=E[X]-(

D.

(6.77)

Covariance and Correlation Coefficient:

The (k, n)th moment of a two-dimensional r.v. (X, Y) is defined by I.I.x/y/pXY(x;, y) mkn

=

E(xk yn)

=

{

X: discrete (6.78)

Yi x;

f~00 f~oo xk ynfXY(x, y) dxdy

X: continuous

138

PROBABILITY AND RANDOM VARIABLES

[CHAP. 6

The (1,l)th joint moment of (X, Y),

= E(X Y) E(X Y) = 0, then

(6.79)

Cov(X, Y)=aXY=E[(X-µx)(Y-µy)]

(6.80)

m 11

is called the correlation of X and Y . If we say that X and Y are orthogonal. The covariance of X and Y, denoted by Cov(X, Y) or aXY, is defined by

Expanding Eq. (6.80), we obtain Cov(X, Y)

=

E(XY) - E(X)E( Y)

(6.81)

If Cov(X, Y) = 0, then we say that X and Yare uncorrelated. From Eq. (6.81) we see that X and Y are uncorrelated if E(X Y)

=

E(X)E( Y)

(6.82)

Note that if X and Y are independent, then it can be shown that they are uncorrelated (Prob. 6.66). However, the converse is not true in general; that is, the fact that X and Y are uncorrelated does not, in general, imply that they are independent (Prob. 6.67). The correlation coefficient, denoted by p(X, Y) or p xY, is defined by p(X, Y)

=

PxY

axy

= --

(6.83)

-1:;;; PxY,;;; 1

(6.84)

ax ay

It can be shown that (Prob. 6.53) IPXYI,;;; 1

6.7

or

SPECIAL DISTRIBUTIONS

There are several distributions that arise very often in communication problems. These include binomial distribution, Poisson distribution, and normal, or gaussian distribution.

A.

Binomial Distribution: A r.v. Xis called a binomial r.v with parameters (n,p) if its pmf is given by k=0, 1, ... ,n

(6.85)

where O :;;; p :;;; 1 and

( Z)- k!(nn~ k)! which is known as the binomial coefficient. The corresponding cdf of Xis

Fx(x)

= %-a

(Z )p\l - pf-k

(6.86)

The mean and variance of the binomial r.v. X are 2

ax= np(l - p)

(6.87)

The binomial random variable Xis an integer-valued discrete random variable associated with repeated trials of an experiment. Consider performing some experiment and observing only whether event A occurs. If A occurs, we call the experiment a success; ifit does not occur (A occurs), we call it a failure. Suppose that the probability that A occurs is P(A) = p; hence, P(A) = q = 1 - p. We repeat this experiment n times (trials) under the following assumptions:

CHAP. 6]

PROBABILITY AND RANDOM VARIABLES

139

1. P(A) is constant on each trial. 2. The n trials are independent.

A point in the sample space is a sequence of nA's and A's. A point with kA's and n - kA's will be assigned a probability of pk qn-k. Thus, if Xis the random variable associated with the number of times that A occurs in n trials, then the values of X are the integers k = 0, I, ... , n. In the study of communications, the binomial distribution applies to digital transmission when X stands for the number of errors in a message of n digits. (See Probs. 6.17 and 6.41.) B.

Poisson Distribution:

A r.v. Xis called a Poisson r.v. with parameter ex(> 0) if its pmf is given by Px(k) = P(X = k) = e-a

~

k

k!

= 0,

(6.88)

I, ...

The corresponding cdf of X is n

Fx(x)

exk

= e-a L -

(6.89)

k=Ok!

The mean and variance of the Poisson r.v. X are (Prob. 6.42)

a}= ex

(6.90)

The Poisson distribution arises in some problems involving counting, for example, monitoring the number of telephone calls arriving at a switching center during various intervals of time. In digital communication, the Poisson distribution is pertinent to the problem of the transmission of many data bits when the error rates are low. The binomial distribution becomes awkward to handle in such cases. However, if the mean value of the error rate remains finite and equal to ex, we can approximate the binomial distribution by the Poisson distribution. (See Probs. 6.19 and 6.20.)

C.

Normal (or Gaussian) Distribution:

A r.v. Xis called normal (or gaussian) r.v. if its pdf is of the form fx(x)

= _l_e- P(B). If P(AIB)

=

P(AnB) P(B)

P(BIA) = P(A n B) P(A)

6.11.

> P(A), then > P(A)P(B)

> P(A)P(B). Thus

P(A n B)

= P(B)

P(A)

Let A and B be events in a sample space S. Show that if A and B are independent, then so are II and (b) A and B.

(a) A and

(a) From Eq. (6.100) (Prob. 6.6), we have P(A)

= P(A n

B)

+ P(A nB)

Since A and B are independent, using Eqs. (6.21) and (6.4), we obtain P(A nB) = P(A)- P(A n B) = P(A)- P(A)P(B)

= P(A)[I

- P(B)]

= P(A)P(B)

(6.102)

Thus, by Eq. (6.21), A and Bare independent. (b) Interchanging A and Bin Eq. (6.102), we obtain P(B n A)

= P(B)P(A)

which indicates that A and B are independent.

6.12.

Let A and B be events defined in a sample space S. Show that if both P(A) and P(B) are nonzero, then events A and B cannot be both mutually exclusive and independent. Let A and B be mutually exclusive events and P(A) P(A)P(B) # 0. Therefore P(A n B)

That is, A and B are not independent.

# 0,

P(B)

# P(A)P(B)

# 0.

Then P(A n B) = P(0) = 0 and

PROBABILITY AND RANDOM VARIABLES

144

6.13.

[CHAP. 6

Verify Eq. (6.24). Since B n S = B [and using Eq. (6.23)], we have B=BnS=Bn(A 1 uA 2 u · · · uAN)

=(BnA 1)u(BnA 2 )u · · · u(BnAN) Now the events B n Ak (k = I, 2, ... , N) are mutually exclusive, as seen from the Venn diagram of Fig. 6-8. Then by axiom 3 of the probability definition and Eq. (6.18), we obtain P(B) = P(B n S) =

N

N

k=I

k=l

L P(B n Ak) = L P(BIAk)P(Ak)

s

BnA 3

Fig. 6-8

6.14.

In a binary communication system (Fig. 6-9), a 0 or 1 is transmitted. Because of channel noise, a 0 can be received as a I and vice versa. Let m 0 and m 1 denote the events of transmitting 0 and 1, respectively. Let r 0 and r 1 denote the events of receiving 0 and I, respectively. Let P(m 0 ) = 0.5, P(r1 Imo)= p = 0.1, and P(r0 lm1) = q = 0.2. P(r0 lm 0 )

mo

~-----------::::f;J

ro

m1

er------------ao

r1

P(m,)

Fig. 6-9

(a)

P(r 1 lm 1)

Binary communication system

Find P(r0 ) and P(r 1).

(b) If a 0 was received, what (c) If a 1 was received, what (d) Calculate the probability (e) Calculate the probability

is the probability that a 0 was sent? is the probability that a I was sent? of error Pe. that the transmitted signal is correctly read at the receiver.

(a) From Fig. 6-9, we have P(m 1) = I - P(m 0 ) = I - 0.5 = 0.5 P(r0 lm 0 ) = I -P(r 1lm0 ) = I -p = I -0.1 = 0.9 P(rtlm 1) = I -P(rolm 1) = 1-q = 1-0.2 = 0.8

CHAP. 6]

PROBABILITY AND RANDOM VARIABLES

145

Using Eq. (6.24), we obtain P(r0 ) P(r1)

= =

+ P(r0 lm I )P(m I ) = 0.9(0.5) + 0.2(0.5) = 0.55 P(rilmo)P(mo) + P(rilm I )P(m I ) = 0.1(0.5) + 0.8(0.5) = 0.45

P(r0 lm 0 )P(m0 )

(b) Using Bayes' rule (6.19), we have P(molro)

=

P(m 0 )P(r0 lm0 ) P(r0 )

= (0.5)(0.9) = 0 _818 0.55

(c) Similarly,

(d) (e) The probability that the transmitted signal is correctly read at the receiver is

Pc= P(rolmo)P(mo)

+ P(r1 lm1)P(m 1) = 0.9(0.5) + 0.8(0.5) = 0.85

Note that the probability of error P, is P, =I-Pc= 1-0.85

6.15.

= 0.15

Consider a binary communication system of Fig. 6-9 with P(r0 lm 0 ) = 0.9, P(rdm 1) = 0.6. To decide which of the message was sent from an observed response r0 or r 1, we utilize the following criterion: If r 0 is received: Decide m 0

if P(m 0 1ro) > P(m1 Ira)

Decide m1

if P(mdro) > P(molro)

Decide m 0

if P(m 0 lr1) > P(m1 lr1)

Decide m1

if P(m1'r1) > P(molr1)

If r 1 is received:

This criterion is known as the maximum a posteriori probability (MAP) decision criterion (see Prob. 9.1). (a) Find the range of P(m 0 ) for which the MAP criterion prescribes that we decide m 0 if r 0 is received. (b) Find the range of P(m 0 ) for which the MAP criterion prescribes that we decide m 1 if r1 is received. (c) Find the range of P(m 0 ) for which the MAP criterion prescribes that we decide m 0 no matter what is received. (d) Find the range of P(m 0 ) for which the MAP criterion prescribes that we decide m 1 no matter what is received. (a)

Since P(rilm 0 ) = I - P(r0 lm0 ) and P(r0 lm1) = I - P(rilm1), we have P(r0 lm 0 )

= 0.9

P(r I lm 0 )

= 0.1

= 0.6

P(rolm I )

+ 0.4[1 -

P(m0 )]

P(r I lm 1)

= 0.4

By Eq. (6.24), we obtain P(r0 ) = P(r0 lm0 )P(m0 )

Using Bayes rule (6.19), we have

+ P(r 0 lm I )P(m1) =

0.9 P(m 0 )

= 0.5 P(m 0 ) + 0.4

146

PROBABILITY AND RANDOM VARIABLES

[CHAP. 6

P(r0lmo)P(mo) 0.9 P(mo) P (mo Iro) = - ~ - - - = - - - - P(r0) 0.5 P(m 0) + 0.4 P (m1 Iro)

P(r0 /m 1)P(m 1) 0.4(1- P(m 0)] O.4-O.4P(m 0) =- - - - = ____ ..::.;__ = ----

0.5 P(m 0) + 0.4

P(r0)

0.5 P(m 0) + 0.4

Now by the MAP decision rule, we decide m 0 if r 0 is received if P(m 01ro) > P(m 1lro), that is, O.9P(m )

0.4 - O.4P(m ) + 0.4

0 0 ---->-----'O.5P(mo)

or

O.9P(m 0)

> 0.4 -

O.4P(m 0)

+ 0.4

or

O.5P(m 0)

I .3P(m0 ) > 0.4

0.4

or P(m 0) > - = 0.31 I. 3

Thus, the range of P(m 0) for which the MAP criterion prescribes that we decide m 0 if r 0 is received is 0.31 < P(m 0 ):,;:; I (b)

Similarly, we have P(r1)

= P(r1 lm 0 )P(m0 ) + P(r1 lm 1)P(m 1) = 0.1 P(m 0) + 0.6[ I -

P(m 0)]

= -O.5P(m0) + 0.6

P(r 1lmo)P(mo) 0. IP(m ) P(m 0 Ir 1) = - - - - - = - - - -0- P(r1) -O.5P(m 0) + 0.6

0.6 - O.6P(m 0) + 0.6

-O.5P(m 0)

Now by the MAP decision rule, we decide m 1 if r 1 is received if P(m 1h) > P(m 0 h), that is, O.6-O.6P(m0) O. IP(m0)

-------'-- > - - - - - -O.5P(mo)

or

O.6-O.6P(m 0)

+ 0.6

-O.5P(m 0)

+ 0.6

> O.IP(m0) or 0.6 > O.7P(m0) or

P(m 0)

0.6 07

< _ = 0.86

Thus, the range of P(m 0) for which the MAP criterion prescribes that we decide m 1 if r 1 is received is 0 :,;:; P(m 0 ) < 0.86 (c)

From the result of (b) we see that the range of P(m0) for which we decide m 0 if r 1 is received is P(m 0 )

> 0.86

Combining with the result of (a), the range of P(m 0) for which we decide m 0 no matter what is received is given by 0.86 < P(m 0 ) (d)

:,;:;

I

Similarly, from the result of (a) we see that the range of P(m 0) for which we decide m 1 if r 0 is received is P(m 0 )

< 0.31

Combining with the result of (b), the range of P(m 0) for which we decide m 1 no matter what is received is given by 0:,;:; P(m 0 ) < 0.31

6.16.

Consider an experiment consisting of the observation of six successive pulse pos,tions on a communication link. Suppose that at each of the six possible pulse positions there can be a positive pulse, a negative pulse, or no pulse. Suppose also that the individual experiments that determine the kind of pulse at each possible position are independent. Let us denote the event that the ith pulse is positive by {x; = + 1}, that it is negative by {x; = -1}, and that it is zero by {x; = O}. Assume that

P(x; (a)

= + 1) = p = 0.4

P(x;=-l)=q=0.3

Find the probability that all pulses are positive.

for

i= 1, 2, ... ,6

CHAP. 6]

PROBABILITY AND RANDOM VARIABLES

147

(b)

Find the probability that the first three pulses are positive, the next two are zero, and the last is negative.

(a)

Since the individual experiments are independent, by Eq. (6.22) the probability that all pulses are positive is given by P[(x 1 = +l)n (x2 = +l)n · · · n (x6 =+I)]= P(x 1 = +l)P(x2 =+I)· · · P(x6 = +I)

p6

=

(b)

= (0.4) 6 = 0.0041

From the given assumptions, we have P(x; = 0) = I - p - q = 0.3

Thus, the probability that the first three pulses are positive, the next two are zero, and the last is negative is given by P[(x 1 = +I) n (x2 = +I) n (x 3 = +I) n (x4 = 0) n (x 5 = 0) n (x6 = -1)] = P(xi = +l)P(x2 = +l)P(x3 = +l)P(x4 = O)P(x 5 = O)P(x6 = -I) =

p3( I - p - q)2 q = (0.4)3(0.3)2(0.3) = 0.0017

RANDOM VARIABLES 6.17.

A binary source generates digits 1 and O randomly with probabilities 0.6 and 0.4, respectively. (a) What is the probability that two ls and three Os will occur in a five-digit sequence? (b) What is the probability that at least three ls will occur in a five-digit sequence? (a)

Let X be the random variable denoting the number of ls generated in a five-digit sequence. Since there are only two possible outcomes ( I or 0) and the probability of generating I is constant and there are five digits, it is clear that X has a binomial distribution described by Eq. (6.85) with n = 5 and k = 2. Hence, the probability that two ls and three Os will occur in a five-digit sequence is 3

P(X = 2) = (; )(0.6)2(0.4) = 0.23

(b)

The probability that at least three ls will occur in a five-digit sequence is P(X ;3 3)

= I-

6 (i

P(X:,;:; 2)

2

where Hence,

6.18.

P(X"" 2)

=

)(0.6)\0.4) 5-k

= 0.3 I 7

P(X-;3 3) = I -0.317 = 0.683

Let X be a binomial r.v. with parameters (n,p). Show that Px(k) given by Eq. (6.85) satisfies Eq. (6.33c). Recall that the binomial expansion formula is given by

(a+bt= t(;)akbn-k Thus, by Eq. (6.85),

6.19.

Show that when n is very large (n » k) and p very small (p «: 1), the binomial distribution [Eq. (6.85)] can be approximated by the following Poisson distribution [Eq. (6.88)]:

148

[CHAP. 6

PROBABILITY AND RANDOM VARIABLES

P(X

= k) = e-np (np/

(6.103)

k!

From Eq. (6.85) n) k n-k n! k n-k P(X= k) = ( k p q = k!(n-k)!p q (6.104)

n(n-1)· · ·(n-k+I) k n-k k! P q

When n » k and p

«

I, then n(n - I) · · · (n - k q

= I_P=

e-p

+ I)= n · n ·

· · · ·n

= nk

qn-k = e-(n-k)p = e-np

Substituting these relations into Eq. (6.104), we obtain P(X = k)

6.20.

= e-np (npl k!

A noisy transmission channel has a per-digit error probability Pe = 0.01. (a) Calculate the probability of more than one error in 10 received digits. (b) Repeat (a), using the Poisson approximation, Eq. (6.103). (a)

Let X be a binomial random variable denoting the number of errors in 10 received digits. Then using Eq. (6.85), we obtain P(X > I)

=

I - P(X = 0) - P(X = I)

= I -(

°

1

0

0

)(0.01) (0.99)

10

0 -( \

1

)(0.01) (0.99)9

= 0.0042 (b)

Using Eq. (6.103) with np, = 10(0.01) =0.1, we have 1

P(X >I)= I -e-o 1 (O.l)o -e-0· 1 (0.!) = 0.0047

O!

6.21.

I!

Let X be a Poisson r.v. with parameter a. Show that Px(k) given by Eq. (6.88) satisfies Eq. (6.33c). By Eq. (6.88),

°'LPX " (k) = e L°'" 1 oo

oo

-o

k=0

6.22.

k

0(

= e- , e" = I

k=Ok.

Verify Eq. (6.35). From Eqs. (6.6) and (6.28), we have P(X= x):,;; P(x-e

< X,;; x)

= Fx(x)-Fx(x-e)

for any e ;;,, 0. As Fx(x) is continuous, the right-hand side of the preceding expression approaches O as e-+ 0. Thus, P(X = x) = 0.

6.23.

The pdf of a random variable Xis given by fx(X)

={

i

G""-X""-b

otherwise

where k is a constant. Determine the value of k. (b) Let a= -1 and b = 2. Calculate P(IXI ,;;. c) for c = ½-

(a)

CHAP. 6]

From property [Eq. (6.37b)],

(a)

of fx(x) [Eq.( 6.37a)], k must be a positive constant. From property 2 of fx(x)

[Jx(x)dx =

from which we obtain k

I:

kdx = k(b - a) = 1

= 1/(b- a).Thus, 1 -f( xx) - {b-a

0 (b)

149

PROBABILITY AND RANDOM VARIABLES

a~x~b otherwise

(6.105)

A random variable X having the preceding pdf is called a uniform random variable. With a= -1 and b = 2 we have l

={b

fx(x)

-1 ~x~2

otherwise

From Eq. (6.37c) P(IXI ~ ½) = P(-½ ~

6.24.

x~ ½) =

lp flp f-1/2 fx(x)dx = -1/2 1dx = 1

The pdf of X is given by fx(X)

= ke-axu(x)

where a is a positive constant. Determine the value of the constant k. From property I of fx(x) [Eq. (6.37a)], we must have k;;,, 0. From property 2 of fx(x) [Eq.( 6.37b)], 00

fx(x)dx=kf e-axdx=~= 1 f -oo o a oo

from which we obtain k

= a. Thus, (6.106)

A random variable X with the pdf given by Eq. (6.106) is called an exponential random variable with parameter a.

6.25.

All manufactured devices and machines fail to work sooner or later. If the failure rate is constant, the time to failure Tis modeled as an exponential random variable. Suppose that a particular class of computer memory chips has been found to have the exponential failure law of Eq. (6.106) in hours. (a) Measurements show that the probability that the time to failure exceeds 104 hours (h) for chips in the given class is e- 1( =0.368). Calculate the value of parameter a for this case. (b) Using the value of parameter a determined in part (a), calculate the time t0 such that the probability is 0.05 that the time to failure is less than t0 • (a)

Using Eqs. (6.38) and (6.106), we see that the distribution function of Tis given by FT(t)

Now

P(T

=

f

fr(r)dr

= (I -

e-at)u(t)

00

> 104) = 1 - P(T~ 104 ) 4

4

= l-FT(104 )= l-(1-e-a(l0 ))=e-a(l0 )=e-I

from which we obtain a= 10-4 . (b) We want FT(to) = P(T~ lo)= 0.05

Hence,

1-e-ato

=

1-e-(lo-•)to

= 0.05

150

PROBABILITY AND RANDOM VARIABLES

[CHAP. 6

or from which we obtain t0

6.26.

= -104 ln0.95 = 513 h

The joint pdf of X and Y is given by fXY(x,y)

= ke-(O)

where

Mk\O)

=

1,2, ...

= dk Mx(2) I _ d2k

X

(a) (b)

k

i-0

Find the moment generating function of X uniformly distributed over (a,b). Using the result of(a), find E[X], E[X2], and E[X3 ]. eih

-/a

Ans. (a) ).(b _ a)

(b) E [X] 6.71.

= ½(b + a),

E [X2 ]

= ½(b2 +ab+ a2),

E [X3 ] = ¼(b 3 + b2a + ba2 + a3 )

Show that if X and Y are zero-mean jointly normal random variables, then E[X2Y 2] = E[X2]E[Y2] +2(E[XY])2 Hint: Use the moment generating function of X and Y given by

Mxr0-1J2)

= E[i x+I,Y] 1

%a~ ta G)E [xk yn-kp} A2 k

= aX + b

such that

RANDOM PROCESSES

ls for random message signals and noise encountered in communication systems are developed in er. Random signals cannot be explicitly described prior to their occurrence, and noises cannot be by deterministic functions of time. However, when observed over a long period, a random signal ay exhibit certain regularities that can be described in terms of probabilities and statistical uch a model, in the form of a probabilistic description of a collection of functions of times, is ndom process.

TIONS AND NOTATIONS OF RANDOM PROCESSES

r a random experiment with outcomes A and a sample space S. If to every outcome 2 ES we -valued time function X(t, 2), we create a random (or stochastic) process. A random process efore a function of two parameters, the time t and the outcome A. For a specific A, say, 2;, we time function X(t, 2;) = x;(t). This time function is called a sample function. The totality of all ions is called an ensemble. For a specific time tj, X(tj, 2) = Xj denotes a random variable. t1) and fixed 2(= 2;), X(t1, 2;) = x;(t1) is a number. ndom process is sometimes defined as a family of random variables indexed by the parameter is called the index set. 1 illustrates the concepts of the sample space of the random experiment, outcomes of the sociated sample functions, and random variables resulting from taking two measurements of ctions. wing we use the notation X(t) to represent X(t, 2).

S OF RANDOM PROCESSES Expressions:

random process X(t). For a particular time ti, X(ti) tion Fx(xi;ti) is defined as Fx(xi; ti)

= P{X(ti),;;: xi}

real number. 165

= Xi

is a random variable, and its (7.1)

166

RANDOM PROCESSES

[CHAP. 7

Sample space

\ • A2 •

• • •

Outcome

Fig. 7-1

Random process

And Fx(xi; ti) is called the first-order distribution of X(t). The corresponding first-order density function is obtained by (7.2)

Similarly, given ti and t2, X(ti) = Xi and X(t2) = X2 represent two random variables. Their joint distribution is called the second-order distribution and is given by

Fx(xi, xz;ti, t2)=P{X(ti),;;;xi, X(t2),;;:x2} where Xi and x 2 are any real numbers. The corresponding second-order density function is obtained by 2 . )_o Fx(xi,x2;ti,t2) f;( X X1,X2,ti,t2 '°' '°' uXiuXz

In a similar manner, for n random variables X(t;)

= X;(i =

(7.3)

(7.4)

1, ... , n), the nth-order distribution is

Fx(Xi, ... ,xn; ti, ... , tn) = P{X(ti) ~Xi, ... ,X(tn),;;; Xn}

(7.5)

The corresponding nth-order density function is . )-onFx(X1,••·,xn;ti,···,tn) f;( xXi,••·,Xn,ti,•••,tn '°' '°' uXi .. . uxn

B.

(7.6)

Statistical Averages: As in the case of random variables, random processes are often described by using statistical averages (or ensemble averages). The mean of X(t) is defined by

µx(t) = E[X(t)] =

J:

xfx(x; t) dx

00

where X(t) is treated as a random variable for a fixed value oft.

(7. 7)

CHAP. 7]

RANDOM PROCESSES

167

The autocorrelation of X(t) is defined by RxxCti, t2)

= E[X(ti)X(t2)]

=

J:ooJ:oo x,xJ!x(xi,x2;ti,t2)dxidx2

(7.8)

The autocovariance of X(t) is defined by CxxCti, t2)

= E{[X(ti)- µxCti)][X(t2)- µx(tz)]} =

RxxCti, t2) - µxCti)µxCt2)

(7.9)

The nth joint moment of X(t) is defined by E[X(t1)- · .X(tn)J

=

J: J: • • •

00

C.

Stationarity:

1.

Strict-Sense Stationary:

Xi··· Xnfx(Xi, ... ,xn;ti, ... ,tn)dxi · · · dxn

(7.10)

00

A random process X(t) is called strict-sense stationary (SSS) if its statistics are invariant to a shift of origin. In other words, the process X(t) is SSS if (7.11)

for any c. From Eq. (7.11) it follows thatfx(xi;ti) =fx(xi;t1 +c) for any c. Hence the first-order density of a stationary X(t) is independent of t: (7.12)

Similarly.fx(xi ,x2; ti, t2) = fx(xi ,x2; ti

+ c, t2 + c) for any c.

fx(xi ,xz; ti, t2)

Setting c =-ti, we obtain

= fx(xi, Xz; t2 -

ti)

(7.13)

which indicates that if X(t) is SSS, the joint density of the random variables X(t) and X(t + r) is independent oft and depends only on the time difference,.

2.

Wide-Sense Stationary:

A random process X(t) is called wide-sense stationary (WSS) if its mean is constant E[X(t)]

= µx

(7.14)

and its autocorrelation depends only on the time difference r E[X(t)X(t + r)]

= Rxx(r)

(7.15)

From Eqs. (7.9) and (7.15) it follows that the autocovariance of a WSS process also depends only on the time difference ,: (7.16)

Setting

t

= 0 in Eq.

(7. 15), we obtain (7.17)

Thus, the average power of a WSS process is independent of t and equals Rxx(O). Note that an SSS process is WSS but a WSS process is not necessarily SSS. Two processes X(t) and Y(t) are called jointly wide-sense stationary Gointly WSS) if each is WSS and their cross-correlation depends only on the time difference ,: Rxy(t, t

+ r) = E[X(t)Y(t + r)] = Rxy( r)

(7.18)

168

[CHAP. 7

RANDOM PROCESSES

From Eq. (7.18) it follows that the cross-covariance of jointly WSS X(t) and Y(t) also depends only on the time difference ,: (7.19)

D.

Time Averages and Ergodicity: The time-averaged mean of a sample function x(t) of a random process X(t) is defined as x

= (x(t)) =

1 fT/2 lim x(t) dt T -T/2

(7.20)

T- 00

where the symbol O denotes time-averaging. Similarly, the time-averaged autocorrelation of the sample function x(t) is defined as Rxx( r)

= (x(t)x(t + r)) =

1 fT/2 lim x(t)x(t + r) dt T- T -T/2

(7.21)

00

Note that x and Rxx(r) are random variables; their values depend on which sample function of X(t) is used in the time-averaging evaluations. If X(t) is stationary, then by taking the expected value on both sides of Eqs. (7.20) and (7.21), we obtain E[x]

=

1 fT/2 lim E[x(t)]dt T -T/2

T-oo

= µx

(7.22)

which indicates that the expected value of the time-averaged mean is equal to the ensemble mean, and E[Rxx(r)]

=

1 fT/2 lim E[x(t)x(t + r)]dt T -T/2

T- 00

= Rxx(r)

(7.23)

which also indicates that the expected value of the time-averaged autocorrelation is equal to the ensemble autocorrelation. A random process X(t) is said to be ergodic if time averages are the same for all sample functions and equal to the corresponding ensemble averages. Thus, in an ergodic process, all its statistics can be obtained by observing a single sample function x(t) = X(t, 2) (2 fixed) of the process. A stationary process X(t) is called ergodic in the mean if

x = (x(t)) = E[X(t)] = µx

(7.24)

Similarly, a stationary process X(t) is called ergodic in the autocorrelation if Rxx(r)

= (x(t)x(t + r)) = E[X(t)X(t + r)] = Rxx(r)

(7.25)

Testing for the ergodicity of a random process is usually very difficult. A reasonable assumption in the analysis of most communication signals is that the random waveforms are ergodic in the mean and in the autocorrelation. Fundamental electrical engineering parameters, such as de value, root-mean-square (rms) value, and average power can be related to the moments of an ergodic random process. They are summarized in the following: l. x = (x(t)) is equal to the de level of the signal. 2. [.x] 2 = (x(t))2 is equal to the normalized power in the de component. 3. Rxx(O) = (x2 (t)) is equal to the total average normalized power. 4. al = (x 2 (t))-(x(t))2 is equal to the average normalized power in the time-varying or ac component of the signal. 5. irx is equal to therms value of the ac component of the signal.

CHAP. 7]

RANDOM PROCESSES

169

7.4 CORRELATIONS AND POWER SPECTRAL DENSITIES In the following we assume that all random processes are WSS.

A.

Autocorrelation RxxC-r:): The autocorrelation of X(t) is [Eq. (7.15)] Rxx(r)

= E[X(t)X(t + r)]

Properties of Rxx(r): l.

Rxx(-r)

= Rxx(r)

(7.26)

2.

IRxx(r)I ,s;; Rxx(O)

(7.27)

3.

B.

(7.28)

Cross-Correlation RXY('r:): The cross-correlation of X(t) and Y(t) is [Eq. (7.18)] Rxr(r)

Properties of Rxy(r): 1.

= E[X(t)Y(t + r)]

Rxy(-r)

= Rrx(r)

(7.29)

2.

(7.30)

3.

C.

IRxr(r)I ,s;; ½(Rxx(O)

+ Ryy(O)]

(7.31)

Autocovariance Cxx(1:): The autocovariance of X(t) is [Eq. (7.9)] Cxx(r)

= E[{X(t) -

E[X(t)]}{X(t

+ r) -

E[X(t

+ r)]}]

+ r) -

E[Y(t

+ r)]}]

= Rxx(r)- µ1, D.

(7.32)

Cross-Covariance Cxy(1:): The cross-covariance of X(t) and Y(t) is Cxy(r)

= E[{X(t) = Rxy(r) -

E[X(t)]}{Y(t

(7.33)

µxµy

Two processes X(t) and Y(t) are called (mutually) orthogonal if Rxy(r)

=

0

(7.34)

170

RANDOM PROCESSES

[CHAP. 7

They are called uncorrelated if

=0

Cxy(r)

E.

(7.35)

Power Spectral Density or Power Spectrum: Let Rxx(r) be the autocorrelation of X(t). Then the power spectral density (or power spectrum) of X(t) is defined by the Fourier transform of Rxx( r) (Sec. 1.3) as Sxx(w)

=

J:

Rxx(r)e-jwrd,

(7.36)

00

Thus,

Rxx(r)

Joo

1

_;wr

= ln _ Sxx(wk dw

(7.37)

00

Equations (7.36) and (7.37) are known as the Wiener-Khinchin relations.

Properties of Sxx(w): 1.

Sxx(w)is real and Sxx(w);;,, 0 Sxx(-w)

2.

Joo

3.

1 _ Sxx(w) dw ln

= Sxx(w)

= Rxx(O) = E[X2 (t)]

(7.38) (7.39)

(7.40)

00

F.

Cross Spectral Densities: The cross spectral densities (or cross power spectra) Sxr(w) and Srx(w) of X(t) and Y(t) are defined by

Sxy(w)

=

J:

Rxy(r)e-jwrd,

(7.41)

00

(7.42)

and The cross-correlations and cross spectral densities thus form Fourier transform pairs. Thus, Rxr(r)

1

= ln

Joo_

_;wr dw

Sxy(wk

(7.43)

00

(7.44)

Substituting the relation [Eq. (7.29)] Rxr(r)

= Rrx(-r)

in Eqs. (7.41) and (7.42), we obtain (7.45)

CHAP. 7]

171

RANDOM PROCESSES

7.5

TRANSMISSION OF RANDOM PROCESSES THROUGH LINEAR SYSTEMS

A.

System Response: A linear time-invariant (LTI) system is represented by (Sec. 2.2)

= L[X(t)]

Y(t)

(7.46)

where Y(t) is the output random process of an LTI system, represented by a linear operator L with input random process X(t). Let h(t) be the impulse response of an LTI system (Fig. 7-2). Then [Sec. 2.2, Eq. (2.8)]

= h(t) * X(t) =

Y(t)

J:

(7.47)

h(r:t.)X(t - r:t.) dr:t.

00

S(t)

LTI

h(t)

X(t)

system

Y(t)

Fig. 7-2

B.

LTI system

Mean and Autocorrelation of the Output: µy(t)

= E[Y(t)] = E[f:

=

J: J:

h(r:t.)X(t- r:t.) dr:t.]

00

h(r:t.)E[X(t - r:t.)]dr:t.

00

=

h(r:t.)µx(t - a) dr:t.

= h(t) * µx(t)

(7.48)

00

Ryy(ti, t2)

= E[Y(ti)Y(t2)]

J: J: J: J: J:

= E[f: =

=

h(r:t.)X(ti - r:t.)h(/3)X(t2 - /3) dr:t.d/3]

00

00

h(r:t.)h(/3)E[X(ti - r:t.)X(t2 - /3)]dr:t.d/3

00

00

00

00

h(r:t.)h(/3)RxxCti - r:t., t2 - /3)]dr:t.d/3

If the input X(t) is WSS, then from Eq. (7.48) we have E[Y(t)]

=

J:

h(r:t.)µxdr:t.

= µx

J:

h(r:t.)dr:t.

= µxH(O)

(7.49)

(7.50)

00

00

where H(0) is the frequency response of the linear system at w constant.

= 0.

Thus, the mean of the output is a

+ r:t. -

/3) dr:t.d/3

The autocorrelation of the output given in Eq. (7.49) becomes Ryy(ti, t2)

=

J: J: 00

h(r:t.)h(/3)RxxCt2 - ti

00

which indicates that Ryy(ti, t 2 ) is a function of the time difference r

= t2 -

ti. Hence,

(7.51)

172

RANDOM PROCESSES

Ryy(r)

=

J: J: 00

h(r:t.)h(/3)Rxx(r

+ r:t. -

[CHAP. 7

/3) dr:t.d/3

(7.52)

00

Thus, we conclude that if the input X(t) is WSS, the output Y(t) is also WSS.

C.

Power Spectral Density of the Output: Taking the Fourier transform of both sides of Eq. (7.52), we obtain the power spectral density of the output Syy(w)

=

J: J: J: J:

Ryy(r)e-jw,d,

00

=

00

=

00

h(r:t.)h(/3)Rxx(r

+a -

/3)e-jw,d,dr:1.d/3

00

2

(7.53)

IH(w)l Sxx(w)

Thus, we obtained the important result that the power spectral density of the output is the product of the power spectral density of the input and the magnitude squared of the frequency response of the system. When the autocorrelation of the output Ryy( r) is desired, it is easier to determine the power spectral density Syy(w) and then to evaluate the inverse Fourier transform (Prob. 7.21). Thus, Ryy(r)

1 Joo = ln _

_iwT

Syy(w)e-

dw

00

= ln1 Joo_ IH(w) 12Sxx(w)e-_iwT dw

(7.54)

00

By Eq. (7.17), the average power in the output Y(t) is E[Y 2 (t)] = Ryy(O) = - I

2n

Joo IH(w)I 2Sxx(w)dw -oo

(7.55)

7.6 SPECIAL CLASSES OF RANDOM PROCESSES In this section, we consider some important random processes that are commonly encountered in the study of communication systems.

A.

Gaussian Random Process: Consider a random process X(t), and define n random variables X(t 1), ... ,X(tn) corresponding ton time instants t 1 , • •• , tn- Let X be a random vector (n x I matrix) defined by

(7.56)

Let x be an n-dimensional vector (n x I matrix) defined by

CHAP. 7]

RANDOM PROCESSES

173

(7.57)

so that the event {X(t 1):;;; x 1 , ••• , X(tn),;;; Xn} is written {X:;;; x}. Then X(t) is called a gaussian (or normal) process if X has a jointly multivariate gaussian density function for every finite set of {t;} and every n. The multivariate gaussian density function is given by (7.58)

where T denotes the "transpose," µ is the vector means, and C is the covariance matrix, given by (7.59)

(7.60)

(7.61)

where which is the covariance of X(t;) and X(tj), and det C is the determinant of the matrix C. 1.

Some of the important properties of a gaussian process are as follows: A gaussian process X(t) is completely specified by the set of means

µ;

= E[X(t;)]

i

=

1, ... ,n

and the set of autocorrelations RxxU;, tj)

2.

3. 4.

B.

= E[X(t;)X(tj)]

If the set of random variables X(t;), i

= 1, ... , n, Cu= 0

i, j

= I, ... , n

is uncorrelated, that is, i -f-j

then X(t;) are independent. If a gaussian process X(t) is WSS, then X(t) is SSS. If the input process X(t) of a linear system is gaussian, then the output process Y(t) is also gaussian.

White Noise: A random process X(t) is called white noise if [Fig. 7-3(a)] Sxx(w)

= -'1

2

(7.62)

Taking the inverse Fourier transform of Eq. (7.62), we have Rxx(r)

= '1 J(r)

2

which is illustrated in Fig. 7-3(b). It is usually assumed that the mean of white noise is zero.

(7.63)

174

RANDOM PROCESSES

[CHAP. 7

11

2

w

0

0

(bl

(a)

Fig. 7-3

C.

White noise

Band-Limited White Noise: A random process X(t) is called band-limited white noise if (7.64)

lwl > Ws Then Rxx(r)

= -I

f

Wa

2n

-w 8

'1 _;w,d

-e-

2

w

IJWB sin Wsr =---

2n

w8 r

(7.65)

And Sxx(w) and Rxx(r) of band-limited white noise are shown in Fig. 7-4. Note that the term white or band-limited white refers to the spectral shape of the process X(t) only, and these terms do not imply that the distribution associated with X(t) is gaussian.

Sxx 11

2

0

w

(bl

(a)

Fig. 7-4

D.

Band-limited white noise

Narrowband Random Process: Suppose that X(t) is a WSS process with zero mean and its power spectral density Sxx(w) is nonzero only in some narrow frequency band of width 2W that is very small compared to a center frequency we, as shown in Fig. 7-5. Then the process X(t) is called a narrowband random process. In many communication systems, a narrowband process (or noise) is produced when white noise (or broadband noise) is passed through a narrowband linear filter. When a sample function of the narrowband process is viewed on an oscilloscope, the observed waveform appears as a sinusoid of random amplitude and phase. For this reason, the narrowband noise X(t) is conveniently represented by the expression X(t)

=

V(t) cos [wet+ q>(t)]

(7.66)

CHAP. 7]

175

RANDOM PROCESSES

w

0

H

H

2W

2W

Fig. 7-5

Narrowband random process

where V(t) and

Z I(x1) if P(x;) < P(x1)

I(x;)

I(x;x;) = I(x;)

+ I(x1)

(10.4)

if X; and x1 are independent

(10.5)

The unit of I(x;) is the bit (binary unit) if b = 2, Hartley or decit if b = 10, and nat (natural unit) if b = e. It is standard to use b = 2. Here the unit bit (abbreviated "b") is a measure of information content and is not to be confused with the term bit meaning "binary digit." The conversion of these units to other units can be achieved by the following relationships. Ina loga log2a = ln2 = log2

2.

(10.6)

Average Information or Entropy:

In a practical communication system, we usually transmit long sequences of symbols from an information source. Thus, we are more interested in the average information that a source produces than the information content of a single symbol. The mean value of I(x;) over the alphabet of source X with m different symbols is given by m

H(X)

= E[I(x;)] = L P(x;)I(x;) i=I

(10.7)

m

= -

L P(x;) log

2

P(x;)

b/symbol

i=I

The quantity H(X) is called the entropy of source X. It is a measure of the average information content per source symbol. The source entropy H(X) can be considered as the average amount of uncertainty within source X that is resolved by use of the alphabet. Note that for a binary source X that generates independent symbols 0 and 1 with equal probability, the source entropy H(X) is H(X)=-½log 2 ½-½Iog 2 ½= I b/symbol

(10.8)

The source entropy H(X) satisfies the following relation: 0.:;; H(X).:;; log 2 m

(10.9)

where m is the size (number of symbols) of the alphabet of source X (Prob. I 0.4). The lower bound corresponds to no uncertainty, which occurs when one symbol has probability P(x;) = 1 while

CHAP. 10]

INFORMATION THEORY AND SOURCE CODING

247

= 0 for j =/- i, so X emits the same symbol x; all the time. The upper bound corresponds to the maximum uncertainty which occurs when P(x;) = 1/m for all i, that is, when all symbols are equally likely to be emitted by X. P(x)

3.

Information Rate:

If the time rate at which source X emits symbols is r (symbols/s), the information rate R of the source is given by R

10.3 A.

= rH(X) b/s

(10.10)

DISCRETE MEMORYLESS CHANNELS Channel Representation:

A communication channel is the path or medium through which the symbols flow to the receiver. A discrete memoryless channel (DMC) is a statistical model with an input X and an output Y (Fig. 10-1). During each unit of the time (signaling interval), the channel accepts an input symbol from X, and in response it generates an output symbol from Y. The channel is "discrete" when the alphabets of X and Y are both finite. It is "memoryless" when the current output depends on only the current input and not on any of the previous inputs.

I.,,

x, •

• Y2

X2 •



·I

X

X; •



't''

P(Y;IX;}

• Yn

Xm•

Fig. 10-1

Discrete memoryless channel

A diagram of a DMC with m inputs and n outputs is illustrated in Fig. 10-1. The input X consists of input symbols x 1, x 2, ... , Xm- The a priori probabilities of these source symbols P(x;) are assumed to be known. The output Y consists of output symbols y 1 , y 2 , ... , Yn. Each possible input-to-output path is indicated along with a conditional probability P(yJix;), where P(y11x;) is the conditional probability of obtaining output y1 given that the input is x;, and is called a channel transition probability. B.

Channel Matrix:

A channel is completely specified by the complete set of transition probabilities. Accordingly, the channel of Fig. 10-1 is often specified by the matrix of transition probabilities [P(YIX)], given by [P(YIX)]

=

P(ydx1)

P(Y2IX1)

P(Ynlx1)]

P(~'._lx2)

P(Y2lx2)

P(Ynlx2)

P(y1 lxm)

P(Y2IXm)

P(Ynlxm)

(]O.ll)

[

The matrix [P( YIX)] is called the channel matrix. Since each input to the channel results in some output, each row of the channel matrix must sum to unity, that is, n

L P(y)x;) = 1 }=1

for all i

(10.12)

248

INFORMATION THEORY AND SOURCE CODING

[CHAP. 10

Now, if the input probabilities P(X) are represented by the row matrix [P(X)]=[P(x1)

P(x2)

...

P(xm)]

(10.13)

and the output probabilities P( Y) are represented by the row matrix [P(Y)]

then

=

[P(Y)]

[P(y 1)

=

P(y2)

...

P(yn)]

(10.14) (10.15)

[P(X)][P(YIX)]

If P(X) is represented as a diagonal matrix (10.16) 0

then

[P(X, Y)]

=

[P(X)]dP(YjX)]

(10.17)

where the (i,j) element of matrix [P(X, Y)] has the form P(x;,Y)- The matrix [P(X, Y)] is known as the joint probability matrix, and the element P(x;, y1) is the joint probability of transmitting X; and receiving YJC.

Special Channels:

I.

Lossless Channel:

A channel described by a channel matrix with only one nonzero element in each column is called a lossless channel. An example of a lossless channel is shown in Fig. 10-2, and the corresponding channel matrix is shown in Eq. (10.18).

(10.18)

x1



--============

Y, Y2

!

4

l

3

Fig. 10-2

Lossless channel

It can be shown that in the lossless channel no source information is lost in transmission. [See Eq. (10.35) and Prob. 10.10.) 2.

Deterministic Channel:

A channel described by a channel matrix with only one nonzero element in each row is called a deterministic channel. An example of a deterministic channel is shown in Fig. 10-3, and the corresponding channel matrix is shown in Eq. (10.19).

CHAP. 10]

INFORMATION THEORY AND SOURCE CODING

249

x, x 2 • Y, I X3

~

• Y2

I

X4

Fig. 10-3

Deterministic channel

[P(YJX)]=

II 0 [0 0

O O OJ 0 1 0 1 0 0 1

(10.19)

Note that since each row has only one nonzero element, this element must be unity by Eq. (10.12). Thus, when a given source symbol is sent in the deterministic channel, it is clear which output symbol will be received.

3.

Noiseless Channel:

A channel is called noiseless if it is both lossless and deterministic. A noiseless channel is shown in Fig. 10-4. The channel matrix has only one element in each row and in each column, and this element is unity. Note that the input and output alphabets are of the same size; that is, m = n for the noiseless channel.

xm..__ _ _ _ _ _ _ _ _ _ _ _ _ _ Ym

Fig. 10-4

4.

Noiseless channel

Binary Symmetric Channel:

The binary symmetric channel (BSC) is defined by the channel diagram shown in Fig. 10-5, and its channel matrix is given by [ P( YJX)]

= [ I p- p

p

1-p

]

(10.20)

The channel has two inputs (x 1 = 0, x 2 = 1) and two outputs (y 1 = O,y 2 = 1). The channel is symmetric because the probability of receiving a 1 if a O is sent is the same as the probability of receiving a O if a I is sent. This common transition probability is denoted by p. (See Prob. 10.35.)

250

INFORMATION THEORY AND SOURCE CODING

[CHAP. 10

1-p

Binary symmetrical channel

Fig. 10-5

10.4 A.

MUTUAL INFORMATION Conditional and Joint Entropies:

Using the input probabilities P(x;), output probabilities P(y), transition probabilities P(y11x;), and joint probabilities P(x;,y1), we can define the following various entropy functions for a channel with m inputs and n outputs: m

H(X)

=-

L P(x;) log

P(x;)

(10.21)

L P(y) log2 P(y1)

(10.22)

2

i=I

n

H( Y)

=-

j=I

n

H(XI Y)

=-

m

L L P(x;, y) log2 P(x;ly1)

(10.23)

J=I i=I

n

H( YIX)

=-

m

L L P(x;, y1) log2 P(y1lx;)

(10.24)

/=I i=I

n

H(X, Y)

=-

m

L L P(x;, y) log2 P(x;, y1)

(10.25)

j=I i=I

These entropies can be interpreted as follows: H(X) is the average uncertainty of the channel input, and H( Y) is the average uncertainty of the channel output. The conditional entropy H(XI Y) is a measure of the average uncertainty remaining about the channel input after the channel output has been observed. And H(XI Y) is sometimes called the equivocation of X with respect to Y. The conditional entropy H(YIX) is the average uncertainty of the channel output given that X was transmitted. The joint entropy H(X, Y) is the average uncertainty of the communication channel as a whole. Two useful relationships among the above various entropies are

B.

H(X, Y)

= H(XI Y) + H( Y)

(10.26)

H(X, Y)

= H( YIX) + H(X)

(10.27)

Mutual Information:

The mutual information J(X; Y) of a channel is defined by J(X; Y)

= H(X) -

H(XI Y)

b/symbol

(10.28)

CHAP. 10]

INFORMATION THEORY AND SOURCE CODING

251

Since H(X) represents the uncertainty about the channel input before the channel output is observed and H(XI Y) represents the uncertainty about the channel input after the channel output is observed, the mutual information J(X; Y) represents the uncertainty about the channel input that is resolved by observing the channel output. Properties of J(X; Y) : l(X; Y)

1.

(10.29) (10.30)

2.

l(X;Y)?O

3.

J(X; Y)

= H(Y)-H(YIX)

4.

I (X; Y)

= H(X) + H( Y) -

10.5 A.

= J(Y;X)

(10.31) H(X, Y)

(10.32)

CHANNEL CAPACITY Channel Capacity per Symbol Cs: The channel capacity per symbol of a DMC is defined as Cs= max I(X; Y)

b/symbol

{P(x;)}

(10.33)

where the maximization is over all possible input probability distributions {P(x;)} on X. Note that the channel capacity Cs is a function of only the channel transition probabilities that define the channel.

B.

Channel Capacity per Second C:

If r symbols are being transmitted per second, then the maximum rate of transmission of information per second is rCs. This is the channel capacity per second and is denoted by C (b/s): C

C.

Capacities of Special Channels:

I.

Lossless Channel:

For a lossless channel, H(XIY)

= 0 (Prob.

= rC

5

b/s

(10.34)

10.10) and

J(X; Y)

= H(X)

(10.35)

Thus, the mutual information (information transfer) is equal to the input (source) entropy, and no source information is lost in transmission. Consequently, the channel capacity per symbol is C5

= max H(X) = log 2 m

(10.36)

{P(x,)}

where m is the number of symbols in X.

2.

Deterministic Channel:

For a deterministic channel, H( YIX)

= 0 for

all input distributions P(x;), and

J(X; Y)

= H(Y)

(10.37)

Thus, the information transfer is equal to the output entropy. The channel capacity per symbol is

252

INFORMATION THEORY AND SOURCE CODING

= max H(Y) = log 2 n

C5

[CHAP. 10

(10.38)

{P(x,)}

where n is the number of symbols in Y.

3.

Noiseless Channel:

Since a noiseless channel is both lossless and deterministic, we have J(X; Y)

=

H(X)

=

(10.39)

H( Y)

and the channel capacity per symbol is (10.40)

4.

Binary Symmetric Channel:

For the BSC of Fig. I0-5, the mutual information is (Prob. I 0.16) J(X; Y)

= H(Y) + plog2 p + (l -p)log2 (1 -p)

(10.41)

and the channel capacity per symbol is (10.42)

10.6

ADDITIVE WHITE GAUSSIAN NOISE CHANNEL In a continuous channel an information source produces a continuous signal x(t). The set of possible signals is considered as an ensemble of waveforms generated by some ergodic random process. It is further assumed that x(t) has a finite bandwidth so that x(t) is completely characterized by its periodic sample values. Thus, at any sampling instant, the collection of possible sample values constitutes a continuous random variable X described by its probability density function fx(x).

A.

Differential Entropy:

The average amount of information per sample value of x(t) is measured by H(X)

=-

f:

Jx(x) loglfx(x) dx

b/sample

(10.43)

0

The entropy H(X) defined by Eq. (10.43) is known as the differential entropy of X. The average mutual information in a continuous channel is defined (by analogy with the discrete case) as

or where

H(Y) H(XI Y) H(YIX)

===-

f:

l(X; Y)

=

H(X)- H(XI Y)

J(X; Y)

=

H( Y) - H( YIX)

fy(y)log2fy(y) dy

(10.44)

00

J: J: f: f: 00

00

00

00

/XY(x, y) loglfx(xly) dxdy

(10.45a)

fXY(x,y)log 2 fy(ylx) dxdy

(10.45b)

CHAP. 10]

B.

INFORMATION THEORY AND SOURCE CODING

253

Additive White Gaussian Noise Channel

In an additive white gaussian noise (A WGN ) channel, the channel output Y is given by

Y=X+n

(10.46)

where Xis the channel input and n is an additive band-limited white gaussian noise with zero mean and variance a 2 • The capacity Cs of an A WGN channel is given by Cs= max J(X; Y) lfx(x)}

1 (1 + S) 2

= - log2

N

b/sample

(10.47)

where S/ N is the signal-to-noise ratio at the channel output. If the channel bandwidth B Hz is fixed, then the output y(t) is also a band-limited signal completely characterized by its periodic sample values taken at the Nyquist rate 2B samples/s. Then the capacity C (b/s) of the A WGN channel is given by

= B log2 ( 1 +

C = 2BCs

!)

b/s

(10.48)

Equation (10.48) is known as the Shannon-Hartley law. The Shannon-Hartley law underscores the fundamental role of bandwidth and signal-to-noise ratio in communication. It also shows that we can exchange increased bandwidth for decreased signal power (Prob. 10.24) for a system with given capacity C.

10.7 SOURCE CODING A conversion of the output of a DMS into a sequence of binary symbols (binary code word) is called source coding. The device that performs this conversion is called the source encoder (Fig. 10-6).

Discrete memoryless source

1----i

Source encoder

x,

Fig. 10-6

Binary sequence

Source coding

An objective of source coding is to minimize the average bit rate required for representation of the source by reducing the redundancy of tht- information source.

A.

Code Length and Code Efficiency:

Let X be a DMS with finite entropy H(X) and an alphabet {x 1 , ... , Xm} with corresponding probabilities of occurrence P(x;)(i = 1, ... , m). Let the binary code word assigned to symbol X; by the encoder have length n;, measured in bits. The length of a code word is the number of binary digits in the code word. The average code word length L, per source symbol is given by m

L = LP(x;)n;

(10.49)

i=I

The parameter L represents the average number of bits per source symbol used in the source coding process. The code efficiency 17 is defined as

254

INFORMATION THEORY AND SOURCE CODING

(CHAP. 10

Lmin

(10.50)

11=-

L

where Lmin is the minimum possible value of L. When 17 approaches unity, the code is said to be efficient. The code redundancy y is defined as (10.51)

y=l-17

B.

Source Coding Theorem: The source coding theorem states that for a DMS X with entropy H(X), the average code word length L per symbol is bounded as (10.52)

L? H(X)

and further, L can be made as close to H(X) as desired for some suitably chosen code. Thus, with Lmin = H(X), the code efficiency can be rewritten as H(X)

(10.53)

11=--

L

C.

Classification of Codes: Classification of codes is best illustrated by an example. Consider Table I0-1 where a source of size 4 has been encoded in binary codes with symbol 0 and I. Table 10-1

I.

Binary Codes

X;

Code 1

Code 2

Code 3

Xi

00

00

0

0

0

1

X2

01

01

1

10

01

01

X3

00

10

00

110

011

001

X4

11

11

11

111

0111

0001

Code 4

Code 5

Code 6

Fixed-Length Codes:

A fixed-length code is one whose code word length is fixed. Code I and code 2 of Table I 0-1 are fixed-length codes with length 2. 2.

Variable-Length Codes:

A variable-length code is one whose code word length is not fixed. All codes of Table I 0-1 except codes I and 2 are variable-length codes. 3.

Distinct Codes:

A code is distinct if each code word is distinguishable from other code words. All codes of Table I 0-1 except code I are distinct codes-notice the codes for x 1 and x3. 4.

Prefix-Free Codes:

A code in which no code word can be formed by adding code symbols to another code word is called a prefix-free code. Thus, in a prefix-free code no code word is a prefix of another. Codes 2, 4, and 6 of Table I 0-1 are prefix-free codes.

CHAP. 10]

5.

INFORMATION THEORY AND SOURCE CODING

255

Uniquely Decodable Codes:

A distinct code is uniquely decodable if the original source sequence can be reconstructed perfectly from the encoded binary sequence. Note that code 3 of Table I0-1 is not a uniquely decodable code. For example, the binary sequence 1001 may correspond to the source sequences x 2 x 3x 2 or x 2 x 1x 1x 2 • A sufficient condition to ensure that a code is uniquely decodable is that no code word is a prefix of another. Thus, the prefix-free codes 2, 4, and 6 are uniquely decodable codes. Note that the prefix-free condition is not a necessary condition for unique decodability. For example, code 5 of Table I 0-1 does not satisfy the prefix-free condition, and yet it is uniquely decodable since the bit 0 indicates the beginning of each code word of the code. 6.

Instantaneous Codes:

A uniquely decodable code is called an instantaneous code if the end of any code word is recognizable without examining subsequent code symbols. The instantaneous codes have the property previously mentioned that no code word is a prefix of another code word. For this reason, prefix-free codes are sometimes called instantaneous codes.

7.

Optimal Codes:

A code is said to be optimal if it is instantaneous and has minimum average length L for a given source with a given probability assignment for the source symbols.

D.

Kraft Inequality:

Let Xbe a OMS with alphabet {x;} (i = 1, 2, ... ,m). Assume that the length of the assigned binary code word corresponding to x; is n;. A necessary and sufficient condition for the existence of an instantaneous binary code is m

K=

Irn;,;;;

I

(10.54)

i=I

which is known as the Kraft inequality. Note that the Kraft inequality assures us of the existence of an instantaneously decodable code with code word lengths that satisfy the inequality. But it does not show us how to obtain these code words, nor does it say that any code that satisfies the inequality is automatically uniquely decodable (Prob. I 0.27).

10.8

ENTROPY CODING

The design of a variable-length code such that its average code word length approaches the entropy of the OMS is often referred to as entropy coding. In this section we present two examples of entropy coding. A.

Shannon-Fano Coding:

An efficient code can be obtained by the following simple procedure, known as Shannon-Fano algorithm: 1. List the source symbols in order of decreasing probability. 2. Partition the set into two sets that are as close to equiprobable as possible, and assign 0 to the upper set and 1 to the lower set.

256

INFORMATION THEORY AND SOURCE CODING

[CHAP. 10

3. Continue this process, each time partitioning the sets with as nearly equal probabilities as possible until further partitioning is not possible. An example of Shannon-Pano encoding is shown in Table 10-2. Note that in Shannon-Pano encoding the ambiguity may arise in the choice of approximately equiprobable sets. (See Prob. 10.33.)

Table 10-2

B.

Shannon-Fano Encoding

X;

P(x;)

Step 1

Step 2

Xi

0.30

0

0

00

X2

0.25

0

1

01

X3

0.20

1

0

10

X4

0.12

1

1

0

X5

0.08

1

1

1

0

1110

x6

0.05

1

1

1

1

1111

Step 3

H(X)

= 2.36 b/symbol

L

= 2.38 b/symbol

1/

= H(X)/ L = 0.99

Step 4

Code 5

110

Huffman Encoding:

In general, Huffman encoding results in an optimum code. Thus, it is the code that has the highest efficiency (Prob. 10.34). The Huffman encoding procedure is as follows: 1. List the source symbols in order of decreasing probability. 2. Combine the probabilities of the two symbols having the lowest probabilities, and reorder the resultant probabilities; this step is called reduction 1. The same procedure is repeated until there are two ordered probabilities remaining. 3. Start encoding with the last reduction, which consists of exactly two ordered probabilities. Assign 0 as the first digit in the code words for all the source symbols associated with the first probability; assign 1 to the second probability. 4. Now go back and assign 0 and 1 to the second digit for the two probabilities that were combined in the previous reduction step, retaining all assignments made in Step 3. 5. Keep regressing this way until the first column is reached. An example of Huffman encoding is shown in Table 10-3.

= 2.36 b/symbol = 2.38 b/symbol 1/ = 0.99

H(X) L

CHAP. 10]

257

INFORMATION THEORY AND SOURCE CODING

X;

P(x;)

0.30 0.25 0.20

Code 00

00

0.30

01

01

0.25

11

0.20

11

0.08

0.30

0.45

1

0.25

0.25

01

0.20

0.13 1000

0

045=jF"~

0.25

101 0.12 ~

1

00

0.30

11

012 101

0.05 1001

Table 10-3

Huffman Encoding

Solved Problems MEASURE OF INFORMATION 10.1.

Verify Eq. (10.5), that is, l(X;Xj)

= J(x;) + J(xj)

if x; and

xj

are independent

If x; and xj are independent, then by Eq. (6.21)

P(x;x)

= P(x;)P(x)

By Eq. (JO.]) 1 P(x;x)

J(x;x) =log--= l o g - - -

P(x;)P(x)

-1

-

_l_

1

_l_

og P(x;) + og P(x)

= /(x;) + J(x)

10.2.

A OMS X has four symbols = 0.1. (a) Calculate H(X).

x 1, x 2 , x 3, x 4

with probabilities P(x 1) = 0.4, P(x2) = 0.3,

P(x 3) =

0.2,

X4X3X3X2,

and

P(x4)

(b)

Find the amount of information contained in the messages compare with the H(X) obtained in part (a).

x 1x2x 1x 3

4

(a)

H(X) = - LP(x;)log 2 [P(x;)] i=l

= -0.4 log 2 0.4- 0.3 log 2 0.3/ - 0.2 log 2 0.2- 0. l log2 0.1 = 1.85 b/symbol

(b)

P(X1X2X1X3) = (0.4)(0.3)(0.4)(0.2) = 0.0096 l(x 1x 2 x 1x 3 ) = -log 2 0.0096 = 6.70 b/symbol

and

258

INFORMATION THEORY AND SOURCE CODING

Thus,

l(x 1x2x 1x 3)

[CHAP. 10

< 7.4 [= 4H(X)] b/symbol

P(X4X3X3X2) = (0.1)(0.2)2(0.3) = 0.0012 J(x 4x 3x 3x 2) = -log2 0.0012 = 9.70b/symbol

Thus,

10.3.

l(x 4x 3x 3x 2)

> 7.4[= 4H(X)] b/symbol

Consider a binary memoryless source X with two symbols x 1 and x 2 . Show that H(X) is maximum when both x 1 and x 2 are equiprobable. Let P(x 1) = a. P(x2 ) = I - a. (10.55)

Using the relation

we obtain dH(X) I -a - - = -log 2 a+ logi(I - a)= log 2- ~ (t.

The maximum value of H(X) requires that

that is, 1 -(t_

1

--=1--+a=a 2 Note that H(X) = 0 when a= 0 or 1. When P(x 1) = P(x2) = H(X) =

10.4.

1

½, H(X)

1

is maximum and is given by

Iog 2 2 + Iog 22 = 1 b/symbol 2

2

(10.56)

Verify Eq. (10.9), that is, 0 ,s; H(X)

,s;

log 2 m

where m is the size of the alphabet of X. Proof of the lower bound: Since 0,;; P(x;),;; 1,

I P(x;)

--?

I

and

Then it follows that

I

p(x) log 2--;;,, 0 ' P(x;) m

Thus,

H(X) =

1 P(x;) log 2 P(x;) ;;,, 0

6

Next, we note that 1 P(x;) log2 P(x;) = 0

(10.57)

CHAP. 10]

if and only if P(x1)

INFORMATION THEORY AND SOURCE CODING

= 0 or

259

I. Since m

LP(x;)

=I

i=l

when P(x;) = I, then P(xi) = 0 for j=f i. Thus, only in this case, H(X) = 0. Proof of the upper bound: Consider two probability distributions {P(x1) alphabet {x;}, i = I, 2, ... , m, such that m

LP = I 1

= P;} and {Q(x;) = Q;} on the

m

L Q; = I

and

r-1

(10.58)

r-1

Using Eq. (10.6 ), we have I Q; LP; log2-Q;P; =-IP; In ln2 P; m

m

i=I

i=I

Next, using the inequality (10.59)

and noting that the equality holds only if a = I, we get

m

m

i=l

r-1

(10.60)

=IQ;-IP,= 0 by using Eq. (10.58). Thus,

L P, Iog2----'Q·P;- ,;;; O m

(10.61)

i=l

where the equality holds only if Q1 = P; for all i. Setting Q1 = _!_ m

I

m

LP; log2 p

we obtain

=-

;m

i=l

i= 1,2, ... ,m

(10.62)

m

m

i=l

i=l

LP; log2 P; - LP log m 1

2

m

= H(X) - Iog2 m LP;

(10.63)

i=l

= H(X) - log 2m ..;; 0 Hence, and the equality holds only if the symbols in X are equiprobable, as in Eq. (10.62).

10.5. A high-resolution black-and-white TV picture consists of about 2 x 10 6 picture elements and 16 different brightness levels. Pictures are repeated at the rate of 32 per second. All picture elements are assumed to be independent, and all levels have equal likelihood of occurrence. Calculate the average rate of information conveyed by this TV picture source. H(X)

=-

16

I

I

i=l

I

16

L -6 Iog2-

r = 2(10 6 )(32)

= 4 b/element

= 64(10 6 ) elements/s

Hence, by Eq. (JO.JO) R

= rH(X) = 64(10 6 )(4) = 256(10 6 ) b/s = 256 Mb/s

260

10.6.

INFORMATION THEORY AND SOURCE CODING

Consider a telegraph source having two symbols, dot and dash. The dot duration is 0.2 s. The dash duration is 3 times the dot duration. The probability of the dot's occurring is twice that of the dash, and the time between symbols is 0.2 s. Calculate the information rate of the telegraph source. P(dot)

= 2P(dash)

P(dot) + P(dash) = 3P(dash) = 1 P(dash)

Thus,

=

1

3

and

P(dot)

=

2

3

By Eq. (10.7)

= -P(dot) log 2 P(dot)- P(dash) log 2 P(dash) = 0.667(0.585) + 0.333(1.585) = 0.92 b/symbol fctot = 0.2 S lctash = 0.6 S fspace = 0.2 S

H(X)

Thus, the average time per symbol is

Ts

= P(dot)lctot + P(dash)lctash + lspace = 0.5333 s/symbol

and the average symbol rate is

1 r=T

= 1.875 symbols/s

s

Thus, the average information rate of the telegraph source is

R = rH(X)

= 1.875(0.92) =

1.725 b/s

DISCRETE MEMORYLESS CHANNELS 10.7.

[CHAP. 10

Consider a binary channel shown in Fig. 10-7 (See Prob. 6.14) 0.9

Xi

Yi

0.8

Fig. 10-7

Find the channel matrix of the channel. (b) Find P(y 1) and P(y 2 ) when P(x 1) = P(x2 ) = 0.5. (c) Find the joint probabilities P(x 1,y2 ) and P(x2 ,y 1) when P(x1) (a)

(a)

Using Eq. (10.11), we see the channel matrix is given by

(b)

Using Eqs. (10.13), (10.14), and (10.15), we obtain

= P(x 2 ) = 0.5.

CHAP. 10]

INFORMATION THEORY AND SOURCE CODING

[P(Y)]

(c)

261

= [P(X)J[P(YjX)] 0.1] 0.8

= [0.5

0.5]

0.9 [ 0.2

= [0.55

0.45]

= [P(y1)P(y2)]

Hence, P(y 1) = 0.55 and P(y 2) = 0.45. Using Eqs. (10.16) and (10.17), we obtain [P(X, Y)]

= [P(X)ld[P(YIX)] = [ 0.5

0 ] [ 0.9 0.5 0.2

0

0.1 ] 0.8

0.05] =[P(x 1,Y1) 0.4 P(x2,Y1)

=[0.45 0.1

P(x1,Y2)] P(x2,Y2)

Hence, P(x 1,y2) = 0.05 and P(x2,y,) = 0.1.

10.8.

Two binary channels of Prob. 10.7 are connected in cascade, as shown in Fig. 10-8.

x , ~o.9o . ,Y o.,

~ o.9 ,

z,

0.2___,_ _ _ _ _.::,,.,.:::::....._ _ 0.2 X2.,,::::__ _ _ _ _ _ _ _ _ ____;""'z2 0.8

Y2

0.8

Fig. 10-8

(b)

Find the overall channel matrix of the resultant channel, and draw the resultant equivalent channel diagram. Find P(z 1) and P(z2 ) when P(x 1) = P(x2 ) = 0.5.

(a)

By Eq. (10.15)

(a)

[P(Y)] [P(Z)]

= = =

[P(X)l[P(YJX)] [P(Y)l[P(ZI Y)] [P(X)l[P(YJX)][P(ZI Y)]

= [P(X)l[P(ZIX)] Thus, from Fig. I 0-8 [P(ZIX)]

=

[P(YJX)][P(ZIY)]

=[0.9 0.1][0.9 0.1]=[0.83 0.2 0.8 0.2 0.8 0.34

0.17] 0.66

The resultant equivalent channel diagram is shown in Fig. 10-9. (b)

[P(Z)]

= [P(X)l[P(ZIX)] = [0.5

0.5]

Hence, P(z 1) = 0.585 and P(z2) = 0.415.

0.83 [ 0.34

0 17 · ] 0.66

= [0.585 0.415]

262

INFORMATION THEORY AND SOURCE CODING

[CHAP. 10

0.83

::~:: 0.66

Fig. 10-9

10.9.

A channel has the following channel matrix:

[P(YIX)]=[l~p

~ l~p]

(10.64)

(a)

Draw the channel diagram.

(b)

If the source has equally likely outputs, compute the probabilities associated with the

channel outputs for p (a)

= 0.2.

The channel diagram is shown in Fig. 10-10. Note that the channel represented by Eq. (10.64) (see Fig. 10-10) is known as the binary erasure channel. The binary erasure channel has two inputs x 1 = 0 and x 2 = 1 and three outputs y 1 = 0, y 2 = e, and y 3 = 1, where e indicates an erasure; that is, the output is in doubt, and it should be erased.

1-p

Fig. 10-10

(b)

Binary erasure channel

By Eq. (JO.I 5) [P(Y)]

= [0.5

0.5]

0.8 [ 0

0.2 0.2

0 ] 0.8

= [ 0.4 0.2 0.4]

MUTUAL INFORMATION 10.10. For a lossless channel show that H(XIY)

=0

(10.65)

When we observe the output YJ in a lossless channel (Fig. 10-2), it is clear which x; was transmitted, that is, P(x;ly1) = 0 or 1

Now by Eq. (10.23)

(10.66)

INFORMATION THEORY AND SOURCE CODING

CHAP. IO]

n

H(XI Y)

263

m

= - L L P(x;, yj) log2 P(x;lyj) j=I i=l

(10.67)

n

m

j=I

i=l

= - L P(yj) L P(x;IYi) log2 P(x;IYj) Note that all the terms in the inner summation are zero because they are in the form of 1 x log 2 I or 0 x log 2 0. Hence, we conclude that for a lossless channel H(XIY)

=0

10.11. Consider a noiseless channel with m input symbols and m output symbols (Fig. 10-4). Show that H(X)

and

= H(Y)

(10.68)

=0

(10.69)

H(YIX)

For a noiseless channel the transition probabilities are P(yJix;)

Hence,

1 i=j i =f j

={0

P(x;) P(x;,Y) = P(yJix;)P(x;) = { O

(10.70) i=j i =f j

(10.71)

m

and

L P(x;,Y) = P(x1)

P(yj) =

i=l

Thus, by Eqs. (10.7) and (10.72) m

H(Y)

=-

LP(y1Jlog2 P(y) j=I m

= - L P(x;) log2 P(x;) = H(X) i=l

Next, by Eqs. (10.24), (10.70), and (10.71) m

H(YIX)

=-

m

LL P(x;,y)log2 P(Y1lx;) j=l i=l

=-

m

m

i=l

J=I

L P(x;) L log

2

P(y11x;)

m

= - LP(x;)log 2 l = 0 i=l

10.12. Verify Eq. (10.26), that is, H(X, Y)

= H(XI Y) + H( Y)

From Eqs. (6.16) and (6.24)

m

and

LP(x;,Y) i=l

= P(y)

(10.72)

264

INFORMATION THEORY AND SOURCE CODING

[CHAP. IO

So by Eq. (10.25) and using Eqs. (10.22) and (10.23), we have n

H(X, Y)

=-

m

L LP(x;,y;)logP(x;,Yj) J=I i=l

n

=

m

-I LP(x;,y;)log[P(x;ly;)P(y)] J=l i=l

n

=

m

-I LP(x;,y;)logP(x;IY;) j=l i=l

n

=

H(XI Y)-

L P(y1) log P(yi) J=I

= H(XI Y) + H(Y) 10.13. Show that the mutual information l(X; Y) of the channel described by Eq. (10.11 ), with the input probabilities P(x;), i = 1, 2, ... , m, and the output probabilities P(yj), j = 1, 2, ... , n, can be expressed as (10.73)

By Eq. (10.28)

= H(X) -

I(X; Y)

H(XI Y)

Using Eqs. (10.21) and (10.23), we obtain

10.14. Verify Eq. (10.29), that is J(X; Y)

= I( Y; X)

Using Eq. (10.73), we can express /(Y; X) as l(Y;X)

=

m

n P(y)x;) _6P(y1,x;)log2 P(y;)

6

Now, by Eq. (6.19)

and

P(y;lx;) _ P(x;IYi) P(y;) - P(x;)

(10.74)

CHAP. IO]

INFORMATION THEORY AND SOURCE CODING

265

Thus, comparing Eqs. (10.74) with (10.73), we conclude that l(X; Y)

= /( Y; X)

10.15. Verify Eq. (10.30), that is, J(X; Y);;;. 0

From Eq. (10.73) and using the relation log(a/b) m

-l(X; Y)

= -log(b/a),

we have P(x·)

n

= L LP(x;,y;)log2 P(x•I' ~l~I

(10.75)

·)

I~

Using Bayes' rule [Eq. (6.19)], we have P(x;) P(x;ly;)

=

P(x;)P(y;) P(x;,Y;)

Then by using Eq. (10.6), Eq. (J0.75) can be rewritten as

I

-l(X; Y)

m

= f2L n

n P(x;)P(y) LP(x;,y;)ln P( . { x,,y;

(10.76)

i=l j=l

Using the inequality (10.59), that is, ln1X.;;;

C(-

I

we have

J 2

or

- l(X; Y).;;; ln

m

[m ~ P(x;)P(y;)-

6

n

n

m

n

i=l

j=l

m

n

~ P(x;,)'i)

6

]

(10.77)

LL P(x;)P(y;) = LP(x;) L P(yj) = (1)(1) = I

Since

i=l j=l

mn m[n ] mP(x;) = ~ P(x;,Y;) = ~ P(x;,)'i) =

6

6

6

I

Equation (10.77) reduces to -J(X; Y),;;; 0 J(X; Y);;,, 0

or

10.16. Consider a BSC (Fig. 10-5) with P(x 1) (a)

= a.

Show that the mutual information J(X; Y) is given by (10.78)

(b)

(c)

Calculate l(X; Y) for a = 0.5 and p = 0.1. Repeat (b) for a= 0.5 and p = 0.5, and comment on the result.

266

INFORMATION THEORY AND SOURCE CODING

(CHAP. 10

,,>:O

I -jw+a

te-0 'u(t)

a>O

I (jw +a)2

e-alil

2a w2 +a2

a>O

e-t2 /(2a 2 )

Pa(/)= { b

2