[1st Edition] 9781483264264

Advances in Communication Systems: Theory and Applications, Volume 3 focuses on feedback systems, data compression, sate

471 11 17MB

English Pages 224 [213] Year 1968

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

[1st Edition]
 9781483264264

Table of contents :
Content:
Contributors to this VolumePage ii
Front MatterPage iii
Copyright pagePage iv
List of ContributorsPage v
PrefacePage viiA.V. BALAKRISHNAN
Contents of Previous VolumesPage xi
Sequential Signal Design for Channels With FeedbackPages 1-27MICHAEL HORSTEIN
Adaptive Data Compression for Video SignalsPages 29-66R.L. KUTZ, J.A. SCIULLI, R.A. STAMPFL
Some Aspects of Communications Satellite SystemsPages 67-90S. METZGER
Advances in Threshold DecodingPages 91-115JAMES L. MASSEY
Coding and Synchronization–The Signal Design ProblemPages 117-148J.J. STIFFLER
Progress in Sequential DecodingPages 149-204J.E. SAVAGE
Author IndexPages 205-207
Subject IndexPages 208-209

Citation preview

CONTRIBUTORS TO THIS VOLUME MICHAEL HORSTEIN R. L. KUTZ JAMES L. MASSEY S. METZGER J. E. SAVAGE J. A. SCIULLI R. A. STAMPFL J. J. STIFFLER

Advances in

COMMUNICATION SYSTEMS THEORY AND APPLICATIONS E D I T E D BY

A. V. Balakrishnan DEPARTMENT OF ENGINEERING UNIVERSITY OF CALIFORNIA LOS ANGELES, CALIFORNIA

VOLUME 3

1968 ACADEMIC PRESS

New York and London

COPYRIGHT © 1968, BY ACADEMIC PRESS I N C . ALL RIGHTS RESERVED. NO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM, BY PHOTOSTAT, MICROFILM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.

A C A D E M I C PRESS I N C .

I l l Fifth Avenue, New York, New York 10003

United Kingdom Edition published by A C A D E M I C PRESS I N C . ( L O N D O N ) L T D . Berkeley Square House, London, W.l

LIBRARY OF CONGRESS CATALOG CARD NUMBER: 64-8026

PRINTED IN THE UNITED STATES OF AMERICA

List of Contributors Numbers in parentheses indicate the pages on which the authors' contributions begin.

Space Systems Division, Hughes Aircraft Company, El Segundo, California (1)

MICHAEL HORSTEIN,

R. L. Κυτζ, NASA/Goddard Space Flight Center, Greenbelt, Maryland (29) L. MASSEY, Department of Electrical Engineering, University of Notre Dame, Notre Dame, Indiana (91)

JAMES

S.

METZGER,

Communications Satellite Corporation, Washington, D.C. (67)

J. E. SAVAGE,1 Bell Telephone Laboratories, Incorporated, Holmdel, New Jersey (149) J. A. SCIULLI, NASA/Goddard Space Flight Center, Greenbelt, Maryland (29) R. A. STAMPFL, NASA/Goddard Space Flight Center, Greenbelt, Maryland (29) J. J.

STIFFLER,2

Jet Propulsion Laboratories, Pasadena, California (117)

1

Present address: Division of Engineering, Brown University, Providence, Rhode Island. 2 Present address: Raytheon Company, Sudbury, Massachusetts. ν

Preface This volume differs from the two previous ones in being almost wholly theoretically oriented, although still tied closely to practical systems. The contributions again document significant advances in areas of continuing interest such as feedback systems, data compression, satellite communications, decoding techniques, and synchronization. The authors are acknowledged experts in the respective areas, and enough review material is included so that each contribution is nearly self-contained. A. V. BALAKRISHNAN

March, 1968

vii

Contents of Previous Volumes Volume 1 Signal Selection Theory for Space Communication Channels A. V. Balakrishnan Theories of Pattern Recognition David Braverman The Digilock Orthogonal Modulation System R. W. Sanders Telemetry and Command Techniques for Planetary Spacecraft / . C. Springett Communication from Weather Satellites Rudolf A. Stampft Information Theory of Quantum-Mechanical Channels H. Takahasi AUTHOR INDEX—SUBJECT INDEX

Volume 2 A Study of Multiple Scattering of Optical Radiation with Applications to Laser Communication R. A. Dell-Imagine Stochastic Approximation : A Recursive Method for Solving Regression Problems David J. Sakrison Optical Techniques in Communication Systems L.J. Cutrona Synchronous Satellite Communication Systems D. D. Williams Theory of Adaptive Data Compression Lee D. Davisson Manned Spaceflight Communications Systems Howard C. Kyle Orbiting Geophysical Observatory Communication System Paul F. Glaser AUTHOR INDEX—SUBJECT INDEX

xi

Sequential Signal Design for Channels W i t h Feedback MICHAEL HORSTEIN Space Systems Division, Hughes Aircraft Company El Segundo, California I. Introduction II. Feedback Systems with an Average Power Constraint A. M Signals in One-Space B. M Orthogonal Signals I I I . A Time-Continuous Binary System with Peak and Average Power Constraints A. Introduction B. Sequential Detection C. Nonsequential Detection Appendix References

1 3 6 9 11 11 13 20 25 26

I. Introduction Theoretical results regarding the use of feedback in communication systems are rather scanty, compared with the great effort, particularly in the area of coding, which has gone into the development of highly sophisticated one-way communication systems. Yet most communication systems between a pair of terminals provide for transmission of information in either direction. In fact, efficient use of any time-varying channel implies the existence of a feedback link so that the transmission parameters (e.g., transmission rate) can be adjusted in accordance with current values of the forward-channel statistics. Moreover, theoretical results thus far obtained indicate that a significant reduction in system complexity is sometimes possible if the feedback channel is properly utilized. Thus, there is ample incentive for investigating communication systems in which feedback plays a fundamental role. It is important to be aware at the outset of those aspects of system performance which cannot be improved by the introduction of feedback. The most important result along these lines is provided by Shannon (7), who showed that the capacity of a zero-memory channel is not increased by the availability of a feedback channel, even if the latter is noise-free. However, for a fixed probability of error, the complexity of a I

2

MICHAEL HORSTEIN

one-way system tends to increase rapidly as the transmission rate approaches channel capacity. In a practical sense, therefore, the reduction in complexity made possible by the use of feedback may permit transmission at a higher rate than is possible in a one-way system. Feedback can play various roles in communication systems. It may serve only to inform the transmitter whether or not additional information is needed for a reliable decision regarding the current message segment. The forward-channel transmission in this case may be either coded or uncoded. In error detection systems (2-7), for example, blocks of information digits are encoded into blocks of channel symbols. After a preliminary decision has been made about each of the received symbols, the entire block (or perhaps only a portion of it in more sophisticated systems) is rejected and a repeat requested if it is not identical to one of the possible transmitted blocks. Systems of this type require only a small amount of channel capacity for reliable encoding of the accept/reject information and require relatively simple decoding equipment at the receiver terminal. However, because of the uneven rate at which new message segments are accepted at the channel input, this type of system is generally unacceptable for communicating the output of a constant-data-rate source. Although insertion of a storage device between source and channel can provide long periods of transmission during which no information is lost, for any finite storage capacity, an overflow condition will eventually obtain. Feedback plays a more fundamental role, resulting in a much more flexible use of redundancy, when the signal set from which each new transmission is drawn is itself determined by the feedback information. The transmission schemes described here have the property that the distinguishability of the signals representing the various messages is invariant from one transmission to the next. As a consequence, they are equivalent to simple repetitive schemes, insofar as the information provided about the transmitted message is concerned. However, by using the feedback information to control the energy consumed on each transmission, some rather remarkable information-theoretic results are obtained. In Section II, for example, a class of systems is presented which achieves channel capacity through the selection of a signal set which, on the basis of the feedback information, minimizes the average energy on each transmission. The scheme was devised by Schalkwijk (8) and is a generalization of an earlier scheme by Schalkwijk and Kailath (9, 10). It is applicable under the conditions of an additive-white-Gaussian-

SEQUENTIAL SIGNAL DESIGN

3

noise forward channel and a noiseless feedback channel, when the forward-channel signals are subject to an average-power limitation. Both wideband and bandlimited versions are presented in the references, but the description here is confined to the former. The scheme is illustrated by the cases of M signals in one-dimensional space and M orthogonal signals. Section III is devoted to the special case of a single signal pair subject to both peak and average power constraints, in the limiting situation of time-continuous transmission. Both sequential and nonsequential detection are considered. This problem was originally posed and partially solved by Turin (11, 12). The solution was completed by Horstein (13). Simplification of the solution in the sequential case was subsequently provided by Ferguson (14).

II. Feedback Systems with an Average Power Constraint The simplest conceivable means of communicating one of M messages with a specified reliability is to repeat the corresponding signal a sufficient number of times to guarantee that the average error probability meets the specifications. Unfortunately, as the required error probability is made arbitrarily small, the transmission rate simultaneously approaches zero. In the absence of feedback, reliable communication at a nonzero rate requires some form of coding. However, the attendant system complexity at the decoding end is often considerable, if not excessive, when the transmission rate is a substantial fraction of channel capacity and the required error probability is small. In contrast to this situation, proper utilization of a noiseless feedback channel permits channel capacity to be achieved with an arbitrarily small error probability by a system which is essentially repetitive in nature. The type of feedback system we wish to discuss can be used to communicate signals chosen from any finite-dimensional set over an infinitebandwidth forward channel contaminated by additive white Gaussian noise. Schalkwijk (8) terms it " center-of-gravity " feedback for reasons which will become obvious. It is repetitive in the sense that the distinguishability of the signals representing the M messages is the same on all transmissions. However, the signal sets from which the transmissions are chosen are determined from the feedback information so that the average transmitted energy is minimized.

4

MICHAEL HORSTEIN

With this system, the average cumulative transmitted energy is bounded, irrespective of the number of transmissions. It is therefore possible, by fixing the total transmission time in accordance with the allowed average power, to communicate at a nonzero rate with a probability of error as small as desired. From the specific cases thus far examined, it appears that, regardless of the value of M or the dimension of the signal space, zero error probability can be achieved at a rate equal to channel capacity. Of course, as the error probability approaches zero, the required number of transmissions, and hence the bandwidth, becomes infinite. A concomitant requirement exists for delayless channels in both directions. Finally, the assumption of Gaussian (and hence unbounded) noise leads to a requirement for infinite peak power. Consider a set of signals of the form x(t) = Σ *kä d[t ~(kk,d

1)Δ],

d = 1, 2, . . . , D,

k =

ly2y...,K

(1) in which the basic waveforms φά[ί — (k — 1)Δ] form an orthonormal set; i.e., /•oo

J-oo

< U ' - (*i - 1 ) Δ ] Μ * - (*2 - 1)Δ] dt = ôdld2 ôklk2

(2)

On the M i transmission, the coefficients xkdy d = 1, 2, . . . , Z), are determined from the message being communicated and the results of previous transmissions. The received waveform will be denoted by z(i) = x(t) + n(t), where n(t) is the channel noise. It can be shown (15) that, for the purpose of computing the a posteriori probability of message m f , / = 1, 2, . . . , M, it is sufficient to retain the "received vector" zK = (zu z 2 , . . . , zK), where zk has components

**- Α Γ * ( W - (* - 1)Δ] dt, J — oo

rf=l,2,...,Z>,

£ = 1,2,...,*:

Clearly, zK = xK + n*, where xK = (xj, x 2 , . . . , xK),

nK = (ni9 n 2 , . . . , nK)

(3)

SEQUENTIAL SIGNAL DESIGN

5

and the components of xk and nfc, k = 1, 2, . . . , K, are defined in a manner similar to (3). It can also be shown that the noise components satisfy the relation kidxnk2d2

n

= °kik2

°dxd2 ~Ίχ

( V

where iV0 is the one-sided spectral density of the noise. Finally, because of the orthonormal property of the φά[ί — (k — 1)Δ], the total transmitted energy is E K^

t

*=1

W2

(5)

Let us associate with each message miy i = 1, 2, . . . , M, a signal vector s£ in a D-dimensional signal space. The distinguishability of the signals, and hence the error behavior of the system, is determined by the set of distances {sf — s7·}. In general, the only way in which the signal set { s j can be modified from one transmission to the next without changing {sf — s,·} is by subtracting a common vector from { s j . This may be viewed alternatively as a shift in the origin of coordinates. Let the origin 0K on the (K + l)th transmission (K ^ 0) be located at the tip of the vector uK in the original coordinate system. In other words, if ntj is the intended message, on the (K + l)th transmission the transmitted vector xK + l = Sj — uK (Fig. 1). The average energy expended on the (K + l)th transmission is

FIG. 1. Choice of signal set : Three signals in two-space.

6

MICHAEL HORSTEIN

If we regard Pr[mf | zK] as the mass of a particle located at the tip of i> Εκ + ι becomes the moment of inertia of the system of particles about Οκ. A well-known law of physics states that the moment of inertia is a minimum when taken about the center of mass. With this choice for Οκ, (6) reduces to s

EK+l=

f

PrK|zK]|Si|2-|uK|2

(7)

Following the Kth transmission, the receiver computes the coordinates of Οκ according to M

uKd = Σ P r K I z K ] s i d ,

(8)

d = l,2,-..,D

i=l

and transmits them over the feedback channel, which is assumed noiseless. The round-trip transmission time is assumed sufficiently small that the transmitter is informed of uK prior to the (K + l)th transmission. Two examples will be used to illustrate the process, M signals in onespace and M orthonormal signals. A. M Signals in One-Space In this system, the M messages are represented by signal points in the interval (0, 1), with message mi corresponding to the point — 1/2M + ijMy i = 1, 2, . . . , M. The messages are considered equally likely a priori. Since we shall be interested in large values of M> it is convenient to regard the signal points as being continuously and uniformly distributed on the interval (0, 1). The continuous parameter m, 0 ^ m < 1, will be used to denote the message represented by the signal point s = m. Since Fr[m \zK] is proportional to Pr[z K | m] for any message m when the a priori distribution is uniform, we can write Pr[w | zK] ~ Pr[z* | m] = (πΛΓ0) " K ' 2 f j fc=l

ex

Pt - ( * * - * + «»-i) 2 /^o]

(9)

with u0 = i. Manipulation of (9) yields Pr[m | zK] - expj - — f) (zk + uk.xf

+ -^-^

Σ (** + M *-i)

[

•»"{-IhiLt^"'-'']}(I0)

SEQUENTIAL SIGNAL DESIGN

7

The fact that Pr[m | zK] assigns nonzero probabilities to values of m outside the message interval (0, 1) is of little consequence for large K and will be ignored here. It is clear from (10) that the center of gravity of the message distribution after K transmissions is located at 1 U

=

K K;

K

1 +£ZK

Σ (zk + Uk-i) =UK-I

(H)

Alternatively, uK can be written η

K

1

s

1

K

( + k) = s + ~ Σ nk

κ=-ρΣ

n

(12)

Equation (11) indicates the simple manner in which the center of gravity is updated with each new transmission. It follows from (12) that uK is a Gaussian random variable with mean s and variance N0/2K. The probability of error for the system is equal to the probability that uK falls outside the interval of width \\M centered on the message point. This is given by

Ρ Α2

·

-*[Β(^)Ί

0. On the other hand, M, and consequently K, must grow exponentially with the total transmission time T in order to maintain a nonzero transmission rate. Suppose that K = e2AT

(16)

for some A > 0. (The significance of A will become evident shortly.) The transmission rate is then given by KA^K-JV-O

(17)

8

MICHAEL HORSTEIN

The parameter A, and consequently the rate R, is limited in magnitude by the allowed average power, P a v . To see this, observe from the remarks following (12) that the average energy associated with the Kth transmission is £ K = < ( * - " K - i ) 2 > a v = iVo/2K

(18)

The total expected energy in K transmissions is

ΕΚ ί2+

= τ^ϊχΎ1ηΚ

(19)

for K > 1. In time T the expected energy is therefore ET 4 (N0/2) In e2AT = ATN0

(20)

Since the maximum allowed energy is P a v T, the maximum rate at which zero error probability can be achieved is Rcrit à A = PaJN0

nats per second

(21)

This is just the capacity C of an infinite-bandwidth white-Gaussian-noise channel. That infinite bandwidth is actually required follows from the observation that K independent samples transmitted in time T require a bandwidth of K/2T = e2AT/2T, which grows without limit as T -+ oo. According to (21), A = C for large K. Therefore, from (17), ε = (C - R)IC and Κε = e2(c-R)T. Finally, by combining (13) and (14), the following expression can be obtained for Pe: e

{N )1/2 ° Jnexp[(C-R)T]

; e x p i - T-^r exp[2(C - R)T]\ { 47V0

(22)

The double exponential behavior with T exhibited by Pe contrasts impressively with the single exponential behavior associated with the best block codes. In this example, the total energy increases in proportion to In K, rather than being bounded, as was implied earlier. However, the number of messages also grows with K as specified by (15). Thus, combining (15) and (19), we find that EK = iV0(l - ε) In M

(23)

so that the total energy per nat (or per bit) of information is, in fact, bounded.

SEQUENTIAL SIGNAL DESIGN

9

A modified scheme (8, 9) is available for use with bandlimited channels. In this case, transmission is at the Nyquist rate l/2Wfor bandwidth W. In contrast to the wideband case, the average power per transmission is held constant. Since the number of transmissions varies linearly with the total transmission time, the number of messages must be increased exponentially with the number of transmissions to maintain a constant transmission rate. The latter can be maintained at channel capacity, with an error probability exhibiting a double exponential decrease with time, just as in the wideband case. The above analysis has ignored two factors inevitably present in any communication system—namely, delay in acquiring the feedback information and noise in the feedback channel. The former merely reduces the effective number of transmissions by the size of the delay and is therefore relatively insignificant when the number of transmissions is large. The latter factor is of greater significance, as the following results show: 1. When the error probability is fixed, the transmission rate is always less than a certain maximum value which decreases as the noise level in the feedback channel increases. 2. When the transmission rate is fixed, the probability of error is always greater than a certain minimum value which increases as the noise level in the feedback channel increases.

B. M Orthogonal Signals The ith. signal vector has the form sf = (0 · · · 0 1 0 · · · 0), where the one is in the ith position. Substituting into (8), we have uKd = Pr[md\zK]y

rf

= l,2,...,Z>

(24)

Σ > « = Σ {Pr[md|z*]}2.

(25)

Also, \"K\2=

d=l

d=l

If the signals are assumed equiprobable a priori, we can assume without loss of generality that mi is the message actually transmitted. Thus P

*[*K I «i] = M^m niy/

K o)

Π Π

k=id=i

ex

P(-*2JNo)

(26)

10

MICHAEL HORSTEIN

Pr[z*K]=

{πι\0)

Π exp[-(n 4 1 + l) 2 /iV 0 ] k=l

i = 2, 3, . . . , D (27) With the Substitution (28) (25) becomes

(29) The expected energy on the (K + l)th transmission is, from (7), (30) A decision made after K transmissions will be in error if Pr[z* | m j > Pr[z* | m{\ for at least one value of / g: 2. This condition obtains if ξΚί > 0 for at least one value of / ^ 2. It is readily verified that ξΚί, i = 2, 3, . . . , D, are Gaussian random variables with mean and variance of KjN0, and covariance equal to Kj2N0. For the special case where K = 2, we define ξ = ξΚ2· Then EK + 1 becomes

E

-l

1+e

" 24i1 2

(l+e-

)

2

(e« + «"«) 2

on

and P e is given by

p,=

1

{2πΚΙΝογΐ2

exp

(w-K/N0)2 2K/N0

dw = trîc(K/N0)i/2

(32)

It should be noted that this case is equivalent to the case of two signals in one-space, provided the distance between the tips of the signal vectors

SEQUENTIAL SIGNAL DESIGN

II

is also yJ2 in the latter instance. It is easily verified, by taking the onedimensional approach, that EK + lis also given by 2 Pr[ml | zK] Pr[m2 \ zK]. Schalkwijk (8) has given an informal derivation of the limiting binary case in which the spacing between transmissions and the average energy contained in each transmission become infinitesimal in such a way that the energy per unit time remains finite and nonzero. In other words, the situation considered is that of continuous transmission with zero delay in both the forward and feedback channels. He finds that the average total energy required is bounded, independent of the number of transmissions, by N0 In 2. Thus, if the average power used is P a v , the transmission rate for zero error probability is Pav/N0 In 2 bits per second, which is equal to channel capacity. In the next section, the continuous-transmission binary case is reexamined when, in addition to an average power constraint, a peak power limitation is imposed. It will be seen that inclusion of the latter constraint forces the transmission rate to zero (for the scheme examined) as Pe is made to approach zero. However, a rate equal to a substantial fraction of channel capacity can nevertheless be maintained at quite small values of Pe. Implicit in the approach taken thus far is the assumption that a decision is made after a fixed number of transmissions; i.e., the decision process is nonsequential. In the absence of a peak power constraint, there is little reason to do otherwise since, as indicated above, channel capacity (at zero error rate) is achieved by such a process. With the addition of a peak power constraint, however, the tradeoff between transmission rate and error probability becomes important. We shall see that, for the same error probability, a higher transmission rate is attained if a sequential decision process is employed—i.e., if transmission continues until a fixed reliability threshold is reached.

III. A Time-Continuous Binary System with Peak and Average Power Constraints A. Introduction In this system, the forward channel is delayless, with infinite bandwidth, and is disturbed by additive white Gaussian noise. The log likelihood ratio of the two possible signals is continuously fed back to the transmitter via a noiseless and delayless feedback channel. Peak and

12

MICHAEL HORSTEIN

average power constraints, designated by P p e a k and P a v , are placed on the forward-channel signals, their ratio being denoted by a = P p e a k /P a v . These signals are said to be optimally designed when the feedback information is so utilized that the average (for sequential detection) or fixed (for nonsequential detection) transmission time is minimized, subject to a specified probability of error. The system with which we are concerned is shown in Fig. 2. One of two signal waveforms s± , representing messages m+ , is transmitted through the forward channel.f Let t = 0 be the time at which transmission begins. Then, for all t > 0, the receiver computes and transmits through the feedback channel the quantity Λ )

(33)

Pr [m_|*J

where zt is the waveform (i.e., signal plus noise) received in the interval (0, t). It will be assumed that m+ and m_ are equally likely a priori, so that y(0) = 0.

BINARY J MODULATOR mr;iTÇ"*1

»i [y (y) = P r [ m T \y], which is equivalent to (40).

15

SEQUENTIAL SIGNAL DESIGN

of U+(y). As a starting point, we set U±(y) = U(+\y) and define γ0 as the corresponding value of y. There are two cases to be considered. Suppose initially that y0 > a. Then, if P p e a k and P a v represent the peak power and average power actually used, *peak/*av

>

*peak/*av

(^) =

Suppose, in addition, that σ has been adjusted so that Ppeak ^Peak · It must then be true that P a v (45) simplifies to = ^ P = y tanh =- - 2 In cosh iV0 2 Z

(46)

It is also shown in the Appendix that the average time required to reach a decision is given by N„Y Y ^ T t a n h y

(47)

The value of T corresponding to the use of the optimal signals is Tmin ± SmJP:Y

= SmJPay

(48)

SEQUENTIAL SIGNAL DESIGN

17

Consequently, the maximum σ2 consistent with the use of t/(+pt)(y) is N„YP Y ^ ^ F ^S.t a n h ^ min

(49)

For ymax < Y, the peak power is fully utilized; consequently, σ2ρί can also be written as PpcJU£x = ocPJU£x. For y max = Y, σ2ρί = P peak / 2 Umax = y0Payl(l — Pe) . (The parameter γ0 is identical to α' defined in (51).) For a given value of U'max, U^°(y) is found from (42) and (43), 5 m i n is obtained from (45) or (46), and Tmin and σΙρί are computed from (48) and (49). To complete the solution, U'max must be related to a. If U'm2iX < 1 - Pe, division of (49) by P peak and manipulation of the resulting equation yields a

iv 0 t/: 2 a x ytanh(y/2) ^min(^max)

in which the (known) functional dependence of 5 m i n on U^ax has been emphasized. Equation (50) provides the desired relation when

a

JV0(l-Pe)2ytanh(y/2)

= SFi)

=a

(51)

For a > a', U'max = 1 — P e by definition. Curves of U'max versus a are drawn in Fig. 4a for several values of Pe. The normalized signal amplitude tfopt/(Pav)1/2 is plotted versus a, with Pe as a parameter, in Fig. 4b. The corresponding values of 2SmiJN0 are shown in Fig. 5. All three sets of curves are drawn only for α ^ a'. For larger values of a, the ordinates remain equal to their values at α'. It is of interest to examine the average transmission rate as Pe -► 0. We consider first the special case in which there is only an average power constraint; i.e., P peak = oo. Since ymax — Y, we find from (46) that lim 5 m i n = lim S min = N0 In 2

Pe-0

Y»oo

(52)

The limiting value of the transmission rate, which is given by T~fn, is therefore Pav/N0 In 2 bits/sec. As we have observed before, this is the capacity of an infinite-bandwidth channel corrupted by additive white Gaussian noise. Thus, in the absence of a peak power constraint, the

18

MICHAEL HORSTEIN

system can achieve an arbitrarily small error probability while operating at channel capacity. The required peak power is ^peak

=

α

°av

~

PavlnPe r~Ö

which grows without limit at Pe -> 0.

FIG. 4. Optimal signals for sequential detection.

(")

19

SEQUENTIAL SIGNAL DESIGN

The limiting system behavior is quite different, however, if Pe is made to approach zero before allowing a to become arbitrarily large. In fact, for any finite P p e a k , an arbitrarily small Pe can be achieved only at the expense of a vanishingly small information rate (see Fig. 5). To see

vv ^

l|

Λ /y

///

10 9

le

7 6 5 4 3

yy.

VA 10

7

20

40

70

100

FIG. 5. Minimum energy per bit for sequential detection.

this formally, note from (45) that, for U+(y) = U^(y)9 approaches the limit

lim %= = * + „

J».-O N0

Y

ymax

SmJN0

(54)

s2

(1 + e x p j m a x ) 2

where K is independent of Y. For a fixed value of ymax, the required energy, and hence the average transmission time, grows without limit as Pe -» 0. Furthermore, substitution of (54) into (50) reveals that lim α = f/;2ax(l +ex P < y m a x ) :

Pe-+0

_( UL. Y

V-u'mJ

exp 2ymax

(55)

Since, according to (55), there exists a value of ymax ^ 0 corresponding to each value of P p e a k ^ P a v , we have the desired result.

20

MICHAEL HORSTEIN

The asymptotic form of the optimal signals can be obtained as explicit functions of a. From (55) we see that

iimc/;ax=^^

(56)

Also, by combining (49), (54), and (55), we have 1 ^ ^ 7 I

= ^

-

(57)

Despite the fact that the transmission rate tends to zero as Pe -> 0, the system performs very respectably for nonzero, but nevertheless quite small, values of Pe. It can be seen from Fig. 5, for example, that, if a = 8, an error probability of 10 " 6 requires only 2 dB more average power than is needed by a system without a peak power constraint operating at the same rate. Alternatively, if the two systems are allotted the same average power, the system with a = 8 is capable of achieving an error probability of 10 ~ 6 while operating at a rate equal to 0.63 times the capacity of either system. (The capacity of a white-Gaussian-noise, infinite-bandwidth system with an average power constraint is not reduced by the application of a peak power limitation (16).) A sequential system with a = 8 and Pe = 10 ~ 6 requires 10.1 dB less average power than the usual binary transmission system, which operates without feedback and with a fixed transmission time (see Fig. 7 for a = 1 and Pe = 10 " 6 ) . Of this total improvement of 10.1 dB, 5.5 dB could be realized from the use of uncertainty feedback together with nonsequential detection, assuming again that a = 8. On the other hand, a sequential detection system utilizing feedback only for synchronization (a = 1 in Fig. 5) would retain an improvement of 5.1 dB.

C. Nonsequential Detection The derivation of the optimal signals when nonsequential detection is employed is similar to that for sequential detection, with one additional complication. For sequential detection, we were able to eliminate the time dependence as irrelevant. As mentioned earlier, this is not possible for nonsequential detection, so we impose the additional constraint that the signals be factorable in the form s±(y,t) =

±a(t)U±(y)

(58)

SEQUENTIAL SIGNAL DESIGN

21

It can be shown (12) that the average energy required to reach a decision is given by

Γ {U+>(y)Q+(y) + [1 - E/+(:y)]2Ô-(:y)} «*V (59)

5(£/ + , σ) = ψZ

J-oo

where

ο ω

* -9Γ β 4-Κ7 + * ! )] Α 1

ί-τ

^ojo

- 2 (τ)

(60)

rft

(61)

In addition, R is related to Pe by P e = erfc(i?/2) 1/2 » *-* / 4 /(πΑ) 1 / 2

(62)

Note that *§ and P e depend on σ only through R. Moreover, for a given U+ function and a given value of R, the peak power is minimized by setting σ = constant. Thus, as in the case of sequential detection, we need only consider the restricted class of signals having the form s±(y) = ±aU±{y)

(63)

An argument similar to that used in the sequential case reveals that the optimal U±(y) has the form (1/(1 + e x p ±j m a x ), Ûtpt\y)

^

1/(1 + exp ±y),

y>y

max

\y\ ύ ymax

( l / ( l + exp ±ym3X),

(64)

y < -ym3X

where J W ^ - l n p - ^ ) \

U max

/

(65)

for an appropriately chosen value of Û'max. The corresponding value of *§, denoted by *§ min , cannot be expressed in closed form. From (61), the optimal σ2 is given by NpP^R #opt = —ç

(66)

Use has been made in (66) of the fact that the allowed average power is

22

MICHAEL HORSTEIN

always fully utilized. If 0"max < 1, (66) can be manipulated to α

_N0RÛ'm\x ~ e ff?' \ ^ m i n V ^ max/

Equation (67) relates Û'mâx to a for

For a > 5 ' 4

6

^ 3

5 ^

4

--!o9ιοΡ. = 2

3 ?I

J

2

4

7

10 a

20

40

70

100

FIG. 6. Optimal signals for nonsequential detection.

23

SEQUENTIAL SIGNAL DESIGN

Curves of Û'max and àopJ(Pay)1/2 vs. a are shown in Figs. 6a and 6b, respectively. The corresponding values of 2§miJN0 are plotted in Fig. 7. The curves in these three figures are drawn only for α ^ ά'. For larger values of a, the ordinates remain equal to their values at at'. 40 36 32 28

\

w

v\\

24 «Λ ^

/'°

\ \ \

20 16

\

12

^

k

4

M 1

1 /

// // // 3-log

7

10

0Pe =

20

2

40

70 100

FIG. 7. Minimum energy per bit for nonsequential detection.

The limiting transmission rate as Pe -+ 0 is obtained from (59) and (60). Since Q±(y) does not depend on a, we have

lim

Pe-0

Q±(y)=exp[(±y-\y\)l2]

(69)

Substitution of (64) and (69) into (59) yields lim 5mi„ =

■Ppeak = CO

oo,

^peak
#', it is Ό) = "5(j — Jo)· If (76) is multiplied by C± and integrated over ( — Y, Y), the result is ÔW+ dtn

_ σ2 ÔW+ + N0 dy0

N0

d2W+ dy02

(77)

Note that dW±(t\y0,t0) dt0

dt = -

dW±(t0+T\y0,t0) δτ

= W±(t0\y0,t0)

= C±(y0)

άτ (78)

the last equality following from (74) and the boundary conditions to (76).

26

MICHAEL HORSTEIN

If (77) is now integrated with respect to t, we can write _ σ2 dV+ €

^

σ2 d2V+ (79)

= +Ν-0ΊΪ;-Ν-0ΊΪ7

with boundary conditions V±(Y) = V±(—Y) = 0. The solution to (79) for y0 = 0 can be writtenf

' (0) == §- £ f ' CC±(y)Q (y)

V±±(0)

σ

J-Y

±

±

dy

(80)

where ^_g±,/2«J°h[(y-lyl)/zi 0 y ± W _ e cosh(F/2)

,8n (81)

Note that, if C + ( - j ) = C_(y), Γ + (0) = F_(0). By replacing C ± ( j ) first by unity and then by s±2(y), it is easily established that r

=

S = i [ F + ( 0 ) + V.(0)] =

T\[

-y

N„Y Y iI^Ltanhl G 2

i^ 2 (y)ö + (y)+[i - ^+(y)]2ß-(y)}rfy

(82)

(83)

REFERENCES

i. C. E. Shannon, The zero-error capacity of a noisy channel. IRE Trans. IT-2, 8-19 (1956). 2. H. C. A. Van Duuren, Error probability and transmission speed on circuits using error detection and automatic repetition of signals. IRE Trans. CS-9, 38-50 (1961). 3. B. Reiffen, W. G. Schmidt, and H. L. Yudkin, The design of an error-free data transmission system for telephone circuits. Trans. AIEE 80, Pt. 1, 224-231 (1961). 4. J. J. Metzner and K. C. Morgan, Coded feedback communication systems. Proc. Natl. Electron. Conf. 16, 250-257 (1960). 5. A. B. Fontaine, Queuing characteristics of a telephone data transmission system with feedback. Trans. AIEE 82, 449-455 (1963). 6. F. E. Froehlich and R. R. Anderson, Data transmission over a self-contained error detection and retransmission channel. Bell System Tech. J. 43, 375-398 (1964). f The more general solution for y0 Φ 0, as well as for asymmetric decision thresholds, can be found in Reference (14).

SEQUENTIAL SIGNAL DESIGN

27

7. M. Horstein, Efficient communication through burst-error channels by means of error detection. IEEE. Trans. COM-14, 117-126 (1966). 8. J. P. M. Schalkwijk, Center-of-gravity information feedback. Rept. No. 501. Sylvania Appl. Res. Lab., Waltham, Massachusetts, 1966. 9. J. P. M. Schalkwijk, Coding schemes for additive noise channels with feedback. Rept. No. 10. Stanford Electron. Lab., Stanford, California, 1965. 10. J. P. M. Schalkwijk and T. Kailath, A coding scheme for additive noise channels with feedback—Pt. 1 : No bandwidth constraint. IEEE Trans. IT-12, 172-182 (1966). 11. G. L. Turin, Signal design for sequential detection systems with feedback. IEEE Trans. IT-11, 401-408 (1965). 12. G. L. Turin, Comparison of sequential and nonsequential detection systems with uncertainty feedback. IEEE Trans. IT-12, 5-8 (1966). 13. M. Horstein, On the design of signals for sequential and nonsequential detection systems with feedback. IEEE Trans. IT-12, 448-455 (1966). 14. M. J. Ferguson, Sequential signal design for radar and communication. T R 6761-1. Stanford Electron. Lab., Stanford, California, 1966. 15. J. M. Wozencraft and I. M. Jacobs, " Principles of Communication Engineering," pp. 229-232. Wiley, New York, 1965. 16. C. E. Shannon and W. Weaver, " The Mathematical Theory of Communication," pp. 63, 73. Univ. of Illinois Press, Urbana, Illinois, 1949.

Adaptive Data Compression for Video Signals R. L. Κυτζ, J. A. SCIULLI, AND R. A. STAMPFL NASA/Goddard Space Flight Center Greenbelt, Maryland I. Introduction II. Theory and Application of an Adaptive Compression System A. Prediction of Successive Samples B. Prediction with an Operator C. Prediction with a Nonlinear Operation D. The Fidelity Criterion and a Figure of Merit E. Encoding " Compressed Data " for the Noiseless Channel F. Channel Noise Considerations III. Experimental Results A. Predictor Simulation Experiments B. Linear Prediction Operator Experiments C. Nonlinear Prediction Experiments D. Coding for the Noiseless Channel E. Channel Noise Effects F. Conclusions References

29 32 32 33 34 35 36 39 40 41 44 46 47 51 55 65

I. Introduction The spin-stabilized T I R O S weather satellites and their operational version, ESSA, produce, along with other data, large quantities of television pictures. The two television cameras on board can televise directly when within communication range of a ground station. More importantly, they can store 32 TV frames per camera on magnetic tape and transmit this information when an appropriate ground command is given. A 62.5-kc video bandwidth is employed for the transmission of 500 line frames, each with a frame duration of 2 sec. The gray scale range of the pictures varies between five and ten levels as a function of camera tube life and telemetry performance. Six levels can be considered an average value. Gray scale here is used as in photography, where the intensity of a given level is simply ^Jl times the next lower level. When pictures are received at ground stations, a composite mosaic consisting of individual overlapping frames is prepared from the 32picture sequence with the help of the satellite attitude data available 29

30

R. L KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

from other data channels. Meteorologists at the receiving sites transpose the meteorological information content onto weather maps by visual inspection. Cloudiness of the area viewed, types of clouds, and interpretive results such as fronts and storm centers are all noted on maps.

FIG. 1. Sample of a mosaic and the corresponding nephanalysis and teletype message.

The results are conventional nephanalyses, which are transmitted via facsimile links to forecasting centers or translated into teletype messages for distribution. A sample of a mosaic and the corresponding nephanalysis and teletype message are given in Fig. 1. A consideration of the communication system requirements to accommodate ever-increasing demands for large amounts of high

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

31

resolution data stresses the need for practical data compression techniques. In this chapter, we report on the application of the theory of adaptive prediction (1) to the compression of video data. In addition, we consider the problem of encoding "compressed'' data for transmission in a noiseless channel. The effect of channel noise on " compressed* ' data is also considered. The choice of these sophisticated adaptive prediction techniques was made because of their inherent optimal performance, even under such undesirable conditions as nonstationarity. This is in contrast to various algorithmic approaches for which no guarantee of optimality usually can be given. Delta modulation, for example, is a simple technique in which the amplitude difference between successive samples is transmitted. The sampling rate is chosen such that no more than one quantum level difference is expected. This technique produces objectionable results because resolution is lost on high-contrast transitions. The simplicity of delta modulation yields a most attractive implementation, but visual inspection of cloud pictures transmitted via this method confirms objectionable loss in picture resolution. Even though picture data can be transmitted by a four-bit code due to a limited number of gray levels, the total number of bits per frame is seldom reduced by more than 2:1 using delta modulation. A large number of possible data compression techniques have been discussed in the literature [(2-5) to name a few]. These include such algorithms as run length coding and wide varieties of " predictors/' interpolators, and averaging techniques, but these methods usually result in the elimination of some significant part of the total information. The appealing properties of the techniques chosen for investigation lie in their ability to adjust to new conditions in the data and in the existence of an error criterion enabling one to judge the quality of performance of the data compression system. As the following discussion will show, it is necessary to include a learning operation to adapt to new conditions in the data. A self-monitoring feature (i.e., an error threshold) is also required. Thus, the system strives to either preserve all data originally available or distort it at a predetermined level. The investigation considered both linear and nonlinear predictors, and determined the pertinent parameters for the predictors experimentally by simulation. The sample pictures shown substantiate the results. Each picture is a 480 scan line frame with 456 samples per line. Each sample is digitized to 6 bits. Thus, the quantization is considerably finer than the average gray scale definition required.

32

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

II. Theory and Application of an Adaptive Compression System In this section, we outline the theoretical background necessary for a general understanding of the adaptive compression systems investigated. In addition, we present arguments for the choice of various parameters and techniques used in the application of the theory to video data compression. We begin by presenting two methods of adaptive prediction and applying them for the removal of so-called " redundant samples." We next consider the problem of encoding "compressed" data for transmission in a noiseless channel and propose several possible encoding procedures. Finally, a consideration of the effects of channel noise permits a complete system performance evaluation. Throughout this section there is no attempt to give step by step developments of well-known theoretical results. We simply try to present concise statements of the fundamental theory and show how they may be applied to our specific problem.

A. Prediction of Successive Samples The first step in the design of a sampled-data compression system is the selection of a method which will eliminate "redundant'' samples at the transmitter and reestablish these samples (possibly with some allowable error) at the receiver. The adaptive prediction techniques suggested by Balakrishnan (1) are quite general and easily applied to the performance of these functions. We assume throughout this chapter that the video data is represented in the usual form by sequences of samples (TV elements) arranged in scan lines. Each sample may take on any of Q quantum levels. The problem at the transmitter then is to attempt to predict the present sample after having observed m of the preceding samples. Since the exact statistical structure of an information source is in general not known a priori, it is desirable that the predictor be adaptive (6,7). Therefore, the predictor must learn from the data as well as adjust its operation according to changes in the characteristics of the data. Following this philosophy, let the present sample be denoted by Sk and the corresponding prediction by Sk. Let us further define the memory set associated with Sk by Mk = {Skj | the Sk. are the previous samples included in the memory, = 1, . . . , m} and the learning set,

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

33

Lk = {Mk., Sk.\ the Mk. and Sk. are the past memory sets and associated prediction candidates included in the learning set; i — 1, . . . , λ}. These rather abstract definitions of the memory and learning sets will become clearer in the following paragraphs, where we describe the determination of Sk and give specific examples of Mk and Lk.

B. Prediction with an Operator Here we consider the operator " O " applied to the m preceding samples of Mk surrounding the sample to be predicted (Sk). Since the neighboring samples are correlated to the prediction candidate, the memory should include samples from the candidate's line as well as samples from preceding lines. The nonlinearity of the operator is increased by weighting higher order products of samples in the memory. This allows the prediction to be as nonlinear as is desired. The operator used here, however, is a linear weighting of the m samples in memory plus a constant and is required to minimize the mean square prediction error over the learning set Lk. Specifically then, §k is determined according to 0(Ski, Skl,...,

SkJ = W0 + W,Ski + W2Sk2 + ■ ■ ■ + WmSkm = §k

where the Wj are determined such that £(Skl-Skl)2,SkleLk

(1) (2)

is minimized. The Wj are computed by solving a set of λ simultaneous equations in m + 1 unknowns. Early computer simulation experiments showed that a learning set size of 20 samples and a memory set size of 3 samples were advantageous choices. This was in agreement with the theoretical results obtained by Davisson (8). The memory set consisted of the 3 samples immediately preceding and on the same scan line as Sk. The learning set also consisted of 20 preceding samples on the same scan line as Sk. In later experiments, the memory set was reduced to two samples consisting of the sample immediately to the left (on the same scan line as Sk) and the sample directly above Sk (on the preceding scan line). One-fourth of the samples in the learning set were chosen from the present line and one half from the line above. The remainder of the samples were located two lines above the present line.

34

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

In order to strictly satisfy the properties of adaptivity, the prediction operator should be updated (i.e., new weights should be computed) before each prediction attempt. To conserve computer time, however, a criterion was established in order to decide when to update the operator. After every λβ prediction attempts, the sum of the squared errors for unpredictable samples is compared to a threshold T. If T is exceeded, the operator is updated by beginning a new learning operation and determining new weights. If T is not exceeded, the operator is permitted to attempt to predict the next λ/2 samples using the same weights.

C. Prediction with a Nonlinear Operation Suppose again that we attempt to predict the &th sample, having observed m preceding samples Skl, Sk2, . . . , Skm. The " b e s t " nonlinear estimate for Sk in the mean square sense is given by Sk = E(SkISkl, Sk2, . . . ,Skm), since the conditional expectation minimizes the mean square error. For the sampled-data case, a prediction can be determined by calculating Sk=t

1= 1

i-Fr(Sk = i/Skl, Skl,...,

Skm)

(3)

where Pr( · / · ) is the conditional probability that the &th sample is equal to the zth quantum level, given the ordered set of m preceding samples. Unfortunately these conditional probabilities are not known a priori and therefore must be estimated from the data. In the implementation, however, we need only keep track of the sample conditioned mean for each of the Qm ordered sets {Ski, . . . , Skm}. Storage requirements obviously limit the size of m. Since the predictor is to be adaptive, it must modify its structure to meet changes in the characteristics of the data. This is accomplished by requiring that the prediction statistic be updated after each prediction attempt. That is, if the sample is predictable, the predicted value is used to update the conditioned mean, and if the sample is unpredictable, the actual sample value updates the conditioned mean. In the next section, we discuss the choice of a peak error criterion for deciding whether or not a sample is predictable. The sample conditioned mean prediction statistic is ideally suited to a mean square error rather than a peak error criterion. The statistic which does corre-

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

35

spond to a peak error criterion is the sample conditioned mode. That is, the prediction operation becomes §k — i, if

Σ ?r[Sk = (i +j)ISki, ...,SkJZ+£

Pr[Sk = (i +

h)/Ski,...,SkJ

(4) for all A; 1 g ι, h ^ ρ . This predictor is " best " in the sense of minimizing the probability of exceeding a peak error threshold τ. The implementation of this scheme essentially requires the construction of a sample distribution (histogram) for each ordered set {Sfcl, . . . , Skm}. A prediction is then determined according to Eq. (4). Again the predictor adjusts its structure at each attempt at prediction by updating the sample distribution with either the predicted value or the actual sample value, depending on the success or failure of the prediction attempt. Since there are Qm possible observations of the memory vector Mk = {Ski, . ..,*S fcm }, memory set sizes larger than three produce unwieldy storage requirements even for Q as small as 16 quantum levels. Again in early experiments with this prediction method, memory sizes of one and two samples were used. First experiments used the samples immediately preceding Sk while later experiments again utilized the sample immediately to the left and the sample directly above Sk, as was done in the prediction operator experiments. The learning operation of the sample conditioned statistic predictors differs considerably from that of the operator predictor. Since there are typically several hundred possible observations of Mk, the statistical predictor must learn over a rather large number of samples in order to obtain good estimates. While no attempt was made at optimizing this learning operation, a learning set of approximately 1000 samples was considered a reasonable choice. Several learning schemes suggested by Davisson (9) may be more optimal. These have the effect of permitting the learning operation to depend on the most recent samples.

D. The Fidelity Criterion and a Figure of Merit Now that several methods for obtaining Sk have been established, we may proceed to outline a decision rule for determining whether or not Sk is predictable. A prediction is determined for each sample and we must decide whether or not the sample may be eliminated. This

36

R. L KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

decision is made by forming the absolute error ε = \Sk — §k\ and comparing it to an allowable peak error threshold, τ. If ε ^ τ, the sample is said to be predictable and conversely if ε > τ, the sample is considered unpredictable. This process generates a binary timing sequence which must be encoded and transmitted along with unpredictable samples so that the receiver may reestablish every sample. A peak error criterion was chosen because of its simplicity as well as its satisfactory performance. While it is possible to construct more complicated decision criteria to achieve better prediction results, the question of an optimal decision criterion was not of primary concern in our experiments. It should be noted that the parameter τ essentially controls the fidelity of the reconstructed data. The actual value of τ corresponding to a minimum acceptable quality of the reconstructed image was determined experimentally. For example, with Q = 16 quantum levels, a τ of one level will provide acceptable quality in the image reconstructed at the receiver. The performance of a predictor must be judged by some figure of merit. During early experiments a rather natural figure of merit evolved called the Sample Compression Ratio (Cs) and was defined as Total number of samples Number of unpredictable samples

s

The limitation of this quantity is quite obvious, because savings in energy per bit is the ultimate goal. To provide a meaningful measure of the savings, the cost of transmitting timing information must also be included. During later work, encoding of "compressed" data was considered and it became more convenient to define a slightly different measure of predictor performance. This figure of merit is called the average "probability of prediction ,, (p) and is defined as

p

Number of predictable samples =

Total number of samples Probability of prediction will be used as the fundamental measure of predictor performance for the remainder of the chapter. Typical values of p attained in simulation experiments range from 0.75 to 0.95.

E. Encoding "Compressed Data" for the Noiseless Channel After the process of eliminating " redundant " (predictable) samples is completed, unpredictable samples as well as timing information must be encoded for transmission to the receiver. The next step in the design of

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

37

the sampled-data compression system is the selection of an encoding procedure. For every B (block length) original samples, the output of the predictor consists of the unpredictable samples in addition to a binary sequence (1 = unpredictable sample, 0 = predictable sample) indicating the outcome of the prediction attempt for each original sample. We assume that perfect synchronization is achieved between blocks. This situation is depicted in Fig. 2. UNPREDICTABLE SAMPLES s

Si, S2, · · · SB PREDICTOR

b

S i + r i, S 1 + ri

»Ί

+ r2

•••Si

+ ri +

...rN

r2

10001 00001 •••100

RAW DATA SAMPLES

TIMING INFORMATION

FIG. 2. Predictor input and output.

It has been stated that the sample compression ratio is not an adequate measure of system performance when the encoded bit stream is considered. Thus, for the noiseless channel case, we may define a new figure of merit called the Bit Compression Ratio (C b ), defined as the ratio of the number of bits/sample required to transmit uncompressed data to the number of bits/sample required to transmit the same data " compressed. " Before discussing specific encoding procedures, it would be worthwhile to observe the performance of the ideal noiseless code as a function of the average probability of prediction, p. If in the original data all possible sample values (Q quantum levels) are equally likely, the transmission per sample rate prior to prediction is simply log 2 Q. Assuming that prediction is sample to sample independent and that all sample values outside the threshold τ are equally likely when prediction fails, the per sample rate after prediction is then: (1 -p) \og2Q0 -(l-p)

log 2 (l -p)

-plog2p

where Q0 = Q — 2τ — 1. Under these assumptions, the upper bound in bit compression ratio (Cb) is l

c < b

-(l-p)

log2Q0 -(l-p)

^2

M

log2(l - p) - p log2p

V>

38

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

and provides a measure of the performance (as a function of p) of the data compression system, assuming an ideal code and a noiseless channel. We may now proceed to discuss several possible encoding procedures and compare their performance with the ideal noiseless code. One of the simplest and most obvious encoding procedures might be called the onebit-per-sample code. Here the procedure simply consists of transmitting the output of the predictor (Fig. 2) in some appropriate format. This means that for every B original samples, B(\ — p) \og2Q bits are used to transmit unpredictable samples. In addition, the B bits of the timing sequence at the output of the predictor are sent. This procedure is simple

I 0

i 0.1

i 0.2

i 0.3

i 0.4

i 0.5

i 0.6

i 0.7

i 0.8

i 0.9

I 1.0

AVERAGE PROBABILITY OF PREDICTION (p)

FIG. 3. Bit compression ratio (Cb) vs. probability of prediction for both the ideal code and the one bit-per-sample code for Q = 16 quantum levsls.

but is not very efficient when the probability of prediction^) is high. This is indicated in Fig. 3, where the bit compression ratio is plotted vs. p for both the ideal code and the one-bit-per-sample code. The effect of channel noise is one of the most important considerations in evaluating the performance of a data compression system. Our

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

39

objective was to experimentally determine the noisy channel performance of a specific video data compression system and compare our findings with the theoretical results obtained by Davisson (10). We, therefore, adopted Davisson's model in our experimental research. One of his assumptions was the use of a specific form of run length coding referred to as a word pair code, where the predictor output is encoded in word pairs. The first word in each pair represents an unpredictable sample value. The second word (run length word) corresponds to the number of predictable samples until the next unpredictable sample. The maximum allowable run length is (Q — 1) samples. If more than (Q — 1) consecutive samples are predictable, the Qth sample must be transmitted and a new run length initiated. Thus, the output of the predictor in Fig. 2 becomes the input to an encoder generating the word pair code in Fig. 4. UNPREDICTABLE SAMPLES

r

l

• " Sl ' rl

r

2

10001 00001·· »100

fr

S

l + r , ' r2

· · ·

Si +ri +. . . r N, r

WORD PAIR CODE

TIMING INFORMATION

FIG. 4. Word pair encoder.

F. Channel Noise Considerations A number of workers have investigated image data compression techniques but frequently without consideration of channel noise. In the previous section, we pointed out that the sample compression ratio is an inadequate figure of merit if the noiseless encoding problem is considered and thus proposed that the bit compression ratio was more meaningful. Similarly, if we now wish to include channel error effects, the bit compression ratio is not adequate. We, therefore, make use of another system performance measure due to Davisson (10), called the Energy Compression Ratio Ce. The ratio Ce is defined as the ratio of the average energy required to send a sample in an uncompressed communication system to that required in a proposed compression system for the same data quality at the receiver. This ratio is a function of the bit error rate in the

40

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

compression system, and as the error rate goes to zero, the energy compression ratio approaches the bit compression ratio. Davisson used the probability of received sample error as the measure of "data quality. ,, One of our objectives was to determine sample error rate and energy compression ratio by simulating a " compressed " video communication system. Davisson's analysis and hence our experiments were based on the following system model : 1. The data is encoded in blocks of B samples, assuming perfect synchronization and that the synchronization word is of negligible length compared to the block length. 2. The source data can be modeled as a first order Markov chain with probability p (probability of prediction) of remaining at the same quantum level independent of the level. Therefore, a nonadaptive (fixed transition probabilities) predictor was employed. The transition probabilities were estimated from the data. 3. Within each block, the data is encoded according to the word pair coding scheme described in the previous section. 4. The channel is binary symmetric with error rate r bits/bit. In the next section, we present our experimental results and formulate several basic conclusions necessary to define guidelines for the design of a video data compression system.

III. Experimental Results In this section, we present our experimental results and draw on the definitions and theoretical results stated in Section II. Before proceeding to discuss the results, we first give a brief description of our image display facility, since a number of sample pictures will be shown in this section. The pictures to be shown here were produced on a Litton Flying Spot Scanner/Display System. The Litton System is interfaced with an SDS-920 computer so that video data stored on digital magnetic tape may be displayed. In addition, this system can be used to scan 35 mm slides, digitize the video output, and record the samples on magnetic tape. An approximation is made for the nonlinearity of the eye by encoding the cube root of the slide's transmissivity. Thus, the full range of slide transmissivities are converted into sixteen equal psychological steps (11). This conversion is important when data compression is employed and channel noise is encountered. That is, if all quantum

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

41

levels are equally likely, the effects of both compression and noise would appear to be uniformly distributed over all intensities. In future work, this experimental facility will be used in a closed-loop, real-time mode to evaluate various compression system models. Since speed is important for this real-time simulation, the data will be passed through an interface between the SDS-920 and an SDS-9300, so that the 9300 may handle the more involved processing. The image output of the simulation will be immediately displayed, thus allowing rapid system evaluation and optimization. Because the theory does not always specify optimum choices for parameters and procedures, one of the experimental objectives was to make such determinations. By far the least expensive and most flexible method for this task is computer simulation. The remainder of this chapter is devoted to the presentation and interpretation of the main results. All of our experiments were performed on a sample of ten TIROS cloud cover pictures. Starting with standard F M subcarrier modulated magnetic tapes, the data was originally quantized to 6 bits/ sample (TV element) with 456 samples/scan line and 480 scan lines/ frame, as already stated. First experiments showed that 6-bit quantization was too fine (in view of the six to ten TV gray levels) and that acceptable digital picture quality could be obtained with as few as 4 bits/sample. Therefore, in the following sections only the results with 4-bit quantization will be given. In addition to results on individual pictures, average results over the ten picture set are also quoted.

A. Predictor Simulation Experiments We pointed out in Section II that the peak error threshold τ essentially controls the fidelity of the data reconstructed at the receiver. While τ is not completely independent of the prediction method, it should be chosen as the largest value yielding a minimum acceptable picture quality. Figure 5 depicts received picture quality for τ = 0 , 1, and 2 quantum levels. (The related picture data, giving where and in which mode the image has been received, is explained in the caption.) This picture has been used because of its nondistinct pattern, showing only an accumulation of cumulus clouds over the tropical region. Thus, it is thought to be a difficult case. In order to achieve a compromise between prediction efficiency and picture quality, a threshold of one quantum level was chosen. As one can see, higher T'S cause objectionable contouring and streaking.

42

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

FIG.

5a

FIG.

5b

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

FIG.

5c

FIG.

5d

43

FIG. 5. Received picture quality for (a) analog original, (b) τ = 0, (c), τ = 1, (d), T = 2 , quantum levels. Picture originated from T I R O S III, orbit 102, frame 2, camera 1 ; taped before transmission from satellite ; principal point, 13.3N, 5.6W; subsatellite point, 11.9N, 1.9W.

44

R. L KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

B. Linear Prediction Operator Experiments The objective of all our prediction experiments was to maximize the sample compression ratio (and hence the probability of prediction) under the constraint of the fidelity criterion (i.e., τ = one quantum level). Many simulations of the prediction operator were performed for various values of m and λ. Early experiments used m = 3 and λ = 20 with the Mk and Lk containing only past samples on the same line as Sk. These conditions yielded a ten picture average ofp = 0.785, corresponding to an average sample compression ratio (Cs) of 4.65. In later experiments, the memory and learning sets have been defined as shown in Fig. 6. The obvious intent here is to make up the learning and memory LEARNING SET (L K ) X

x

X

X X

X

x'

SKÏX.

..X ..X ..X

MEMORY SET (MK) FIG. 6. Geometry of the memory and learning sets.

FIG. 7a

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

FIG.

45

7b

F I G . 7C

FIG. 7. (a) Original analog image, (b) Original digital image, 0 = 16 quantum levels, (c) Reconstructed image after ideal " compressed " data transmission; adaptive linear operator prediction, p = 0.796 ; τ = 1 quantum level. Picture originated from T I R O S III, orbit 4, frame 3, camera 2: direct transmission from satellite; principal point, 43.4N, 95.0W; subsatellite point, 40.8N, 88.8W.

46

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

sets from past samples surrounding Sk, and in contrast to the method used before, the learning set samples are contained in a number of TV lines in the proximity of Sk. Using this arrangement, with m = 2 and λ = 16, an average p = 0.809 (Cs = 5.23) was attained. A typical example is shown in Fig. 7. Figure 7c is the result of applying the prediction operator to the source data, transmitting only unpredictable samples, and timing information with the ideal code over a noiseless channel and reconstructing the image at the receiver.

C. Nonlinear Prediction Experiments First experiments with nonlinear prediction used memory set sizes (m) of one and two past samples on the same line as Sk. However, prediction efficiency on the video data was almost completely insensitive to learning set size (λ). This is illustrated in Fig. 8, which shows an early result with 6-bit quantization. Table I summarizes the average compression results for both the sample conditioned mean and the sample conditioned mode predictors. For m = 2, the memory set consists of the two samples immediately preceding Sk. For m = 2*, the asterisk indicates that the memory set (Mk) is defined as in Fig. 6. Table I shows that the mode and mean predictors yield very similar results. A pictorial TABLE I T E N PICTURE AVERAGE PROBABILITY OF PREDICTION (p) FOR BOTH THE MEAN AND MODE PREDICTORS WITH 0 = 16 QUANTUM LEVELS; T = 1 ; λ = 1000 ADAPTIVE C O N D . MEAN PREDICTOR

ADAPTIVE C O N D . MODE PREDICTOR

m = 2

.815

.822

m = 2*

.835

.853

example is given by Fig. 9. Again this example represents the reconstructed image after noiseless transmission with the ideal code. In the next section, we present some examples which use the word pair code described in Section I I . In addition, we summarize the average compression results in the noiseless channel for several coding procedures.

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

47

oω LU O

Û-

b

4.0

^ < 3.0 LU O O CO < CO

cr LU LU er. >

Û.

O o

2.0 1.0 0

50

100 150 200 250 300 350 400 450 500 LEARNING SET SIZE (In Scan Lines)

FIG. 8. Average sample compression ratio vs. learning set size (in scan lines) with Q — 64 quantum levels, m = 1, τ = 2 quantum levels.

D. Coding for the Noiseless Channel In Section II, we outlined the problem of encoding "compressed" data for transmission in a noiseless channel. Our intent here is to summarize the average bit compression ratio results for several encoding procedures. The average compression results using three different predictors including a first order Markov predictor are presented in Table II. Markov prediction results are included because, in the following section where channel noise is considered, we assume that the image data can be modeled as a first order Markov chain. To implement this predictor, estimates of the first order transition probabilities were obtained by averaging over all ten pictures. The most probable sample value conditioned on the previous sample value was used as the prediction for each sample. The transition probability matrix is given in Fig. 10. It is interesting to note the large probabilities along and adjacent to the main diagonal, while the off-diagonal terms become almost insignificant. This explains the respectable performance of the simple zero order (previous sample) predictor on this type of data. The Markov predictor utilizing the matrix of Fig. 10 is nonadaptive, since no attempt was made to adjust the elements of the transition probability matrix based on the success or failure of prediction attempts. One can safely conclude from Table II that an average bit compression ratio between 2.5 and 3.5 can be attained with a rather simple compression system.

48

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

FIG.

9a

FIG.

9b

FIG. 9. (a) Original analog image, (b) Original digital image, Q = 16 quantum levels, (c) Reconstructed image after ideal "compressed" data transmission; adaptive conditioned mode prediction, p = 0.876, τ = 1 quantum level.

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

49

FIG. 9C

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

,639 .248 .069 .020 .008 .006 .003 ,001 .001 .001 .157 .498 .276 .042 .013 .006 .004

002

.031 .191 .520 .182 .044 .017 .009

003 .001

.008 .028 .178 .448 .205 .084 .033

010 .003 .001

.003 .007 .038 .178 .386 .273 .082

023 .005 .002 .001

.001 .002 .010 .054 .189 .429 .239

057 .012 .003 .001 .001

.001 .004 .017 .051 .199 .438

226 .048 .011 .003 .001

.001 .005 .013 .047 .221 .001 .003 .012 .053 .001 .003 .010 .001 .002 .001

15 .002

437 .206 .053 .012 .002 .001 229 .408 .227 .054 .009 .002 054 .211 .441 .226 .043 .007 .001 Oil

.046 .209 .487 .196 .038 .008 .001

002 .008 .047 .239 .486 .175 .034 .006 .001 001 .002 .012 .075 .299 .425 .155 .026 .003

.001 .005 .030 .117 .287 .402 .137 .017 .001

.001 .003 .010 .042 .116 .281 .431 .113

.009 .001 .001 .002 .001 .002 .002 . 002 .003 .004 .007 .014 .028 .074 .200 .648

FIG. 10. Transition probability matrix for first order Markov predictor.

50

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

TABLE II

SUMMARY OF AVERAGE COMPRESSION RESULTS FOR THE NOISELESS CHANNEL CASE. Q = 16 INTENSITY LEVELS; T = 1 WORD PAIR CODE

IDEAL CODE PREDICTORS0

A B C

P

C

P

0.809 2.875 0.784 0.853 3.493 0.816 0.813 2.885 0.795

C

1 BIT/ SAMPLE CODE

P

2.455 0.809 2.715 0.853 2.468 0.813

C 2.283| 2.520 2. 288

a A, adaptive linear operator; m = 2, λ = 16 samples. B, adaptive conditioned mode; ra = 2, λ =1000 samples. C, first order Markov predictor.

FIG. 11. Reconstructed image after simulated transmission of compressed data in a noiseless channel. Markov prediction and word pair code. Maximum run length = 15 samples; τ = 1 quantum level; p = 0.842; Cb = 2.866 (original as in Fig. 9).

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

51

A typical example is shown in Fig. 11, using Markov prediction and the word pair code. In the next section, we attempt to show how channel noise might affect this conclusion. E. Channel Noise Effects Assuming the model described in Section I I , a number of experiments designed to show both the quantitative and qualitative effects of channel errors were performed. Throughout this section we make use of first order Markov prediction and word pair coding. Intuitively one would expect "compressed'' data to be much more sensitive to channel errors than " uncompressed " data. This notion is reinforced by a consideration of Fig. 12. Figure 12a shows the result of transmitting an " uncompressed " picture through a binary symmetric channel at an error rate, r = 10" 1 . Figure 12b shows the reconstructed picture after a "compressed " data transmission over the same channel at the same error rate. Despite the high error rate, the image is still recognizable in Fig. 12a but is completely destroyed in Fig. 12b. The reason for this is quite simple. A single bit error in the "uncompressed'' data causes an error in only one sample. A single error in the "compressed" data, however, may cause errors in many reconstructed samples. Actually two types of errors may occur during a "compressed" data transmission. The first of these is a quantum level error, which can only effect the reconstructed sample value but not its relative position on the scan line. The second type is a timing word error, which can cause errors in future reconstructed samples even beyond the distance specified by the run length word. Both types of errors can be seen in Fig. 12b, where the high error rate has made it impossible for the simulated receiver to reconstruct an image even vaguely resembling the original picture. This sort of subjective reasoning can be complemented by Davisson's quantitative result, which is plotted in Fig. 13 for p = 0.8. One of the objectives of recent experiments was to verify this result using the TIROS video data. In order to remain consistent with the model of Section II, the following choices were made for the experimental simulation. 1. The data was encoded in blocks of 456 samples (one scan line) with perfect block synchronization. Each original sample was quantized to 16 quantum levels.

52

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

FIG.

12a

FIG.

12b

FIG. 12. Channel error rate = 1 0 " 1 bit/bit. (a) Image data transmitted via PCM. (b) Reconstructed image after "compressed" data transmission. First order Markov prediction and word pair code. Maximum run length = 15 samples ; τ = 1 quantum level.

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

53

2. A first order Markov predictor was used with an allowable error threshold τ = 1. 3. Within each block, the "compressed'' data was encoded in the word pair code with a maximum run length of 15 samples. 4. A binary symmetric channel was simulated. The simulation yielded an average probability of prediction p = 0.795 and the experimental points are plotted in Fig. 13 for various channel error rates. Each point was obtained by computing the exact number of 1.0

ANALYTICAL RESULT fio] -^ p=0.8 ; B=500

CHANNEL BIT ERROR RATE F I G . 13. Average probability of sample error in reconstructed data vs. channel bit error rate. Analytical and empirical results.

54

R. L. K U T Z , J. A. SCULLI, A N D R. A. STAMPFL

sample errors in the ten reconstructed pictures resulting from a given channel error rate. It should be recognized that some of the savings achieved by data compression could be used to reduce the reconstructed sample error rate by increasing the transmitted energy per bit. This philosophy leads naturally to the use of an energy compression ratio as the fundamental system performance measure. Figure 14 shows a plot of energy compression ratio (Ce) vs. channel error rate forp = 0.8 and 0.9. o* 5 o BIT COMPRESSION RATIO

z o

CO CO Ld

cr

Q_

o o

>o cr 10"8

10"7

106

105

104

103

10-

10J

CHANNEL BIT ERROR RATE

FIG. 14. Energy compression ratio (Ce) vs. channel bit error rate. Q = 16 quantum levels ; maximum run length = 1 5 samples ; block length (B ) = 500.

It is interesting to consider several typical examples showing channel error effects, keeping in mind the quantitative results provided by Fig. 13 and 14. Figure 15 depicts the effects of noise at a channel error rate of r = 10" 3 . The number of sample errors in the reconstructed data is 36,258. This corresponds to approximately 17% of the total samples, but subjectively the picture does not appear to be objectionable. The untimely "streaking" creates an undesirable effect, yet even at this relatively high error rate the picture is by no means worthless. Most of the errors seem to be masked in the content of the scene, but obvious timing errors can be detected. A second example is presented in Fig. 16 at a more reasonable channel error rate of 10 " 4 The number of individual sample errors is 1,516 but the effects of these errors nearly defy subjective detection. It should be emphasized that these experiments were performed under the assumption of perfect block synchronization. In addition, there was no attempt to implement any special decoding rules designed to

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

55

minimize the effects of errors. Future work, however, will be aimed at including effects of synchronization errors as well as investigating optimum receivers for data compression systems. F. Conclusions The objective of our experiments with several prediction schemes was to maximize the probability of prediction (p). The results show that the average values of p attained were nearly the same for all methods considered. This is illustrated in Table II and is especially true if the word pair code is used. In pure prediction capability, the adaptive sample conditioned mode predictor yielded the best performance of all methods. Its superiority is minimized, however, when its implementation requirements are compared with that of the nonadaptive Markov predictor. As a general guideline then, we can expect to achieve a value of/) between 0.8 and 0.9 on a single picture and an average value ofp between 0.8 and 0.85. The effects of noise on compressed data are translated into streaks or contours in the reconstructed image. The exact structure of these effects seems to be primarily dependent upon the encoding procedure chosen. A particular prediction method may tend to slightly improve or degrade the subjective effects of noise, but its effectiveness appears to be minimal. Clearly the choice of a run length type code tends to emphasize the "streaky" noise effects because of its error accumulation property. On the other hand, the one bit per sample code is much less noise sensitive. At high signal to noise ratio, however, the run length type code provides superior coding efficiency and the noise effects become almost undetectable. It is interesting to note that the use of the threshold, τ produces the same kind of streaking or contouring as does noise. The difference is that τ controls the range of the prediction error, whereas the noise errors are limited only by the range of the data. Since all the prediction methods provided nearly the same prediction efficiency, we decided to study the effects of channel errors on one of the simplest possible systems. This system consisted of the Markov predictor and the word pair encoder. These choices also fit Davisson's analytical model and do not require large amounts of computer time to achieve significant results. The summary examples which follow utilize the Markov predictor and word pair encoder to achieve compression. These examples provide the reader with a step by step outline of our results.

56

R. L KUTZ J. A. SCULLI, AND R. A. STAMPFL

FIG.

15a

FIG.

15b

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

57

F I G . 15C

FIG. 15. (a) Original analog image, (b) Original digital image, Q = 16 quantum levels, (c) Effects of channel errors at r = 10" 3 bit/bit. First order Markov prediction and word pair code. Maximum run length = 1 5 samples ; τ = 1 quantum level; p = 0.842; Cb = 2.866.

FIG. 16. Effects of channel errors at r = 10~4 bit/bit. First order Markov prediction and word pair code. Maximum run length = 1 5 samples ; τ = 1 quantum level; p = 0.842; Cb = 2.866 (original as in Fig. 15).

58

R. L. KUTZ, J. A. SCULLI, AND R. A. STAMPFL

FIG.

17a

FIG.

17b

FIG. 17. (a) Original analog image, (b) Original digital image, (c) " C o m pressed" picture; Markov prediction, word pair code, no noise. £ = 0.822; Cb = 2.579. (d) r= 10" 2 . (e) r = 10~3. (f) r= 10" 4 . Picture originated from TIROS V, orbit 3143, frame 6, camera 1; direct transmission from satellite; principal point, 32.4N, 69.3W; subsatellite point, 33.9N, 73.4W.

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

FIG. 17C

FIG.

17d

59

60

R. L KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

FIG.

17e

FIG.

17f

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

FIG.

18a

FIG.

18b

61

FIG. 18. (a) Original analog image, (b) Original digital image, (c) " C o m pressed" picture; Markov prediction, word pair code, no noise, p = 0.923; Cb = 4.529. (d) r = 1 0 " 2 . (e) r=10~3. (f) r = 10" 4 . Picture originated from T I R O S VI, orbit 3692, frame 31, camera 1; taped before transmission from satellite; principal point, 36.8N, 57.2W; subsatellite point, 33.IN, 48.7W.

62

R. L. KUTZ, J. A. SCULLI, AND R. A. STAMPFL

FIG.

18c

FIG.

18d

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

FIG.

18e

FIG.

18f

63

64

R. L KUTZ, J. A. SCULLI, AND R. A. STAMPFL

The fundamental conclusion to be drawn from the results is that, using a relatively simple system, average energy savings of more than 4 dB can be achieved, and "acceptable'' received picture quality can be maintained at channel error rates of 10 ~ 4 and lower. This conclusion may seem to be in conflict with the energy compression ratio arguments and the curve of Fig. 14. The problem here is that the use of the probability of received sample error as an absolute measure of data quality is not ideal. Our conclusion as to received picture quality at r = 10~ 4 was reached subjectively. The received sample error rate does provide a neat quantitative measure for analysis purposes, but it contains no information about the structure and distribution of the errors. It cannot therefore be used as a basis for subjective picture quality judgments which appear to be so necessary. Unfortunately no ideal measure of picture quality is known at present, so that we must continue to use subjective arguments. Perhaps energy compression ratio should be viewed as a lower bound, especially at channel error rates lower than 10 ~ 4 . The two pictures shown in Figs. 17 and 18 are largely self-explanatory. Figure 17 has been selected because of the plentiful fine structure of cumulus type clouds. The large bands are the dominant gross feature. (Pertinent documentary information is given in the caption.) The picture of Fig. 18 shows a large hurricane off the coast of Miami. The original analog reproductions from the magnetic tapes recorded at the receiving stations are given in Figs. 17a and 18a. Figures 17b and 18b are presentations where no error (τ = 0) was allowed and no noise inserted in the channel. For control of contrast reproducibility, a gray wedge is inserted on the right-hand side. Close observation of small specks of clouds convinces the reader that the two images (a and b) are nearly identical. The fiducial marks are straight line features permanently etched onto the vidicons for geometrical calibration. The ragged appearance in the pictures are due to ground equipment synchronization imperfections and in some cases spacecraft tape recorder flutter and wow. The next picture (Figs. 17c and 18c) allows one quantum level error (τ = 1) and is encoded with the word pair code without insertion of errors in the simulated transmission link. Probability of prediction was p = 0.822 for Fig. 17c and p = 0.923 for Fig. 18c. Bit compression ratios of 2.579 and 4.529 were achieved respectively. The remaining frames use r = 10~ 4 and 10" 3 to show gradual degradation. The predominant error, even without noise in the transmission link, is the streaky appearance which is due to the allowable error threshold. In both pictures, transmission noise produced very few streaks. In each picture,

ADAPTIVE DATA COMPRESSION FOR VIDEO SIGNALS

65

only a few streaks are really disturbing. Higher noise levels degrade picture quality rapidly. OUTLOOK: Considerable energy saving can be obtained by application of the techniques discussed in this chapter. Before complete design guidelines can be established for a practical video data compression system, however, an efficient synchronization system must be designed. A method under consideration uses a synchronization word of length equal to that of a word pair. It is inserted for every B (block length) samples of the original data stream. A synchronization algorithm is being evaluated which has the property that nearly perfect synchronization is maintained at channel error rates less than 10~ 4 . Preliminary computer simulation results indicate that a block size between 40 and 70 samples allows maximum energy saving for channel error rates from 10 ~ 3 to 10 ~ 5 . Another practical design question is that of pretransmission buffering of compressed data. The function of the buffer is to accept the variable rate data at the output of the compression system and provide a uniform rate bit stream to a storage device or directly to a transmitter. An investigation is being conducted to establish a criterion for the choice of block size as well as to determine an efficient buffer control algorithm. Preliminary results show that a buffer size slightly larger than one TV scan line should be sufficient for most cases. While these studies were designed to show the feasibility of video data compression with known amplitude accuracy, it must be left to the communication systems engineer to adapt the techniques to specific applications. REFERENCES

I.A. 2.

3. 4. 5. 6.

V. Balakrishnan, A n adaptive nonlinear data predictor. Proc. Natl. Telemetering Conf., Washington, D.C., 1962 II, suppl. W . F . Schreiber, C. F . K n a p p , a n d N . D . K a y , Synthetic highs—an experimental television b a n d w i d t h reduction system. Presented at t h e 1958— 84th S M P T E Convention, Detroit, Michigan. D . W e b e r , A synopsis on data compression. Proc. Natl. Telemetering Conf. Houston, Texas, 1965 Paper T A 1-1. G . L . Rega, A unified a p p r o a c h t o digital television compression. Proc. Natl. Telemetering Conf., Houston, Texas, 1965 Paper T A 1-4. L . W . G a r d e n h i r e , R e d u n d a n c y r e d u c t i o n — t h e key t o adaptive telemetry. Proc. Natl. Telemetering Conf., Los Angeles, California, 1964 Paper 1-4. A. V. Balakrishnan, R. L . K u t z , a n d R. A. Stampfl, Adaptive data c o m p r e s sion for video signals. NASA Tech. Note D-3395 April 1966.

66

R. L. KUTZ, J. A. SCIULLI, AND R. A. STAMPFL

7. J. A. Sciulli, Compression of video data by adaptive nonlinear prediction. NASA Tech. Note D-3475 August 1966. 8. L. D. Davisson, Theory of adaptive prediction, in Advances in Communication Systems (A. V. Balakrishnan, ed.), Vol. 2, Academic Press, 1966. 9. L. D. Davisson, Data compression and its application to video signals, in Final Report of the Goddard Space Flight Center Summer Workshop Program, GSFC Document X-100-65-407, pp. A-35-A-55, 1965. 10. L. D. Davisson, The concept of energy compression ratio and its application to run length coding, in Final Report of the Goddard Space Flight Center Summer Workshop Program, GSFC Document X-70067-94, 1966. 11. S. S. Stevens, The psychophysiology of sensory function, in "Sensory Communication" (W. Rosenblith, ed.). M.I.T. Press, and John Wiley and Sons, p. 13, 1962.

Some Aspects of Communications Satellite Systems S. METZGER Communications Satellite Corporation Washingtony D.C. I. Introduction II. Communications Satellites A. Intelsat I B. Intelsat II C. Intelsat III D. Intelsat IV I I I . Modulation Methods A. General B. Frequency Division Multiplex I Frequency Modulation (FDMjFM) C. Pulse Code Modulation!Phase Shift Keying (PCM/PSK) D. Frequency Division Multiplex I Single Sideband Transmission (FDM/SSB) References

67 68 68 71 72 74 75 75 82 86 88 90

I. Introduction In less than 10 years from the launching of Sputnik I, satellite usage has expanded from purely scientific probes to practical applications such as weather satellites (Tiros series), communications satellites, and manned satellites. The first meeting of organizations who were planning earth stations for testing with the experimental communications satellites Telstar and Relay was held in Paris, July 1961, under NASA auspices. Progress has been more rapid than had been predicted by even the most ardent enthusiasts. By 1965, the Early Bird satellite was launched for the Interim Communications Satellite Committee. This 85 lb operational satellite has a capacity of 240 circuits, almost as many as the combined capacity of the first four transatlantic cables laid during the previous 10 years. By 1966 the ICSC, with some 50 member countries, had 1200 circuit satellites under construction, and plans underway for one with a potential of many thousands. This paper covers two selected aspects of communications satellites which are unique to this field. A description of the present ICSC satellite program, with discussion on satellites in orbit, under construction, and in planning, mainly from the viewpoint of the communication system engineer is followed by a report on the question of multiple 67

68

S. METZGER

access, of many stations communicating with one another via a common satellite, and a comparison of the relative merits of various modulation methods and their possible upper bounds of channel capacity.

II. Communications Satellites The rapid advances in the commercial communications satellite field can be shown by comparing the telephone circuit capacities of the Interim Communications Satellite Committee (ICSC) satellites Intelsat I through IV. Intelsat I ("Early Bird"), planning for which started in late 1963 and which was launched in April, 1965, has a 240 circuit capacity between countries in the Northern Hemisphere. Intelsat II, started in 1965 and launched in late 1966 also has a 240 circuit capacity, but its wider beam antenna permits operation in both the Northern and Southern Hemisphere. Intelsat III, contracted for in the summer of 1966 to be launched in 1968, will have a capacity of 1200 circuits and full hemispheric coverage. Intelsat IV, now in the planning stage, could be launched in about 1970 and could have circuit capacities ranging from approximately 5000 to 15,000 as the antenna beam coverage narrowed from hemispheric (20° beam) to sectional (2° beam). Intelsat I through III are designed for the Delta rocket family (whose capacity increases year by year), while Intelsat IV is designed for a larger launching rocket. A description of these satellites from the viewpoint of their communications performance, and a discussion of the reasons behind their main features, follows.

A. Intelsat I This " experimental/operational " satellite ("Early Bird") was launched by a Delta rocket from Cape Kennedy on April 6, 1965. It was " experimental " in that it was to provide needed data regarding: (1) rain margins at earth stations; (2) telephone subscribers' reactions to the time delay of a stationary satellite ; (3) long-time performance of control valves in a space environment; (4) the applicability of satellite communications for commercial telephone use; (5) long-term drift rates of a quasistationary satellite. It was "operational" in that the single experimental satellite could

COMMUNICATIONS SATELLITE SYSTEMS

69

be used for commercial service, assuming answers to the above questions were satisfactory (and they were). The orbit achieved, after initial adjustments, an inclination of 0.126°; apogee of 19,332 nautical miles, and perigee of 19,310 nautical miles at a longitude of 30°W; and a drift rate of 0.056° east/day. The drift continued until on July 1, 1965 at a longitude of 28°W, the drift direction reversed. On December 2, 1965, with the satellite at a longitude of 38.5°W, commands were sent to propel the satellite in an easterly direction, since a further westward drift would decrease the satellite elevation angle from the most easterly earth station at Raisting, Germany. Since then the satellite has drifted east to 28.5°W longitude, reversed, and as of September, 1966, was at 38.3°W longitude. At that time another maneuver propelled the satellite toward the east with an initial rate of 0.15°/day. Ideally, it would have been desirable to hold the satellite's position within a fraction of the earth station antenna beamwidth (an 85 ft diameter antenna has a beamwidth of approximately 0.14° between half-power points at 6 GHz). However, a satellite at this altitude undergoes an increase in inclination of approximately 0.9° per year due to the effects of the moon and the sun. To correct this inclination would require the expenditure of about 4 lb of peroxide per year, but the weight limitation of the rocket prevented this approach. Correction of longitude requires less than 0.1 lb of peroxide per year, and the two control systems (one in use and one spare) each had approximately 2.25 lb following initial positioning maneuvers. However, correction of longitude alone would still require the earth station to track the satellite's "figure 8 " motion due to inclination (2.18° as of July, 1967), and the typical longitude drift rate of about 0.07°/day is very small compared to inclination changes. Therefore, no attempt has been made for frequent longitude corrections. The Comsat Control Center, based on angle and range data obtained from the Andover, Maine earth station, updates the orbit data daily and sends weekly earth station pointing predictions (10-min intervals) to all earth stations using Intelsat I (Pleumeur Bodou, France; Raisting, Germany; Fucino, Italy; Goonhilly, United Kingdom). The satellite weighted 85 lb when in orbit (apogee kick rocket fired; peroxide tanks full). Many of the parameters of the system were selected because they permitted use of existing equipment, rather than because they satisfied long-term objectives. For example, the receivers incorporated limiters; and each has a bandwidth of only about 25 MHz. These factors render difficult application to multiple carrier usage

70

S. METZGER

through each receiver, but are satisfactory for the mode of operation employed wherein only two countries use the satellite at one time (the European countries operating in a weekly sequence), so that one carrier is sent through each of the two receivers. The output of these receivers feeds into a common traveling wave tube (a spare tube is provided and can be switched in by ground command). The satellite transmitting antenna consists of six dipoles arranged in a colinear array. The resulting pattern is omnidirectional in the orbital plane (at right angles to the line of the array—which is the spin axis), but the dipoles are phased so that in elevation the beam (11° between half-power points) is tilted 7° above the orbital plane to concentrate the radiation to the Northern Hemisphere. The antenna gain at beam center with respect to an isotropic radiator is slightly under 9 dB, and the specification on effective radiated power (ERP) for each of the two carriers is 10 dBW. The flux density at the satellite needed to saturate the limiter is - 7 4 dBW/m 2 . A synchronous satellite is eclipsed by the earth during the spring and fall. The eclipse season starts approximately three weeks before March 21 (and also before September 23) and continues for six weeks per season. The outages occur at local midnight (at the subsatellite point), and increase in duration each night until a maximum of 70 min on March 21 and September 23, is reached after which they start decreasing. To permit operation during these periods would require the use of storage batteries, but the weight restriction on Intelsat I prevented this. The storage batteries on board the satellite are mainly for supplying the high peak power needed to operate the solenoids for the attitude control system. The greater weight capacity of Intelsat II, III, and IV will allow sufficient battery capacity to operate the communications repeaters during eclipse. This satellite was built for the ICSC by the Hughes Aircraft Company, and is still operating successfully (Nov. 1967) since its launching (April 6, 1965). Early Bird tests during the first 2 years in orbit provided the following data regarding the experimental aspects of the satellite : (1) Stations without radomes report up to 2 to 3 dB degradation in carrier-to-noise (C/N) due to rain and wind. Stations using inflated radomes report degradations of up to 6 to 9 dB due to rain. (2) Time delay alone does not appear to be significant in subscriber reaction (with a path including a single synchronous satellite), but rather

COMMUNICATIONS SATELLITE SYSTEMS

71

the combined effects of the presently used echo suppressors with this delay are factors. Quantitative measures of the reaction vary widely depending on the test method used. A significant test is that of commercial acceptance and use of over 108 telephone circuits (via Early Bird) across the North Atlantic since June, 1965. (3) Hydrogen peroxide control valves have worked properly at the latest maneuver in June, 1967, after two years in orbit. (4) Satellite communications have been accepted as a major means of global communications, and consideration is now being given to its use for regional and domestic use. (5) Based on data gathered from tracking Early Bird, the position of the earth's bulge now appears to be at 12 to 15°W longitude rather than at 20°W as previously estimated. Using the refined constants, the satellite position can be predicted about a month in advance with an accuracy of approximately 0.05°.

B. Intelsat II The specifications for this satellite arose from a need by NASA in the summer of 1965 for multichannel communication by late 1966 between their stations in Carnarvon, Australia; Ascension Island; Canary Islands; tracking ships in the Atlantic, Pacific, and Indian Ocean ; and their Manned Space Flight Headquarters in Houston, Texas (via the Comsat earth stations in Brewster, Washington ; and Andover, Maine). The Intelsat I satellite was not suitable for this application because its tilted antenna beam covered only one hemisphere; insufficient power output (which a wider beam would have decreased further) ; and the receiver limiters, which prevented its application to multicarrier use. The need to modify the Intelsat I specifications allowed the introduction into Intelsat II of new techniques, which showed promise for future communications satellites. These included the concept of a wide band repeater with several stages of tunnel diode amplifiers for the incoming signals in the 6 GHz band, a conversion to 4 GHz, and subsequent amplification by a low level traveling wave tube and the output stage. This approach resulted in a wideband amplifier (130 MHz) with far fewer parts and therefore greater potential reliability than with the more conventional I F amplifier type of heterodyne repeater. The bandwidth was made as wide as possible consistent with the performance of available tunnel diodes and the need to limit new development because of

72

S. METZGER

the difficult delivery schedule. This approach has been extended (in Intelsat III) to a bandwidth of 230 MHz, and it appears that it might be increased to 500 MHz in the future, thus covering the entire band allocated to communications satellite use in this part of the spectrum (5925-6425 MHz for earth-to-satellite transmissions and 3700-4200 MHz for satellite-to-earth transmissions). A second new concept in Intelsat II was that of paralleling several traveling wave tubes to increase output power using existing tubes. A total of four tubes are provided, and any combination of these can be switched on by ground command. It is anticipated that for most applications two or three will be used, the remainder acting as spares. Finally, eliminating the limiter used in Intelsat I provides a quasilinear repeater suitable for multicarrier operation. The input-output characteristic of the repeater is primarily determined by that of the output T W T stage. The flux density at the satellite needed to saturate the TWTis -68dBW/m2. The transmitting antenna array consists of four stacked biconical horns, with an omnidirectional pattern in azimuth, and an elevation beam 17° wide with the peak of the beam aimed at the equator. The specified effective radiated power of the single satellite repeater (single carrier operation) is 15.5 dBW at an angle of ±6° from the center of the beam, corresponding to ±36° in latitude on the earth. The limiting factor in output power is the satellite weight, which in turn is dictated by the launch vehicle capability. It is a tribute to the Delta project to point out that its in-orbit weight capability for a satellite in synchronous orbit has increased from 85 lb (Intelsat I) in April 1965 to 190 lb (Intelsat II) in the last quarter of 1966. This is due to the improved second stage, and also to the FW4 third stage, in place of the former " 2 5 8 " third stage. Intelsat II was also built by the Hughes Aircraft Company. C. Intelsat III The Intelsat III design reflects an advance in design beyond Intelsat I, while Intelsat II was only an interim measure to meet a specific application. Intelsat III is to have a capacity of 1200 two-way telephone channels (with earth stations having 85 ft diameter antennas), or a five-fold increase over Intelsat I. This improvement is based on two factors : First, the use of a fully directional antenna covering the visible earth (20° beamwidth in elevation and in azimuth) rather than the

COMMUNICATIONS SATELLITE SYSTEMS

73

toroid antenna pattern used in Intelsat I and I I ; and secondly, the continuing increase in launch capability of the Delta rocket. This satellite, a contract for which was let to TRW, Inc., in the summer of 1966, is to be launched in 1968, at which time the Delta will have a synchronous

- • — 6 in.

18 in.

' ,, Intelsat II

r*—

26.5 in

1

5 6 in

FIG. 1. Dimensions of Intelsat I, II, and III.

74

S. METZGER

satellite capability of 270 lb. This increase over the 1966 (Intelsat II) Delta is due to the replacement of the FW4 third stage by a Surveyor type TE-364 motor and also to the increasing of the size of the first stage. The antenna is a conical horn coincident with the spin axis, and a flat plate reflector at a 45° angle with respect to this axis and rotating in the opposite sense to the satellite's rotation. IR sensors are used to keep the plate pointed toward the earth. There are two repeaters, each with a 230 MHz bandwidth, so that together they cover 90% of the allocated 500 MHz band. Each repeater has an effective radiated power of 22 dBW measured at the half-power point of the beam. The repeaters use the same general type of tunnel diode/TWT amplifier as that used in Intelsat II. Since the directive array is used for reception as well as transmission, the flux density at the satellite needed to saturate the repeater is reduced to —74 dBW/m 2 . The outside dimensions of Intelsat I, II, and III, including their antennas, are shown in Fig. 1.

D. Intelsat IV Since this satellite is now (1967) still in the planning stage, the following figures are rough estimates. The basic idea for this satellite is that it embody the higher ERP which can be achieved by the use of a larger launch vehicle (Atlas or Titan class rather than Delta) and also use directive antenna beams, which illuminate only sectors of the visible earth. The net effect will be to increase the channel capacity of large earth stations and also to provide reasonable capacity to smaller earth stations. For example, it is estimated that ten to twelve repeaters of the Intelsat III type might be used (instead of two), thus giving a total ERP (at the edge of a 20° x 20° global coverage beam) of 33 dBW (2 kW). If the antenna beam(s) be narrowed to 6° x 6°, the total ERP rises to 43 dBW (20 kW); and if multiple beams of width 2° x 2° be used, the ERP becomes 53 dBW (200 kW). If this power be uniformly spread over the entire useful band of 400 MHz (allowing a total of 100 MHz for guard band), the field intensity on the earth's surface is still within the proposed limit of - 1 5 2 dBW/m 2 /4 kHz. Studies are under way for the use of a satellite as a repeater for communication between planes flying transoceanic routes and control stations in various parts of the world. Questions under consideration

COMMUNICATIONS SATELLITE SYSTEMS

75

involve the choice of frequency for this application; and also the problem of a single satellite combining this purpose and providing telephone service between fixed stations or of the use of a separate satellite for each.

III. Modulation Methods A. General One of the new problems introduced by communications satellites is that of obtaining "multiple access", wherein a number of earth stations communicate through a common satellite, hopefully with a minimum of satellite power, bandwidth, intermodulation, and equipment. To accomplish this, several means are being studied, each of which has certain advantages and disadvantages with respect to the above parameters. The basic methods include multiple carriers, each frequency modulated by a frequency division multiplexed baseband (FDM/FM) ; a time division system wherein each station transmits its channels in time sequence using pulse code modulation and phase shift keying of its carrier (PCM/PSK) and the signals from the various stations interleave in time with one another ; and single sideband transmission in which the frequency division multiplexed basebands of each station are translated into the microwave region (6 GHz) for transmission to the satellite and return (SSB). A novel method has been proposed by the Nippon Electric Co. and Hughes Aircraft Co., the " single channel per carrier system," in which each voice channel frequency modulates a separate carrier. These methods will be briefly discussed, and a more detailed analysis of the first three will then be given to show their possible upper bounds of channel capacity for satellite use. The satellites being built and planned all use simple wide band frequency translating repeaters, so that any of the above modulation methods, or even a combination of these, may be used in the future. The only methods excluded are those which involve signal processing in the satellite; for example, SSB for the earth-to-satellite path, and a conversion in the satellite to F M for the return path. It was felt that though this method has certain advantages over the others, it would eliminate the flexibility of the nonprocessing repeater for accepting different types of modulation, and also increase its complexity, and therefore was not considered.

76

S. METZGER

A basic difference between satellite communications and terrestrial microwave links is in the required margins. Fading in terrestrial line-ofsight links is usually due to multipath transmission, approximated by a Rayleigh distribution, and typical margins allowed for such fades are 35 to 40 dB. Experience with the Early Bird during the past two years shows that simple free space propagation exists except for degradation in carrier-to-noise performance during rain (or wet snow). Stations without radomes report 2-3 dB degradation at worst, while those stations using inflated radomes report at most 6-9 dB due to the combined effects of carrier attenuation and increase in system noise. Operating with such low CjN ratios results in greatly increased frequency deviations, compared to terrestrial link usage, to meet CCIR recommendations of 50 dB test tone/noise ratio, and correspondingly increased bandwidths. /. Multiple Carriers, FDM/FM

The method now being used on Intelsat I and to be used on Intelsat II and III as well, is to have each earth station transmit one (or more) carriers, each of which is frequency modulated by a frequency division multiplexed baseband (FDM/FM). The total number of voice channels transmitted from each station is assigned in prearranged groupings to the various countries with which that station communicates. Thus, in general, each station would have a single radio transmitter for its channels, but a separate radio receiver tuned to each country from which it receives. The low noise amplifier may be sufficiently wideband to be used for a number of radio receivers. It is also possible to have some of the voice channels not permanently assigned, but switched among countries as the need arises. This method of F D M / F M multiple carrier transmission is used, because F D M / F M is the common modulation method for terrestrial microwave links throughout the world; it permits the use of a relatively simple wideband repeater in the satellite, and is also relatively efficient from the viewpoint of channels per watt. a. Intelligible Crosstalk^ Two major transmission impairments take place when multiple frequency modulated carriers are amplified by a common traveling wave tube. First, a variation in satellite repeater gain across the spectrum of a frequency-modulated carrier will result in f See Chapman and Millard (i).

COMMUNICATIONS SATELLITE SYSTEMS

77

amplitude modulation of that carrier. This varying voltage applied to the traveling wave tube will in effect vary the tube's electrical length, thus phase modulating all other carriers passing through the tube with the modulating frequencies of the first carrier (AM to PM conversion). This effect is independent of the spacing between the carriers and is proportional to the modulating frequency. In the case of only two carriers, one for each direction of transmission, with the same voice channel assignment (same frequency location in the baseband) in each direction, the crosstalk will appear as echo. The objectives for echo are less severe than for intelligible crosstalk. For two carriers being amplified by a typical traveling wave tube (Early Bird satellite), measurements made at the French, German, and United Kingdom earth stations show the intelligible crosstalk at 1 MHz baseband frequency to be weaker than 60 dB below the test tone level. When more than two carriers are used, the interference appears as intelligible crosstalk, and in this case it is desirable to limit the number of voice channels per carrier to about 120 for an equal number of channels on each carrier. If one of the carriers is larger than the others, it could have any number of channels, since the crosstalk due to the added channels would not appear on the other carriers' channels. b. Intermodulation. A second impairment is intermodulation due to nonlinearity of the satellite traveling wave tube. This problem has been analyzed by Sunde (2) for the case of a number of equal amplitude carriers, equally spaced. His analytic model of the traveling wave tube assumes a linear input vs. output voltage characteristic up to the saturation level, beyond which the output level is constant. This analysis shows that for the case of many carriers, the combined output power is less than for single carrier operation by about 1.5 dB ; and that the total intermodulation power for an input corresponding to the saturation level is 9 dB below the resulting output power. Mr. Arnold Berman of the Communications Satellite Corporation has extended Sunde's analysis to cover the general case of any number of input carriers with arbitrary amplitudes, frequency spacings, and frequency modulation indices. His T W T model is a four-term Fourier series whose coefficients are determined graphically from the input vs. output characteristics of a typical tube. The input signals are applied to this model, and the output consists of the resulting carrier amplitudes and the intermodulation products. A computer is used to perform the actual calculations. The 500 MHz spectrum used for satellite-earth

78

S. METZGER

transmission (3700-4200 MHz) is divided into 5000 cells (of 100 kHz each) and the intermodulation power falling into each cell due to the various cross products is added. It is sufficient to include only 3rd and 5th order intermodulation products. It is assumed that each carrier is frequency modulated with a band of white noise (representing a multichannel baseband signal) and the resulting power spectral density is given by the probability density function of the amplitude of the modulating signal. This approach can be modified for other types of modulation or baseband signals. Laboratory measurements were made using eleven carriers, spaced unequally over a 108 MHz band, and with amplitudes differing as much as 12 dB from one another. The resulting carrier output amplitudes and intermodulation products showed good agreement with the computed values (within 1 dB), at drive levels near saturation. The experiment also showed that the intermodulation noise could be treated as thermal noise in so far as its effect on the performance of a threshold extension receiver is concerned (3). In the present procedure for system calculations, the allowable 10,000 pW of channel noise (for 50 dB SjN) is divided approximately equally among up-path, downpath, and intermodulation noise contributions. The exact calculation becomes quite involved, including an estimate of the required margin; the performance of the threshold extension receivers (which varies with the number of channels) ; and an allowance for nonequal carrier amplitudes which result in nonuniform distribution of intermodulation products. The net effect is to require roughly 0.1 W satellite ERP per voice channel.^ Present estimates of bandwidth to be assigned for various numbers of channels are 5 MHz for 12 or 24 channels; 10 MHz for 60 channels; and 20 MHz for 132 channels. 2. PCM/PSK The several important advantages of this method over F D M / F M and the disadvantages follow. From the viewpoint of bandwidth conservation, 8-bit PCM and 8 kHz sampling rate as now used extensively in 24-channel PCM cable systems requires 64 kHz bandwidth per RF channel for two-phase PSK transmission, or 32 kHz for four-phase PSK transmission with the same power as for two-phase. In addition, allowances must be made for guard t Based on use of an 85 ft diameter antenna and a system noise temperature of 50°K.

COMMUNICATIONS SATELLITE SYSTEMS

79

time to separate the emissions from the various earth stationsf ; for time to synchronize RF carriers ; for bit timing ; and for the RF guard band (taken as 25% of the occupied band). For twelve stations sending a total of 720 voice channels (an average of 60 channels each), a single 32 MHz band would be needed (four-phase transmission), including the above allowances. By comparison, the multiple carrier F D M / F M system would use 12 bands of 10 MHz each, 120 MHz total. Another advantage is the simplification, and lower cost, of a PCM terminal compared to an F D M one. In the former, the complex equipment such as the coder/decoder, compressor-expander, and timing circuits are common to all channels, and the channel units are relatively simple gating circuits ; while in the latter, the bulk of the equipment is in the channel units rather than in common equipment. Further, the digital circuitry used for PCM is well adapted to the use of integrated circuits making for small, mass produced, low cost yet reliable modules ; while the bandpass filters needed for each F D M channel tend to be large, complex, and expensive by comparison. A further advantage of the PCM system over the F D M appears in the relative amounts of RF equipment needed. In the example given above, in the PCM/PSK case each of the 12 stations would need but one radio receiver (and associated oscillator, down converter, and demodulator), while for F D M each would need 12 receivers and associated equipment, including relatively expensive threshold extension demodulators. Finally, there is a substantial gain in system flexibility for PCM, in that changes in the number of channels allotted to each country (within the total allowable to the system) may be made simply by throwing switches, or by patching cables ; whereas in the F D M case cited above, changes in frequency allocations, transmitters, and receivers may be required. The disadvantages of PCM to F D M are: (1) For a power limited satellite, the former needs about 0.5 to 2 dB more satellite power than the latter (although as shown later, PCM requires less power per channel in the bandwidth limited case) ; (2) the PCM system degrades f Recent experiments made by the Communications Satellite Corporation between their earth station at Andover, Maine, and the Department of Transport earth station at Mill Village, Canada, via the Early Bird satellite showed that a guard time of 0.1 /xsec appears ample to permit stations to maintain synchronism through a moving satellite. Thus for 12 stations, the total guard time allowance would be only 1% (1.2 /xsec out of 125 /isec).

80

S. METZGER

more rapidly than F D M below the threshold region; (3) if the earth station is remote from the source of traffic, terrestrial links will be needed to connect the two, and F D M transmission will probably be used for that purpose, requiring both an F D M and a PCM terminal at the earth station; and (4) most important of all, currently, high capacity PCM terminals (the order of 600 channels) are not available commercially. This is because there has been no need yet expressed, rather than because of technological limitations. Several hundred thousand channels of PCM are in use in the United States (in 24-channel cable systems) and similar systems are in use in other countries. Also, tests are under way in Japan on a 240-channel commercial system. Development is now proceeding on higher capacity terminals for communication satellite application. 3. SSB

Single sideband transmission from earth station to satellite and return has not yet been seriously considered, mainly because of the relatively large satellite power requirements. It is shown in another section of this chapter that the satellite ERP (peak) per channel (for a synchronous satellite and an 85 ft diameter antenna at the earth station with a system temperature of 50°K) is from 7 to 35 W (depending on the linearity of the satellite repeater traveling wave tube), while FDM/ F M needs about 0.1 W/channel, an increase of two orders of magnitude. SSB uses an average of but 5 kHz per channel (assuming a 25% guard band between satellite repeaters) as compared to 2 to 3 times this for the limiting cases of F D M / F M and multiphase PCM/PSK. The maximum SSB channel capacity in a 500 MHz band, for a single satellite with all repeaters illuminating the same area on earth, is then 100,000 channels, requiring a satellite ERP (peak) of approximately 2 million watts. If a satellite antenna with a 2° beam were to be used (33 dB gain at edge of beam) covering an area of roughly over one-half million square miles, the satellite RF power becomes 1000 W. For comparison, a present day spinning satellite launched by an Atlas/Agena rocket has a capacity to generate about 100 W of RF power (at microwave frequencies). It would appear reasonable that in the next few years, considering the advances being made in launch capability, in oriented solar arrays, and in nuclear power supplies, that a capacity as given above could be realized. It is pointed out that the average power per SSB channel (assuming the energy is uniformly distributed over a 3 kHz band) corresponds to a flux density at the equator (worst case)

COMMUNICATIONS SATELLITE SYSTEMS

81

of —165.5 dBW/4 kHz/m 2 , well below the recommended maximum o f - 1 5 2 d B W / 4 kHz/m 2 . SSB has not been used in terrestrial links, since it is not practical to design repeaters with sufficient linearity to permit operation with multihop systems which can undergo large fades. For satellite use, these considerations no longer apply. Questions relating to amplitude and frequency control needed for an SSB/FM system will be tested in 1967 by NASA in their ATS program, and this data should be of use in evaluating the practicability of SSB transmission. Both the earth-to-satellite and satellite-to-earth frequency bands now being used are shared with terrestrial radio links. F D M / F M as the multiplexing and modulating method forms the basis for present frequency coordination calculations, and these would require reexamination for application to other modulation methods such as SSB. 4. Single Channel per Carrier Method

This method appears best suited for earth stations with relatively few channels which can be switched from station to station as the need arises, as contrasted to large stations with permanently assigned channels for the various other stations. It is an efficient method because of the relatively large threshold extension possible in an FM receiver with a 3 kHz baseband, and also because voice operated switches remove the carrier whenever the subscriber pauses, thus permitting the average satellite power to be reduced by a factor of 3 to 4 (speaker's activity factor). It lends itself to efficient use of the spectrum in that the various carrier frequencies may be assigned to whichever station requests circuits at a given time. To accomplish this sharing requires a central control station, or equivalent means, to do the assigning on demand, using manual, semiautomatic, or fully automatic methods. Also, each station must have equipment to permit a given voice channel to be switched to any of the assignable frequencies, for both transmission and reception. A variation of this method uses digital modulation for each channel in place of FM. Digital techniques applied to the single channel per carrier system have the same advantages and disadvantages compared to F M as previously stated for the multichannel case. 5. Summary

It does not appear possible at this time, with communications satellites in commercial operation for only a few years, to foretell the modulation method of the future. Studies and experiments are under way on all

82

S. METZGER

of the methods mentioned above, and with the quasilinear satellite repeaters now envisaged any of these methods or even combinations of them, each type being used for the application for which it is best suited, may be accommodated. The more detailed analyses in the following sections indicate the approach used and give typical results. In an actual design, the exact figures used for desired signal-to-noise ratio; up-path noise allowance; earth station equipment intermodulation; satellite antenna gain; and filter and transmission line losses may change the results given by a decibel or two.

B. Frequency Division Multiplex/Frequency Modulation (FDM/FM) The following analysis of channel capacity as a function of satellite and earth station parameters is based on high capacity systems involving many hundreds of voice channels per carrier, each carrier being amplified by a separate satellite repeater; thus a conventional FM receiver will be used (rather than a threshold extension type), and intermodulation between carriers will not be considered. (1) The mean power of an iV-channel multiplexed signal, at a point of zero relative level, is ( - 1 5 + 10 log N) dBm, or 32N x 10 " 6 W. (2) A full load sine wave whose peak voltage equals that of the multiplexed signal (whose peak to rms ratio = 13 dB) has a mean power = 3207V x 10 " 6 W . (3) Ratio of this full load tone to a single channel test tone of 1 milliwatt = 0.32 N. (4) For a single channel test tone-to-noise ratio of 50 dB, with 2.5 dB noise weighting, 4 dB preemphasis, and 1 dB (typical) allowance for up-path noise, the flat weighted test tone to noise ratio in the top channel is 44.5 dB = 3 x 10 4 ; which corresponds to a full load tone-tonoise ratio of 3 x 10 4 x 0.32 N = N x 10 4 . (5) The signal/noise power ratio in the top channel of an FM system is C/n6 kHz x m2, where m is the modulation index of the full load tone (top channel), and C/w6 kHz is the carrier-to-noise power ratio at the input to the receiver where the noise is measured in a 6 kHz band. This expression holds only for operation above threshold of the FM receiver. (6) Equating the above expression with that for the desired full load tone/noise gives N x 10 4 = C/n6kHz x m2; or in terms of the

83

COMMUNICATIONS SATELLITE SYSTEMS

system noise temperature T°K measured at the receiver input, C/T = SN x 10" 1 6 /m 2

(1)

The expression CjT is very convenient for describing the performance of a satellite system since it includes, in one number, the ERP of the satellite in the direction of the earth station under consideration, path loss, earth station antenna gain less feeder loss, and earth station noise temperature. It is commonly given in terms of dBW/°K, and will so be used later, but at this point in the analysis it is more convenient to leave C in terms of watts. (7) Equation (1) above gives the CjT needed for the desired channel performance as a function of the number of channels and also of the modulation index. It is also necessary to specify the margin M for the system, such that at extreme satellite range, under maximum rain conditions, the system performance will be just at threshold, 10 dB carrierto-noise ratio measured in the receiver band where the noise is that due to the earth station receiver plus that due to the up path. This latter contribution is taken as 1 dB. Under nonrain conditions, the carrier-tonoise ratio and the signal-to-noise ratio will both be greater by M (compared to rain conditions) and desired performance will be obtained. Thus, C = 1.26 M x 10 kTB, where k is Boltzmann's constant, or CjT = 1.7 MB x 10" 1 6 (B is bandwidth in MHz)

(2)

(8) Equation (1) is the CjT needed for the desired test tone/noise ratio, and Eq. (2) is the CjT for the desired margin. These may be combined by using the relationship between RF bandwidth B, number of channels N, and modulation index m. For large numbers of 4 kHz frequency divided channels, the maximum modulating frequency (in megacycles) is approximately 4-N x 10" 3 , and the RF bandwidth B = 2 x 47V x 10" 3 (m + 1). For large m this is approximately B = 8mN x 10" 3

(3) 2/3

(9) Combining Eqs. (1), (2), and (3) yields CjT = 1.13 NM 10~ 17 , or expressed in terms of decibels, (ψ)

\ ^ /dBW/°K

= -169.5 + 10 log N + i MdB

x (4)

Because Eq. (3) is an approximation, Eq. (4) is only approximate, but is used because it more clearly shows the relationship among the parameters than the exact expression, which can not be solved explicitly for

84

S. METZGER

C/T. A decrease in the margin required (for example, due to a decrease in the rainfall rate) tends to decrease the required carrier power and therefore the output S/N ratio, but this (for a given output S/N ratio) requires increasing the modulation index and therefore the bandwidth and the carrier power. The net effect is that carrier power is related to the margin with a § factor. Thus, if the required margin could be reduced from 6 to 3 dB, the carrier power is reduced by 2 dB. For example, assume a synchronous satellite, with 4 GHz path attenuation of 197 dB between isotropic radiators, an 85 ft diameter earth station antenna of gain = 58 dB, and an earth station system temperature of 50°K; then the required satellite ERP = [-13.5 +10 log N 4- 2/3 M d B ] dBW.

(5)

Considering the entire 500 MHz bandwidth allocated to satellite communications, and allowing 100 MHz for the total guard band to be used to separate the several repeaters, leaves 400 MHz of useful band, corresponding to 8350 channels (from Eq. (6) below with the above assumptions including a 6 dB margin). The total satellite ERP is then 30.3 dBW (1070 W), and with a net satellite antenna gain of 13 dB,f the combined repeater output powers become 53.5 W (say 12 repeaters of 4.5 W each for 700 channels/repeater). The limitation of 8350 channels is not a fundamental one. By decreasing the deviation of each carrier, and therefore the spectrum occupied, more channels can be obtained within the 400 MHz available, but at an increasingly high price in satellite ERP. For example, take the expression for m from the relation B = 2 x 47V x 10 ~ 3 (m + 1), and substitute it in Eq. (1), yielding C _ 87V x l O " 1 6 Γ / ΰ χ 10 3 V 87V and for B = 400 MHz, C_ 87V x l O " 1 6 T " / 5 x 10 4 f A satellite antenna b e a m illuminating the globe (approximately 20° X 20° b e a m width) would have a m a x i m u m gain of 18 dB at the b e a m center and 15 dB at b e a m edge. T a k i n g the latter figure, and allowing 2 dB for satellite losses gives 13 d B net antenna gain.

85

COMMUNICATIONS SATELLITE SYSTEMS

This expression no longer has the margin constraint, but it does have a bandwidth constraint (400 MHz). The resulting margins will all be substantially larger than 6 dB, since increasing M forces the required carrier power C to increase in order to maintain the channel signal/noise ratio. The decrease in bandwidth and the increase in C both tend in the direction of increasing the margin. It is also pointed out that the same total satellite ERP is needed for a given total number of channels regardless of the number of repeaters in the satellite. This is seen from Eq. (6) as follows—consider the entire band B to be used for a single carrier modulated by N channels, amplified by a single repeater requiring a certain CjT ratio. If the band B now be split into, say, 10 equal parts, each with its own repeater, and each amplifying a carrier modulated by one-tenth N channels, the resulting CjT per repeater is one-tenth that for the first case, so that the total satellite ERP for all 10 repeaters is the same as before. As a specific example of the channel capacities obtainable with higher satellite ERP than needed for 6 dB margin, the Table I gives CjT (and TABLE I

CHANNEL CAPACITIES, SATELLITE POWER, AND BANDWIDTH FOR F D M / F M

N: C/T (ratio): C/T (dBW/°K): Sat. ERP(dBW): Margin (dB) : Sat. ERP (dBW)/chan. : Sat. transmitter power (dBW) for sat. antenna gain = 13 dB (20° beam): = 33 dB (2° beam): BW/chan. (kHz):

8350 2.7 x lO" 13 -125.7 30.3 6 -8.9

10,000 5 x 10" 13 -123 33 8.7 -7

17.3 -2.7 60

20 0 50

20,000 30,000 40,000 7.1 x 10" 12 5.4 x 10" 11 5.1 x lO" 10 -111.5 -102.7 -92.9 63.1 44.5 53.3 38.8 20.2 29 1.5 8.5 17.1

31.5 11.5 25

40.3 20.3 16.7

50.1 30.1 12.5

satellite ERP when used with an 85 ft diameter earth station antenna and 50°K system temperature) for various numbers of one-way telephone channels, as well as the power per channel and bandwidth per channel. Examination of Eq. (7) shows that the limiting value for N approaches 50,000. However, consideration of spectra overlap and repeater filter characterististics will reduce this number somewhat. Finally, note that these capacities are not the maximum obtainable from one satellite, but are based on that number of channels sharing a

86

S. METZGER

common frequency band. If the satellite was to be provided with two directive arrays, aimed sufficiently apart so that their overlap was negligible (and this could be helped by each using an opposite sense of circular polarization), the same frequency band could be used for each antenna, thus doubling the number of channels possible. Further, if the area illuminated on earth by each were half that of the one beam satellite, the satellite transmitter power for each antenna could be halved, while still keeping the ERP per antenna and therefore channel capacity per antenna, as before. Twice the number of repeaters would be needed, but these would be half the power of the one beam satellite repeaters. This process could be continued for even more separate beams.

C. Pulse Code Modulation/Phase Shift Keying (PCM/PSK) In this calculation, it is assumed that the speech channel is sampled at an 8 kc rate, and that each sample is represented by an 8-bit binary code, resulting in a bit rate of 64 kHz/channel and requiring an RF bandwidth of 64 kHz for phase shift keying with two phases. A practical threshold is taken as 10 dB carrier-to-noise ratio, measured in a band equal to the bit rate for two phase keying. For multiphase keying (4, 8, 16, and 32 discrete phases), the relative bandwidth compared to two phase transmission is reduced in the ratio of 1/2, 1/3, 1/4, and 1/5 respectively. The threshold carrier-to-noise ratio for an n phase system is computed on the basis that the carrier phase shift due to the peak noise voltage should be no more than ±(360/2w)°. Based on the above, the bandwidth, threshold carrier-to-noise ratios and power needed relative to two phase keying required for the various multiphase systems are : TABLE II

COMPARISON OF MULTIPHASE SYSTEMS

Number of phases : Bandwidth per channel (kHz) : Threshold C/JV(dB): Power relative to 2 = 4 υ θ 4 2 ) 42) = 4 1) ©4 1) ©4 2) 42Λ2 = 4 Χ ) Θ ί ί ν

2

θ ^ 2 θ ^ 2

tfAa = 4 Χ) e4V\ ©4^3 θ4 χ Λ 3 θ4 2 Λ 3 ^2χ+1 "~

e

x+l

^ye2x+i

^e2x+i

2

and it may be readily checked that {s\?\ sx \ s(22x\ 2 , s(2x\ 3 © s22J+ χ} is a set of J = 4 parity checks orthogonal on e^ with the property that, excluding e(Q\ bits from time units 0 through (x — 1) are checked by only the first of these checks, and bits from any other x consecutive time units are checked by at most two of these checks. Thus the code is an ^-diffuse code. It corrects all bursts confined to x or fewer consecutive time units or any double-error pattern within its effective constraint length of nE = 12 bits. The required guard space, g> between correctable bursts is g = 3χ + 3 time units, since then bits from at most one burst can effect the nE noise bits appearing in the orthogonal checks. It is interesting to note than nE = 11 is the minimum possible value for a double-error-correcting R = \ code (1, p. 36), so that this ^-diffuse

ADVANCES IN THRESHOLD DECODING

109

code is nearly optimum in its random-error mode. Moreover, g = 3x time units is the minimum possible guard space (20) for an R = \ code that corrects all bursts confined to x or fewer time units ; so this diffuse code is again nearly optimal in its burst-correcting mode. One obtains the powerful practical advantage of combining both random and burst correcting capabilities without a significant sacrifice of optimality for either mode. Error control units employing such ^-diffuse convolutional codes have been built by Codex Corporation and operated over HF-teletype and tropo-scatter-teletype links. The coded systems were found to provide reliable communications at information rates significantly greater than uncoded systems on the same channels, and to reduce drastically the periods of unacceptable teletype copy (21). 3. Diffusing and Interlacing It should be pointed out that an ^-diffuse code is essentially different from an interlaced code. An interlaced code is a convolutional code formed from a basic code by replacing D with Dx in the code-generating polynomials of the basic code (22). Interlacing amounts to using x different basic codes, one code operating on the bits from time units spaced apart by multiples of x. Thus any burst over x or fewer time units looks like an error pattern in a single time unit to the underlying basic codes. The double-error-correcting R = \ code with nE = 11, having G[j](D) = 1 ®D3 © D 4 ®Z>5, becomes, after interlacing, the R = i code having G((^(Z>) = 1 Θ D3x ® D 4 x 0 D5x. The interlaced code corrects all double random errors as well as all bursts over x or fewer time units, since any such burst looks like a burst confined to one time unit, and hence at most a double error (since n0 = 2) to the underlying basic code. The guard space required by this interlaced code, however, is g = 5x time units or about 60% greater than that required by the ^-diffuse code in the preceding section. This large increase in guard space compared to a diffuse code is typical of what one encounters when attempting to obtain both random and burst protection by interlacing a multiple-error-correcting basic code. It should be mentioned that interlacing is extremely effective when one is interested in pure burst correction (22). It is also possible to interlace multiple-error-correcting convolutional codes in such a way as to obtain multiple-burst-correcting codes, but we shall not pursue these matters here.

MO

JAMES L MASSEY

I I I . N e w Results for Block Codes A. Nonorthogonalizability of the Golay Code In earlier work (7, p. 109), we conjectured about the possibility that any binary block code could be completely orthogonalized using the generalized L-step orthogonalization procedure. Since then, we have been able to prove that the cyclic Golay (23, 12) code cannot be orthogonalized, i.e., that no set of dmin — 1 = 6 parity checks orthogonal on any noise bit, or sum of noise bits, can be formed. Russian coding theorists (23) have independently observed the same result. We shall not give the proof, which is much more involved than seems warranted for this chapter. B. Self-Orthogonal Quasi-Cyclic Codes /. Description of Codes

Consider converting a convolutional code into a block code by the following artifice: Restrict the polynomials IU\D) in Eq. (4) to degree (M — 1), so that there will be a total of M(k0) information bits in the block. Carry out the formation of the parity sequences as in Eq. (5), but reduce the degree of the right-hand side to (M — 1) or less by repeated application of the rule DM = 1. The resulting block code still has rate R = k0ln0 and has a block constraint length of n = M(n0) bits. Block codes of this type, though formulated in a slightly different way, were studied by Townsend and Weldon (24)> who called them quasicyclic codes. The implementation of encoders and decoders for these codes is so similar to that for convolutional codes as given by Massey (1) that we shall not discuss that aspect of the codes here. 2. Self-Orthogonal Codes

Since the structure of quasi-cyclic codes is so closely related to that of convolutional codes, it is not surprising that results for the latter will have their counterparts for the former. In particular, almost all of Section II,B,3 applies with little change for quasi-cyclic codes. The set of differences for G[l\(D)y whose nonzero terms have degrees dY < d2 < ' ' ' < dw, is defined as the set of integers between 0 and M (exclusive) congruent modulo M to the integers in the set {^ — dj \ i φ j}.

ADVANCES IN THRESHOLD DECODING

III

The set of differences will be called full if its W2 — W integers are all distinct. Corresponding to Theorem 2, one obtains then the following theorem due to Townsend and Weldon (24) : THEOREM 5. A quasi-cyclic code with k0 = 1 or k0 = n0 — 1 and block length n = M(n0) is self-orthogonal if and only if the sets of differences for all of the G[l](D) are full and mutually disjoint.

Proof. The proof of Theorem 2 obtains with minor modification. Table I still gives the correct subscripts provided the residues modulo M of the entries are taken. Thus the only change in the proof is that all integers are replaced by their residues modulo M, i.e., by the integers in the set 0, 1, 2, . . . , M — 1 congruent to the given integers.

3. Code Construction

Perfect difference sets again provide a simple and direct means of synthesizing self-orthogonal codes. Indeed by taking M = W2 — W + I for a perfect difference set {dt = 0, d2, . . . , dw}, it follows from the definition of a perfect difference set that if we let the d/s be the degrees in G[l](D)y then the set of W2 — W differences is full (and closed packed in the sense that it includes all the integers 1 , 2 , . . . , W2 — W). The order of this difference set is W — 1. Since perfect difference sets exist whenever the order is the power of a prime (Î7), pk it follows immediately from Theorem 5 that : COROLLARY. For every prime p and integer &, there exists an R = \ self-orthogonal quasi-cyclic code with block length n = 2 (p2k + pk + 1) a n d / = < i m i „ - l = / + l· The last part of the corollary follows from noting that since G[l](D) has W = pk + 1 terms, each information noise bit will be checked by exactly W syndrome bits, as is readily seen from Table I. Finally, / is always less by one than the minimum distance in any self-orthogonal code. It should also be clear that, in complete parallelism with Section II,B,3, perfect difference sets can be partitioned into n0 — 1 subsets with the digits in these subsets then used to derive the code-generating polynomials of self-orthogonal quasi-cyclic codes with k0 = 1 or k0=n0-

1.

112

JAMES L. MASSEY

C. Codes from Finite Geometries /. The Projective Plane PG(2, pm)

A finite geometry is a collection of " p o i n t s " which satisfy an axiom system similar in many respects to the axioms of ordinary geometry (25). In particular, two distinct points determine a line, i.e., if two lines in a finite geometry have two points in common, then they must be the same line. An extremely interesting approach to the construction of codes suitable for threshold decoding has been developed by Rudolph (26, 27) using the properties of finite geometries. In this chapter we will discuss only the codes that arise from the projective geometries PG(2,pm). FG(2,pm) is the geometry of the projective plane. The 2 denotes 2-space or plane geometry, and the pm, where p is a prime and m any integer, denotes the fact that this geometry has a concrete representation in terms of the elements of GF(p m ), the finite field with pm distinct elements. Points of PG(2, pm) may be represented as nonzero 3-tuples (xu x2, x3) of elements of GF(pm), with the convention that (xif x2> x3) and (kxl9 kx2y kx3) denote the same point for every nonzero element k of GF(pm). Since there are p3m — 1 distinct nonzero 3-tuples, and since each point has pm — 1 representations, it follows that there are exactly (P3m - l)l(pm - !) = ! +Pm +P2m distinct points in PG(2,/>m). The lines of PG(2, pm) are similarly represented by nonzero 3-tuples (u1,u2,u3) of the elements of GF(pm) with the same convention. Thus, there are also p2m + pm + 1 distinct lines in PG(2,^ m ). The points on a given line (uu u2, u3) are the set of points (xu x2, x3) for which uixi + u2x2 + u3x3 = 0. Since this equation has exactly p2m — 1 nonzero solutions and since sets of pm — 1 solutions correspond to the same point, it follows that there are exactly (p2m - l)l(pm - 1) = pm + 1 points on each line of PG(2, pm). 2. The Related Codes

Consider next forming the (p2m + pm + 1) by (p2m + pm + 1) incidence matrix for PG(2,^>m) by associating columns with distinct lines, and rows with distinct points, and making the /, j entry 1 or 0 according to whether the ith point is or is not contained in the 7th line. Now consider that the rows of this matrix are the coefficients of the noise bits in the

ADVANCES IN THRESHOLD DECODING

113

parity checks for a block code of length n = p2m + pm + 1. Since each line contains pm + 1 points, pm + 1 of these parity checks will check any prescribed noise bit, say eY. But these same J = pm + 1 parity checks may not have any other noise bit checked by more than one member, since distinct lines are forbidden to have two points in common. Thus, from this set of parity checks, one can obtain a set of / —pm 4- 1 parity checks orthogonal on every noise bit. The redundancy, or the number of parity bits in the code, is just the number of linearly independent rows of this matrix and was conjectured by Rudolph (27) to be 3 m + 1 for the case p = 2. The correctness of this conjecture was recently verified by Graham and MacWilliams (27a). Finally, Singer (17) has shown that such an incidence matrix always exists in cyclic form, i.e., that the lines can be so ordered that the cyclic shift of any row is another row of the incidence matrix. Thus the corresponding code is just a cyclic code (28). We summarize these results forp = 2 with the following theorem due to Rudolph (27). THEOREM 6. For every prime p and integer m, there exists a cyclic block code with block length n =p2m + pm + 1 for which / = pm + 1 parity checks orthogonal on each noise bit can be formed. For p = 2, the redundancy of the code is 3 m + 1 bits.

For p = 2, the codes given by Theorem 6 are the (21, 11) and (73, 45) cyclic codes studied earlier by Prange (7) for m = 2 and 3 respectively. The Hamming (7, 4) code is obtained for m = \. All the remaining codes appear to be new. Rudolph has generated other classes of cyclic codes from finite geometries other than PG(2, pm). In particular, the codes derived from PG(&, 2) turn out to be just the maximal-length codes (26). It should be pointed out that Singer's proof (17) that the incidence matrix described above can always be chosen cyclic is just his proof of the existence of perfect difference sets of order pm. There is indeed a very close connection between perfect difference sets and the codes described in this section, but the use of the finite geometries gives a physical insight that would be hard to obtain solely from the numbertheoretic properties of perfect difference sets.

IV. Conclusions The abundance of material available for this chapter shows that threshold decoding remains an active area of research. It seems highly

114

JAMES L. MASSEY

significant to us that most of the new codes found for implementation by threshold decoders have their origins in number theory rather than in algebra. Further use of number theory in this way promises to provide additional classes of codes suitable for threshold decoding. The results in Section II,C,2 are a useful first step in laying to rest the bogey of error propagation in convolutional coding. Further results along this line can be expected in the future. It seems safe to say that threshold decoding has established itself as a technique of practical importance and of an increasing theoretical importance as well. ACKNOWLEDGMENT

I am grateful to Codex Corporation, Watertown, Massachusetts, for their release of the material in Section II, D. REFERENCES

1. J. L. Massey, "Threshold Decoding." M.I.T. Press, Cambridge, Massachusetts, 1963. 2. I. S. Reed, IRE Trans. IT-4, 38-49 (1954). 3. D. E. Muller, IRE Trans. EC-3, 6-12 (1954). 4. R. B. Yale, Error correcting codes and linear recurring sequences. Rept. 34-77. M.I.T. Lincoln Lab., Lexington, Massachusetts, 1958. 5. N. Zierler, On a variation of the first order Reed-Muller codes. Rept. 95. M.I.T. Lincoln Lab., Lexington, Massachusetts, 1958 (originally published as Sc.D. thesis, Dept. of Elec. Eng., M.I.T., Cambridge, Massachusetts, 1960). 6. J. H. Green and R. L. San Soucie, Proc. IRE, 46, 1741-1744 (1958). 7. E. Prange, The use of Coset equivalence in the analysis and decoding of group codes. AFCRC Rept. 59-164, Bedford, Massachusetts, 1959. 8. M. Mitchell, R. Burton, C. Hackett, and R. Schwartz, Coding and operations research. Rept. on Contract AF19(604)-6183. General Electric, Oklahoma City, Oklahoma, 1961. 9. R. G. Gallager, "Low-Density Parity-Check Codes." M.I.T. Press, Cambridge, Massachusetts, 1963. 10. J. L. Massey, Threshold decoding. Ph.D. Thesis, M.I.T., Cambridge, Massachusetts, 1962. 11. J. R. Macy, Theory of serial codes. Ph.D. Thesis, Stevens Inst. of Technol., Hoboken, New Jersey, 1963. 12. D. W. Hagelbarger, Recurrent codes for the binary symmetric channel. Lecture notes. Summer Conf. Theor. Codes, Univ. of Michigan, Ann. Arbor, 1962. 13. A. D. Wyner and R. B. Ash, IEEE Trans. IT-9, 143-156 (1963). 14. J. P. Robinson, Error propagation and definite decoding of recurrent codes. Rept. 47. Dig. System Lab., Princeton, New Jersey, 1965.

ADVANCES IN THRESHOLD DECODING

115

15. J. L. Massey and R. W. Liu, IEEE Trans. IT-10, 248-250 (1964). 16. J. P. Robinson, Self orthogonal codes. Rept. 43. Dig. System Lab., Princeton, New Jersey, 1965. 17. J. Singer, Trans. Am. Math. Soc. 43, 377-385 (1938). 18. J. L. Massey, IEEE Trans. IT-12, 132-134 (1966). 18a. D. D. Sullivan,Private communication (1966). 19. A. Kohlenberg, Private communication (1963). 20. R. G. Gallager, IEEE Trans. IT-12, 273 (1966). 21. North Atlantic teletype engineering study final Rept. I.T.T. Commun. Systems, Paramus, New Jersey, 1964. 22. J. L. Massey, IEEE Trans. IT-11, 416-421 (1965). 23. M. S. Pinsker, Private communication (1966). 24. R. L. Townsend and E. J. Weldon, IEEE Trans. IT-12, 278 (1966). 25. R. D. Carmichael, " Introduction to the Theory of Groups of Finite Order," Chapter XI. Dover, New York, 1956. 26. L. D. Rudolph, Cyclic codes and geometric configurations. Rept. 63MBD12. General Electric, Oklahoma City, Oklahoma, 1963. 27. L. D. Rudolph, Threshold decoding and incomplete block designs. Rept. 63MCD13. General Electric, Oklahoma City, Oklahoma, 1963. 27a. R. L. Graham and F. J. MacWilliams, B.S.T.J., 45, 1057-1070(1966). 28. W. W. Peterson, Error-correcting codes. M.I.T. Press, and Wiley, New York, 1961.

Coding and Synchronization—The Signal Design Problem J. J. STIFFLERI Jet Propulsion Laboratory Pasadena, California I. Introduction II. The Phase-Locked Loop Approach I I I . Resolving the Remaining Ambiguities A. Improving the Search Algorithm B. Selecting the Signal IV. Rapid Acquisition Sequences V. Summary Appendix A. T h e Mean-Square Optimum Cross-Correlation Function . . . Appendix B. On the Optimality of the Square-Wave Correlation Function for the First-Order Loop References

117 118 127 130 133 136 141 143 146 148

I. Introduction In digital communication systems in general, and coded communication systems in particular, timing is of the essence. In such systems, the information is represented by a sequence of digits or symbols, selected by the data source from a finite symbol set sometimes called an alphabet. In coded systems not all possible concatenations of symbols can be transmitted ; rather, the symbols are grouped into code words, only certain combinations of symbols comprising legitimate words. Generally, these words are in turn grouped into frames, the significance of a word depending upon its position in the frame. In order to extract the desired information from the received signal, the receiver must know when each new symbol, when each new word, and when each new frame begins. The establishing of this timing or synchronization information is the subject to be investigated in this chapter. Because a general discussion of the synchronization problem, as just outlined, is much too ambitious a project to be attempted in a single chapter, it is necessary to impose some restrictions on the particular aspect of the problem to be investigated here. Specifically, we shall stipulate that the needed synchronization is to be obtained from a f Present address: Raytheon C o m p a n y , S u d b u r y , Massachusetts. 117

118

J. J. STIFFLER

separate communication channel to be used solely for this purpose. The task is to determine the form of the signal to be transmitted over this channel. The goal is to choose the signal so as to minimize the transmitter power required to provide the desired synchronization accuracy in the presence of additive Gaussian noise. Let T indicate the length in seconds of the longest timing uncertainty to be resolved ; e.g., T might represent the word or frame period. Similarly, let AT be a measure of the maximum acceptable timing uncertainty; e.g., we might insist that the standard deviation of the timing error not exceed AT seconds. The problem, then, is to select some sync signal capable of reducing the timing uncertainty by a factor of T/AT = N, and to attempt to minimize the signal power needed to accomplish this task. It is apparent that the amount of power needed will be an increasing function of N. Thus, when N is large, the selection of an efficient synchronizing signal becomes especially important. And as the number of symbols included in each code word is increased (in order to obtain more efficient codes), the value of N generally increases, the original timing ambiguity T presumably increasing with the word length, while the maximum acceptable ambiguity AT is kept small relative to the symbol period. The problem to which this chapter is addressed, therefore, is of particular relevance when the information is to be encoded into relatively long code words. I I . T h e Phase-Locked Loop Approach Perhaps the most obvious method for attaining the goal outlined in the previous section is to transmit a periodic signal x(t) having a minimum period T. The received signal is then y(t) = Ax(t + τ 0 ) + n(t)9 with n(t) denoting the additive noise. The problem is to determine τ 0 to within the desired AT second accuracy. If we assume the noise is Gaussian, the optimum (maximum-likelihood) receiver is well known, and if in addition, the noise is white, the receiver assumes a particularly simple form. It is only necessary to evaluate the cross-correlation coefficients J m-l

\

m-l

'(ί+1)Γ iT

1 m AT

y(t)x(t + x)dt

y(t)x(t + τ) dt (1)

CODING AND SYNCHRONIZATION

119

The largest of these coefficients then corresponds to the maximumlikelihood estimate of τ (1). An mT second integration interval has been indicated. In principle, since the signal is transmitted repetitively, m can be any integer, but in practice m will be restricted by other factors to be taken into consideration. (We shall elaborate upon this shortly.) The parameter τ can generally take on any value in the range 0 ^ τ < T. This suggests an infinity of correlators of form (1). Since it is sufficient to reduce the timing ambiguity to Δ Γ seconds, N such correlators, determining the coefficients c(0)y c(AT), c(2 Δ Γ ) , . . . , c((N - 1) AT) (and selecting the largest of these) might be adequate. But since N will usually be quite large, this approach has definite practical limitations too. An even more serious difficulty with this method is caused by the inevitable time dependence of τ 0 , the phase of the received signal. The period of this signal will never exactly equal that of its locally generated replica. Doppler variations, for example, and random oscillator drifts will cause τ 0 to vary with time. The receiver must not only determine the current value of τ 0 , but must attempt to track it as it varies with time. This is the reason for the limitation on the maximum correlation interval referred to earlier. One way of overcoming at least some of these difficulties is suggested by the following observation. If c(x) is to attain its maximum value for τ = τ 0 , then dc{x)jdx should be zero at this point. We therefore find, as a condition on the optimum estimator of τ 0 ,

Further, since τ 0 corresponds to a maximum of the function C(T), οε(τ)/δτ will be positive for τ < τ 0 and negative for τ > τ 0 . Thus, dc{x)jdx could be used as an error signal indicating whether the current phase of the local signal leads or lags the phase of the received signal. This argument, in turn, suggests a tracking loop as shown in Fig. 1. The phase τ of the signal generator output is changed in accordance with the sign of the integrator output, increased when the integrator output is positive and decreased when it is negative. Presumably τ will thereby converge to and track the phase of the received signal. This device is highly reminiscent of the well known phase-locked loop. In fact, essentially the same device can be realized by replacing the integrator phase shifter combination in the Fig. 1 block diagram by a voltage controlled oscillator (VCO). The output of this VCO can then be used as a "clock" to control the period of the signal x\t). Since

120

J. J. STIFFLER

forcing the phase of a periodic signal to be directly proportional to the integral of a time function is equivalent to forcing the frequency of this same signal to be directly proportional to the function itself, this alteration does, indeed, leave the device effectively unchanged.

y(t)

x'(t + r )

PHASE SHIFTER

MT-SECOND INTEGRATOR

*'(t)

SIGNAL GENERATOR

FIG. 1. Signal tracking loop. The preceding paragraphs lead, then, to the slightly generalized version of the phase-locked loop shown in Fig. 2. Actually, the configuration arrived at was a first-order loop (in which the transfer function H(jco) of the loop filter is a constant, independent of ω), but presumably the loop performance could be improved, at least for some input phase statistics, by incorporating other loop filters. Nevertheless, for reasons which will become apparent shortly, consideration here will be limited to first-order loops. y(t)

►6*

LOOP FILTER

iI

z(H-T)

FUNCTION GENERATOR

FIG. 2. Phase-locked loop.

VCO

CODING AND SYNCHRONIZATION

121

The output of the signal generator in Fig. 2 is indicated as z(t + τ) rather than x\t + τ). Since we have not precluded the possibility that z(t) = x'(t)> this implies no additional limitation. Introducing this flexibility allows us to pose the signal design problem as it now stands in somewhat greater generality. Specifically, we would like to select the signals x(t) and z(t) so as to minimize the phase-locked loop tracking error. We require both x(t) and z(t) to have unity power :

ΐΓ* 2 (ί)Λ=ΐΓ* 2 (*)Λ=ι

(3)

(This represents no loss of generality, since the received signal is Ax(t + τ 0 ) with A an arbitrary constant, and since multiplying z(t) by a constant is equivalent to increasing the filter gain, which has not been restricted, by the same constant.) The phase-locked loop, as just argued, is in effect a device for determining the cross-correlation between the received and locally generated signals, and adjusting the phase of the local signal accordingly. Consequently, the received signal affects the phase of the local signal solely through the cross-correlation function 1

F

P»M = y | *(*)*(* + * ) *

(4)

The ability of the loop to track the phase of the received signal is clearly dependent upon its ability to determine the sign of this correlation function even in the presence of noise. This strongly suggests that pxz (τ) should be maximally positive for all τ in the range — T/2 < τ < Τβ. Since |ρ χ ζ (τ)| ^ Ι for all τ, the " o p t i m u m " correlation function is apparently a square wave. This contention is indeed proved to be true, in Appendix B, at least insofar as the tracking error in a first-order loop due to the additive noise is concerned. It is well known, but for our purposes unfortunate, that a squarewave cannot be realized as a cross-correlation function. One might attempt to determine two functions x(t) and z(i)y subject to the constraints (3), so as to best approximate, in some sense, a square-wave correlation function. The best approximation in the mean-squared sense, for example, is derived in Appendix A. The result is only marginally different from a sinusoidal correlation function; the performance of a phase-locked loop using such a correlation function is accordingly nearly identical to that of an ordinary phase-locked loop.

122

J. J. STIFFLER

The approach to be taken here, instead, is to evaluate the performance of the "ideal," but physically unrealizable, loop operating on the basis of a square-wave correlation function, and then to attempt to duplicate this performance with a realizable loop. The most extensive analyses of phase-locked loops have been made on the basis of linear approximations to the differential equation governing the loop dynamics [see, for example, Jaffe and Rechtin (2)]. This is possible if the correlation function ρχζ(τ) can be approximated by a linear function of τ in the region τ = 0, an obvious impossibility when ρχζ(τ) is a square wave. Exact analyses have been achieved only for the first-order loop (J, 4); and for the loop having the filter H(jœ) = K/(a Λ-joS) [see (4)]. It is for this reason that the following discussion is limited to first-order loops. The phase error φ (φ = 2πτ/Τ) caused by the additive noise in a firstorder loop is characterized by the density function (3) (5)

p{4>) = Ctx.V[rg{4>)} where

g(

c

= [[^pteW] V] \

r

= j^i

and where Ps denotes the power in the received signal (Ps = A2), N0 the single-sided (white, Gaussian) noise spectral density, BL = AK\\y the loop-noise-bandwidth, K the VCO gain constant, and ρ(τ) = ρχζ(τ) is as previously defined. The loop bandwidth, of course, is chosen to effect a compromise between the various sources of phase error. In particular, if BL were too small, the loop would be sluggish and unable to track possible random or transient phase drifts in the transmitted signal. Conversely, if BL were too large, the phase error caused by the additive noise would be excessive. If ρ(τ) were a square wave, g{^) would have the form

101 έ π

§(φ)=π-\φ\,

(6)

and Ρ{Φ)

=

lie™-

1) eXp[r(n

-

W)]

'

]ΦΙ π

~

(?)

and the mean-squared phase error in radians squared is

J ^φ ρ(φ) άφ = -j -

^„ _ ^

(8)

CODING AND SYNCHRONIZATION

123

Now, for purposes of comparison, consider the phase-error variance when the correlation function is a triangular-wave rather than a squarewave: 2 (ηΤ\

)

(η + π),

π 2η

- (η - π)> π

π - π