1,687 213 12MB
English Pages 320 Year 1990
Sir Maurice Kendall and J Keith Ord
Time Series Third Edition
Time series Third edition Sir Maurice Kendall and J. Keith Ord A CHARLES GRIFFIN TITLE
.
**
OXFORD UNIVERSITY PRESS
New York
© 1990 Maurice Kendall and J. Keith Ord
k3j
Mo
First published in Great Britain 1973 Second edition 1976 Third edition 1990
Published in the USA by Oxford University Press 200 Madison Avenue, New York, NY 10016
Library of Congress CataloguinginPublication Data Kendall, Maurice G. (Maurice George), 1907Time series/Sir Maurice Kendall and J. Keith Ord_3rd ed ‘A Charles Griffin title.’ Includes bibliographical references ISBN 0195207068 1. Time series analysis. I. Ord. J. K. II. Title QA280.K44 1990
Printed in Great Britain
Contents
Preface
ix
1. General ideas Discrete time series Calendar problems The length of a time series Some examples of time series The objectives of time series analysis Decomposition Notation Plan of the book Bibliography
1 4 5 6 7 14 14 15 16 17
2. Tests of randomness Turning points Phase length Tests for trend Testing trend in a seasonal series Testing for seasonality Exercises
18 18 20 21 23 24 26
3. Trend Moving averages Endeffects Centered averages Differencing Appendix 3A: properties of the V and B operators Exercises
27 28 25 36 36 37 39
4. Seasonality Types of model The Xl 1 method Exercises
41 42 47 49
5. Stationary series Stationarity Variance stabilization
51 51 52
iv
Contents
The autocorrelation function The partial autocorrelation function Autoregressive processes The Markov scheme The Yule scheme YuleWalker equations General autoregressive schemes Moving average schemes Duality between AR and MA schemes Autocorrelation generating function Mixed ARMA Schemes Seasonal Series Mixed seasonal models Exercises Appendix 5A: The geometric series Appendix 5B: Solution of difference equations 6. Serial correlations and model identification Definition Central limit theorem Exact results Partial serial correlations Model identification Detecting nonstationarity Data analysis Seasonal series Exercises 7. Estimation and model checking Fitting autoregressions Fitting MA schemes Fitting ARIMA schemes Some examples Model checking The BoxPierceLjung statistic Stationarity and invertibility check Goodnessoffit Updating a model Seasonal models Automatic model selection Information criteria Autobox Exercises 8. Forecasting Introduction Forecast accuracy Best linear predictors Nonstationary processes Seasonal models
54 55 55 55 59 60 62 63 65 67 68 70 70 72 73 74 76 76 77 80 81 82 88 92 96 103 104 104 106 106 108 110 111 112 112 113 114 116 1 17 119 120 A It
1 1 Vj 11/
122 122 123 124 127 128
Contents
v
Other forecasting procedures Moving averages Exponential smoothing Discounted least squares Holt’s method The Holt—Winters seasonal model Harrison’s seasonal model Evaluating forecasting methods Empirical comparisons Exercises
128 128 129 131 133 133 134 135 139 141
9. Statespace models Introduction The statespace formulation Recursive updating Properties of the statespace approach Estimation of ARIMA models Structural models Structural or ARIMA modelling? Exercises
144 144 145 148 150 150 151 155 155
10. Spectrum analysis Introduction Fourier representation The spectrum The Nyquist frequency Aliasing Sampling properties Examples of power spectra Smoothing the estimates Sidebands Echoeffects Fast Fourier transform Filtering and prewhitening Complex demodulation Seasonality and harmonic components Evolutionary spectra Inverse autocorrelations Forecasting Further reading Exercises
156 156 157 161 164 164 165 165 168 173 173 174 175 177 177 179 179 180 180 180
11. Transfer functions: models and identification Crosscorrelations Crossspectra Transfer functions Form of the transfer function Linear systems Cars and the FT index Several explanatory variables
181 181 183 loy 190 102 105 106
vi
Contents
Seasonal models Comparison with other approaches Exercises
197 197 198
12. Transfer functions: estimation, diagnostics and forecasting Introduction Estimation procedures Diagnostic procedures Checking the transfer function Checking the error structure Cars and the FT index Forecasting Prediction intervals Time series regression Automatic transfer function modelling Exercises
200 200 200 201 203 204 205 206 210 212 216 220
13. Intervention analysis and structural modelling Introduction Intervention analysis Model identification Multiple effects Estimation The airline example Automatic intervention detection Structural models Calendar adjustments Exercises
221 221 222 223 224 225 225 226 227 227 229
14. Multivariate series Introduction Autoregressive models VMA and VARMA models Stationarity and invertibility Autocorrelations Model identification Estimation Forecasting Exercises
230 230 230 232 233 235 237 238 240 242
15. Other recent developments Introduction Seasonal adjustment Missing values and unequal spacing Fractional differencing Robust estimation Nonlinear models Bispectra Bilinear models Threshold autoregression Random coefficients
243 243 243 244 245 245 246 246 247 248 248
’
Contents
vii
Transformations and growth curves Multidimensional processes
249 251
Appendix Appendix Appendix Appendix
253 266 269 276
A: Data sets and references B: C: Weights for fitting polynomial trends D: Computer programs
References
278
Author Index
291
Subject Index
294
Preface to the Third Edition
The dramatic advances in computing power made over the last twenty years have combined with theoretical progress to make modern time series analysis a very different subject (as symbolised by dropping the hyphen used in earlier editions). Thus, the third edition has turned out to be quite distinct from its predecessors, apart from the classical elements retained in the opening chapters. Nevertheless, the book’s role as an applied companion volume to Kendall’s Advanced Theory Of Statistics is preserved, and crossreferences are given where necessary. The notation is generally similar, but some differences will be noticed, such as the change of signs on the moving average coefficients. The central part of the book develops univariate models in the time domain and includes an extensive discussion on forecasting. Time spent on leave at the London School of Economics led to interesting discussions on structural modelling with Andrew Harvey and others. As a result, I find myself turning increasingly to the structural approach: a development of which I think MGK would have approved. The chapter on the frequency domain has been extended and updated; this is selfcontained and could be read straight after Chapter 5 or even omitted by those seeking only a discussion in the time domain. Later chapters cover the extension to models containing explanatory variables and the multivariate case. The book ends with a brief review of other topics. Exercises have been added to all but the first and last chapters to make the book more useful as a course text, as well as to help the reader’s understanding of the material. A data appendix is provided for those wishing to try model building for themselves, surely the ultimate path to understanding. Another appendix gives a brief guide to computer programs; the different formats of some of the diagrams reflect the use of these several packages in performing the analyses.
Acknowledgements I should like to thank Ildiko Schall for typing and retyping the manuscript. Also, I should like to thank David Reilly for his permission to reproduce
x
Preface
Figure 7.7. Finally, I should like to thank the London School of Economics for their hospitality during my sabbatical leave and several colleagues there, especially Andrew Harvey, for interesting discussions on time series. Keith Ord State College, Pennsylvania, USA New Year’s Eve, 1989
1 General ideas
1.1 Time is perhaps the most mysterious concept in a mysterious universe. But its esoteric nature, fortunately, will not concern us in this book. For us, time is what it was to Isaac Newton, a smoothly flowing stream bearing the phenomenal world along at a uniform pace. We can delimit points of time with ease and measure the intervals between them with great accuracy. In fact, although errors and random perturbations are frequent and important for many of the variables with which we shall be concerned, only exceptionally shall we have to consider errors of measurement in time itself. Such difficulties as arise in measuring time intervals are mostly manmade (e.g. the way in which the Christian world fixes Christmas Day but allows Easter Sunday to vary over wide limits). 1.2 From the earliest times man has measured the passage of time with candles, clepsydras or clocks, has constructed calendars, sometimes with remarkable accuracy, and has recorded the progress of his race in the form of annals. The study of time series as a science in itself, however, is of quite recent origin. Recording events on a chart whose horizontal axis is marked with equal intervals to represent equal spaces of time must have occurred a thousand years ago; for example, the early monkish chants recorded on the elevenline musical stave are a form of time series. In this connection Fig. 1.1 (from Funkhauser, 1936) is of some interest as the earliest diagram known in the Western world containing the essential concepts of a time graph. It dates from the tenth, possibly eleventh century, and forms parts of a manuscript consisting of a commentary of Macrobius on Cicero’s In Somnium Scipionis. The graph was apparently meant to represent a plot of the inclinations of the planetary orbits as a function of time. The zone of the zodiac is given on a plane with the horizontal (time) axis divided into thirty parts and the ordinate representing the width of the zodiacal belt. The precise astronomical sig¬ nificance need not detain us; the point is that, even at this early stage, the time abscissa and the variable ordinate were in use, albeit in a crude and limited way. 1.3 Notwithstanding the invention of coordinate geometry by Descartes, the pictorial representation of time series was a late development. As late as 1879,
2
General ideas
Stanley Jevons, whose book The Principles of Science was by no means intended for schoolboys, felt it necessary to devote some space to the use of graph paper. Possibly the first (and certainly one of the earliest) writers to display time charts in the modern way was William Playfair, one of his diagrams being reproduced in Fig. 1.2; the diagram was published in 1821. Playfair, the brother of the mathematician known to geometers as the author of Playfair’s axiom on parallel lines, made a number of claims to priority in diagrammatic presentation which, whether justified or not, at least demon¬ strated the general lack of awareness of such procedures. 1.4 In the nineteenth century, theoretical statistics was not the unified subject it has since become. Work in the physical sciences was largely independent of work in economics or sociology, and at that time the ideas of physics were entirely deterministic; that is, a phenomenon tracked through time was imagined as behaving completely under deterministic laws. Any imperfection, any failure of theory to correspond with fact, was either dealt with by modifying the theory in a deterministic direction (as, for example, in the discovery of Neptune) or attributed to errors of observation. During the latter part of the nineteenth century, attempts were made to apply the methods which had been so successful in the physical sciences to the biological and behavioural sciences. The deterministic approach was adopted, together with the rest of the mathematical apparatus which had been developed. At that point the modern theory of statistics began with the realization that„although individuals might not behave deterministically, aggregates of individuals were themselves subject to laws which could often be summarized in fairly simple mathematical terms. For a historical account of these developments see Stigler (1986).
General ideas
3
Fig. 1.2 Chart combining a graph and a histogram: from Playfair's A letter on our Agricultural Distress (1821)
1.5 Time series, however, resisted this change in viewpoint longer than any other branch of statistics. Until 1925 or thereabouts, a time series was regarded as being generated deterministically; the evident departures from trends, cycles or other systematic patterns of behaviour that were observed in Nature were regarded as ‘errors’ analogous to errors of observation. They occurred all too frequently in some fields, but were regarded in much the same way an engineer regards ‘noise’, as a fortuitous series of disturbances on a systematic pattern. In particular, fluctuating phenomena such as the trade ‘cycles’ of the nine¬ teenth century were subject to Fourier analysis only if they were generated by a number of harmonic terms. The failure of such models to account for much of the observed variation in such things as trade, population and epidemics, though disappointing, did not deter the search for underlying cyclical move¬ ments of a strictly harmonic kind. Indeed, the belief in their existence is not yet dead. 1.6 In 1927, Udny Yule (see his Statistical Papers, 1971) broke new ground with a seminal idea that underlies much of the subsequent work in time series analysis. Working on sunspot numbers, which obviously fluctuate in a manner that cannot be due entirely to chance, Yule was struck by the irregularities in the series, both in amplitude and in the distances between successive peaks and troughs. The illustration that he used to explain his fresh approach is classical: if we have a rigid pendulum swinging under gravity through a small arc, its motion is well known to be harmonic, that is to say, it can be represented by a sine or cosine wave, and the amplitudes are constant, as are the periods of the swing. But if a small boy now pelts the pendulum irregularly with peas, the motion is disturbed. The pendulum will still swing, but with irregular amplitudes and irregular intervals. Instead of leading to
4
General ideas
behaviour in which any difference between theory and observation is attribu¬ table to an evanescent error, the peas provide a series of shocks which are incorporated into the future motion of the system. This concept leads us to the theory of stochastic processes, of which the theory of stochastic time series is an important part. Its usefulness will be illustrated frequently in the course of this book. Stochastic processes also encompass the study of the distribution of times between events, such as the flow of customers joining a queue. This aspect of the subject will not be discussed further here, and the interested reader is referred to Cox and Miller (1968). 1.7 In contrast to other areas of statistics, the characteristic feature of timeseries analysis is that the observations occur in temporal order, which is not quite as trite as it sounds. The implication is that we shall, among other things, be interested in the relationship between values from one term to the next; that is, in the serial correlations along the series. When we come to consider several series, it becomes necessary to consider not only correlations between series, but also the serial correlations within each series. Furthermore, the extent to which one series leads, or lags, another becomes an important part of statistical modelling. 1.8 The study of relative position in time leads naturally to the thought that processes may exhibit spatial, or both spatial and temporal, dependence. Such questions are beyond the scope of this book, but have been examined by several authors, notably Bennett (1979), Cliff and Ord (1981), and Ripley (1981).
Discrete time series 1.9 When a variable is defined at all points in time, we say that the time series is continuous. Examples include the temperature at a given location, the price of a commodity on an open market, or the position of a projectile. Other variables are created by aggregation over a period, such as rainfall, industrial production or total passenger miles recorded by an airline. Yet others are defined only at discrete points in time such as annual crop yields of harvest, monthly salaries or the majority of a political party at a general election’ Sometimes we cannot choose the times at which data are recorded as, for example, with harvest yields. In other cases the choice may be limited or dictated by convention, as with the monthly publication of many government statistics. Under more controlled conditions, the recording times may be at choice; for example, regular surveys of political opinion may be carried out as often as funds allow, or medical staff may check a patients’s pulse every hour. Finally, many continuous records, such as temperature and barometric pressure recorded on a rotating drum, or the alpha rhythm of the brain on an encephalograph, may be digitised; that is, the continuous record is converted into a time series reported at regular intervals. In all these cases, the^set of time points is finite, and we speak of a discrete time series, whether the random variable being measured is continuous or discrete. All the statistical methods we shall consider relate to such discrete series; further, we shall usually assume that the data are recorded at regular points, or are aggregated over regular intervals of time. The analysis of irregularly recorded observations will be
Calendar problems
5
discussed briefly in Section 15.8. Nevertheless, it is useful to think of the time series as continuous in time when developing certain theoretical concepts. 1.10 In official statistics, the term ‘continuity’ is used in a different sense. Index numbers, such as the Index of Retail Prices, were historically based on a fixed set of weights, or ‘basket of goods’. Over time, consumers’ tastes change and new products become available so that the ‘basket’ must be changed, leading to a new set of weights. Each revision of the weights technically produces a discontinuity in the series, although the two segments can be spliced together at the point of change. Thus, continuity is taken to mean comparability over the period concerned. We shall not use continuity in this sense, although the notion of comparability finds more formal definition in the concept of stationarity introduced in Chapter 5. 1.11 There are two conflicting factors to be considered when the length of time between observations, or recording frequency, is at choice. First, for economic reasons, we do not want to take more observations than is necessary; yet, on the other hand, we do not want to miss important features of the phenomenon being studied. For example, if we are interested in seasonal variations, we should take several observations (quarterly, monthly) per year. However, if we wish to ignore seasonal variation it may be possible to take only one observation per year or to aggregate over shorter time periods. Likewise, examination of a patient’s overall health may require measurement of the pulse rate once an hour, whereas the detection of possible heart irregularities will utilise the continuous scan of an electrocardiograph (ECG), perhaps digitized on intervals of one tenth of a second or less. In time series analysis, as we shall often find, there are few rules of universal application; a great deal depends on the purpose of the study.
Calendar problems 1.12 Under experimental conditions, we can usually ensure that observations are regularly spaced; for social and economic data, however, problems may arise. To quote only the most obvious examples, the months are of different lengths and Nature failed to make the solar year an integral number of days. Further, a month may contain either four or five weekends. Movable feasts and public holidays contribute their own share of confusion, especially Easter, which may fall in either the first or second quarter of the year. Even series derived from experimental observation on the factory floor are not immune as interruptions in production may occur due to strikes, mechanical breakdowns, material shortages and even meal breaks. 1.13 A variety of methods for ‘cleaning up’ data are available, which we note briefly: (a)
Many figures recorded for calendar months can be adjusted by scaling to a standard month of 30 days, e.g., multiplying the figure for February by 30/28, that for March by 30/31 and so on. The time periods for which such data are recorded remain unequal, but this is rarely a major problem. It should be noted that the total for twelve ‘corrected’ months
6
(b) (c)
(d)
General ideas will not be exactly equal to the annual figure, even if corrected for a year of 360 days. Adjustments for production and similar series may be made by using the number of working days per month. Shortterm effects may sometimes be eliminated by aggregation. We may work with halfyearly periods rather than quarters to avoid the effects of a movable Easter or we may record weekly averages to avoid the effects of weekends; and so on. Data relating to value are problematical because of changes in the value of money. The best approach seems to be that of deflating the value series by a suitable price index.
1.14 As is evident from this discussion, such datarecording problems cannot be simply ignored. Yet much of the time series literature tends to assume these difficulties away. A notable exception is the US Bureau of the Census Xll seasonal adjustment procedure, which we discuss in Section 4.9. The Xll system provides a variety of adjustment procedures for trading day and other calendar effects. More recently, Bell and Hillmer (1983b) have developed a regression procedure to adjust for calendar day variations as part of the time series model building paradigm described in Chapters 57. This is clearly an area where further development would be desirable.
The length of a time series 1.15 When we refer to the ‘length’ of a time series, we tend to think of the elapsed time between the recorded start and finish. Indeed, this is appropriate when the phenomenon is recorded continuously. However, common usage in time series analysis decrees that a series is of length 60 when 60 observations have been recorded at regular intervals, whether elapsed time covers one minute, one hour or five years. 1.16 A more important point concerns the amount of information in the series as measured by the number of terms. In ordinary statistical work, we are accustomed to thinking of the amount of information in a random sample as being proportional to the size of the sample. Whether this is a correct usage of the word ‘information’ is arguable, but it is undoubtedly true that the variance of many of the estimates that we derive from random samples is inversely proportional to the sample size. This idea needs modification in time series analysis because successive values are not independent. A series of 2n values (even if it extends over twice the time) may not provide twice as much information as a series of n values. Further, if we sample a given time period more intensively by recording values at half the previous interval of obser¬ vation, thereby doubling the number of observations, we do not add much to our knowledge if successive observations are highly positively correlated. The consequence is that n, the number of observations, is not a full measure of the information content. We shall see that the precision of estimates that we obtain from data involves the internal structure of the series as well as the sample size.
Some examples of time series
7
Some examples of time series 1.17 The statistical methods we shall develop will be illustrated by appli¬ cation to a variety of observed series. We now describe some of the principal series we shall examine. Table 1.1 Annual yields per acre of barley in England and Wales from 1884 to 1 939 (data from the Agricultural Statistics)
Year
Yield per acre (cwt)
Year
Yield per acre (cwt)
1884 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1900 01 02
15.2 16.9 15.3 14.9 15.7 15.1 16.7 16.3 16.5 13.3 16.5 15.0 15.9 15.5 16.9 16.4 14.9 14.5 16.6
1903 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21
15.1 14.6 16.0 16.8 16.8 15.5 17.3 15.5 15.5 14.2 15.8 15.7 14.1 14.8 14.4 15.6 13.9 14.7 14.3
Fig. 1.3
Year
Yield per acre (cwt)
1922 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
14.0 14.5 15.4 15.3 16.0 16.4 17.2 17.8 14.4 15.0 16.0 16.8 16.9 16.6 16.2 14.0 18.1 17.5
Graph of the data of Table 1.1 (yields per acre of barley in England and
Wales, 18841939, annual)
8
General ideas
Table 1.2 Sheep population of England Wales for each year from 1867 to 1939 (data from the Agricultural Statistics)
Year
2 203 2 360 2 254 2 165 2 024 2 078 2 214 2 292 2 207 2 119 2 119 2 137 2 132 1 955 1 785 1 747 1 818 1 909 1 958 1 892 1 919 1 853 1 868 1 991 2 111
Year
Population (10 000)
1892 93 94 95 96 97 98 99 1900 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16
2 119 1 991 1 859 1 856 1 924 1 892 1 916 1 968 1 928 1 898 1 850 1 841 1 824 1 823 1 843 1 880 1 968 2 029 1 996 1 933 1 805 1 713 1 726 1 752 1 795
She
1867 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
Population (10 000)
Fig. 1.4
Graph of data in Table 1.2 (sheep population)
Year
Population (10 000)
1917 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
1 717 1 648 1 512 1 338 1 383 1 344 1 384 1 484 1 597 1 686 1 707 1 640 1 611 1 632 1 775 1 850 1 809 1 653 1 648 1 665 1 627 1 791 1 797
Some examples of time series
9
Table 1.1 gives the yields of barley in England and Wales for the 56 years 18841939. Table 1.2 gives the sheep population of England and Wales for the 73 years 18671939. Table 1.3 gives the miles flown by British airlines for the 96 months January 1963 to December 1970. Table 1.4 gives the immigration into the United States for the 143 years 18201962. Table 1.5 gives the number of births of babies, according to the hour at which they were born, for certain US hospitals. Finally, Table 1.6 gives for 196071 the quarterly average index of share prices on the London exchange as compiled by the Financial Times. The six series are plotted in Figs. 1.31.8. The selected series are typical of those which arise in practice; other series are presented in Appendix A at the end of the book, so that the reader may attempt to analyse some different examples. 1.18 Barley yields, by definition, occur only once each year. The sheep population, although continuously in existence, is observed only once a year at
Table 1.3 UK airlines: aircraft miles flown, by month (thousands)
Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.
Fig. 1.5
1963
1964
1965
1966
1967
1968
1969
1970
6 6 7 8 8 9 10 10 9 8 6 7
7 6 7 8 9 10 11 10 10 9 7 7
8 7 8 9 10 11 11 11 10 9 7 7
8 186 7 444 8 484 9 864 10 252 12 282 11 637 11 577 12417 9 637 8 094 9 280
8 7 9 10 10 12 12 12 13 10 8 9
8 8 10 10 11 10 10 12 13 10 9 10
9 8 11 8 12 14 13 13 15 12 10 12
10 10 13 13 13 14 14 14 16 12 11 12
827 178 084 162 462 644 466 748 963 194 848 027
269 775 819 371 069 248 030 882 333 109 685 602
350 829 829 948 638 253 424 391 665 396 775 933
334 899 994 078 801 950 222 246 281 366 730 614
639 772 894 455 179 588 794 770 812 857 290 925
491 919 607 852 537 759 667 731 110 185 645 161
Graph of the data of Table 1.3 (UK airlines: miles flown by month)
840 436 589 402 103 933 147 057 234 389 595 772
Table 1.4 Immigration into USA (Dewey 1963) 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829
8 385 9 127 6911 6 354 7 912 10 199 10 837 18 875 27 382 22 520
1870 1871 1872 1873 1874 1875 1876 1877 1878 1879
387 203 321 350 404 806 459 803 313 339 227 498 169 986 141 857 138 469 177 826
1920 1921 1922 1923 1924 1925 1926 1927 1928 1929
430 001 805 228 309 556 522 919 706 896 294 314 304 488 335 175 307 255 279 678
1830 1831 1832 1833 1834 1835 1836 1837 1838 1839
23 322 22 633 48 386 58 640 65 365 45 374 76 242 79 340 38 914 68 069
1880 1881 1882 1883 1884 1885 1886 1887 1888 1889
457 257 669 431 788 992 603 322 518 592 395 346 334 203 490 109 546 889 444 427
1930 1931 1932 1933 1934 1935 1936 1937 1938 1939
241 97 35 23 29 34 36 50 67 82
700 139 576 068 470 956 329 244 895 998
1840 1841 1842 1843 1844 1845 1846 1847 1848 1849
84 066 80 289 104 565 69 994 78 615 114 371 154 416 234 968 226 527 297 024
1890 1891 1892 1893 1894 1895 1896 1897 1898 1899
455 302 560 319 579 663 439 730 285 631 258 536 343 267 230 832 229 299 311 715
1940 1941 1942 1943 1944 1945 1946 1947 1948 1949
70 51 28 23 28 38 108 147 170 188
756 776 781 725 551 119 721 292 570 317
1850 1851 1852 1853 1854 1855 1856 1857 1858 1859
295 984 379 466 371 603 368 645 427 833 200 877 200 436 251 306 123 126 121 282
1900 1901 1902 1903 1904 1905 1906 1907 1908 1909
448 487 648 857 812 1 026 1 100 1 285 782 751
572 918 743 046 870 499 735 349 870 786
1950 1951 1952 1953 1954 1955 1956 1957 1958 1959
249 205 265 170 208 237 321 326 253 260
187 717 520 434 177 790 625 867 265 686
1860 1861 1862 1863 1864 1865 1866 1867 1868 1869
153 91 91 176 193 248 318 315 277 352
1910 1911 1912 1913 1914 1915 1916 1917 1918 1919
1 041 570 878 587 838 172 1 197 892 1 218 480 326 700 298 826 295 403 110618 141 132
1960 1961 1962
265 398 271 344 283 763
640 918 985 282 418 120 568 722 680 768
Note: The years are ended June 30, except for certain earlier years up to 1868, and occasional adjustments have been made to ensure comparability.
The objectives of time series analysis
Fig. 1.6
11
Graph of the data of Table 1.4 (immigration into the USA, 1 8201 962,
annual)
a fixed date (June 4) so that seasonal movements are omitted. The airline data present a characteristic pattern of seasonal variation on a rising trend. Immigration, also on an annual basis, shows fluctuations, some of which can be identified with events such as war. The birth data are exceptional in that we have graphed the square root of the number rather than the number of births themselves (on the grounds that births probably follow a Poisson distribution and the square root transformation stabilises the variance), and in that form they reveal a remarkable cyclical pattern. The F.T. indexnumbers are typical of fluctuations in the stock market over a period of time.
The objectives of time series analysis 1.19 It would be inappropriate to launch into a microscopic analysis of the various reasons for analysing time series, but a few general comments are in order because they often determine the methods of analysis to be used. Broadly speaking, we may identify five major types of investigation. (a)
(b)
(c)
At the most superficial level, we take a particular series and construct a simple system, usually of a more or less mathematical kind, which describes its behaviour in a concise way. Penetrating a little deeper, we may try to explain its behaviour in terms of other variables and to develop a structural model of behaviour. Stated another way, we set up the model as a hypothesis to account for the observations. We may, from either (a) or (b), use the resulting model to forecast the behaviour of the series in the future. From (a) we work on the assumption
Table 1.5 Number of normal human births in each hour in four hospital series, transformed to y = square root of number of births (Bliss 1958; King 1956) Jbirths = y in hospital Hour starting
A
B
C
D
Total
y
Observed
Expected Y
Mt AM
12 1 2 3 4 5 6 7 8 9 10 11
13.56 14.39 14.63 14.97 15.13 14.25 14.14 13.71 14.93 14.21 13.89 13.60
19.24 18.68 18.89 20.27 20.54 21.38 20.37 19.95 20.62 20.86 20.15 19.54
20.52 20.37 20.83 21.14 20.98 21.77 20.66 21.17 21.21 21.68 20.37 20.49
21.14 21.14 21.79 22.54 21.66 22.32 22.47 20.88 22.14 21.86 22.38 20.71
74.46 74.58 76.14 78.92 78.31 79.72 77.64 75.71 78.90 78.61 76.79 74.34
18.6150 18.6450 19.0350 19.7300 19.5775 19.9300 19.4100 18.9275 19.7250 19.6525 19.1975 18.5850
18.463 18.812 19.129 19.393 19.587 19.697 19.716 19.641 19.479 19.240 18.941 18.602
M PM
12 1 2 3 4 5 6 7 8 9 10 11
12.81 13.27 13.15 12.29 12.92 13.64 13.04 13.00 12.77 12.37 13.45 13.53
19.52 18.89 18.41 17.55 18.84 17.18 17.20 17.09 18.19 18.41 17.58 18.19
19.70 18.36 18.87 17.32 18.79 18.55 18.19 17.38 18.41 19.10 19.49 19.10
20.54 20.66 20.32 19.36 20.02 18.84 20.40 18.44 20.83 21.00 19.57 21.35
72.57 71.18 70.75 66.52 70.57 68.21 68.83 65.91 70.20 70.88 70.09 72.17
18.1425 17.7950 17.6875 16.6300 17.6425 17.0525 17.2075 16.4775 17.5500 17.7200 17.5225 18.0425
18.246 17.897 17.579 17.315 17.121 17.011 16.993 17.067 17.229 17.468 17.767 18.106
Total
327.65
457.54
474.45
502.36
1762.00
18.3542
The objectives of time series analysis
13
Table 1.6 Financial Times Index of leading equity prices: quarterly averages, 196071
Year and quarter
Index
1960
1 2 3 4
323.8 314.1 321.0 312.9
1964
1 2 3 4
335.1 344.4 360.9 346.5
1968
1 2 3 4
409.1 461.1 491.4 490.5
1961
1 2 3 4
323.7 349.3 310.4 295.8
1965
1 2 3 4
340.6 340.3 323.3 345.6
1969
1 2 3 4
491.0 433.0 378.0 382.6
1962
1 2 3 4
301.2 285.8 271.7 283.6
1966
1 2 3 4
349.3 359.7 320.0 299.9
1970
1 2 3 4
403.4 354.7 343.0 345.4
1963
1 2 3 4
295.7 309.3 295.7 342.0
1967
1 2 3 4
318.5 343.1 360.8 397.8
1971
1 2 3 4
330.4 372.8 409.2 427.6
(d)
Year and quarter
Index
Year and quarter
Index
that, even though we may be unaware of the basic mechanism which is generating the series, there is sufficient momentum in the system to ensure that future behaviour will be like the past. From (b) we have, we hope, more insight into the underlying causation and can make projections into the future more confidently. Using a structural model, as in (b), we may seek to control a system, either by generating warning signals of future untoward events or by examining what would happen if we alter either the inputs to the system or its parameters.
14
General ideas
(e)
More generally, we may wish to consider several jointly dependent variables, known as a vector process. In such cases we are approaching the more general subject area of statistical modelbuilding as, for example, in the simultaneous equation systems developed in econometrics (cf. Johnston 1984). Indeed, in recent years the research in multiple time series and in econometrics has drawn much closer together (cf. Harvey 1981; Hendry and Richard 1983).
Decomposition 1.20 A survey of the examples of time series already given, and of the many others which are doubtless known to the reader, suggests that we may usefully consider the general series as a mixture of four components: (a) (b) (c) (d)
a trend, or longterm movement; fluctuations about the trend of greater or less regularity; a seasonal component; a residual, irregular, or random effect.
It is convenient to represent the series as a sum of these four components, and one of the objectives may be to break the series down into its components for individual study. However, we must remember that, in so doing, we are imposing a model on the situation. It may be reasonable to suppose that trends are due to permanent forces operating uniformly in more or less the same direction, that shortterm fluctuations about these long movements are due to a different set of causes, and that there is in both some disturbance attributable to random events, giving rise to the residual. But that this is so, and that the effects of the different causes are additive, are assumptions which we must always be ready to discard if our model fails to fit the data. 1.21 Perhaps the easiest components to understand are those which are undoubtedly due to physical factors, e.g. diurnal variations of temperature, the tidal movements associated with the lunar months, and seasonal variation itself. We must be careful not to confuse such effects with fluctuations of a pseudocyclical kind such as trade ‘cycles’, or with sunspot ‘cycles’ in which there is no known underlying astronomical phenomenon of a periodic kind.* The definition of seasonality, however, is by no means as easy as one might think. A glance at Fig. 1.5 will illustrate one of the problems. In this series of airmiles travelled, there are undoubtedly seasonal effects, a peak around Christmas, another at Easter, and one in the summer, all due to holiday travel. But the recurrence at Easter varies with Easter itself and therefore does not occur at the same date each year; and the pattern of the variation is altering from year to year, owing partly to the increased volume of traffic and partly to the spread of the period over which holidays are now taken. Ifi short, our seasonal effect itself has a trend.
Although it has been suggested that an apparent fouryearly swing in the British economy is due to the manmade tact that General Elections must be held at no greater than fiveyearly intervals.
Plan of the book
15
1.22 As we shall see when we come to a detailed study, it seems that trend and seasonality are essentially entangled, and we cannot isolate one without, at the same time, trying to isolate the other. Conceptually, however, they are distinct enough. Trend is generally thought of as a smooth broad movement., nonoscillatory in nature, extending over a considerable period of time. However, it is a relative term. What appears as a trend in climate to a drainage engineer may be nothing more than a temporary observation or a shortterm swing to the geologist, whose time scale is very much longer. 1.23 If we can identify trend and seasonal components and then eliminate them from the data, we are left with a fluctuating series which may be, at one extreme, purely random, or at the other, a smooth oscillatory movement. Usually, we have something between these extremes; some irregularity, especi¬ ally in imperfect data, but also systematic effects due to successive obser¬ vations being dependent. We prefer to call this systematic effect an oscillation rather than a cycle, unless it can be shown to be genuinely cyclical in the pattern of recurrence, and in particular, that its peaks and troughs occur at equal intervals of time. Very few economic series are cyclical in this strict sense.
Notation 1.24 As noted earlier we shall usually assume that the series is observed at equal intervals of time, and as a rule no generality is lost if we take these intervals as units. Hence, we may denote a series by using subscripts, such as yi,y2,y3, etc, the observation at time t being yt. Here we suppose obser¬ vations to begin at t = 1, but if necessary we can represent previously occurring values by y0, y~\, y2, etc.
Plan of the book 1.25 Having presented an overview of time series analysis, we now outline briefly how we shall chart our path through the subject. 1.26 Chapters 24 deal with what might be termed the ‘classical’ approach to time series analysis. In Chapter 2 we describe various tests of randomness since, if the series is devoid of structure, further analysis is pointless. Chapter 3 describes various methods of trend removal and Chapter 4 discusses seasonality and seasonal adjustment; these methods are still very relevant since they form the basis of nearly all procedures for the adjustment of official series. 1.27 Chapters 59 describe the analysis of a single series in the time domain. Chapter 5 contains a development of the widely used class of linear stochastic models, the autoregressive movingaverage (ARMA) schemes, which form the basis for much of the theoretical and applied work in the area. Chapter 6 describes the sampling properties of the serial correlations which are used to assess the form of dependence in a series; these coefficients provide a natural basis for model specification, or identification.
16
General ideas
The modelbuilding paradigm we shall follow is summarised in Fig. 1.9 and derives from the basic work of Box and Jenkins (1976) in this area. Chapter 7 then goes on to describe estimation procedures for ARMA schemes and Chapter 8 discusses univariate or autoprojective forecasting procedures. Many of the forecasting procedures in current use were developed directly as multiple component forecasting models rather than as integrated models for the time series itself. This leads naturally to the use of the ‘statespace’ approach, outlined in Chapter 9, which has drawn increasing attention from statisticians in recent years. 1.28 In Chapter 10 and the first part of Chapter 11, we examine the behaviour of time series in the frequency domain, first for single series and then for several series. Frequencydomain analysis provides the natural vehicle for identifying strictly cyclical phenomena and is especially useful in engineer¬ ing and the physical sciences. In other areas, its primary value is as a descriptive tool since exact cycles are rarely observed. 1.29 In Chapters 1113 we examine structural models for a single dependent
Fig. 1.9
A general paradigm for univariate time series modelling
Bibliography
17
series. The use of transfer functions and intervention analysis enables us to combine the benefits of structural modelling from the regression area with the explicit modelling of error processes underlying univariate time series analysis. Chapter 14 provides an overview of recent developments in vector time series models and Chapter 15 summarises other developments in the field which have not been considered explicitly in this book.
Bibliography 1.30 The emphasis in this volume is upon the methodological aspects of time series analysis rather than the formal development of theory. We shall often refer to Kendall et al. (1983), The Advanced Theory of Statistics, Volume 3, Chapters 4551, as a source for such theoretical developments. Volumes 1 (Stuart and Ord, 1987) and 2 (Kendall and Stuart, 1979) will also be used as standard references for theoretical results. To assist readers who may have access to different editions, we shall refer to sections in the Advanced Theory rather than pages. Other useful books on the theory of time series are T. W. Anderson (1971), Brillinger (1981), Fuller (1976), Hannan (1970) and Priestley (1981). Two useful volumes of papers describing recent research in the frequency domain and the time domain, respectively, are edited by Brillinger and Krishnaiah (1983) and Hannan, Krishnaiah, and Rao (1985). An overview of recent developments is given in the review papers by Newbold (1981, 1984), whereas current developments in forecasting are evaluated in a special issue of the International Journal of Forecasting (1988, number 4).
2
Tests of randomness
2.1 In the opening chapter, we considered the classical decomposition of a time series into trend, seasonal, oscillatory and irregular components. In many cases the presence of such components is very clear, as with the downward trend in the sheep series of Table 1.2 or the marked seasonal pattern in aircraft miles flown, shown in Table 1.3. However, there are cases, such as the barley data of Table 1.1, where the issue is not as clearcut and a more accurate test is needed. Other instances may arise where some components have been removed, or filtered out, and we wish to test the residuals to see whether any structure remains. In this chapter we shall concentrate upon tests of random¬ ness where the null hypothesis is that the observations are independent and identically distributed. That is, the observations are equally likely to have occurred in any order. 2.2 A considerable variety of possible tests of randomness is available, but certain selection criteria may be specified. (a) (b) (c)
The test should not make any restrictive assumptions about the underly¬ ing distribution. When new observations arise over time, it should be possible to update the test statistics without having to perform the calculations ab initio. The choice of test should depend upon the alternative hypothesis we have in mind.
In the rest of this chapter, we consider some simple tests corresponding to intuitive ideas about departures from randomness. These tests use only the relative positions of the observations in the series and make no specific distributional assumptions, except that the random variables are continuous.
Turning points 2.3 A simple departure from a random series arises when the series exhibits welldefined turning points. A turning point is defined as either a ‘peak’ when a
Turning points
19
value is greater than its two neighbouring values, or a ‘trough’ when the value is less than its two neighbours. A simple test is given by counting the number of peaks and troughs, a particularly simple procedure when the series has been plotted. In order to carry out the test we must determine the distribution of the number of turning points in a random series. 2.4 Consider a time series consisting of n values y\, yz,..., yn Three consecutive points are needed to define a turning point; thus yi > y,~ i
and
y, > y,+i
defines a peak at time i,
whereas y, < yi1
and
y, < y,+1
defines a trough at /.
Turning points are possible at the (n  2) times 2, 3,..., n  1 since only one equality can be checked for y\ and y„. If we define the indicator variables Ui 1, when there is a turning point at time / = 0, otherwise, then the number of turning points is given by
p=nYi Ut. i=2
(21)
Consider now the three value [y,1, y„ yi+1}. When the series is random, the six possible orders are equally likely: 123 231
132 312
213 321
where ‘3’ denotes the largest value and ‘1’ the smallest. Four of these six yield a turning point, so that the expected values of U\ is £(£/,)= ld) + 0(26) = * and E(p) = Z E(Ui) = 23(n 2).
(2.2)
By an extension of this argument (see Kendall et al. 1983, Section 45.18), we find that the variance of p is var(p) =
\6n  29 90
Further, the distribution of p approaches normality rapidly as n increases. Therefore, we may carry out the test using z = B—MH—V.
(2.3)
jvar(/?)}1/2
When the null hypothesis is true, the distribution of z is, approximately, the standard normal.
Example 2.1
In the barley data of Table 1.1 there are 56 values, but at two points (1906/7 and 1910/11) the values in successive years are equal. We shall consider each of these as a single point and reduce the number n to 54. There
20
Tests of randomness
are 35 turning points in the series. The expected number is f(52) = 34.67. Agreement is so close that no test is necessary. For the record, we note that var (p)
16(54)29 90
= 9.278
and (35  34.67) = 0 1 3.04
Phaselength 2.5 Another feature of interest in a series is the length of time between successive turning points; from a trough to the next peak is a run up, whereas peak to trough represents a run down. Thus, if yt is a trough and yi+d is the next peak, there is a run up, or phase of length d. To define a phase, a run up say, of length d, we need the specific pattern
T/i
yi < yi+i < yi+2 < ••• < yi+d > yi+d+1.
T
T
trough
peak
Arguments similar to those for the turning points test (Kendall et al. 1983, Section 45.19) show that the number of phases of length d, Nd, has expected value E(Nd) = 2 (nd 2 )(d2 + 3 d+ 1 )j{d + 3)!
(2.4)
for 1 < d ^ n  3. The total number of phases, N, has approximate expected value E{N) = \{2nl).
(2.5)
The distribution of N tends to normality for large n, see Levene (1952). Gleissberg (1945) tabulated the actual distribution for n ^ 25. 2.6 The distributions of the ratios Nd/N do not tend to normality. However, Wallis and Moore (1941) showed that when observed and expected numbers are compared for phases of length d= 1,2 and ^3, the usual x2 statistic may be used with the following approximate percentage points for the upper tail:
Example 2.2
a
0.10
0.05
0.01
value
5.38
6.94
10.28
In the barley data of Table 1.1, there are 34, phases. As before, we take n = 54; only complete phases are counted, starting at the peak of 1885 and ending at the peak of 1938. Their actual lengths and the theoretical values given by (2.4) are as shown in Table 2.1. The values are so close that a test is hardly necessary; the x2 statistic has the value 0.83, clearly not significant.
Tests for trend
21
Table 2.1
Phaselength
No. of phases observed
Theoretical
1 2 ^3
23 7 4
21.25 9.17 3.25
Total
34
33.67
Tests for trend 2.7 The phaselength and turning point tests could be used to look for trends, but their primary value is in detecting cyclical effects in a series. More direct tests for trend are obtained by comparing successive terms and examining them for decreases or increases. The simplest such test is the differencesign test which counts the number of points of increase in the series. That is, we define the indicator variable Uj= 1
if Ti'+i > yi
—0
if yi yj when j > i otherwise.
Then let
Q=
ZiU i — Po + Pit + /32t2; it follows that
Vyt = P\ — @2 + 2@2t,
V2_y, — 2(32,
and
k> 2.
Vky, = 0,
In general, if the trend follows a polynomial of degree d, we have
k > d.
V(/yt = constant and Vky, = 0, This provides a straighforward Exponential trends such as
way
of
removing
polynomial
trends.
y, = /3oe^/ + 02r can be removed by taking logarithms and then differencing: Zt = loge yt,
V2z, = 2(32,
k> 2.
Vkz< = 0,
3,15 In order to use differencing to remove trends, we must decide how many differences to take. Suppose it is reasonable to consider the model yt = Po + @it + ••• + (3ktk + e,,
(3.30)
where the errors are independent and identically distributed with zero mean and variance a2. Since the errors are independent, we obtain, for successive values of k
k = 0: a o = var(>’,)  a2 k = 1: o2 = var(V>>,) = var(Vc,) = var(e,) + var(e,_i) — 2a2,
k = 2: a I = var(V2j>,) = var(V2e,) = var(e,) + 4 var(e,_i) + var(e/+2) = 6a2. Generally, we find a2k = var(V*j>,) =
2k (3.31)
Working with model (3.30), we might select that value of k for which the ratio Vk = sample variance of (Vky,)
2 k\
,2 (2 Id
=
ak
(3.32)
is minimized. This approach to selecting k is known as variate differencing', see Kendall et al (1983, Sections 46.2432) for further details. Unfortunately,
Differencing
39
model (3.30) is generally unrealistic as the errors are often correlated. An alternative approach is to consider the sequence of alternative models
V** = e, and then choose the value of k which minimizes dl = var(V*_y,). Again, the procedure is affected by correlation among the error terms but gives a lower and usually more realistic value for k.
Example 3.3
From the sheep data of Table 1.2 we obtain Table 3.3. The two criteria give strikingly different results. As we shall see in Chapter 7, there is considerable evidence to favour the analysis based upon model (3.32); the discussion in Sections 6.1922 provides a more formal criterion for selecting k. 3.16 The discussion in this chapter serves to demonstrate that trend fitting and trend removal need to be approached with some care. The choice of method and the mode of application require an appreciation of the subject matter of the series being analysed and an element of personal judgement. To a scientist it is always felt as a departure from correctness to incorporate subjective elements into the analysis. The student of time series cannot be a purist in that sense. What can be done is to make available the primary data and to explain unambiguously how the analysis was performed. Anyone who disagrees with what has been done can then carry out his or her own investigation. Table 3.3
2
k
Vk
Ok
0
49 640 3 500 1 463 866 637 526
49 7 8 17 44 132
1 2 3 4 5
640 001 780 319 563 569
Exercises 3.1 3.2 3.3
3.4 3.5
Derive the weights given in (3.8) for fitting a quadratic to five points. Compare the effects of smoothing the barley data, given in Table 1.1, with (a) a cubic fitted to seven points and (b) a simple [3] [5] average. Using (3.15), calculate the variance for a random series using each of the moving averages used in Exercise 3.2. Compare these theoretical values with the variances for the smoothed barley series. Verify the weights for Spencer’s 15point formula, given in (3.10). Calculate the variance of yt, Vy, and S/2y, for the Financial Times Index data given in Table 1.6. What order of differencing is appropriate? Compare your results with those from Exercise 2.1. (Note: The random walk model V_y, = e,, described in Section 5.12, is the basis of models for an efficient stock market. Why? The model was
40
3.6
Trend first proposed by Bachelier in 1900 but his work was not followed up until the 1950s and 1960s. The model is now a cornerstone of financial theory.) Consider the series 1,2,4,8,16,32,64,128,....
3.7
What happens if you difference to try to remove the trend? Take logarithms (log2(2^) = k) and try differencing again. Show that WsT, = V,V^.
Appendix 3A: Properties of the V and B Operators We assume that c is a constant and {y,) is a time series. The following properties hold for B: Be = c; Bey, = cy,i = cBy, (a\B' + a2BJ)y, = ct\B‘y, + a2BJy, = ct\y,.j + a2ytj B'Bjy, = Bi+jy, = yt_t_j (T aB)y,= !1 + aB+ a2B2 + ^y, = yt+ ayt1 + a2yt2 + •••, provided \a \ < 1. An analogous set of properties hold for V, save that Vc = 0.
4
Seasonality
4.1 Seasonal effects, although they may vary somewhat in their average time of occurrence during the year, have a degree of regularity which other elements of time series usually do not. When we discuss spectrum analysis, we shall see that it is sometimes possible to isolate the seasonal component. However, in the time domain it is impossible to determine the seasonal effects without some prior adjustments for the trend. Consider, for example, a monthly series consisting of a slowly rising trend, 100 for January 1970, 101 for February 1970,..., 112 for December 1970, 113 for January 1971 and so on. In any year January is the lowest month and December is the highest. Yet these are not seasonal effects which, in this case, do not exist. The problem is to distinguish such cases from, for example, the monthly sales of Christmas cards, which presumably also have their highest value in December, such variation being seasonal in any ordinary sense of the word. 4.2 There are several different reasons for wanting to examine seasonal effects, just as there were various reasons for looking at residual effects after the removal of trend: (a)
(b) (c)
To compare a variable at different points of the year as a purely intrayear phenomenon; for example, in deciding how many hotels to close out of season, or at what points to allow stocks to run down. To remove seasonal effects from the series in order to study its other constituents uncontaminated by the seasonal component. To ‘correct’ a current figure for seasonal effects, e.g. to state what the unemployment figures in a winter month would have been if customary seasonal influences had not increased them.
These objectives are not the same, and it follows that one single method of seasonal determination may not be suitable to meet them all. This is, perhaps, the reason why different agencies (especially in government) favour different techniques for dealing with the seasonal problem.
42
Seasonality
Types of model 4.3
We shall consider three types of model, depending on whether the seasonal effect is additive or multiplicative. If m, is the smooth component of the series (trend and cyclical effects), s, is the seasonal component and e, the error term, we may have
yt — Hit + St + £/,
(4.1)
yr = m,st£h
(4.2)
or, the multiplicativeseasonal model
yt = rn,s, + e,.
(4.3)
The purely multiplicative model (4.2) may be converted to linear form by taking logarithms log y, = log mt + log St + log £,.
(4.4)
In making the transformation (4.4) we assume that e, in (4.2) can only take on positive values; otherwise log e, is undefined. Thus we may choose to write rj, = log et, where rj, is a random variable with zero mean. Then (4.4) becomes log yt = log Hi, + log s, + 7],
(4.5)
y, = m,s,ev'.
(4.6)
corresponding to
4.4
We shall now concentrate upon the additive model, returning to the mixed model in Section 4.6. Since we wish to separate out the seasonal and trend components, it is reasonable to impose the condition that the sum of the seasonal effects is zero. Thus, for monthly data, the subscript t may be written as t = 12(/  1) + j, corresponding to the y'th month of the fth year. Assuming the seasonal effects are the same in different years, we would impose the condition 12
12
Esf=E$/ = 0 J=1
(4.7)
7=1
since Si2i+j=Sj for all / and j. To determine the trend we may now take a 12month centred moving average (as suggested in Section 3.11) with weights A [2] [12]= ^ [1,2,2,..., 2,1].
(4.8)
For quarterly data, (4.7) is summed over the four components of Sj and (4.8) becomes § [2] [4], This moving average removes seasonality as Example 4.1 illustrates. The values (y,  trend) yield sj+e,, or simply Sj for the errorfree series in the Example.
Example 4.1
Consider the quarterly series with values
yt = 10 + t + Sj, S1 = —3,
52
= 1,
t = 4(/  l) + jt 53
= 4,
54
= 2.
The effect of applying the movingaverage is as shown in Table 4.1
Types of model
43
Table 4.1 (1) Year
(2) Quarter
(3) Series
(4) 8 [2][4]
(5) Col. (3)  Col. (4)
1
1 2 3 4
8 13 17 12
13 14
4 2
2
1 2 3 4
12 17 21 16
15 16 17 18
3 1 4 2
3
1 2 3 4
16 21 25 20
19 20 21 22
3 1 4 2
4
1 2 3 4
20 25 29 24
23 24
3 1
The use of a simple moving average, combined with the restriction (4.7), serves to eliminate a quadratic trend rather than a linear one. In general, the order of polynomial removal is increased by one when restriction (4.7) is imposed.
Example 4.2
Consider a quarterly series with a quadratic trend as shown in Table 4.2. If a fixed seasonal effect was added to the series in column (3), its mean under averaging would be zero and it would be added to column (5). Since we have constrained the seasonal components to sum to zero, the values in column (5) should be adjusted to sum to zero. This gives the same result for seasonality as if we had added 1.5 to the centred average in column (4), which would then reproduce the original quadratic exactly. The procedure for computing the deseasonalised series may be summarised as follows: we assume that (p+l) complete years of monthly data are available, giving 12(p + 1) observations in all. (1) (2)
Estimate the trend, mt, using an appropriate moving average. Compute
xt = yt  m,
(4.9)
and estimate the seasonal component by
sj = xjx
(4.10)
where
xj= 2 x, (=i
and
y=SY/'/12 J
(4.11)
Seasonality
44
Table 4.2 (1) Year
(2) Quarter
1
1 2 3 4
0 1 4 9
5.5 10.5
 1.5 1.5
2
1 2 3 4
16 25 36 49
17.5 26.5 37.5 50.5
 1.5  1.5  1.5 1.5
3
1 2 3 4
64 81 100 121
65.5 82.5 101.5 122.5
 1.5  1.5 1.5  1.5
4
1 2 3 4
144 169 196 225
145.5 170.5
1.5  1.5
(3) Series
(4) s [2] [4]
(5) Col. (3)  Col. (4)
The t subscript is defined as
t= 12(i  l) + y,
y = 7,..., 12
t — 12/ +y,
7 = 1 > •••>6,
the different subscripts being to allow for the observations omitted at the ends of the series. (3)
The deseasonalized series is then y,  Sj. The following example illus¬ trates the method.
Example 4.3
Table 4.3 gives the quarterly index numbers of the wholesale price of vegetable food in the United Kingdom for the years 195158 For arithmetic convenience the scale is multiplied by 10 and the series'then transferred to origin 300 in Table 4.4.
Table 4.3 Quarterly index numbers of the wholesale price of vegetable food in the United Kingdom, 195158 (data from the Journal of the Royal Statistical Society for appropriate years; 186777 = 100)
First quarter 2nd quarter 3rd quarter 4th quarter
1951
1952
1953
1954
1955
1956
19$7
1958
295.0 317.5 314.9 321.4
324.7 323.7 322.5 332.9
372.9 380.9 353.0 348.9
354.0 345.7 319.5 317.6
333.7 323.9 312.8 310.2
323.2 342.9 300.3 309.8
304.3 285.9 292.3 298.7
312.5 336.1 295.5 318.4
Types of model
45
Table 4.4 Data of Table 4.3 with origin 300, values multiplied by 10
First quarter 2nd quarter 3rd quarter 4th quarter
1951
1952
1953
1954
1955
1956
50 175 149 214
247 237 225 329
729 809 530 489
540 457 195 176
337 239 128 102
232 429 3 98
1957
1958
43  141 77 13
125 361 45 184
Table 4.5 gives the residuals after elimination of trend by a centred average of fours. The mean values for each quarter (over seven years) are shown in the last column. These means sum to 24.01 with an overall mean of 6.00. Thus the seasonal effects are measured by subtracting 6.00 from the last column, e.g. 68.46  6.00. After division by 10 to restore the original scale we have for the seasonal factors in the four quarters 6.25,
8.62,
8.84,
6.03,
which sum to zero as required. The seasonally adjusted series is given by subtracting these values from y, in Table 4.3. Table 4.6 gives the similar residuals obtained by fitting a sevenpoint cubic 21 [2, 3, 6, 7]. The seasonal adjustments are found to be 6.81,
6.87,
8.07,
5.61.
The difference between the two sets of results are not very great. The seasonal effect is perceptible but not very marked. 4.5 Our development has assumed that the trend values should be determined at every point of the series, except the ends, in order to compute the seasonal effects. In fact, Durbin (1963) showed that seasonal effects can be computed directly from the original series; details are given in Kendall et al. (1983, Section 46.40). . . We now consider the multiplicativeseasonal model given in (4.3). Although the model is nonlinear, the component, mt, is usually estimated by a moving average, as before. Indeed, the only changes in steps (1)—(3) given in Section 4.4 are that the detrended series is now estimated by
4.6
x, = ytjm,
(4.12)
and the seasonal components are defined as Sj
= xjjx,
(4.13)
where x.j and x are as in (4.1). Finally, the deseasonalized series is given by y,lsj. There is an element of arbitrariness here in that it seems odd to set the mean of the seasonal effects equal to one, rather than their product. However, no method is likely to resolve the problem completely, and this rule seems to work well.
4.7
Up to this point we have assumed that the seasonal effects do not change over time. If, for any reason, the seasonal pattern is thought to be changing, the estimates of the effects must also change over time. Several methods have been developed to handle this problem, the best known being the Census
NO to ON ’ 1
to •O ON
to ON 1
O »o hOn h' m
1
1
O o O oi _ f(.VX  1)/X, (loge y,
X 5^ 0 X = 0,
(5.10)
where, typically, — 1 ^ X ^ 1 and the random variable y is such that the probability of a negative value is negligible. Provided p2 > y0, an approxi¬ mation for the variance of the transformed variable is var/X) = p2X
2varO>).
(5.11)
Thus if the variance of y appears to increase linearly with the mean, we should use X = 0.5. If the variance increases quadratically with the mean, X = 0 or the logarithmic transformation is appropriate. Box and Cox (1964) estimate X by maximum likelihood, but a graphical check for the values X =  1, 0.5,0,0.5,1 will often be sufficiently accurate. In Fig. 5.1, we give the plots of y,, Jy^ and log y, for the immigration data of Table 1.4. Fluctuations appear greatest just after the turn of the century when immi¬ gration peaked. These fluctuations are less noticeable for Jy~t, whereas the plot for log y, suggests that the fluctuations are virtually unrelated to the level of y,. Accordingly, a log transform would appear to be appropriate for these data. 5.5 The third requirement for stationarity is the stability of the autocorre¬ lations. There are no formal tests of this assumption in general use although, when the observed series is sufficiently long to justify the step, it is worth dividing the series into two parts and comparing the sample autocorrelations computed separately from each part.
The autocorrelation function 5.6 The set of values Pk and the plot of Pk against k= 1,2, ...are known as the autocorrelation function (ACF). All the autocorrelations must lie in the range [1, +1] but, in addition, they must satisfy the conditions Vk = var
ajy,)j ^ 0,
(5.12)
5 anC* c^°*ces °f the constants (a,); a0  1 without loss of generality. . .. ^ (5.12) implies  pi  ^ 1. For k = 2, it may be shown that V2 is minimized when ct\ and ai are given by
ai
=
— Pl (1 — p2 )
1
Pl
and
a2 =
(P2  p 1
ipi
(5.13)
Substituting back into (5.12) we obtain the constraint
P2 — 2p2 + 1 ^ 0.
(5.14)
Thus if y = 0.8, we must have p2 ^ 0.28. If successive values are highly correlated, there is still a strong correlation between values two time periods
The Markov scheme
55
The partial autocorrelation function 5.7 The carryover effect just observed suggests that it would be useful to look at the partial autocorrelation between y, and ytk, after allowing for the effect of the intervening values yt\, ...,ytk+\ It follows from the general expression for partial correlations (Kendall and Stuart, 1979, Section 27.35) that the partial autocorrelation between y, and t,z, allowing for yt\, is corr(.y,, ytZ\yt\) = ^—^
(5.15)
which is exactly the coefficient a2 in (5.13). As we shall see in Section 5.14, this is no accident. Higherorder partial autocorrelations may be defined similarly, but we prefer to develop these expressions in a more intuitively appealing manner in Section 5.11.
Example 5.1 (a) (b)
If pi = 0.8 and p2 = 0.28, cov(.y,, y,2 \ yt\) =  1, confirming the con¬ straint given by (5.14). If P2 = Pi, cov(T/, yi2  T/i) = 0, a property which arises naturally in Section 5.9.
Autoregressive processes 5.8 An important classs of linear timeseries models is the set of autoregres¬ sive schemes. The pthorder scheme, denoted by AR(p), may be written as yt = 5 + (f)iy, \ + ■•■ + (f)pytp + cp.
(5.16)
the coefficients i,..., 4>p being the (auto)regressive coefficients for y, on yti, ...,y,p with 5 denoting the constant term. The error term, ef, is assumed to have the properties E(c,) = 0,
var (e,) = a2,
(5.17a)
cov(e,, E,k) = 0,
k A 0,
(5.17b)
cov(c,,ytk) = 0,
k > 0.
(5.17c)
and
Condition (5.17c) states that the new error is independent of past values of the process; this assumption is critical to later developments. We shall now examine two important special cases of (5.16), the Markov scheme (p = 1) and the Yule scheme (p = 2).
The Markov scheme 5.9 The Markov, or AR(1), scheme may be written as
yt = 5 + (friyt1
+
€(.
(5.18)
56
Stationary series
Taking expected values and recalling that E(y,) — p given stationarity, we obtain E(y,) = p = 0,
l02 + 0i>O
and

021
< 1
(5.44)
These are precisely the conditions required for stationarity. In general, the
60
Stationary series
Yule scheme (5.38) gives rise to the auxiliary equation 0(*) = 1 0iX02*2 = O,
.
(5.45)
and the conditions for stationarity are that the roots of (5.45) should be greater than one in absolute value. These conditions are equivalent to those in (5.44) and are often stated as the requirement that the roots of (x) lie outside the unit circle.
YuleWalker equations 5.15
The pair of equations (5.40) are known as the Yule Walker equations. It is evident from (5.40) that the first two autocorrelations determine the autocorrelation structure of the Yule process completely. In general, the first p autocorrelations determine uniquely the coefficients of an AR(p) scheme see Section 5.19. If we multiply (5.38) by ut~k and take expectations, we obtain pk = 0ip*i + 4>2pk~2,
(5.46)
recalling that p0 = 1 and p, = p_ , Equation (5.46) is a secondorder difference equation, whose general solution is described in Appendix 5B. When i + 402 > 0, the roots are real and the autocorrelation function is of damped exponential form, with a steady or oscillatory pattern depending on the signs of the roots. When 0 , + 402 < 0, the roots are complex and the ACF is a damped sine wave.
Example 5.3
What is the ACF of the Yule scheme ut = 0.8m,!  0.64p,2 ?
The model has auxiliary equation 1  0.8.x + 0.64x2 = 0 with roots 7T
.
'■'•25d±i®,25('cos — ± i
.
7r\
sin —).
From Appendix 5B we obtain the solution Pj
= (0.8/ sin
+ a;) /sin co,
where oj = 0.467r. Hence, Pi =0.49,
P2= — 0.25,
p3=—0.51
as may be checked directly from (5.41) and (5.46). A typical ACF tor the Yule scheme with complex roots is given in Fig. 6 4 As we shall see in Chapter 6, the general shape of the ACF is more important than the detailed algebraic solution.
YuleWalker equations
Fig. 5.3 5.16
61
Plot of 65 terms of a second order autoregressive Yule scheme
From (5.42), the PACF for the Yule scheme is 011  Pi = .
1
~>

02
022 = 02,
4>jj —
0,
7 ^3,
so that the plot of the PACF consists only of two spikes. 5.17 In Fig. 5.3 we plot 65 terms from the Yule scheme with 0i = 1.1 and 02 = _o.5. The error term is uniformly distributed in range [9.5, 9,5]. For comparison, Fig. 5.4 gives the values of 60 terms of the harmonic series u, = 10 sin
+ £,,
(5.47)
where e, is uniformly distributed on [5, 5]. The sine wave has regular peaks every 10 terms, but the Yule scheme is clearly less regular in its fluctuations. 5.18 In general, it may be shown that the mean distance between peaks is 27r/0, where 0 < 6 < 2tt and cos 6 = corr[Vw,, V«/_i); see Kendall et al. (1983, Section 46.1920) for details.
(5.48)
Stationary series
62
For the Markov scheme cos 6 = — (1 — 0j) and for the Yule scheme COS
d = — (1 
0,  02)•
Thus for a purely random process, 0 = 27t/3, and the mean difference between peaks is 3. Hence, the mean distance between turning points is 1.5, as noted in Section 2.4. For the Yule series plotted in Fig. 5.3, 6 = 0.47r, and the mean distance between peaks is about five time periods. From Fig. 5.3, we see that there are 12 peaks including a pair of adjacent peaks at t = 9 and 10. The first peak occurs at t = 4 and the last at t = 59, giving an observed mean distance of (59  3  1)/12 = 4.6, where 1 is subtracted for the tie. Even though the distribution is not normal, the observed and theoretical distances between peaks are in close agreement.
General autoregressive schemes 5.19
Analysis for the general AR(p) scheme follows along the same lines as for the Markov and Yule schemes. Thus, we obtain the set of YuleWalker equations Pk 
01PAT1
(5.49)
+ 02P*2 + ••• + pPkp,
for k — 1,2, The p equations (5.49) may be used to express the 0* in terms of the pk or vice versa. For k > p, we obtain the same recurrence relation (5.49) so that higherorder autocorrelations may be calculated. The partial autocorrelation are obtained by solving the sets of equations succes¬ sively for p= 1,2,3,...; that is.
II
1
II
CL
p=
Pi = 021 + 022p 1
II
P2 = 02 IP 1 + 022 Pi = 031 + 032P1 + 033P2
P2
— 03 1P1 + 032 + 033P1
P3 = 031P2 + 032P1 + 033
and so on. The PACF is given by 0j7, j = 1,2,.... Rather than solve the complete set of equations at each stage Durbin (1960) showed that these could be solved iteratively as 4>k+l,k+l —
(Pk + 1
^'7=1 rf’kjPk+lj)
(1  E/=1 4>k +1,7 4>kj
(5.50)
4>kjpj)
k+l,k+l(t>k,k+lj,
%
j=l,2,...,k.
For further details, see Kendall et a/. (1983, Section 47.15). This procedure is generally known as the DurbinLevinson algorithm; for a history and description ot its other uses in statistics, see Morretin (1984).
Moving average schemes
63
5.20 The stationarity conditions for the AR{p) scheme follow from (5.49) as the requirement that the roots of the auxiliary equation 4>(x) = 1  ix — (fox2  •••  (t>pXp = 0
(5.52)
should be greater than one in absolute value. If any root is < 1 in absolute value, the process is nonstationary. As before, differencing and transfor¬ mations may be used to induce stationarity.
Moving average schemes 5.21 In Chapter 2 we used moving averages to represent trends. We shall now use them in a rather different way to model the persistence of random effects over time. Suppose, for example, yt denotes weekly sales figures and some sales are not recorded until the following week. A possible model would be yt = fx + Ct — 6i£t1,
(5.53)
where the term in e(1 reflects the carryover from one week to the next. Model (5.53) represents a firstorder moving average scheme, or MA( 1). The general MA (q) form is yt = jU. + £; — diCt l 5.22
•'' — 6qCtQ’
(554)
Since E(e,) = 0, we see immediately that E(y,) = p.
(555)
Further, since the e, are uncorrelated, it follows that var(Tr) = E{{yt  m)2] — E{ (C; — 6lCt 1 —
— GqCtq)
!
E{ef + dicj ] + ••• + dqE}q) a2( 1+6U+0Z).
=
(5.56)
By implication, any MA{q) scheme is stationary.
Example 5.4
Find the ACF and the PACF for the M4(l) scheme. Let u, = yt — /x and write the MA (1) scheme (5.51) as u,  ct  d\Zt\.
(557)
If we multiply through by utk and take expectations, we obtain E{UtUtk) = Jk  E{Utk(Et  diEt
0).
For k=l. E(u,ut1) = £{(e,i  die,_2)(er  0i£fi)} yielding 71
—
—d\o2,
and
7Ar  0,
k^2.
(5.58)
64
Stationary series From (5.56),
70
= a2( 1 + 0?) so that Pi —
—
0i/(l + 0?);
(5.59)
clearly, pk = 0, k ^ 2. Thus the /ICFhas a single nonzero term at lag one. The PACF may be found from (5.50) and (5.51) as 011
=
0i 2 1 +0f
*
022
0? =
l+0i+0f
033 =
0?
(5.60)
1 +0f + 01 + 01
and so on. The partial autocorrelations decay at an exponential rate towards zero, all being negative if 0j > 0 or with alternating signs if 01 < 0. This example provides a general idea of the structure of MA schemes. The MA{q) process possesses an ACF With q nonzero values but an exponentially decaying, possibly harmonic, PACF. In this regard its behaviour is the converse of that for AR schemes and we shall exploit this fact in our model selection procedures in the next chapter. 5.23 One final point worth noting about the MA{\) scheme is that the maximum value of  pi  = 0.5. This is another manifestation of the effect noted in Section 5.6; if observations two time periods apart are uncorrelated, adjacent values cannot be ‘too strongly’ correlated. Various upper limits are available for higherorder MA schemes; see Chanda (1962) and O. D. Anderson (1975). 5.24 Following the approach of Example 5.4, we find that the Arth autocorre¬ lation for an MA(q) scheme is Pk
dodk + 010* + 1 + ••• + OqkOq
el
1 +07 +
= 0, where 0o = 
1
k ^ q k> q,
(5.61)
by convention.
Example 5.5
(The Slutzky—Yule effect)
Consider the purely random
series Yt = e, and suppose that we form a moving average, in the sense of Chapter
2,
as
Ut
l +
where 0 < 1. Using the 5operator (Appendix 3A), becomes (1  05)«,
(5.63)
Et,
= e,
u,\
=
Bu,
and (5.63) (5.64)
or 1
U, =£,.
105
(5.65)
The term on the right in (5.65) may be expanded as a geometric series (Appendix 5A) to give ut =
(1 + 05 + 0252 +
— Et +
••■)£,
0£r i + 0 2£/2 + •
(5.66)
that is, the AR( 1) scheme may be represented as an MA scheme of infinite order. Generally, an AR(p) scheme may be written as 4>(B)ut = £t,
(5.67)
0(5) = 1  0,502524>PBP
(5.68)
where
and, given stationarity, this becomes
0(5) =
+ 0i£f i + 02£/2 + ••*,
(5.69)
^(B) = 1 + \piB + 0252 + •••.
(5.70)
Et
where
Expression (5.69) is known as the randomshock form of the model and its coefficients are known, unsurprisingly, as psiweights. 5.27 In the same way, any finiteorder MA scheme may be represented as an AR scheme of infinite extent; however, there is one problem to be overcome. In (5.65), the geometric series expansion is valid because stationarity ensures that  0  < 1. However, in the MA( 1) scheme ut = £tdeti
(5.71)
or
1 £/ =
1 6B
u,
(5.72)
no restrictions have been placed on 6. Indeed, we can show that the two M/1 (1)
Autocorrelation generating function
67
schemes (5.71) and  01e,i
u, = e,
(5.73)
both yield pi = —0/(1 +02). To represent the MA{ 1) scheme in AR form, we must impose the invertibility condition, 0 < 1, to allow the use of the geometric series expansion. Then (5.72) becomes e, = (l +0B + d2B2 + — )u, or Ut — — dUt i — 62U,2 —••• + £,.
(5.74)
In general, the MA{q) scheme ut = 6(B)et, where 6(B) = 1  d\B  •••  dqBq, may be written as 1
w, = 7r(B)u,
£'~6(B) = U,  7Tl Wr 1  7T2W,2  •••,
(5.75)
provided all the roots of the auxiliary equation
0(*) = idixeqxq = 0
(5.76)
are greater than one in absolute value.
Autocorrelation generating function 5.28
An additional benefit derived from the random shock form of the model is that we can generate the autocorrelations by a different method. Starting with u, — fo £t +
+ '/'2£72 + •••»
where fo= U we have oo
E{ututk) = jk = o2 Yj Mj+ky=o
(577)
Define the autocovariance generating function as G(z)=
t k= 
(578)
where z is a ‘dummy variable’ and G(z) is of interest solely because the coefficient of zk is the Arth autocovariance, 7*. From (5.77) and (5.78) =
00
00
s
E
k=
Putting
5
00
(5.79)
j=0
= j + k this gives 00
G(z) = a2 Yj s=o
00
E Z~Hj 7=0
(5.80)
68
Stationary series
since s — j = k and \J/k = 0, k < 0. In turn, (5.80) becomes G{z) = o2\P(z)\Jy(z~l),
(5.81)
where ip(z) is given by (5.70) with z replacing B. Finally, the autocorrelations may be generated from Gp(z) = — G(z).
(5.82)
70
In using (5.81), iHz)d(z) for MA schemes and ip(z)= [0(z)]_1 for AR schemes.
Example 5. 7
The MA (1) scheme has iP(z) = 0(z) = (1  6z)
so that G(z) = a\ldz)(ldz1) = a2(l +e2dz~dz~1). Hence
70
= cr2(l + d2) and
Example 5.8
71
=
71
= — do2, as before.
The >l/?(l) scheme has (z) = (l  4>z)
leading to G(z) = (1  z)( 1 
(j)Z
') ’
Using (5A.7) in Appendix 5A, this becomes 1
G(z) = (1 Since
7
0
2) (l  z
+
z"
1  txxxx>cxxxxxxxxx1 xxxxxxxxx
XX XXXXXXX
4
(b)SPACF
1
89
xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx X XXX XX XX
xxxxx xxxxx XX XX XXX XX XXX
xxxx
XXX XXX
xxxxxx xxxx X X
SACF and SPACF for AR(2) scheme with 0i = 1.1,
2 =
0.5 and
n 
100
small, (6.33) n Figure 6.5 shows the correlograms for the ARIMA(0,1, 1) process Vy, = e,  0.5e,i;
(6.34)
the sample PACF is less useful, but will often appear to correspond to an AR{\) scheme. This is not altogether surprising, since model (6.34) and an /li?(l) scheme with — \ — 6, d small, will have similarlooking correlograms despite the different theoretical underpinnings. Taking first differences of the data yields the correlograms in Fig. 6.6. These suggest an MA{\) scheme, as they should. Differencing again yields the plots shown in Fig. 6.7. The large negative value of n and the string of negative partial serial correlations suggest that the series has been overdifferenced. Figures 6.57 taken together indicate that plotting the correlograms for the
90
Serial correlations and model identification
(a)SACF
1.0 0.8 06 0.4 0.2
1111
2 36 8

101214 16 18 20
0.0 02
0.4 06 08
xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxx xxxxxxxx xxxxxxx xxxxxx XX xxxx
68

10 1214 1618 20Fig. 6.5
1
X XX X X XX X
(b )SPACF l.o 0.80.6 04 0.2 0.0 0.2 0.4 0.6 0.8 24
10
1
XXXX XXXXXXXXXXX
xxxxxxxxxxxx
r
1.0
1
XX XXX XX XXX X XXX XX X
xxxxxxxx xxxx XXX XX XX XXX XX X XX XX
SACF and SPACF for IMA(1), 0 = 0.5, n= 100
original scries and suitable differences will often guide the investigator to an appropriate degree of differencing. 6.19 More formal tests are available for testing for nonstationarity, developed by Dickey and Fuller (1976); see Fuller (1976, pp. 36682' 1985) for details. ’ In particular, suppose that the process can be represented by an AR(p) scheme and that we wish to test whether there is a single unit root' that is whether we need a firstorder difference. We may consider the regression model for y, on (yti, yt~i  y,2, ...,ytp+1  y,p). where the coefficient for is fr. A reasonable choice for p may be achieved by using stepwise regression, forcing into the model and allowing later lags in if they are significant. s us^r^tatisdf111316 * and ^ Standard eiTOr’ SEi^ We may em?loy the
r=(/3il )/SE(pl).
(6.36)
Detecting nonstationary (a)SACF
1.0 0806 0.4 Q2
0.4 0.6 0.8 1.0
TxxxxJxxxxJxxxx)T xxxxxx xxxxx xxxxx xxxxxx xxxx
2 4 6
X X X XXXXX
8 10
xxxxxxxxx xxxxx
12 14
XXX XXX XXXXX
xxxxx
16
XXX XXX XX X
18 20
( b) SPACF 1.0 0.8 0.6 0.4 0.2
0.0 0.2 0.4 0.6
xxxxxxxxxxxxxxx' XXX)'. xxxx X xxxxx
2 4 6
T
0.8
1.0
I1
XXX XX XX XX
8
xxxxxxx xxxxxx xxxxx
10 12
X X
14
xxxx
16
XXX XX XX XX
18
xxxxx
20 Fig. 6.6
0.0 0.2
91
SACF and SPACF for IMA(1), 6 = 0.5, n= 100, differenced once
However, the distribution of r is not that of the Student’s /statistic. Tables of f, developed by Dickey (1975) are given in Fuller (1976, p. 373). To a close approximation, the percentage points are lower 1%: — 3.43 — 0.08c; upper 1 %: 0.60 + 0.03c lower 5%: 2.860.03c; upper 5%: 0.070.02c, where c— 100n~x.
Example 6.3
Model (6.4) may be represented in AR form as y, — y,i — 0.5V>v_i — 0.25V^f_2 — ••• + £/•
(6.37)
For the simulated series of length n = 100 we obtained the fitted values: y, = 0.976^1  0.487V>v_ i  0.209V_y,_2  0.214Vy(3, where the first coefficient had SE = 0.0464. Thus, f =(0.976l)/0.0464= 0.517,
(6.38)
92
Serial correlations and model identification
(a) SACF
1.008 0.6Q4Q2 2 4 6
12 14 16 18
X XX X
xxxxxx xxxxxxxxx xxxxxxx xxxx xxxx xxxxxx xxxxxx xxxx XXX XX X
20
(bISAACS
1.00.80.60.40.2 2 4
xxxx X XXX
16 18 20
0.4 0.6
0.8 1.0
r
XXX XXX
8 12 14
Fig. 6.7
0.0 0.2
xxx&xxx)bb,_3
Data analysis (a)
1.0 0.8 0.6 0.4 0.2 ~ 2 4
0.0 0.2 0.4 0.6 xxxxx
93
0.8 1.0
XXX XX
xxxxxx XX XX
6
xxxx
8
XXX XX XX XXX
10
12 14
xxxxx xxxxxxx xxxxxx
16 18 20
xxxxxx xxxxx
X
00 o
o T
(b)
XXX XX X
0.6040.2 0.0 0.2 1 —i111 i xxxxx
2 4
XX X
xxxxxx X X
6
xxxx xxxxx
8
XX XX XX
10 — 12 14 —
xxxx xxxxxx xxxx XXX xxxxxx
16 18 20 *
Fig. 6.8
1.0 n
X X XX
xxxx
SACF and SPACF of barley series (Table 1.1)
yielding f= (0.935  1)/0.0416 =  1.56 confirming the need for differencing (recall that n — 73). The correlograms for the differenced series, Fig. 6.9(b), do not give clear indications for a particular model, but we might consider (i) ARMA(\, 1),
(ii) A/? (3),
or even
(iii) MA(4).
Further refinement of the choice must await the development of estimation procedures in Chapter 7. 6.22 The correlograms for the FT index series (Table 1.6) are given in Fig. 6.10(a). Again, the combination of slow decay in the sample ACF and A7?(l)like behaviour in the sample PACF suggest possible differencing. Further, the stepwise model reduced to y, on yt1, yielding t= (0.911  1 )/0.0678 =  1.31, confirming the diagnosis. The correlograms for the differenced series in Fig. 6.10(b) suggest either A7?(l) or M4(l) with further activity at a lag of 8 periods (or 2 years). We take this analysis further in Section 7.10.
94
Serial correlations and model identification
o < Q.
b I
CO T3 C
CO LL
(_) < cn 3 CM
_a> Q CO
H
CO
a) CD
cn CL
0 JZ CD CO
o LL
CJ < CL (/) T3 C CD LL
O *
k
, 1) or an ARM A (1,4) scheme, where the AR coefficient row includes possible differencing. 6.28 The ESACF is clearly a method with some potential, although it has yet to be extended to seasonal series. Other alternative procedures have been suggested by Gray et al. (1978), Beguin et al. (1981), and Hannan and Rissanen (1982). In addition, several ‘automatic’ selection procedures have been proposed. We shall examine these in the next chapter after we have discussed estimation procedures.
Table 6.4 ESACF for sheep series (Table 1.2). MA
AR
0 1 2 3 4
0
1
91 42 51 47
76 3 4 15
11
11
63 38 29 18 19
57 27
56 7
54 17
48
3  17
16 14
10
6
16
15 3 9
Table 6.5 Pattern of zeros in ESACF for sheep series. MA
AR
0
1
2
3
0 1 2
26 .A 26
26 0
26
26
26
26
26
26
0
26 26
3 4
26 0
0 0
0 0
0 0 0 0
0 0 0 0
0 0 0 0
26 0 0
6
11 6 5
Exercises
103
Exercises 6.1
Suppose that y, = 6 t,
6.2 6.3
6.4
6.5
t= 1,2,11.
Compute the first four sample autocorrelations using (6.3) and the two modified versions suggested in Section 6.2. Verify (6.1) using (6.8), but ignoring the last term. Supposey, follows an A/?(l) scheme with 0 = 0.8. Use (6.18) to compute the standard error of rj for j = 1,2,.... 5, 10, and compare these with the limiting values. Estimate the standard errors of rj,j— 1,2, ...,5 using (6.17) given
7
12
3
4
rj
0.6
0.3
0.7
0.2
Verify that (6.32) holds. {Hint: Expand var{Z (yi~y)}, which is identically zero.) 6.6 Generate simulated data from an ^4/?(1) scheme with n = 100 and 0 = 0.7. Compute the sample correlograms and see how well these conform to expectation. Repeat the process for n = 50 and n = 25. 6.7 Repeat Exercise 6.6 for an MA (2) scheme with di = 0.9 and 62  0.4. 6.8 Generate simulated data from an ARIMA{\, 1,0) scheme with 0 = 0.6 and n = 100. Examine the correlograms and perform the DickeyFuller text. Decide whether differencing is needed, and determine a provisional model for the data. (Clearly, there is no limit to the number of simulation games one can play. The reader is urged to try a variety of combinations to gain familiarity with examining the sample ACF and PACF and, if available, the sample IACF and the ESACF.) 6.9 Generate simulated data from an ARIMA{\,0, 1) scheme with 0 = 0.6 and 6 = 0.65. What form of the model do the correlograms suggest? Why? (Hint: What happens to this scheme when 0 = 61) 6.10 Select provisonal modelsfor some or all of the following nonseasonal series (tables marked A are in Appendix A): (a) Kendall’s simulated AR(2) series (Table Al); (b) UK gross domestic product (Table 2); (c) Wolfer’s sunspots data (Table A3); (d) Canadian lynx trappings (Table A4); (e) US interest rates (Table A5); (f) US immigration levels (Table 1.4). 6.11 Select provisional models for some or all of the following seasonal series given in Appendix A: (a) Chatfield and Prothero’s sales data (Table A6); (b) UK unemployment (Table A7); (c) UK whisky production (Table A8).
7
Estimation and model checking
7.1 Up to this point we have taken a somewhat intuitive approach to estimation problems, assuming that sample quantities provide reasonable estimates for their population counterparts. However, in order to estimate the parameters of models such as those identified in the previous chapter, we must consider a more formal approach to model fitting. Some of the statistical issues which arise rapidly become quite complex and lie beyond the scope of this book. Therefore, we shall content ourselves with a rather brief discussion and refer the interested reader to Kendall et al, (1983, Chapter 50) and Fuller (1976, Chapter 8). The first part of this chapter considers estimation problems and the central part addresses model checking; that is, to decide whether the selected model is appropriate and, if not, how it should be modified. Finally, we consider the extent to which the modelling process can be ‘automated’. Can the decisions regarding model selection be formulated in such a way that the whole operation can be performed by a computer program, without the investigator being directly involved?
Fitting autoregressions 7.2 We begin by considering purely autoregressive, or AR(p) models where the order p is assumed to be known: y< ~ 5 + 01T/1 + ••• + 4>PytP + e,,
(7.1)
where the e, are independent and identically distributed with mean 0 and vanance a . The model formulation is similar to that of ordinary leastsquares ( S) regression save that the ‘independent’ variables are now lagged dependent variables. Mann and Wald (1943, reproduced in Wald’s Collected Papers 1955) demonstrated that the OLS method is indeed applicable to (7 1) and that the estimators are both asymptotically unbiased, consistent and efficient. Straightforward application of the OLS procedure yields the (p + 1)
Fitting autoregressions
105
estimating equations
2 y, = (n  p)S + E yti + ■■■ + 2 ytP 'j E y,ytj = 5 E y,j + E ytxy,j + ••• + E y,Pytj) ’
^ 2)
where the summations are all over the range t = p + 1 to / = n. Equations (7.2) are similar to the sample version of the YuleWalker equations (5.49) apart from endeffects. When the process lies well inside the stationary region, the difference between the estimates from (5.49) and (7.2) are slight for moderate or large n. However, the YuleWalker equations assume stationarity, whereas equations (7.2) do not. Indeed, Tiao and Tsay (1983) show that these estimators remain consistent even when the process is nonstationary. In general, therefore, we shall use the OLS estimates. Then, it follows from Mann and Wald (1943) that the vector of estimators j8 = nl/2[b
 b, 4>i  u ...,4>P  p]
(7.3)
is asymptotically multivariate normal with mean zero and a finite covariance matrix. Further, the mean square error s2 = E (yt yt)2l(n p  1)
(74)
is a consistent estimator for o2 and the covariance matrix of 0 is validly estimated by s2H~l, where H=
n
(7.5)
m
m' — (E yt1,..., E ytp) and the (j, £)th element of C is E y,jytk. 7.3 The next question of interest is whether specific distributional assump¬ tions about the error process lead to improved estimation procedures. Specifically, we assume that the c, are N(0, o2). For example, when p = 1 and the mean is known (set 5 = 0 with no loss of generality), it follows that the distribution of j/, is 7V{0, a2/(l  02)). Thus, the loglikelihood function for (0, a2) given Oh, yi,...,yn) becomes
1 n l = const — jn In a2 —  XI 2
= const — j In a2
t=
1°2 l
\ S O70J7i)V 2
1 =2
+ ln(l 02)biO (t)2)l°2
i, •••,>’/>)•
106
Estimation and model checking
7.4 Most of the work done on estimation in time series involves the normality assumption or the use of least squares. A notable exception is the work of R. D. Martin, who has developed robust estimators using generalized M procedures; see, for example, Martin and Yohai (1985). Robust procedures will be considered in Chapter 15.
Fitting MA schemes 7.5 The least squares equations provide effective estimators for AR pro¬ cesses, so it might be expected that the first q serial correlations would provide good estimators for the MA (q) scheme. Unfortunately, as Whittle (1953) showed, the estimator 6 derived from n = 0/0 + e2)
(7.7)
is very inefficient and similar problems arise for q > 1. 7.6 One method of developing improved estimators is due to Durbin (1959) and will be illustrated for q = 1. From (5.74) we may write the MA( 1) scheme
y, = c,dc
t1
as yt
+
Qyt\
+
d2yt~2
+ ••• =
£c
(7.8)
If (7.8) is treated as an AR(k)(scheme for a suitably large choice of k, this gives rise to the estimates i, 2,..., 0*. By an appeal to the MannWald theorem Durbin showed that the estimator k 1
I
0= £ Mj+i £ 4>j, j=0
(7.9)
\J=0
where o — 1, is asymptotically efficient if k > oo in such a way that (kin) —> 0 as n > oo. 1 ’
Fitting ARIMA schemes 7.7 There have been several other proposals for fitting pure MA schemes but 09761^1 ^ fHCUS attentlon uP°n the general problem. Box and Jenkins ( 6) developed a nonlinear leastsquares procedure and this led to a variety of least squares and approximate likelihood solutions. Later, Newbold (1974) deye oped an exact likelihood procedure which, with improvements in com¬ putational efficiency rendered by Ansley’s (1979) use of the Cholesky decom¬ position, has been widely accepted. Other procedures, based on the Kalman tfiprhr’ar.eHdlSCMSSed b Y in SeCti°n 9‘13' The reader willin§ to accept the technical details may move on directly to Section 7 10
^A^°(U)TcthemeaSiC =
°fNewboldAnsle>' Procedure, consider the i + etde,i.
t = 1,2, ...,n.
(7.10)
Fitting A RIM A schemes
107
We may write this in matrix form as 0
L
h
y+
e*,
x
(7.11)
where e' = (ci,e„), y' = (yi, ...,y„), e*  (e0, y0)', h is the identity matrix, 1 (64)
L=
1 (64)
6(64)
(7.12)
1
.
6n2(d4)
(64)
1
and
X’ =
e
e2...en
6
64...en~x4
(7.13)
Noting that E(e}) = E(e,y,) = o2 and using Example 5.9, the covariance matrix of e* is
1
1 (7.14)
o2A — o2 1 + 62
1

264
1 4‘ We may choose an orthogonal matrix T such that TAT' = h', here
T~1 =
'1
0
1
(64)042rl/2.
(7.15)
Multiplying (7.11) through by
T
0
0
/
we obtain expressions for e and u = 7e*. The random vectors e and u* are independent by construction, which enables us to transform from (e*,e) to (u, y); the details are given in Newbold (1974). Finally, we may integrate out e* to obtain the marginal density for y which provides the exact loglikelihood: /  const  \ In a2  \ In  Z' Z   \ S(6, 0)/a2,
(7.16)
Z = [h, (T~1)' X'}
(7.17)
S(6, 4) = (Ly + Zu)'(Ly + Zu), u= (Z’Zy'Z'Ly.
(7.18)
where
and (7.19)
Equations (7.16)—(7.19) provide the basis for an iterative search procedure to derive the estimates. Ansley (1979) shows how to speed up the calculations using the Cholesky decomposition.
108
Estimation and model checking
1.9 The method outlined here can be applied to general ARM A (p, q) models by setting e* = (eo,d _ q, y0, ...,yx _ p)' and redefining (7.12)—(7.15) appropriately. For example, reworking these expressions with p= 1, 0
Combining (8.17)—(8.21), we can generate y,(k) successively for k= 1,2,.... Differenced schemes are readily handled by generating the forecasts ’ for wt+k = V V.s y,+k and then producing the ‘integrated’ predictions.
Example 8.1
Consider the model
yt = 5 + 4>yt~ i + £/.
(8.22)
From (8.17)—(8.21),
yt( 1) = yt(k  1),
k^2.
Hence,
y,(k) = 8( 1 + 0 + ••• + 0*') + 0*y( = mO  0 ) + /i
ask^oo.
(8.24)
Quite generally, the stationarity of a process implies that (8.24) holds for the forecasts. Further, from (8.13) and (8.22)
FMSE(k) = o2 £ if 7=0
= o2{\ +02 +
+
02/c“2)
= a2(l  02Ar)/(l 02) which approaches a2/(l 02) as k cc, which is var(y,), the unconditional variance. Again, the limiting FMSE is var(y,) for any stationary process.
Nonstationary processes
Example 8.2
127
Consider the model
y, = li + £tde,1.
(8.25)
Then
yt(\) = H  6et yt(k) = n,
k^2.
In general, the moving average processes have a finite memory property such that y,(k) = /jl, k ^ q + 1. From (8.13) and (8.25), k1 FMSE(k)  o2 H if 7=0
k= 1 k^2.
fa2, a2(l+02), Generally, for an MA(q) scheme,
FMSE(k) = a2(l + Q\ + ••• + djn),
(8.26)
where m = min(£  1 ,q).
Nonstationary processes 8.11
When the process is nonstationary, the forecasts cannot revert back to a mean level as the mean is not defined. Similarly, the variance cannot converge to a stable value. The following example indicates the general behaviour.
Example 8.3
Consider the model
wt = 'Vyt = CtdCti.
(8.27)
The forecasts for w, follow directly from Example 8.2, so that those for y, are given by
y,( 1) = y, + W/U) = yt  de, = (1  d)y, + dy, i (1) y,(2) = 7,(1) + w,(2) = y,  de,
(8.28)
and y,(k) = >>,(1) for all k ^ 2. The forecast does not and indeed cannot revert to a longrun mean level. It follows from (8.27) that yt+k — yt  w,+k + w,+k1 + ••• + Wf+i
= et+k + (1 — d)e,+k1 + ••• + (1 — d)e,+1 — de, so that
io=\, ij = (\d),
1
^j^k 1
and
FMSE{k) = o2{ 1 +(k 1)(1 6)2},
(8.29)
a linear function of k. The nonstationarity of the process means that the
128
Forecasting
prediction intervals continue to widen as k increases. Indeed, if we difference d times, the FMSE is of order k2d~i times a2; see Exercise 8.6. From Examples 8.1 and 8.3, we see that the behaviour of forecasts for an
ARIMA{\, 0,0) scheme with 0=15, 6 small and an ARIMA (0, 1, 1) scheme with 0 = 5 will be very similar for small k, but diverge with increasing k.
Seasonal models 8.12 The construction of forecasts for seasonal models proceeds in exactly the same way as for the nonseasonal case, but the pattern of the FMSE may be rather different.
Example 8.4
Consider the following model for monthly data: Vi2y/ = (1^)(16 512)£(.
(8.30)
The forecasts are
ytn — Btt — 0 £, 11 + 0 0 £, 12 yt(k) = y,+k12 — 0 £,+£12 + 0 0 £,+£_i3,
T?(l) —
y,(13) = yt(l) + 6 0
yt(k) = y,(k 12),
2 ^ k ^ 12
£,
k ^ 14.
The random shock form of (8.30) is y,~ {1  6B + (1  0)512  0(1  Q)B13 + (1  Q)B24 + •••)£,
(8.31)
giving the forecast mean errors as
FMSE(l) = a2; FMSE(k) = ct2(1 + 02), k = 2,...,12; FMSE(\3) = o2{\ +02 + (l 0)2), k — i3; FMSE(k) = cr2(l + 02){1 + (1 — 0)2),
k= 14,...,24;
and so on. The rate of increase of the FMSE is much slower than in Example 8.3 since the nonstationarity here arises only in the seasonal component.
Other forecasting procedures 8.13 Many of the forecasting procedures still used in practice have been developed on rather intuitive grounds. We shall now examine several of these methods and integrate them into our general framework.
Moving averages 8.14 We could just fit a polynomial to the entire series and use this for extrapolation. As indicated in Chapter 2, this is rarely a good idea, and it is
Exponential smoothing
129
usually better to give greater weight to the most recent observations. In Chapter 2, we discussed fitting local polynomials of order p to the last {2m + 1) points of the series, and the tables given in Appendix C give the weights for forecasting one step ahead, under the column headed ‘O’. Alternatively, once the coefficients [aj] have been estimated, the kstep ahead forecasts are simply y,„{k) = a0 + a\{m + k) + ••• + ap{m + k)p,
(8.32)
where the time origin is set at the centre of the {2m + 1) fitted values, as before.
Example 8.5
Suppose we fit a straight line by taking p 1. It follows that
a0 = 2 yjl{2m +1),
ax = 3 £ jyjjm{m + l)(2m + 1),
where the sums are over j — — m to j = + m. Thus, the forecasts become ym{k) = a0 + ai{m + k) _1_ m{m + 1)(2 m + 1)
Yj[m{m + 1) + 3 j{m + k))yj.
(8.33)
a linear function of the yj. When the series is purely random with constant mean, the estimators a0 and a\ are uncorrelated so that FMSE{k) =
a
2m + 1
1 +
3 {m + k)2
(8.34)
m{m + 1)
which increases at the rate k2. Of course, for a purely random process, better forecasts would be given by ym{k)= cro = 3^with FMSE= cr2/(2m + 1). 8.15 One of the difficulties with the moving average model is that it is difficult to visualise a process which corresponds to (8.32); in turn, this makes it difficult to evaluate the FMSE for nonrandom schemes. Conversely, if we consider a process of the ARIMA class, we could indeed determine the FMSE corresponding to forecasts such as (8.32), but these forecasts would not be optimal. For these reasons, forecasting using local polynomials derived from moving averages is not recommended.
Exponential smoothing 8.16
An alternative way of giving most weight to the recent past is to consider a weighted average of the form oo
yt{\)=YjWjyth
(835)
7=0
where 2 w, = 1, and we might choose the weights to be monotonically decreasing. The simplest such choice is to set wj=(l ~(3)(3J,
(836)
with 3 < 1 but, usually, 0 < (3 < 1. Predictor (8.5) with weights (8.36) is known as an exponentially weighted moving average (EWMA) and the operation is also known as exponential smoothing. What makes the procedure particularly attractive is that (8.35) then satisfies a simple recurrence relation
130
Forecasting
since Ml) = (i P)lyt + 0(yti + Pyt2+ ••■)) =
(1 0)yt + 0yti(l).
(8.37)
Thus, to update the forecast, we need only the latest observation and the previous forecast. The Arstep ahead forecast is also given by (8.37). When (8.35) is based upon a finite data record of /values, the weights in (8.36) become wj
= c,l3,
j^t
= 0,
y > /,

1
where ct = (1  /3)/(l  (3r). We then have Ml) = c,y, + (\  ct)yti( 1)
(8.38)
Ml) = ct S (3J yt
(8.39)
or
7=0
8.17
Popular use of (8.37) often involves choosing a value for a = 1  (3 and a startup value yx (1). Recommended values for a are typically in the range of 0 05 to 0.30, depending on the volatility of the series, whereas a common choice for the startup value is to set Ml) = yu Especially when the series is short or (3 is near one, we can see that this may be a poor procedure, giving undue weight to the first observation. It is better to use (8.39) in such circumstances. For a fuller discussion of startup values for EWMA schemes see Ledolter and Abraham (1984). t0 (837)’ we note that lt is equivalent to (8.28) when (3 = 6; that is the EWMA corresponds to the forecast function of an ARIMA (0 1 n scheme. We shall now explain why this is so. ’ 7 8.18 Recall that, for any ARIMA process,
yt+ i = Ml) + e, +1. (8.40) From (8.37)
(1 %,(l) = (i substituting for Ml) using (8.38) yields (1 PB)yt+l = (l 0)y, + (lpB)et+l or K/+ 1
which is exactly (8.27) when 0 = 6. Thus, improved EWMA forecasts may be btained by estimating (3 and the startup value using the ARIMA (0 1 1) scheme. In addition, the diagnostics discussed earlier allow us to determine Section^
^ '' appr0priate’ This equivalence is discussed further in
Discounted least squares
131
Discounted least squares 8.19
A different approach, suggested by Brown (1963), is to emphasise the most recent observations by use of discounted least squares. That is, we use a discount factor, 0 < (3 < 1, and choose the estimator, a,, to minimise
2
(8.42)
(ytj a,)2dJ.
7=0
This leads to a, 2 (3J = 2 PJy,j or
or, = (1  0) 2 18Jytj = 0 P)yt + Pa'u
(8.43)
exactly as in (8.37) when the same (3 is used. If we consider (8.42) for only t terms, forecast function (8.39) results. The advantage of this approach is that it may be extended to consider polynomial models. For example, when p = 1, we minimise S (ytj  dot  auj)2(3J.
(8.44)
7=0
This leads to the estimating equations 2 ytj(3J ~ a0t 2 /3J  au 2 j(3J = 0 2 jy,j&J  a0r 2 j(3J .au 2 j2(SJ = 0.
Now 2 0J= 1/(1  /5), 2 j/3j = @1(1 — /3)2, 2 estimating equations become
(1 0) 2
ao,
0(1 + 0)/(l  /3)3, so the

(1  /3)2 2 jy,.^  /3oo( 
= 0.
Let
Si(0 = 0 /3)T, + /3S,(f 1) = (1/3) 2^,/'
(8.45)
and define a second smoothing operation by
S2(0 = (1 0)St(t) + 0S2(tl) or
(1 0B)S2(t) = (l P)Si(t). From (8.45) and (8.46),
(1 0B)2S2(t) = (l (3)2y„
(8.46)
132
Forecasting
or S2(t) =
(1 ~ /g) (1 /SB)2
= (l/3)2S(y+lM_^ = (l/3)S,(0(l/3)2Sy>#/. The estimating equations become, dropping the arguments in t, for con¬ venience, Si  a0
(3a i
= 0,
1/3 and 52  (1  (3)S,  (3a0  ~\+_^ai = 0. In turn, these expressions yield a0 = 2SiS2
and
ax = (1  /3)(Sl  S2)/p.
(8.47)
That is, the forecasting function is yt(k) = a0 + a\k = (2Si  S2) + k( 1  0)(Sl  S2)l(3,
(8.48)
a locally linear function. Because of (8.45) and (8.46), this method is sometimes known as double exponential smoothing, form (8.48) gives rise to the alternative name linear exponential smoothing. Consideration of (8 48) with k= 1, together with (8.39), (8.45) and (8.46) leads to the conclusion that the underlying model may be written as (1  B)2yt = (1  /3B)2e,;
(8.49)
that is, an ARIMA (0,2,2) scheme with a single parameter. For further discussion of such equivalences, see Abraham and Ledolter (1986).
Example
8.6 Suppose that 0 = 0.8, Jh(l) = S,(l) = S2(l) = 20 y2~ 25 • Then, for single exponential smoothing
and
y2(l) = (0.2) (25) + (0.8)(20) = 21 and y2(k) = 21, k ^ 2. For double exponential smoothing, Si (2) = (0.2) (25) + (0.8) (20) = 21 S2(2) = (0.2)(21) + (0.8) (20) = 20.2, whence 0021.8, ax = 0.2, and y2(l) = 22, y2(k) = 21.8 + 0.2k. 8.20 As with single exponential smoothing, use of (8.48) historically required preselection of (3, Si(l) and S2(l). Typical values would be 0.02 ^ (3 ^ 0.20 and Si(l) = S2(l) = y,, although this can be improved upon by using"(8.49) and a leastsquares procedure; also see Ledolter and Abraham (1984) An excel'em review of recent developments in exponential smoothing is provided by Gardner (1985). s
8.21 The discounted leastsquares procedure may be applied to higherorder polynomials; use of a quadratic in (8.44) produces triple exponential smooth
The Holt Winters seasonal model
133
ing with an underlying model which is a special case of an ARIMA(0,3,3) scheme of the form (1 B)3yt = (\l3B)3et.
(8.50)
Use of (8.50) or higher orders is not recommended in general since the quadratic effects can produce rather wild oscillations in the forecast function. Ameen and Harrison (1984) have developed a method of discount weighted estimation which enables different discount factors to be attached to different components, such as intercept, slope or seasonal effect. This considerably increases the flexibility of discounting methods while still allowing the forecast function to incorporate a startup phase.
Holt’s method 8.22
A key feature in fitting local polynomials and in using discounted least squares is the notion that the forecasts should be ‘adaptive’, in the sense that the loworder polynomials used for extrapolation have coefficients that are modified with each observation. Holt (1957) took this idea a step further by suggesting a forecast function of the form y,{k) = a0(t) + kai(t),
(8.51)
where the a,(t) are updated according to the relations ao(t) = OL\yt + (1  ai){tfo(f  1) + a\{t — 1))
(8.52)
a\(t) = ai{ao{t)  a0(t  1)} + (1  a2)a\{t  1).
(8.53)
Expression (8.52) is a weighted average of the new observation and its previous onestep ahead forecast, as for single exponential smoothing; (8.53) represents a similar average for the slope coefficient. When ou = a(2a) and a2 = a/(2  a), Holt’s model is equivalent to double exponential smoothing. Generally (8.51)—(8.53) are equivalent to the forecast functions for an ARIMA(0, 2, 2) scheme with two MA parameters; see Exercise 8.10. As noted in Gardner (1985), (8.52)—(8.53) may be expressed in errorcorrection form as ao(t) = a0(t  1) + a\(t  1) + one,
(8.54)
a\{t) = a\(t 1) + ona2et,
(8.55)
where e, = ytyti(l). This form is generally easier to use for updating purposes. Values of an, a2, a0(l), and fli(l) are required to start; typical values would be 0.02 ^ on, a2 ^ 0.20 and oo(l) = Ti> (0 = T2 ~ Ti When the length of series allows, better values can be obtained by fitting the ARIMA model; see also Ledolter and Abraham (1984).
The HoltWinters seasonal model 8.23 When the data exhibit seasonal behaviour, several alternatives to ARIMA models exist. A direct extension of Holt’s method due to Winters
134
Forecasting
(1960) and often termed the Holt—Winters method, is to consider the forecast function yt(k)= [a0(t) + kai(t)}c(t+ ks),
(8.56)
where c(t) is a multiplicative seasonal effect and s is the number of points in the year at which the series is observed (e.g. 12 for monthly data). The updating formulae are ao(t)
= ai
yt + (1  cn){a0(t  1) + ai(t  1)) c(ts)
ai(t) = a2{a0(t) a0(t  1)) + (1 a2)ai(t 1) c(t)  «3
ao(t)
+ (1  cx3)c(t  5).
(8.57) (8.58) (8.59)
There is also an additive version of the Holt—Winters model where c(t + k — s) is added, rather than multiplied, in (8.56). Expressions (8.57) and (8 59) then involve the differences y,  c(t  s) and yt  a0(t) rather than the ratios. The multiplicative version is nonlinear and clearly not an A RIM A scheme. The additive version of (8.56)—(8.59) was expressed in ARIMA form by McKenzie (1976), but in a form that would not be identified by any standard procedures. However, the additive model may be expressed in overdifferenced form as Ji + 2 ' V Vys  (1  BiB  d2B2  0SB  Qs+lBs+t — 6s+2Bs+z)e,
= 0{B)zt,
(8.60)
overdifferenced in that d(B) in (8.60) has a unit root. If a factor (1  B) is removed from both sides of (8.60), McKenzie’s form results. One advantage of expression (8.60) is that the forecast mean square error can be developed directly using the approach of Section 8.8. The simpler HoltWinters model Wd“ sl0Pe component, is considered explicitly in Exercises 8.9 and 8 11 ’ Hnltir i ( 978 co,nsiders Parameter estimation and startup values for the Holt Winters mulnphcatwe scheme, noting that the ‘recommended’ values of u.uz ^ on 0.20 are frequently inappropriate; also see Gardner (1985).
Harrison’s seasonal model 8.24 One of the potential problems of the HoltWinters procedures is the stability of the seasonal factors. Numerical studies by Sweet (1985) suggest that this is not a major problem when s = 4 but is much more troublesome when 5 = 12. One solution is to smooth the seasonal factors. Harrison (1965) modified the HoltWinters model by expressing the seasona element in terms of harmonics, similar to Burman’s treatment of seasonals described in Sections 10.3334. Let the forecast function be y>(k)= a0(t) + ka,(t)+ cj+k(t), where the trend values are given by (8.54) and (8.55) with a,=a(2 ni. _ 1 that is, Brown’s scheme. The seasonal index j is defined*modulo
OilQ!2 = o: • that ic
Evaluating forecasting methods
135
(5); that is, j — ji when t = sj\ + ji and the seasonal elements are Cj(t)
=
Yj r
{gr(/)cos
r\i
+
hr(t)sin
rX,j,
(8.61)
with X/ — — s
(/  1)  7r,
i
=
(t +
j) modulo (s),
for j = 1,2, Recall that cos(27r + X) = cos X and the same for the sine. The summation in (8.61) runs over r= 1,2, where M= \ s when s is even and j(s  1) when 5 is odd; Hm = 0 when 5 is even. This construction ensures that Ej=1 Cj(t) = 0. Further, the seasonals are smoothed by retaining only those harmonics found to be significant. The coefficients in (8.60) are updated by: gr(t) = gr{t  1) + (3 *el cos r\j
(8.62a)
hr(t) = hr(t  1) + d *e{ sin r\j,
(8.62b)
where (3* = 2/3/s, 13 being a smoothing constant and el = yt  a0(t)  Cj(t  1). The constraint S cj(t) = 0 is not met by the HoltWinters seasonals, which is sometimes seen as a weakness. Roberts (1982) gives a modified HW scheme that incorporates this constraint.
Evaluating forecasting methods 8.25 The best way to gauge the reliability of a forecasting method is to consider its performance over a period of time beyond that for which the model was fitted; residuals from fitted values have been found to be generally unreliable as a basis for evaluating methods since a particular model may fit historical data well but be unreliable for extrapolation purposes. As an illustration, we used the sheep series (Table 1.2); the first 69 observations were used for model fitting and then the values of years 193639 were forecast using 1935 as the forecast origin. The results are given in Table 8.1 together with the predicton intervals. Similar sets of forecasts may
Table 8.1 Forecasts for 1 93639 for the sheep series, using the ARIMA (3, 1,0) model fitted over 18671935 with 1935 as forecast origin 95 Percent prediction limits Absolute
Percentage
Year
Forecast
Lower
Upper
Actual
error
error
1936 1937 1938 1939
1 1 1 1
1 1 1 1
1 830 2011 2 105 2 108
1 1 1 1
35 150 12 18
2.1 9.2 0.7 1.0
701 777 803 779
571 544 501 449
665 627 791 797
136
Forecasting
be generated by other methods, and a summary of the errors for several different techniques appear in Table 8.2. These errors may be summarised using the mean square error (FMSE), mean absolute error (FMAE), and mean absolute percentage error (FMAPE) criteria introduced earlier; the results are given in Table 8.3. This procedure is followed by way of illustration only; in general, we should follow the steps outlined in Section 8.28 below. Finally, to assist comparisons, the coefficients for the four ytk values in y,( 1)
Table 8.2 Actual values and forecast errors for seven different forecasting procedures for the sheep series 193639; forecasts made using data for 18671935 with 1935 as forecast origin
Year
Actual value Forecast method ARIMA (3, 1,0) ARIMA (2,0,0) SES Default (a = 0.200) Fitted (a = 0.515) DES Default (a = 0.106) i Fitted (a = 0.762) Holt Default (a, = a2 = 0.106) Fitted (a, = 1.503, a2 = 0.015)
1936
1937
1 665
1 627
1938
1939
1 791
1 797
Errors 35 2
150 69
12 63
18 40
22 16
60 54
104 110
110 116
10 68
26 76
140 287
148 339
87 9
51 37
217 137
225 153
Notes:
(1) SadrteS Sin8‘e exponential soothing and DES is double exponential smoothing (2) means parameter values were set to default values; in the SAS procedure FORECAST the default values are a = 0.2 for SES and a or 1  a, = (0.8)005 = 0.894 for DES and for Holt’s method. (3) mnerlpttelValUeS ^ SE?.and HoIt’s methods are given by fitting the corresponding ARIMA models, these estimates he outside the usual ranges for these parameters.
Table 8.3 Comparison of forecast methods using different error criteria
Model
(FMSE)1/2
FMAE
FMAPE
ARIMA (3, 1,0) ARIMA (2,0,0) SES: Default SES: Fitted DES: Default DES: Fitted Holt: Default Holt: Fitted
78.1 50.9 82.3 84.9 103.0 227.8 164.2 104.3
54.1 43.5 74.0 74.0 81.1 192.5 145.0 83.8
3.3 2.5 4.2 4.2 4.6 10.9 8.3 4.7
Evaluating forecasting methods
137
Table 8.4 The weights assigned to the first four AR terms by each method [in yf(1)]
Coefficient Method
yt
ARIMA (3, 1,0) ARIMA (2,0,0) SES: Default SES: Fitted DES: Default DES: Fitted Holt: Default Holt: Fitted
1.50 1.40 0.20 1.51 0.21 0.76 0.12 1.53
yti
yt2
yt3
0.77 0.50 0.16 0.78 0.18 0.22 0.11 0.78
0.05 0.00 0.13 0.40 0.15 0.19 0.11 0.40
0.32 0.00 0.10 0.21 0.12 0.08 0.10 0.21
are given in Table 8.4. We first compare results for the sheep series, then turn to more general issues. 8.26 The results in Table 8.1 are rather encouraging although the prediction intervals appear wide at first sight. However, the ARIMA(3,1,0) model yields step ahead forecasts generated for each forecast origin. At one time this was a very tedious endeavour, but with modern computing facilities it is no longer unduly onerous. Finally, several criteria should be used to evaluate performance at each step ahead as well as the overall averages which we used in Tables 8.3 and 8.7. Table 8.5 Forecasts for 1970 for the airlines series, using the AR/MA (0, 1, 1)(0, 1, 1)12 model fitted over 196369 with December 1969 as forecast origin
95 percent prediction limits Month
Forecast
January February March April May June July August September October November December
10 796 10 396 12411 11 660 13 356 14 679 14 262 14 744 15 592 12 933 11 381 12 553
Lower
Upper
Actual
9 334 8 910 10 902 10 128 11 802 13 102 12 663 13 124 13 951 11 271 9 699 10 849
12 259 11 882 13 920 13 192 14 910 16 255 15 860 16 363 17 233 14 596 13 064 14 256
10 840 10 436 13 589 13 402 13 103 14 933 14 147 14 057 16 234 12 389 11 595 12 772
datMoMSB?’!!? “"TF ,h? airlin6S Series ,or 1 97°: forecasts made using data for 196369, with December 1969 as forecast origin 9
January February March April May June July August September October November December
44 40 1 178 1 742 253 254 115 687 642 544 214 219
881 851 2 172 2 352 616 1 215 764 293 1 890 540 1 228 1 816
114 34 1 146 1 237 657 213 659  1 184 375 729 83 373
0.4 0.4 8.7 13.0 1.9 1.7 0.8 4.9 4.0 4.4 1.8 1.7 
8.1 8.1
1.1
0.3 8.4 9.2 5.0  1.4 4.7 8.4 2.3 5.9 0.7 2.9
16.0 17.5 4.7 8.1 5.4 2.1 11.6
4.4 10.6 14.2
Notes: (1) The ARIMA model is (0, 1,1)(0 1 1)12
G, (9.20)
Vr =
0'
0,1
and var(jj,) = o2Q,. In (9.20) 0 is an (r  1) x 1 vector of zeros and Ir1 is the identity matrix of order (r — 1). If p < r or q + 1 < r, the corresponding 0 or 0 coefficients are set equal to zero.
Example 9.4
Consider the ARIMA model with p = 2 and q = 1: yi = cj)iy, — i
+ (j)2y,2 + £f  0i£/i.
(9.21)
Representation (9.18)—(9.20) has r 2 and I1' =
yi t
(9.22)
Vt =
P2t
G,= G =
01
1
02
0
and
Qt =
l
0,
0i
(9.23)
Several features are worth noting from this example. The matrices G, and Q, involve the AR and MA parameters, so that our present discussion involves updating the system given these parameters. We shall consider parameter estimation briefly in Section 9.13. Secondly, we note that the covariance matrix
148
Statespace models
a Q, is singular; this does not pose any problems since, as we shall note, no matrix inversions are necessary. Finally, we note that the representation for an AR{2) scheme would follow by setting 0, = 0 or an ARMA( 1, 1) by setting 2 = 0; in each case, the number of state variable is r= max(p, q + 1) and the element of redundancy does not affect the basic approach.
Recursive updating 9.7 A major attraction of the statespace approach is the ease with which estimates and forecasts may be updated. We state the general results here and then illustrate them with an example. For more detailed discussions, see Harvey (1981, pp. 104110). Consider the process specified by (9.1)—(9.6); that is,
yt — x/ fit + 8t
(9.24)
Ht — Gtfiti + iit,
(9.25)
where 8t ~ N(0, o1ht) and r/t ~ N(0, o2 3 4 5Qt) with all appropriate correlations zero and ht, Qt known. Let m,_i denote the estimator for nt\ at time t  1 and mr_i ~ N(nti, o1Pti),
(9.26)
where P,i is assumed known. The key steps in the recursive scheme are as follows: (1)
Calculate the onestep ahead forecast for the state variables from (9 25) We write this as m//ri =
(2)
(9.27)
From (9.25) and (9.26), this has covariance matrix o2R, where Rr = GtPtiGl + Qt.
(3)
(9.28)
From (9.24) and (9.27), the onestep ahead forecast for the next observation is Tfi(l) = x/=\! Gtm,i.
(4)
(9.29)
The onestep ahead forecast error is et = yt Tfi(l) = yt x/
(9.30)
with FMSE a2ft, where
ft = ht + x’t R,xt. (5)
(9.31)
The updated estimator up for fit is then given by %
m, = m ,/ti + kte,,
(9.32)
where k, is sometimes known as the (Kalman) gain vector (or matrix in the multivariate case):
k, = Rtx,j f.
(9.33)
Recursive updating
149
It then follows that the distribution of m, is mt~N(pt,o2Pt)
(9.34)
P, = Rt R'X'X,' R,lf,.
(9.35)
with
Finally, the onestep ahead forecast for y,+1 is y,(l) = x/+iG,+ im,.
(9.36)
The forecasts are unbiased and the forecast mean square error is a2(hr+i + xl+iR, + i\,+ i).
(9.37)
9.8 The effect of equations (9.26)(9.37) is that given the starting values m,_ i and the new observation yt, we can derive the new estimate m, and hence the new forecast >>,(1). The updating expressions (9.27)—(9.33) enable us to move from (mr_ i, Pti, yti(l)) to the updated values (m,, P,, y?(l)) Furthermore, this set of calculations may be executed without any matrix inversion operations, thereby enabling high speed and considerable numerical accuracy to be achieved. The overall framework seems cumbersome, but the set of updating equations is so flexible that a large number of different problems can be handled in this way. In order to see what is happening, let us consider a specific example.
Example 9.5
Consider the statespace scheme formulated in Example 9.2: yt = /it
+ 5,
Mr = Mrl + Vi
with ht= 1, Q,= q = o2lo2, x, = G, = 1, and scalars P, = p,, R, = r,. Then, from (9.279.33), we obtain mt/t1 = mt1 r, = pt \ + q yt1 (1) = m,i et = yt mt\ and ft = 1 + n leading to m, = m,1 + kteh where k, = n/ft, so that mt = mt~\ + rt(yt  mt\)jft,
(9.38)
Further, from (9.35) pt = rt — r2t\fu and finally, yt{\) = m,.
(9.39)
Thus, we may start from time zero with values mo and po and generate updated estimates and new forecasts as each observation becomes available.
150
Statespace models
Equation (9.38) and (9.39) combine to give a forecast that is a weighted average of m,\ and yt, but the weights change over time. However, if the process approaches a steady state wherein pr —■> p, we arrive at the forecast function .y,(l) = m, = (l
+ayt1,
(9.40)
where oi = (p + q)l(l + p + q).
(9.41)
Expression (9.40) is the forecast function for the ARIMA (0, 1, 1) scheme in accordance with Example 9.2. 9.9 A further benefit obtained from the statespace formulation is that the £step ahead estimator is just y,{k) = xl+kGI+ky,{k 1)
x/+kGt+kGt+k1 ••• Gt+ im?,
(9.42)
and the FMSE may be obtained in similar fashion. 9.10 In general, the forecasting system may be started up by obtaining the estimates from the first s (^r) observations when p is an (rx 1) vector. Alternatively, rather than obtain such estimates explicitly we may simply start off the iterative process with P0 = cl, where cis a large, but finite number. This is similar to using a diffuse prior in a Bayesian setting. The reason for using a arge cvalue is to ensure that the effect of the initial conditions dies away rapidly, while maintaining the numerical stability of the updating process over early observations. It is, in effect, a painless way of developing a starting solution from the first r observations and solves the problem of startup values considered earlier in Section 8.17.
Properties of the statespace approach 9.11 An immediate benefit of the statespace formulation is that we may generate onestep ahead recursive residuals: et+\ yt+iyt{\).
(9.43)
These residuals may be used to check for changing structure in the process over time. This feature is exploited by Brown et al. (1975) in their development of cuxum test statistics to check for the constancy of regression coefficients over
9 12
Another feature of the statespace formulation is that if we restrict attention to the special case given in Example 9.1, we arrive at uodatine H 9a50t "'ah" the mUltiple rfgression modeI> as developed originally by Placket? J0) Advanc.f in, oa'line digitial data recording make such updating ? oSl6’ whet1her in aircraft guidance, medical monitoring (cf. Smith and West 1987), or other continuously operating systems.
Estimation of ARIMA models 9.13 Given fhat ARIMA models may be represented in statespace form it is possible to develop estimators in recursive fashion once estimators for the
Structural models
151
unknown variance elements are included. Harvey and Phillips (1979) derived an exact maximum likelihood estimation procedure for ARIMA schemes based upon Harvey’s statespace formulation (see Section 9.6); Gardner et al. (1980) provide a computer algorithm to implement this procedure. The procedure appears to be numerically stable and to be competitive with the standard ARIMA algorithms in terms of running time. One advantage of this approach is the ability to handle missing observations since we may simply ‘update’ the estimates; that is, we may replace the missing observation by its onestep ahead forecast. For details of the procedure, see Harvey and Pierse (1984) and Jones (1985).
Structural models 9.14
Thus far in this chapter, we have sought to link statespace models to ARIMA schemes. However, there are obvious attractions in developing such models directly. In particular, Harvey (1984,1985) and Harvey and Durbin (1986) consider a class of structural models built up as follows. The obser¬ vation equation is yt — tk + Jr + Er,
(9.44)
where //,, y and e represent trend, seasonal, and irregular components, in much the spirit of Chapters 3 and 4. The state equations for the trend are level:
= yu.r — 1 + ftt1 +
slope:=
18/1
771 r,
(9.45)
+172/,
(9.46)
that is, the same as for Holt’s method in Section 8.22. The preferred approach for the seasonal component (with 5 seasons) is the trigonometric version defined in a similar fashion to Harrison’s (1965) scheme in Section 8.24. Let 5 = 2M or 2M+ 1 accordingly, as 5 is even or odd, then write M
7*=£7Jf
(947)
7=i
Given yj = 2irj/s, we have the updating scheme /77W \7it)
cos 7,
\s1n7j
sinTA/^^/M COS 7// \yj,ti)
(9.48)
\Uj,/
for j = 1,..., M, with ytn = 0 for s even. As in our earlier development, the 17s are mutually independent white noise errors. In the basic structural model, it is assumed that all the errors are normally distributed with constant variances: ■qn ~ IN(0, of ),
i= 1,2,3,
and
ayr(or a;*,) ~ IN(0,ol). (9.49)
The variances in (9.49) are known as the hyperparameters of the model. 9.15 The estimation of structural models follows the general ideas outlined in Section 9.13; for further details, see Harvey and Peters (1984). The structural model as described here has been implemented in the computer program STAMP (see Appendix D). This system allows the user to choose between seasonal models or, of course, to drop 7 for nonseasonal series. Individual variances (of) can be examined to determine whether the corresponding
152
Statespace models
Table 9.1 Structural model for airlines data
(a) Variance (or hyperparameter) estimates Stochastic slope Component
Estimate
level trend seasonal irregular
2.24 x 104 50.1 1.89 X103 1.80 x 105
Fixed slope /ratio 1.25 0.72 2.32 2.74
Estimate
/ratio
1.69 x 104 _
1.55
3.05 X 103 1.41 x 105
2.68 2.18
(b) Component estimates at 12/69 Component
Estimate
/ratio
Estimate
level slope harmonics (j,j*) (1)
12 565 75.3
44.1 2.14
12 441 53.5
 1 431  1 289 151 268 5 361 1 060 86 174 276 181
7.97 6.80 0.90 1.54 0.03 2.11 6.41 0.50 1.05 1.61 1.28
1 402 1 381 87 284 49 346 1 156  130 206 324 200
(1*)
(2) (2*)
(3) (3*) (4) (4*) (5) (5*) (6)
(c) Forecasts for 1970 from 12/69 Month Actual Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.
10 840 10 436 13 589 13 402 13 103 14 933 14 147 14 057 16 234 12 389 11 595 12 772
Forecast
Standardized residual
10 838 10 399 12 808 11 202 13 602 14 997 14 360 14 828 16 010 13 098 11 567 12 945
0.00 0.05 0.98 2.69 0.59 0.07 0.24 0.85 0.24 0.75 0.03 0.18
/statistic not comparable with stochastic trend case.
/ratio (6.49) 3.58 7.30 6.69 0.48 1.48 0.27 1.83 6.41 0.69 1.14 1.70  1.28
*
Structural models
153
components should be stochasitic or deterministic. Finally, we may use /tests to examine the magnitudes of individual components. 9.16 As an illustration of this approach, we once again consider the airlines data given in Table 1.3. The full model (9.44)(9.48) was fitted for the period 196369 and forecasts generated for 1970. The results are given in Table 9.1 and Figs 9.13. Part (a) of Table 9.1 indicates that the irregular (as expected) and seasonal components are clearly stochastic although the level and slope could conceivably be treated as deterministic. The results for a fixed slope component are also given in Table 9.1(a); the net effect is to increase the stochastic term for the seasonals, suggesting that some trend variation is now being treated as seasonal. Part (b) of Table 9.1 gives the values of the components at the end of the fitting period. The results for the two models are very similar. As expected, the level and slope terms are significant as are the harmonics for the yearly cycle (j, j* = 1). The highly significant harmonic at j = 4 is unexpected and indicates a changing seasonal pattern, or, possibly, the effects of outliers. We return to the discussion of this point in Section 13.10. 9.17 An advantage of the structural approach is that, once the hyperpar¬ ameters have been estimated, we can go back through the series and compute smoothed estimates for the individual components. We now present this analysis for the stochasticslope model. Figure 9.1 shows the trend component superimposed on the original series, which shows a high degree of smoothness except in mid1968. The seasonal component in Fig. 9.2 starts out smoothly but becomes increasingly erratic as time goes on. In Section 13.14, we examine how far this is due to the outliers and how far to a genuine shift in seasonal structure. Finally, Fig. 9.3 shows the irregular component, clearly highlighting the outliers for June and July 1968 and for April 1969. Returning to part (c) of Table 9.1, we see that the forecasts seem to perform quite well, save that the
Fig. 9.1
Actual values and trends component for structural model of the airline
data, 1 96369
154
Statespace models
Fig. 9.2
Fig. 9.3
Seasonal component for structural model of airline data, 196369
Irregular component for structural model of airline data, 196369
April 1970 value is distorted by the outlier the year before. This is also reflected in the forecast seasonal pattern in Fig. 9.2.
Structural or ARIMA modelling? 9.18 ARIMA models offer the advantage of a stronger modelbuilding paradigm whereas structural models offer clearer interpretations through the decomposition into components. Harvey and Todd (1983) and Harvey (1985)
Exercises
155
provide several applications of structural models and comparison with ARIMA schemes. The discussion following Harvey and Durbin (1986) pro¬ vides an appraisal of the methods. What clearly can be said is that both offer major advances over the crude filtering of polynomial trends or regression models that ignore temporal structure. 9.19 In conclusion, the decomposition ability of structural models is a major attraction; however, it should be noted that ARIMA models may also be decomposed, albeit with somewhat greater difficulty, using signal extraction methods. For details, see Burman (1980). Interestingly enough, Burman found that the ARIMA (0,1,1)(0, 1, l)s scheme often worked well in this context; this is very close to the structural model discussed in Exercise 9.3.
Exercises 9.1
If ut = 5t  8,1 + r]t, where the 6 and 77 terms are independent normal variates with means zero and variances a \ and respectively, show that fi, may be represented as an MA{ 1) scheme with ol = 2ol + o*
9.2 9.3
and
dol = o&.
Verify that (9.15)—(9.17) reduce to an ARIMA (0,2,2) process; that is, the forecast function corresponds to Holt’s method. Consider the seasonal statespace model Ut = fit +
7?
+ 8t
lit  fit1 + rju 71
9.4 9.5 9.6
=
7 ts
+
1721,
where all errors have zero means and are uncorrelated, with variances al, a 1, and a2, respectively. Show that VVsj>, is an MA scheme with nonzero autocorrelations at lags 1, 5 1, 5, and s+ 1. Compare this scheme with the models considered in Exercise 8.7 and 8.8. Extend the model in Exercise 9.3 to include a slope term, as in (9.15)—(9.17). Derive the random shock form of this model. Use (9.18)(9.20) to develop a statespace representation for an ARIMA (0,2,2) scheme. Consider the damped trend structural model given by rjt = [I, + e, lit
=
/it1
+ fit +
Vu
fir = (pfiti + 1721
9.7
with  (j>  < 1. If the error terms are stationary, show that this may be represented as an ARIMA (l, 1,2) scheme. If you have access to a statespace package, develop models for one or more of the series in Appendix Tables A1A8.
10 Spectrum analysis
Introduction 10.1
In earlier chapters, we have made periodic references to cyclical phenomena, which we defined as processes that repeat themselves in regular fashion. The simplest such example is the sine wave yt = a cos at + b sin at = c cos(at (f)),
(10.1)
where c2 = a2 + b2 and cos 0 = a/c since cos(o; + @) = cos a cos (3 — sin a sin (3.
(10.2)
The constant c is the amplitude of the series, as y, varies between the limits  c and + c; see Fig. 10.1. The coefficient a is the (angular) frequency since the cWvdee taZ'ff f T CyfkS r a time >* that l compIeleT one cycle m time X = 2tt I a; X is known as the wavelength, or period of the sine :“ally’ t‘S the Phase angIe and indicates how far the wave has been lfted from the natural origin (corresponding to 0 = 0). For example fronTnn ZZ aC°rert,thfe C?Sine function int0 a function. It follow^ from (10.1) that a phase shift of 0 corresponds to a time delay of d = 0/« since ytd= c cos{«(/ d)j = c cos(aT  0). It seems almost trivial to observe that c, a, and 0 are fixed numbers but it is important as the specification of these terms will change later. U.2 In this chapter, we use trigonometric functions to describe the time Ze18 t6Td frequencydomain representation.'in Sections • 10, we consider a finite set of cosine waves, leading to the Fourier representation As we shall show, this results in both inferential Ld struck problems which we resolve by considering all possible frequencies, 0 ^ a ^ tt 10210
th.e
Sections lVn Vand th ^ ?.PeCtrUm; the main ^ developed in Sections 10.11 20 and then estimation procedures are discussed in Sections
Fourier representation
157
+ C
L
—>Time
CJ X
Fig. 10.1
A sinusoidal wave with wavelength X
10.2131. Seasonal components are examined from the frequency viewpoint in Sections 10.3236 and several other topics are mentioned briefly in Sections 10.3741.
Fourier representation 10.3 A natural extension of (10.1) is to consider the time series y, (t = 1,2,..., n) as a sum of sine waves: m
yt = \aQ + XI (Oj cos ajt + bj sin ajt).
(10.3)
7=i
Again, the a, b, and a are fixed quantities, for the moment. We shall set ocj = 2tt j\n so that the yth cycle has wavelength \j = n\ j. Also, we take n to be odd (purely for convenience) and set m = (n  l)/2, so that the frequencies lie in the range 0 to tt. The factor \ in front of ao is also there for convenience and does not affect the basic argument. The selected frequencies, aj, are known as the harmonic frequencies. The reason that the harmonic frequencies are of such interest derives from the work of JeanBaptiste Joseph Fourier. Consider some function f{x) defined for a finite range of values which we can reduce to 7r^x^7rbya change of scale and origin. Then Fourier showed that f(x) may be represented by the series expansion oo
f{x) =  a0 +
X
(fly cos jx + bj sin jx)
(10.4)
7=1
provided only that f{x) is singlevalued, continuous except perhaps at a finite number of points, and possesses only a finite number of maxima and minima. Further, since 7T
cos rx sin sx dx = 0
(10.5)
— 7T
= 0, = IT,
(10.6)
158
Spectrum analysis
it follows that a./=l/7r
J
/(*)cos jx dx
(10.7)
/(Jt)sin jx dx.
(10.8)
— 7T
and bj
=
1/tt
Note that (10.6) holds for r = 0 also. 10.4 The Fourier representation of a time series is thus seen to be a way of expressing the n = 2m + 1 values as a weighted average of sine waves. Mathematically, the result holds for any sequence of values; statistically, it is useful because the successive values belong to a time series. It follows that, in a formal sense, any time series may be decomposed into a set of cycles based on the harmonic frequencies. This does not mean that the phenomenon under study displays cyclical behaviour.
10.5
In order to develop the statistical framework for Fourier analysis, we return to (10.3). In discrete time (10.5) yields, for the harmonic frequencies: 2 cos oijt sin oikt = 0 £ cos djt cos ukt = E sin cnjt sin akt = 0, = n/2,
tanVTn%win^bein8f0^er '= l>2> >n and J,k = 0, to (10.7)—(10.8), we find wo
(10.9) j ^ k j=k
(10.10)
Corresponding
■ x,
naj = 2 S yt cos atjt,
j= l,...,m;
nbj = 2 Tj y, sin ajt,
(10.12) (10.13)
tTmT n ^ 6Ven’ n = 2m + 2 say’ these same results hold except that we add the nam+i = 2 T, yt(l)‘,
(10.14)
since a,„+i = 7r; bm+1 = 0 since sin xt = 0 for all integer t. Finally since we shall be interested in the variation within the series rather than its mean level, we may use the deviations from the mean Ut = yt y. It is evident that the term a0 drops out since u = 0, whereas (10 121 no Ml continue to hold with ut in place of y, ( } (1°’14) 10.6 We saw from Fig. 10.1 that the coefficient Cj is of more natural interest than aj or bj. From (10.12) and (10.13), it follows that cf = aj + bj 2
^ln
n
n
S ^ urU$(cos ajt cos ajS + sin ajt sin ays).
(10.15)
sFnMMIn0:?; th,e C,°uine terms r?duce ,0 cos aJ« ~ s> If we write * = /  s and spin (10.15) into three pans, for k < 0, = 0 or >0, we obtain, after some
Fourier representation
159
reduction, n
cj = 4 jn‘
n
Y
u}
+
n\
Y Y u,ut+k cos
2
t= 1
otjk
(10.16)
r=l k = \
From (6.1) and (6.2), the autocovariances are •yo =
2
ufjn,
yk =
S
UtUt+k/n,
and the autocorrelations are pk = ykjyo so that (10.16) may be rewritten as 7!  1
ncj = 4 70 + 2
Y
k=l
(10.17)
7* cos
or n 1
ncf 14yo =1 + 2 Y pk cos a.jk.
(10.18)
k=1
The quantities /(a/)  ncj/4ir define the sample intensity function and are considered further in Section 10.18. 10.7 Another way of interpreting our analysis is to think of (10.3) as a linear regression and to fit the (2m + 1) parameters to the (2m + 1) observations by least squares. The coefficients are then given by (10.11)—(10.13) and the sum of squares may be written as Y u?= Y (yt  y)2 = Y (yt  a0)2 t=1
t=l
t=1 n
m
= 2 Y (aJ cos ajt + bj sin (a)) = 1 = var {w(a)j
(10.41)
and cov{w(a), w(a')} = 0,
a 5^ a',
(10.42)
at a = ai, ...,a;„. When a = 0 or tt, the variance is doubled. Although these results were originally derived under the assumption of normality, they extend to the nonnormal case (cf. Bartlett, 1978, although the result first appeared in the 1955 edition). Further, for general stationary processes, Bartlett showed that, asymptotically, £{#(«)} = w(a)
(10.43)
var{ w(a)j = w2(a)
(10.44)
and that w(a) is asymptotically distributed as x2(2). For further details, see Kendall et al. (1983, Sections 49.810). These rather remarkable results show that w{a) is an asymptotically unbiased estimator for w(a), but that it is not consistent! As we increase n, we increase the number of harmonics at which w is estimated, but each estimator still has only two degrees of freedom. Since we have n  1 asymptotically uncorrelated estimators, we are making full use of the available data; the problem arises because we are thinking in terms of the harmonic model (10.3) and allowing the number of parameters to increase directly with n. To overcome this, we must think in terms of the entire function w(a) and develop consistent estimators by some form of smoothing such as moving averages. This is similar to the use of histogram or kernel methods to produce estimates of the density function.
Examples of power spectra 10.19 Early users of spectrum analysis were unaware of the inconsistency of the estimators and this led to somewhat tortuous attempts to interpret the many (spurious) peaks that were observed. We now present several examples to illustrate the difficulties before going on to discuss smoothing procedures. The intensity function /(a) is plotted in these examples together with a smoothed estimator; it is evident that plotting 7 rather than w involves only a change of scale.
Example 10.5
The results for the simulated Yule scheme, given in Fig. 5.3, are shown in Fig. 10.3. The unsmoothed intensity values show a
166
Spectrum analysis
Fig.JO.3
The intensity and smoothed spectrum for the simulated Yule series (0i  1.1,02   0.5). A Parzen window  see (10.58)  was used for smoothing
very erratic pattern, whereas the smoothed estimator shows the general shape Exampte^ 1^^ & maximum near « = °10> as suggested by
Example 10.6
Our second example is the time series on wheat prices developed by Lord Beveridge (1921) and reproduced in Table 10.1. An annual series of 370 observations is unprecedented in economic research and the series is deservedly recognised as one of the most famous in the time series literature In his original analysis, Beveridge worked with what was then termed the penodogram  the plot of a multiple of the intensity function against the wavelength  although the term is now often used to refer to the (unsmoothed) intensity function plotted against frequency. Nevertheless, the two diagrams convey similar messages and Beveridge, working on the subject before the erratic nature of the estimates was understood, was led to identify 19 periodicities in the series, as illustrated in Fig. 10.4. By contrast, the smoothed estimate suggests that there are at most two. Indeed, the smoothed sample
1Ij.Flg: °'4 15 not so very different in its general shape from that in big. 10.3 It is interesting to note that both Kendall (1945) and Sargan (1953) thlestimatesat ^ ^ SCheme ^ appr0priate for this series Sargan gives
0i = 0.73,
02= 0.31
Examples of power spectra
167
Table 10.1 Trendfree wheatprice index (European prices) compiled by Lord (then
1500 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
106 1553 54 118 124 55 94 56 57 82 88 58 87 59 60 88 88 61 62 68 98 63 64 115 135 65 104 66 67 96 68 110 107 69 70 97 71 75 72 86 111 73 74 125 75 78 76 86 77 102 71 78 79 81 80 129 81 130 82 129 83 125 84 139 97 85 86 90 87 76 102 88 100 89 90 73 91 86 92 74 74 93 94 76 95 80 96 96 97 112 144 98
90 1606 07 100 08 123 156 09 71 10 71 11 12 81 84 13 97 14 15 105 90 16 17 78 112 18 100 19 20 86 77 21 22 80 23 93 24 112 25 131 26 158 27 113 28 89 87 29 30 87 31 79 32 90 90 33 34 87 35 83 36 85 37 76 110 38 39 161 40 97 41 84 42 106 43 111 44 97 45 108 46 100 47 119 48 131 143 49 50 138 112 51
81 1659 60 98 115 61 94 62 93 63 64 100 65 99 100 66 94 67 68 88 92 69 100 70 71 82 72 73 73 81 74 99 124 75 76 106 77 106 121 78 79 105 84 80 97 81 82 109 148 83 84 114 108 85 97 86 87 92 97 88 89 98 90 105 97 91 92 93 99 93 94 99 107 95 96 106 97 96 98 82 88 99 116 1700 122 01 134 02 03 119 04 136
104 1712 120 13 14 167 126 15 108 16 17 91 18 85 19 73 74 20 21 80 74 22 78 23 24 83 84 25 106 26 134 27 122 28 102 29 107 30 115 31 32 113 104 33 34 92 84 35 36 86 37 101 74 38 39 75 40 66 41 62 42 76 43 79 44 97 134 45 46 169 47 111 48 109 111 49 50 128 51 163 52 137 53 99 54 85 72 55 56 88 77 57
115 1765 134 66 67 108 68 90 69 89 89 70 94 71 72 107 89 73 74 79 91 75 94 76 77 110 111 78 103 79 94 80 101 81 82 90 96 83 84 80 85 76 84 86 87 91 94 88 101 89 90 93 91 91 122 92 93 159 94 110 90 95 81 96 84 97 102 98 102 99 100 1800 01 109 02 104 03 90 04 99 05 95 06 90 07 80 85 08 117 09 112 10
Index
Year
Index
Year
Index
Year
Index
Year
Index
Year
Index
Year
Index
Year
Sir William) Beveridge for the years 15001869
101 1818 106 19 20 113 21 108 22 108 131 23 24 136 25 119 106 26 27 105 88 28 84 29 94 30 87 31 32 79 87 33 34 88 94 35 94 36 37 92 85 38 84 39 40 93 41 108 42 108 43 86 44 78 87 45 85 46 47 103 48 130 49 95 84 50 87 51 52 120 139 53 54 117 105 55 94 56 57 125 114 58 98 59 60 93 94 61 62 94 104 63
94 86 84 76 77 71 71 69 82 93 114 103 110 105 82 80 78 82 88 102 117 107 95 101 92 88 92 115 139 90 80 74 78 86 105 138 141 138 107 82 81 97 116 107 92 79
continued
168
Spectrum analysis
Year
Index
Year
Index
Year
Index
Year
Index
99 97 80 90 90 80 77
52 53 54 55 56 57 58
102 72 63 76 75 77 103
05 06 07 08 09 10 11
66 64 69 125 175 108 103
58 59 60 61 62 63 64
95 91 88 100 97 88 95
11 12 13 14 15 16 17
140 121 96 96 130 178 126
Index
Index
80 99 54 1600 69 01 100 02 103 03 129 04 100 05
Year
Year
46 47 48 49 50 51 52
Index
Year
Table 10.1 (continued)
64 65 66 67 68 69
81 94 119 118 93 102
Fig. 10.4 Intensity and smoothed spectrum of the Beveridge wheat  price index series (Table 10.1)
suggesting a maximum at a = 0.69 or about 0.11 cycles per unit time. Granger an Hughes (1971) reanalysed the data, the pointing out that the detrended senes was computed as a ratio of observed valuesto a 31point moving average, which could have induced cyclical behaviour. They found that the raw data exhibited a strong peak in the spectrum with a period of about 13 3 years or 0.075 cycles per unit time.
Smoothing the estimates a J6 "°W C°nSider the pr°CeSS of smoothi"* Wa(oc)
 Z
h(a

acj)w(otj).
define the smoothed
(10.45)
Smoothing the estimates
169
where ocj = 2ir jfn and h is a weighting function chosen such that (u = a.  ocj): h(u) = h(u) ^ h(0) £ h(u) = 1 h(u) = 0,
 u  ^ m.
These choices are made on grounds of statistical and computational efficiency as we shall see later. It follows from (10.45) that even for large n E{\Va{oi)}
5* w(a)
(10.46)
unless h(u) = 0 for all u ^ 0 or w(o;7) = w(aj) = w(a) for all ocj with nonzero weights. Thus, we are introducing biased estimators to achieve consistency. Assuming w(a) to be reasonably smooth, a small bias requires averaging over only a few adjacent frequencies whereas a low variance is achieved by averaging over a considerable number of values. This tradeoff is achieved through the weighting function h(u), known as a kernel or spectral window. 10.21 In large samples, we may choose m to be of order, say n1/2, so that the estimators are both asymptotically unbiased and consistent. It then follows that var { hu(a)) — w2(a)£ h2(u)
(10.47)
var(log Ht4(a)) = £ h2(u).
(10.48)
or
These expressions may be used to construct approximate confidence intervals for w, (10.48) having the advantage that the bounds are of constant width. When the logarithm is plotted, it is sometimes desirable to omit the lower part of the waxis to improve the graphical representation, as the estimates may be very small. 10.22 An alternative specification of the smoothing process is to express w in terms of the serial correlations so that r
wA(oc)  Yj
n 1
h{oLuj) 1+2 Yj rk cos kaj[. k=
f
(10.49)
1
Changing the order of summation leads to n 1 Wa(oi)
 Xo + 2
Yj k=1
^krk
cos
ka,
(10.50)
where \k = h(0) + 2 Y h(u)cos uk.
(10.51)
u= 1
Similarly, for — ir ^ u ^ 7r, h(u) — (2tt)~1 [Xo + 2 £ \k cos uk].
(10.52)
The set consisting of the weights \k is known as the lag window. Often the \k are chosen directly with the constraint that X* = 0, k> M\ M is known as the truncation point. 10.23 A considerable variety of lag and spectral windows has been con¬ structed over the years and we now review those in common use.
170
Spectrum analysis
Daniell window (Daniell 1946). If we set h(u)=llm,
u = 0, ±1,±q,
m = 2q+\,
(10.53)
we find from (10.51) that Xo = 1
and ^ __sin(7rmk[n)_ m sin(7rkin)’
k = 1,2,n  1.
(10.54)
This window is designed for use in the frequency domain with (10.45), since there is no truncation in the lag window. It always produces nonnegative estimates, as is clear from (10.45). Tukey window (Blackman and Tukey 1958). We start with
X* = (1  2a) + 2a cos(tt£/A/),
k = 0,1,
(10.55)
When a = 0.25, this is called ‘hanning’ (after Julius Von Hann) and when a = 0.23, ‘hamming’ (after R. W. Hamming). This lag window may be represented in the frequency domain as HU (a)  aw i(a  tt/M) + (1  2a)w\(a) + awx(a + tr/M).
(10.56)
for
« = 2try'/M,
y=l,...,Ml
with wA(0) = (1

2a)m(0)
+
2aw\ (tt/M)
Wa(tv) = 2aw\ (tt  7r/M) + (1  2a)wi(ir)
and
H’i(a) = 1 + 2 X] rk cos ak. k=\
(10.57)
That is, we estimate the spectrum using only the first M autocorrelations as in (10.57), and then smooth further using (10.56). The Tukey windows have the advantage of easy application in both domains; there is little practical difference between ‘hanning’ and ‘hamming’. It is, however, possible for the resulting estimates to be negative for some frequencies. Parzen window (Parzen 1961; 1963). Setting
\k =
M
+ 6 T M ’
0 ^ X yt\ + e?
and we apply the filter vt = yt ayti.
(10.66)
It follows from (10.35) that wv(a) = const.
(1  az)( 1  az *) (l^a^1)’
where z — eia. This reduces to 1 + a2  2a cos a 1 + 2  20 cos a
(10.67)
which gets progressively flatter as a gets close to and, of course, produces a flat spectrum corresponding to white noise when a = 0. Examination of the numerator of (10.67) indicates the pattern of values in Table 10.3. Thus, when a is near +1, the filter (10.66) is called a highpass filter since it ‘passes on’ the highfrequency signals but severely dampens the signal at the low frequencies. Conversely, when a is near 1, (10.66) represents a lowpass filter. We can see from (10.67) that if a’series is nonstationary in the mean, its spectrum will be sharply peaked (theoretically infinite) in the neighbourhood of the origin, stressing the need for trendremoval prior to estimation. More generally, filters such as (10.66) may be used to remove dominant autocorrelation structures in the series; such a process is known as pre¬ whitening since it produces a series closer to white noise. From the viewpoint of estimating the spectrum, the advantage of prewhitening is that the bias of the estimates can be reduced. Parzen (1969; 1983) recommended using a highorder AR scheme to remove the dominant structure in the series; standard methods may then be used to estimate the spectrum of the residuals! The spectrum for the original series is then obtained using (10.65) and may be expected to show less bias. We shall make heavy use of prewhitening in Chapter 11 when it is required for a somewhat different purpose.
Table 10.3 Value of a Value of a
near zero
near tr
near +1 near  1
near zero near one
near one near zero
Seasonality and harmonic components
177
Complex demodulation 10.32 If interest focuses upon a particular frequency, such as the seasonal frequency, we may ‘shift’ the series to a new frequency origin by the method of complex demodulation. That is, we define the new series Z(0 = yt exp(itxoO,
(10.68)
where ao is the frequency of interest. Typically, we would centre the series about its mean before making the transformation; note that Z(t) is complex rather than realvalued. A lowpass filter may then be applied to Z(t) prior to estimating its spectrum. This has the effect of enhancing the signal in the region of frequency ao, much as we enhance a radio signal at the desired frequency. This procedure generates improved estimates of the spectrum is the neighbourhood of ao For further details, see Hasan (1983).
Seasonality and harmonic components 10.33 We can now return to a topic which we left on one side in Chapter 4, namely the effects of trendremoval procedures upon the cyclical components of the series. Consider a symmetric moving average of order 2m + 1 which we may represent with weights am to a,„. Its frequencyresponse function reduces to m
T(a)=Oo
+2 X j=
aJ cos joc.
(10.69)
i
The spectrum for the detrended series is given by Wy(a) = wy(a)[ 1  T(a)){l  T(a))
(10.70)
since the detrended values are
v, = y,  E ajy,j. For example, consider a centred moving average of 12 terms and a Spencer 15point average. Table 10.4 shows their frequency response functions. For example, with an angular frequency of 7r/4 or 45°, the value of (10.69) for the centred moving average is [2 + 2{cos 45° + 2 cos 90° + 2 cos 135° + 2 cos 180° + 2 cos 225° + cos 270°}]  £ {2 + 2(2  J2)j = 0.201. The values of the transfer function of the centred average fall to 10 percent, or less after an angular frequency of 45°, corresponding to cycles per unit (month) of  or a wavelength of 8 months. For cycles of shorter wavelength, the spectrum of the series after trend removal is not greatly affected; for larger cycles the effect is substantial and they would be mostly removed with the trend, if any. The Spencer 15point is of the same kind and on the whole distorts the residual spectrum rather less. The general effect of either is to remove with the trend some or all of the long cycles, but to leave the shorter ones largely untouched.
178
Spectrum analysis
Table 10.4 Transfer functions of a centred 12point average and a Spencer 15point average (Burman, 1965) Angular frequency (degrees)
TF centred 12point
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
0.955 0.824 0.633 0.409 0.188 0 0.133 0.198 0.201 0.155 0.080 0 0.065 0.103 0.109 0.086 0.045 0
TF Spencer 15point
1.000
1.000 1.000 1.003 0.984 0.952 0.895 0.809 0.696 0.564 0.425 0.293 0.180 0.094 0.037 0.006 0.005 0.005 0.002 0
Angular frequency (degrees)
TF centred 12point
TF Spencer 15point
95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180
0.038 0.061  0.064 0.051 0.027 0 0.022 0.034 0.034 0.026 0.013 0 0.009 0.013 0.011 0.006 0.002 0
0.002 0.007 0.012 0.015 0.016 0.013 0.008 0.003 0 0.001 0 0.003  0.005 0.005 0.004 0.003 0.001 0
10.34 With a suitable trendremoval method, therefore, we can proceed to consider the seasonal component secure in the knowledge that the higher frequencies are only slightly affected. Moreover, since by definition seasonality is strictly periodic, we expect to be able to represent it by a sum of harmonics, namely with regular frequencies ak = litkjn and wavelengths 12, 6, 4, 3, 2.4,’ and 2 months. Thus, if xt is the deviation from trend of the rth month in a set of 12p months, we have
2 (*j =
2 bj =
l2p
X 3Ct cos (Xjt,
12p
X x, sin ajt, 1
j = 1,2,..., 5
(10.71)
, ,
(10.72)
j — 1 2 ..., 5
12 p
= — X xt cos a6f.
(10.73)
The seasonal movement will then be represented by a set of 11 constants. 10.35 Now that the trend and seasonal components have been filtered out, we may estimate the spectrum to look for other cyclical behaviour. If the spectrum is estimated without making any seasonal adjustments, we expect a major peak at the seasonal frequency plus echoes at higher frequencies, as in Fig. 10.6. A slight modification enables us to take account of moving seasonal effects. We may, in fact, analyse the series by year, taking p= 1 in (10.71) (10.73), and hence obtain 11 annual series of the 11 constants. Each of
Inverse autocorrelations
179
these series can be smoothed and extrapolated and the resulting values used to estimate the seasonal component for any given year. Since the Fourier constants are uncorrelated, the smoothing can proceed independently for each series. Burman (1965; 1966) developed such a method but using a 13point average instead of the Spencer 15point. Missing values at the ends of the series, caused by taking the moving average, were fitted using exponential smoothing. More recently, Burman (1980) has developed a signal extraction procedure for seasonal adjustment, based upon a decomposition of the spectrum. This was briefly discussed in Section 9.18. Also, it should be noted that the treatment of seasonal patterns in structural models is based upon the systematic updating of (10.71)—(10.73); see (9.48). 10.36 Spectral methods are also useful in determining the performance of seasonal adjustment procedures such as the US Bureau of the Census Xll method. Nerlove (1964) showed the then current procedure (X10) to have certain defects which have since been remedied. In general, any seasonal adjustment process will tend to remove ‘too much’ from the seasonal frequencies (Grether and Nerlove, 1970), although this does not seem to be a major problem in practice for the procedures now employed. For further discussion of seasonal adjustment procedures, see Cleveland (1983).
Evolutionary spectra 10.37 In Section 10.30 we suggested computing harmonic components a year at a time. This notion has been generalised and extended by Priestley (1965) to what is known as the evolutionary spectrum. For details and further referen¬ ces, see Kendall et al. (1983, 49.3749.38).
Inverse autocorrelations 10.38 Starting with the spectrum in (10.35) for an ARMA scheme we may define its inverse as w7 (a) = [w(a:)]
(10.74)
1
The inverse autocorrelations, introduced in Section 6.25, are then defined as (10.75)
p\{k) = yi(k)lyi(0),
where oo
(10.76)
wi(a) = XI TiU)zJ. It follows that pi(k) = pi(k).
Example 10.8
The AR( 1) scheme has w(a) = or2/[7o(l 
1 
Hence wi(a) — a
27o(1  z)(l  4>Z *)
and pi(l) = — 0,
pi(Ar) = 0,
k> 1.
180
Spectrum analysis
In general, the inverse ACF behaves like the partial ACF, but with the signs reversed. 10.39 There are two approaches to estimating the IACF. We may estimate the spectrum, and then determine the p\(k) by numerical inversion of wi(a) using the analogue of (10.28). Alternatively, we may fit a highorder auto¬ regression (following Parzen 1969) and then generate the p\(k) from (10.75) and (10.76). For further discussion, see Chatfield (1979) and Bhansali (1983).
Forecasting 10.40 Instead of developing the forecast function using a timedomain model, we may work in the frequency domain and then construct the forecast function for the spectrum; this is known as the WienerKolmogorov filter. For details, see Bhansali and Karavellas (1983).
Further reading 10.41 In addition to the references given earlier, the volume of review papers edited by Brillinger and Krishnaiah (1983) is a valuable source of current information and further references.
Exercises 10.1
Sketch, as functions of t, cos(7rf/4)
10.2 10.3
10.4 10.5 10.6 10.7 10.8
and
cos(97rf/4).
Observe that the two curves have the same values at t = 0, ±1, ±2,.... (Refer to Section 10.17 for an interpretation.) Show that (10.16) follows from (10.15). Verify the inversion formula (10.28) by substituting for w(a) using (10.26) and performing the integration. (Hint: It follows from (10.6) that cos rx cos sx dx= 0, r ^ s; = 7r, r = s.) Verify that the spectrum for an AR( 1) scheme is given by (10.36). Verify that the spectrum for an AR(2) scheme is given by (10.37) and that the spectrum shows a peak at a given by (10.38). Find the spectrum for an MA(2) and show that it has a maximum (minimum) at cos a = ^(1  02)/402 if d2 > 0 (02 < 0). Find the inverse ACF for the MA (1) scheme. Use an available statistical package to estimate the spectrum for one or more of the following series: (a) the sheep data (Table 1.2); (b) the airline data (Table 1.3); (c) the Financial Times Index (Table 1.6); (d) any of the series in Appendix A, Tables A1A8. Recall that the series should be stationary, you should try estimation with and without differencing and also try different levels of smoothing.
11
Transfer functions: models and identification
11.1 Thus far, we have considered only relationships between successive terms of a single series. By contrast, traditional correlation measures and regression models deal only with crosssectional relationships between vari¬ ables that ignore the time dimension. Our purpose in this chapter is to develop measures of association and statistical models that incoporate both timeseries and crosssectional components of dependence. For the most part, we restrict attention to only two series, yu and y2t, although the extension to m > 2 series is straightforward. The association measures are then used to develop model identification procedures.
Cross correlations 11.2
Assume that yu and y2t are stationary series such that
E(yjt) = iiJ,
a11)
and varCy/f) = 7j = of,
j = 1,2.
(11.2)
We define the crosscovariance between y\t and y2,tk as covCyir, yi,tk)
=
E(yuy2,tk)  pip2 = 712(k).
(113)
The crosscorrelation is then pn(k) — yn(k)l1
K
(11.10)
_ 00
Example 11.1
Suppose that y\t and y2t are both A/?(l) schemes with parameters 01 and 02, respectively. It follows from (11.10) that var{r12(A:)} =
1 + 0102 (n k)( 1
0!02)
Further, when 0i = 02 = 0, we find from (11.9) and (11.11) that con>12(£),/I2(£ + 5)} = 02{(s+ l)(s 1)02}/(1 +02).
Table 11.1 Value of 0
0
0.5
0.9
(n  k)var(r) (11.1), s = 1 (11.1), s = 5
1 0 0
1.67 0.80 0.13
9.53 0.994 0.900
0.5 1.67 0.80 0.13
0.9 9.53 0.994 0.900
(11.11)
Crossspectra
183
The numerical values in Table 11.1 indicate the behaviour of these functions (when 0i = 4>2 = 0). These figures illustrate the very high correlations between successive terms in the CCF, a problem we shall address in Section 11.9. Note that var(r) — 1 /(/? — k) when both series are white noise.
Crossspectra 11.5
We may define the crossspectrum between the two series as wn(a) = Zpn(k)eiak,
(11.12)
where 0 < a < 7r and Wi2(a) = w2i(a). This has the inverse transformation Pi2(*)
= £ wu(ct)eiak, 7T
(11.13)
by analogy with (10.32). The crossspectrum is a complex function and is not readily interpretable in its original form. However, it may be decomposed into its real and imaginary parts as follows: wq2(a) = p12(0) + 2 [pl2(k)cos ak + pi2(fc)cos ok) + i £ {pi2(£)sin ok  pi2(A:)sin ak) = c(a) + iq(o),
(1114)
c(a)= 1 + £ lpiz(k) + pi2(— A:))cos ok
(11.15)
where
9(a) =
£ [pi2(k)pi2('k)}sinak.
(11.16)
The quantity c(a) is known as the cospectrum or cospectral density, whereas q(a) is known as the quadrature spectrum. The sum of squares c2(a) + q2(o) is called the amplitude of the spectrum. If we standardise by division by the separate spectral densities of the two series we obtain the coherence, namely C(a) = c2(a) + y2(a)= lWl2(0
the yvalues would be To = 0o, yi = 0o + 0u ...,yt = 0o + 0i + ■■■ + 0k,
t ^ k.
The quantity G — 0o + 0i + ■ ■ ■ + 0 k
(11 25)
is termed the gain, that is, the total change in y for a unit change in x. The general term for models such as (11.24) is (timedomain) transfer functions. Note that the terminology is similar to that used in our discussion of crossspectra in Section 11.6, but we are now working in the time domain. 11.11 Suppose now that the error terms are negligible, so that we may concentrate upon the form yt = a: + 0o Xt + 0\Xt
1
+ ••• + 0kXtk
(11 26)
If x, is a white noise process with variance yXx(0) = ax it follows directly that the covariance between yt and xrj is 7xy U) — 0J7xx(O) or, for the crosscorrelation, Pxy{j) = 0jOxloy,
(11.27)
where var(T) = • That is, when x is a white noise process, the crosscorrelations have a clear interpretation as being proportional to the terms of the impulse response function. However, when the xt have autocovariances cov(xf, xtj) = yxxU), the set
190
Transfer functions: models and identification
YuleWalkerlike equations resulting from (11.26) are of the form: 7jry(0)
=
jSo7*r(0) + (3iyxx(l) +
+ (3kyxx(k)
>
7 xy(f) = /5o7xa(1) + /317^ (0) + ••• + (Skyxx (k — 1) 7xy(k)
—
Poyxx (k)
+
fiijxx (k
— 1) + ••• +
(3kyxx(0)y
Evidently, the relationship between the CCF and the impulse response function is now anything but obvious. Simplicity would be restored, however, if the x, could be transformed to a white noise process, and we now describe this form of prewhitening. 11.12 Suppose that xt may be described by an ARMA scheme 4>{B)xt = d(B)ur,
(11.29)
for convenience, we assume that the constant term is zero and assume any differencing has already been performed on X, such that xt = VdX, is a stationary process. From (11.29), if follows that u, =
HB) 6(B)
xt
(11.30)
is a white noise process. Rewriting (11.24), with a. = 0 for convenience, as yt = P(B)xt
(11.31)
we may multiply both sides of (11.31) by y * x would be complete, with suitable time delays. The identification process we have described depends upon the system being open; unfortunately, many economic systems involve feedback loops, which precludes the effective use of a transferfunction model. In such circumstances, we must turn to a multivari¬ ate system; we defer discussion of this until Chapter 14. However, there are many circumstances when an openloop model is appropriate and we now continue our investigation of transfer functions with some examples.
Linear systems
Explanatory variable,
193
Transfer function
/31B)
G
Dependent variable,//
M
Random error, s/
Fig. 11.9
ARIMA model
°t ^ Linear systems representation of the transfer function model
Example 11.6
A sample of n = 100 was generated from the model yt — 10 + 3/r + lxt
2
+ at
xt = xt~i + 8t — 0.62B2)xt. Step 3:
(11.42)
(11.43)
The initial residuals from (11.43) were then examined and produced the serial correlations given in Fig. 11.11. From these results, either AR(1) or MA(l) seems appropriate.
The initial fitted model and resulting changes will be presented in Section 12.10 after we have discussed estimation and diagnostic checking.
(a) 1.0
0.6

1 
1
3 
0 2 0 0 2 x xxwol xxxxx r~ xxxxx xxxx
7 9 1 1 13 
xxxxx xxxxx
15 
XXX XX XX XX XX
17 19 
(b) 1.0 3 
5 7 
9 11

13 15 17 19 
1.0
XX X X X XX XX XX XX XXX
5 
1 
0.6
~~111
0.6

T
I
0.2 0 0.2

0.6
IT111[—
1.0
xxxxxxxxxxx
X XXX XX X X X XX XX XXX X XXXX XXXXXXXX X X XX XX XXX XX
Fig. 11.11 (a) SACF and (b) SPACF for residuals from simulated series fitting preliminary model
Cars and the FT index
195
Cars and the FT index 11.16 We now turn to the more challenging task of applying these procedures to real data. We shall try to identify a transferfunction model for the cars series given in Table 11.2, using the FT index as an explanatory variable. Recall that the CCF obtained after detrending, given in Fig. 11.3, did not allow a clear interpretation. Step 1:
From (7.20), the prewhitening model for the FT index is Vxr = 5, + 0.2565fi.
Step 2:
Model (7.20) was used in preference to (7.33) since the differences are not large and (7.20) preserves eight more observations. The prewhitened series produced the sample CCF shown in Fig. 11.12. From this diagram we tentatively identify the transfer function component as y, = (coo — u\B — C0262)*;1.
Step 3:
(11.44)
(11.45)
The SACF and SPACF in Fig. 11.13 suggest either an A/?(l) or an MA{ 1) structure.
This model will be used as the starting point for our model development process in Section 12.12. It is, however, interesting to note that as a result of the prewhitening process we appear to have come up with a much simpler relationship than looked likely from our initial analysis in Section 11.8. Also, it should be noted that (11.45) does not involve differencing either series, although both series originally displayed nonstationarity; such series are called cointegrated, see Granger (1981) and Hendry (1986).
Fig. 11.12 only)
CCF for car production of FT index after prewhitening (positive lags
196
Transfer functions: models and identification
Fig. 11.13
(a) SACF and (b) SPACF for residuals of car production series
Several explanatory variables 11.17
Our identification process was predicated on the assumption that there was a single input variable whose impact was sufficient to dominate the error term. When there are two or more explanatory variables, such assumptions cannot be made. If we try to identify the transfer function component for each variable in turn, we risk getting incorrect results unless it fortuitously happens that these variables are independent one of another. There are three ways in which we may proceed. One is to develop a common filter for all inputs (Liu and Hanssens, 1982); the second is to find a separate
Several explanatory variables
197
prewhitening filter for each series and then to apply all the filters to each series (Box and Jenkins, 1976). A third possibility is to filter each series separately (Box and Haugh, 1976). None of these methods is wholly satisfactory in theoretical terms, but all seem to give reasonable results in practice unless there is a high degree of collinearity between the variables when even straightforward multiple regres¬ sion runs into difficulties.
Seasonal models 11.18 The extension to seasonal models is straightforward, at least in principle. We may specify the transfer function component as co(B)U(Bs) 8(B)A(BS)
Xtb +
a,,
(11.46)
and the noise component as d(B)Q(Bs) ^
a' = ^Cej$(Fj£'’
(11.47)
where the seasonal polynomials are of the form [}(5S)= 1 UiBstlLBSL
(11.48)
and so on, A being of order R, 0 of order Q, and of order P. The prewhitening and model selection procedures operate in exactly the same manner as before.
Comparison with other approaches 11.19 Model (11.41) clearly reduces to simple linear regression when u(B) = 8(B)= \ and 6(B) = (fr(B) = 1; recall that the constant term was omitted for convenience. However, a variety of other schemes proposed in the literature arise as special cases, as follows: regression with lagged dependent variables 8(B)yt = u(B)x, + £?; distributed lag models (Almon, 1965):
u(B)
HB)
Xt + £/',
regression with autocorrelated residuals:
y'=“(B)xi+mer11.20 Given the general form of the transfer function model and an identification procedure, we are now ready to consider the estimation and diagnostic phases of our modelbuilding paradigm. These are the topics covered in the next chapter.
198
Transfer functions: models and identification
Exercises 11.1
if yt —
£? +
Xt = Ct —
11.2
11.3
11.4
i + e?_2 + e?3
2e,_ i + 2e,_2,
where the e are independent and identically distributed with mean zero and variance one, find the form of the CCF. A company reports the advertising and sales figures shown in Table 11.4 for 12 successive weeks of operations. Find the sample CCF for lags k = 0, ±1,..., ±4. x, and yt are uncorrelated MA{q) processes with the same parameters 0\,...,dq, find the appropriate correlations between the cross¬ correlations using (11.9) and (11.10). Find the theoretical CCF for yt = xt + xt1 + e, Xt — bt — 0.85;1,
11.5
where e and 5 are independent with means zero and variances one. Then use the xmodel to prewhiten the .yseries and recompute the CCF. Generate sample series of length n = 100 from the scheme yt =
i + c,
Xt = o q q q q q q O
»rv
d d wd d wd d W 1
■
—_
< —
h — *—
"vt m> q o
Model ACF (SE) PACF (SE) CCF (SE)
—
1
Ov vo ^ VO OO —1 Tl OV VO ini fs! (N H O
Table 12.4
1 1 1 III
d d d d d d i i s~^/ i 1 i i
N
O
d d d o 1 ^ 1 w 1 '

1
VO
bD cS H)

1
1
ON VO — d
m
m in in (M — vo oo m
For cars leading FT Index; SEs are as for regular CCF.
(B)xtb + d(B)d(B)e„
(12.15)
where a* = a5(
1)0(1),
so that only a finite number of past values is required. If we are forecasting from the origin n, we follow our conventions in Chapter 8 and set
yn(j) = ynU)
if j > 0
yn+J
if j ^ 0
o
if j > 0
= E/i +j
if j ^ 0
=
£n(j) =
Clearly, when j ^ 0, we also set Xn(j) =
xn +j\
however, two possibilities arise for x when j > 0. xknown: The set of future xvalues may be known. This may arise because the forecasting exercise is based on historical data and the relevant obser¬ vations were withheld at the fitting stage or because x is a policy variable which can be specified by the investigator. Known xvalues may also be deemed to arise in ‘whatif’ forecasting, where it is desired to determine the effect of certain xpatterns upon y. This last possibility is particularly useful in assessing different policy options even though the forecasts are not testable in that the particular xsequence may never occur. In general, when the xvalues are known, we term this expost forecasting. xunknown: In ‘pure’ forecasting applications, it is more common for the xvalues to be unknown. In accordance with the openloop model formulated in Chapter 11 we may estimate the values for x using its ARIMA model and substitute these estimates, x„(j). The resulting forecasts for y are known as the exante forecasts. 12.15 A point worth making but often overlooked is that if x is truly representable as an ARIMA process, at least in principle, an identified univariate model for y will perform as well as the exante forecasts and, indeed, may be equivalent. To see this, consider the scheme y, = v{B)xt + i(B)e,
(12.16)
Xt = fx(B)bt.
(12.17)
It follows directly that y,= r*(B)a, + f(B)et;
(12.18) ♦
where v (B) = v(B)\px(B). Since e? and \B)ct which is representable as an ARIMA( 1,0,2) scheme. However, there is no guarantee that this implied process is invertible. This can be seen by taking, for example,
COo=l,
Wl = 2,
011=021 = 01 — 0.
Also, we note that, in general, the univariate scheme will be rather more complicated. Even so, it is well to keep this equivalence in mind when comparing univariate and transfer function forecasts; cf. Sections 12.2932.
Example 12.4
Generate the forecasts for t = 21, 22, 23 from the model: yt — 10 +
(2 + fi) (1 0.6fi)
1
+
(1
0.8fi) £'
(12.20) (12.21)
(1 0.5fi)*,= 1 +5, given 720=
17,
*20 = 4,
>>19
=15,
*19=3,
£20=1, *18 = 2.
First, from (12.15), we have
(1  0.6fi)(l  0.8fi)7, = 10(1  0.6)(1  08) + (2 + fi)(l  0.8fi)*r  i + (1  0.6fi)e,
(12.22)
or y, = 0.8 + \Ay,i  0ASy,2 + 2*,_i  0.6*,_2  0.8*,_3 + et 0.6e,_i so that ^2o(l) = 0.8+1.4(17)  0.48(15) + 2(4)  0.6(3)  0.8(2) + 0  0.6(1) = 21.4. In order to obtain >>20 (2), we first need *20 (1) = 1 + 0.5*20 = 3;
similarly
*2o(2)
= 1 + 0.5*2o(l) = 2.5 and so on. Then
720(2) = 0.8 + 1.4(21.4)  0.48(17) + 2(3)  0.6(4)  0.8(3)
= 23.8 and 720(3) = 0.8 + 1.4(23.8)  0.48(21.4) + 2(2.5)  0.6(3)  0.8(4)
= 23.848.
12.16
When the process is stationary, we may find the mean (or longterm
210
Transfer functions: estimation, diagnostics and forecasting
forecast) by setting B= 1 in (12.14) and replacing each term by its expected value.
Example 12.5
From (12.21), (1  0.5)^* — 1
or
gx = 2.
In turn, from (12.22), (0.4) (0.2)/iy = 10(0.4) (0.2) + (3) (0.2)2, so 0.08gy = 2 or /iy = 25.
Prediction intervals 12.17 In order to develop prediction intervals for the point forecasts, we must first evaluate the forecast mean square error (FMSE). Referring back to (12.18), we see that yn+k may be expressed as yn+k — (vo an+k + ••• + vkan + •••) + (foEn+k+ ■■■ + fken + •••)
(12.23)
following the same approach as in Section 8.8; i>j and ipj represent the coefficients Ef in the expansions of v(B) and \J/(B), respectively (we have dropped the from v for notational ease). Likewise, the A:step ahead exante forecast will be: yn(Jc)
=
(vkan
+
Pk+lOnl +
■••)
+ Wk£n + ik + lEn1 + •■•)
(12.24)
so that the forecast error is cn(k) = yn + k — yn(k) = (^0 Cln + k
+
••• +
Vk~\Cln+\)
+ (h^n + k + ■■• + fklCn + l). Since the as and es are zeromean whitenoise processes with variances cte , if follows that E[en{k)]= 0
(12.25) and (12.26)
Vk= V[en{k)]=aa S vj + ai 2 ff. 7=0
(12.27)
7=0
If the forecasts are developed expost, the as in (12.24) are specified so that the error reduces to en(k) = foCn+k + ••• + \Jyk  icn+1
(12.28)
as for the univariate case; see Section 8.8. Conditionally upon the xvalues therefore, the errors have zero expectations and the mean square error reduces to Vk  V[en(k)} = ol S' fj. 7=0
as in (8.13).
(12.29)
Prediction intervals
211
12.18 Using (12.27) or (12.29) as appropriate, or some intermediate form if some of the xs are known, we can specify an approximate 100(1  a) percent prediction interval in ususal way as y„(k) ±
(12.30)
Za/2(V/c)1/2
However, the apparent simplicity of this result must not blind us to the heroic assumptions made to obtain it. In addition to the standard assumptions of stationarity and normality made in order to arrive at the expressions, we are supposing that the estimation errors for the coefficients are negligible. Particularly when the series are short, the failure to include estimation errors will lead to overly narrow prediction intervals. The nature of this difficulty can be seen by considering the simple linear regression model yt = do + /3i (xt — x) + £?.
(12.31)
It is well known that the prediction mean square error for some new value xn+k is Vk = Oe 1
+ +
(Xn + k
X')
(12.32)
n
where Sxx = E (xt  x)2. In (12.30), we are, in effect, ignoring the second and third terms in (12.32) and failing to make a small sample correction by using the ^distribution in place of the normal. Further, as noted in Section 8.9, the errors may not be normally distributed. 12.19 Returning to the car production and FT index series, we refitted the selected model using only the first 11 years data. The revised estimates were: y,= 193.3 + 0.474*, + (1 O.lllBy'e, Vx, = 5, + 0.265,i. The forecasts for 1971, together with 95 percent prediction intervals, are as shown in Table 12.5 The expost forecasts were based on the actual values of the FT index in 1971. Since the relationship between the two series is not overly strong, these forecasts show little improvement over the exante forecasts. Such findings are not uncommon and we discuss this further in Section 12.31. The actual values fall well inside the prediction intervals, reflecting the rather stable behaviour of the series at this time.
Table 12.5
1971
Q1 Q2 Q3 Q4
Actual
Expost
Exante
404 416 435 456
448 451 458 459
452 439 428 420
± ± ± ±
78 99 110 116
± 82 ±102 ± 112 ± 118
212
Transfer functions: estimation, diagnostics and forecasting
Time series regression 12.20 We now step back a little from the general models we have been discussing and ask how the approach of this chapter relates to regression analysis, a topic we touched on briefly in Section 11.18. The simple model yt — fto + PiXt + £/
(12.33)
is clearly a special case, as noted in Section 12.5. It main virtues are its simplicity (in specification and in estimation) and the ability to carry out wideranging diagnostic tests for departures from the assumptions of the leastsquares model, such as heteroscedasticity, autocorrelated errors and (with several variables) multicollinearity. In addition, observations that are outliers or have considerable influence on the estimates may be identified (cf. Cook and Weisberg, 1982). The key weaknesses are, of course, the total failure to recognise any temporal dependence in the relationship between x and y or to capture any autocorrelation among the errors. A very brief historical review of the steps taken to remedy this will indicate why transfer functions are a natural framework for modelling. The issue of correlated errors was first tackled by Cochrane and Orcutt (1949). They proposed adding to (12.33) a noise model of the form
et = pcti + ut
(12.34)
e, = (1 ppy'u,.
(12.35)
or
Combining (12.33) and (12.35) we see that the CochraneOrcutt scheme is a special case of the transferfunction model. A somewhat different scheme is the mixed autoregressiveregressive model P
k
yt = a+ Yj
+ S PiXu + e,.
7=1
(12.36)
(=1
Durbin (1960) extended the MannWald theorem to show that the leastsquares estimators for model (12.36) are consistent and asymptotically unbiased even for nonnormal errors. Further extensions of this result justify the leastsquares estimators considered in Sections 12.24. In econometrics particularly, the need arose to incorporate multiple lags of input variable into the model. This was handled by the introduction of distributed lag model such as yt = oc +
CP
(1 +8B)
xt + ef;
(12.37)
see Griliches (1967) and Almon (1965). The various extensions of this approach clearly lead to the general transfer function form for the xrelationship. The combination of the ideas underlying (12.34) and (12 36) also leads us to the general transferfunction formulation. 12.21 Approaching the issue from the other direction, a major strength of regression analysis is its set of diagnostic procedures. Perhaps the bestknown
Time series regression
213
procedure for time series regression is the test for residual autocorrelation, proposed by Durbin and Watson (1950; 1951; 1971). The DurbinWatson statistic d is defined in terms of the residuals by d=
£7=2 (e,  e,~i)2
(12.38)
£7=i ef This, apart from the end effect, is 2(1 — ri), where r\ is the first serial correlation of the residuals. Alternatively, it can be looked on as the sum of squares of first differences of the residuals divided by their sum of squares, as in the variatedifference method. If the residuals are highly positively cor¬ related, the value of d is near zero; if they are uncorrelated it is near 2. 12.22 The distribution of d, which is necessary for an exact test of the null hypothesis that the errors are uncorrelated, is complicated. However, d lies between two statistics dv and dv whose distributions are related to those of R. L. Anderson’s distribution of the first serial correlation coefficient (cf. Section 6.11). The tables in Appendix B give the significance points of dv and dv for the 1 percent and the 5 percent levels, for n = 15 to 100, and k' (the number of regressors in the regression equation apart from the constant term) from 1 to 5. If an observed d falls below the tabulated value of dv, we reject the hypothesis that the original residuals are uncorrelated, and if it falls above the tabulated value of dv the hypothesis is accepted. For cases of indecision where d falls between dv and dv, various procedures have been suggested. Durbin and Watson (1971) favour approximating the distribution of d by a statistic based on dv, namely d* = a + bdv.
where ct and b are constants chosen so that d and d have the same mean and variance. Ali (1984) presents a more accurate, but more complex, approxi¬ mation based on Pearson curves. For quarterly series, Wallis (1972) provides an extension of d to examine fourthorder autocorrelation, replacing the numerator in (12.37) by £(e, e,4)2. .. _ . 12.23 A drawback of the test procedure based on the tables in Appendix B is that it does not apply to models which contain autoregressive terms as well as explanatory variables. To test for autocorrelated errors in this case, Durbin (1970) recommends use of the statistic 1/2
h = ( 1 id)
(12.39) 1  nVi
where V\ is the estimated variance of 4>i in the model yt = a + 4>iyt1
+ £ PjXjt + £f
(12.40)
For large samples, h is approximately N(0, 1). This test clearly requires nV, < l; should this condition be violated, an instrumental variables test due to Wickens (1972) may be used. 12.24 The spirit of these procedures is similar to that underlying the model identification procedures of Sections 11.13—16. The difference lies in the fact that the regression model is prespecified, so that exact tests are possible. By contrast, our procedures have used the data to screen a large number of
214
Transfer functions: estimation, diagnostics and forecasting
potential models so that exact tests are seldom possible; although see Anderson (1971) for exact tests of the order of an AR scheme. 12.25 Notwithstanding the lack of exact procedures, the conclusion that arises from this discussion is that the transferfunction approach is to be preferred unless (a) (b) (c)
the series are too short for full identification procedures to be effective; or the model is already wellspecified and of simple form; or there are too many explanatory variables to be handled by existing identification procedures.
Case (a) offers scant hope to the researcher unless accompanied by a strong dose of (b). In case (c), the best approach might appear to be to run a stepwise regression with a reasonable set of lagged values for each explanatory variable. This idea was developed by Coen et al. (1969). It is open to the objection that it may be unsound when we seek to identify a model involving a number of highly (auto)correlated variables, as noted by Box and Newbold (1971). Indeed, it was this problem that led us to develop more structured identifi¬ cation procedures. However, a relatively minor modification opens up a possible approach along these lines that is reasonably straightforward to implement. 12.26 The essential idea behind Coen et aids (1969) work was to detrend and deseasonalise the variables before looking for relationships between them. We can modify this to the extent of prewhitening each series beforehand; in essence, this is the approach of Box and Haugh (1976) described in Section 11.16. A stepwise search can then be performed using the prewhitened series. This procedure tends to overemphasize the autoregressive structure in y at the expense of structural relations between y and x; nevertheless, it often provides an effective screening for situations like (c) above and gives rise to a reasonable starting model of the form y> =
«+2
VjiXj,ti
+
(12.41)
where ^ is the prewhitening filter for y and the sum is taken over all inputs (/') at selected lags (/).
Example 12.6
We shall now consider the data on UK imports and related marcoeconomic variables given in Table 12.6. Two alternative models will be developed, one a ‘purely predictive’ re¬ lationship based upon (4/,Sf_/, Dti, Fti, i= 1,2,3,4)
(12.42)
and a scheme allowing contemporaneous variation which adds (S, D, F,) to the set (12.42). ’ ’ Straightforward stepwise regressions, using backward and forward elimi¬ nation with threshold values of F= 2.0 both to enter and to leave, were run using MINITAB. Other decision rules may, of course, produce different results as with any identification procedure. The prewhitening models are (with Q
Table 12.6 UK imports for each quarter of the years 19601970 Year
Quarter
/
1960
1 2 3 4
1 1 1 1
382 417 432 438
1 2 3 4
1 1 1 1
1 2 3 4
1963
D
F
149 168 161 150
370 342 332 307
1 1 1 1
088 081 103 146
457 403 389 379
153 102 35 37
327 331 329 313
1 1 1 1
184 205 241 217
1 1 1 1
408 426 460 442
7 12 53 6
316 361 336 350
1 1 1 1
197 221 222 189
1 2 3 4
1 1 1 1
414 472 520 540
3 36  1 159
353 398 414 416
1 1 1 1
070 247 278 317
1964
1 2 3 4
1 1 1 1
611 612 632 659
132 170 141 185
417 429 439 440
1 1 1 1
373 417 459 476
1965
1 2 3 4
1 1 1 1
581 643 672 686
92 83 112 89
453 426 428 417
1 1 1 1
486 474 471 529
1966
1 2 3 4
1 1 1 1
722 681 726 642
86 76 86 6
447 468 409 375
1 1 1 1
500 509 551 552
1967
1 2 3 4
1 1 1 1
777 787 779 850
60 64 1 61
392 428 467 492
1 1 1 1
587 671 645 621
1968
1 2 3 4
1 1 1 1
948 903 945 937
 104 76 85 101
550 408 434 460
1 1 1 1
721 693 714 722
1969
1 2 3 4
1 1 1 2
992 980 966 024
122 76 57 91
400 425 444 432
1 1 1 1
710 672 705 690
1970
1 2 3 4
2 2 2 2
026 130 078 197
 16 116 117 112
433 457 476 480
1 1 1 1
646 759 726 755
1961
1962
S
I = Imports of goods and services S = Value of physical increases in stocks and work in progress D = Consumer expenditure on durables F= Gross domestic fixed capital formation
Data from Monthly Digest of Statistics, see also Gudmundsson (1971). All data are seasonally adjusted. Figures in £ million.
216
Transfer functions: estimation, diagnostics and forecasting
based on 12 lags):
V/, = 26.6  0.472V/f_x + eu, Qa(U) = 3.2, s = 44.1 (3.20)
St 28.2 + 0.3415,i +0.331 S,_2 + e2t, Qa (10) = 9.5 (2.31)
(2.25)
VA = 2.8 + e3t  0.358e3>,1, Qa(U) = 8.0 (2.45) VF,= 15.6 + £4, 0.302^4,ri, 04(11)= 14.7. (2.02) The tentatively identified models are summarised in Table 12.7; stepwise regression was employed both with and without prewhitening. The pre¬ whitening involves differencing all but S, and this should be borne in mind in comparing the results. Since the pure autoregressive scheme for /, yields 5 = 45.3, it is evident that stepwise procedures based on only the lagged variables produce only marginal gains. When current values are added, some further improvement is possible although the choice of terms is not clearcut. Note that the stepwise schemes without prewhitening have coefficients that leave the model outside the stationary region (lagged only) or close to the boundary (lagged plus current). However, Tsay and Tiao (1984) have shown that such estimates remain consistent so this does not present a problem at the estimation stage. In this example, the models selected without prewhitening are quite similar to those chosen after prewhitening. This is largely due to the dominance of the autoregressive elements, and prewhitening should be employed in general.
Automatic transfer function modelling 12.27 At this stage, it would be possible to press ahead with the BoxJenkins and LiuHanssens procedures for model identification described in Section 11.16. In fact, the relatively short series and high correlations mean that no dramatic changes are achieved by such attempts. Instead, we shall apply these procedures in an ‘automatic’ mode using the transferfunction component of AUTOBOX. The logic of this program for the univariate case was shown in Fig. 7.7; the process is essentially similar for transfer functions; see Reilly and Dooley (1987). 12.28 The results from the AUTOBOX analyses are presented in Table 12.8. Because of its multiple prewhitening operations, the BoxJenkins model is based on only 37 observations so the smaller value for s is open to question. The common filter method of LiuHanssens failed to produce an estimable model, probably because S alone was a stationary process. With F only, Liu—Hanssens produced rather similar results to the stepwise scheme after prewhitening. Further analysis is certainly possible and the reader is encour¬ aged to try to improve on these results. However, the principal conclusion is that the level of imports is strongly related to its level in the two previous quarteis and somewhat related to current levels of other economic indicators 12.29 An economist’s approach to modelling imports would start from a
m
O
on Tt
O G
O tT
Tf
co a
03
X)
co £ ]Pyp2 = 72
(14.31b)
and eventually 0fp)7ip + 02(p)72p + 03(p)73/,+ ••• + 0ip)7o = 7p,
(14.31c)
where 77 = 7/; see (11.5). When the system is VAR(p), it follows that = 0,
j > p,
r > p.
(14.32)
Equations (14.31) are analogous to the partial autocorrelations in the univari¬ ate case and can be reduced to correlation form if we set Pj = D 17/D"1,0r* = D10rD.
(14.33)
Whittle (1963) has given a recursive method of solution for equations (14 30)Sec,W r? deVe'Td th.e multiva™te of Durbin's algorithm; see Section 5.19. If we replace the yj by their sample estimates, we can, at least in principle, use the resulting partial correlation estimates to identify an appropriate model. y
Model identification
237
Model identification 14.15 The sample autocorrelation and partial autocorrelation coefficients are asymptotically normally distributed and their standard errors may be determined by an extension of the results in Sections 6.10 and 6.12. As a first rough approximation, we may take SE — n~x/1 for all the sample coefficients. Model identification procedures based on the sample autocorrelation matrices have been developed by Tiao and Box (1981) among others. The sheer volume of information makes the arrays of coefficients difficult to assess. To simplify the presentation, we may code the values as follows: >2 SE, write as + <  2SE, write as within ±2SE, write as •
Example 14.6
Quenouille (1968) gave data for five variables connected with the hog market in the USA: hog sales Oh), hog prices (y2), corn supply (y3), corn prices (.y4) and farm wages 0's). The data refer to the period 18671948, n = 82 observations in all. The series are listed in Appendix A, Table A. 14. Several analyses of these series have appeared
Fig. 14.1
US hog number (yd and hog prices (y2) on logarithmic scale (from
Quenouille, 1 968)
238
Multivariate series
since Quenouille’s original study, notably Box and Tiao (1977) and Tiao and Tsay (1983, 1989). As our aim is expository, we shall consider only the submodel based upon y\ and y2. Following Quenouille and later authors, both series are transformed to logarithms before undertaking any analysis. The two series are plotted in Fig. 14.1. The effects of the Great Depression are clearly seen as is the general upward trend in prices. The series on farm wages might be used as a price deflator in a more complete analysis. Beyond that, it can be seen that the members of the series tend to move together and there is a natural interaction between them. Our analysis is based on y\(t) and yi{t) concurrently, although it should be noted that there are some arguments in favour of using yi{t) and y2(t + 1); see Box and Tiao (1977). The correlations, for lags 03, are summarised in Table 14.1 together with their arrays of coded indicators. There is a slow decay in the crosscorrelations but a much more rapid decay in the partials and we conclude that a reasonable initial model is VAR (2).
Table 14.1 Crosscorrelation and partial correlation matrices for US hog supply (yi) and hog prices (j/2)
(a) Crosscorrelations Lag 0 yi yz
Coding
1.00 0.63 + +
0.63 1.00 + +
(b) Partial correlations Lag 0 yi yz
Coding
1.00 0.63 + +
0.63 1.00 + +
1 0.87 0.60 + +
2 0.68 0.85 + +
1 0.87 0.60 + +
0.72 0.64 + +
3 0.66 0.66 + +
2 0.68 0.85 + +
0.19 0.34 • +
0.62 0.69 + +
0.60 0.54 + +
3 0.06 0.32 . •
0.03 0.09
0.01 0.06
•
•
Estimation 14.16
Parameter estimation by least squares or maximum likelihood follows the
1 P™clPles devel°Ped in Sections 7.79 and 12.24. Hillmer and Tiao (1979) developed an exact likelihood method for VARMA schemes; Spliid (1983) gives another procedure that appears to produce considerable savings in computer time. Ansley and Kohn (1985) give an efficient procedure for checking that estimates satisfy invertibility conditions. » Akaike (1976) developed an estimation procedure using the state space approach and the AIC criterion; see Sections 7.2829. Harvey and Peters (1984) provide an algorithm based upon the Kalman filter; they note that it is easier to ensure that the parameter estimates are admissible for structural models than it is to check for invertibility in VARMA schemes.
Estimation
239
Example 14.7
(Example 14.6 continued) The model was fitted using the MTS program; see Appendix D. The estimates for the VAR{2) scheme, with /values in parentheses, are as shown in Table 14.2. Coefficients not significant at the 90 percent level were dropped and are denoted by (• ); the model was reestimated after removing these redundant parameters. The covariance matrix was estimated as
218
0.84
“°'84')xl03 9.81/
and the residual crosscorrelation matrices are as shown in Table 14.3. These results suggest that it may be desirable to include a firstorder moving average component. However, none of the additional parameters proves to be significant and we conclude that the VAR(2) representation is adequate. 14.17 In the example we used /tests on individual coefficients to decide whether or not to retain each term in the model. Others (e.g., Akaike, 1976) have recommended the use of AIC or some other information criterion. Tiao and Tsay (1989) develop the notion of scalar canonical models (SCMs) which can be used, in conjunction with a canonical correlation analysis, to arrive at a parsimonious representation. As noted earlier, the number of parameters increases with the square of the number of series, and it is clear that the estimation of even moderatesized systems (k ^ 10) depends upon the analyst’s skill in discarding large numbers of parameters. When some form of prior information is available, this may be used to impose restrictions on the parameter values; this approach has been adopted by Litterman (1986) for his Bayesian VAR, or BVAR, system. 14.18 An alternative approach is to develop the model in blockrecursive form so that yi depends on x, y2 on (x, yi) and so on; x may well include lagged values of both yi and y2. Such an approach reduces the estimation problem
Table 14.2 02
0i 0.099 (3.32) 1.009 (10.1)
0.885 (8.11)
y\ yi
0.178 (1.70) 0.621 (4.13)
0.379 (3.66)
Table 14.3
0.27
0.10 0.14
0.06 0.00
0.00 0.15
0.03 0.06
vr\ O
0.04
1 1 o o b b
0.15 1.00
*
1.00 0.15 +
3
2
1
0
*
lag
+
240
Multivariate series
from a large one to several of more manageable size, but does require expert knowledge of the system in order to make the appropriate block specifications. A further possibility is to develop models for each yj conditionally upon the other ^values. That is, we develop univariate schemes
°n
yj
O'/O'^y),*);
and then recombine these univariate schemes for final efficient estimation. This approach is discussed in Ord (1983).
Forecasting 14.19
Forecasts for vector schemes may be generated in the same way as for single series following the conventions given in Sections 8.10 and 12.14. Prediction intervals or regions may then be generated by the appropriate multivariate extensions of the arguments given in Sections 8.9 and 12.1718.
Example 14.8
(Examples 14.6, 14.7 continued) After deleting the last four observations, the model was reestimated, yielding ?
/
0.677
\ — 0.390
0.132\
1  (
1.142/’
02
‘
•
\0.800
\
0.462/’
The /tstep ahead forecasts are given by yt(h) = 0iy?(/z  l) + 02yt(h  2), where y,(h) = y,h, h ^ 0. The results are given in Table 14.4. The forecasts Table 14.4 One to four period ahead forecasts for hog data Hog supply (>>,) 90 percent prediction interval Year
Actual
Forecast
Lower
Upper
1945 1946 1947 1948
774 787 754 737
885.8 850.7 827.1 813.7
823.3 778.2 748.8 729.9
953.0 930.0 913.6 907.0
Hog price (y2)
90 percent prediction interval
1945 1946 1947 1948
Actual
Forecast
Lower
Upper
1 1 1 1
1 1 1 1
967.7 883.6 875.8 871.5
1 1 1 1
314 380 556 632
130.0 123.4 146.4 151.5
319.4 428.2 500.6 521.5
Forecasting
241
are not particularly impressive as supply drops sharply at the beginning of the four year period with an accompanying sharp increase in price, a shift reflected in the revised parameter estimates. Given the dramatic changes in market conditions occasioned by the end of World War 2, these results are not surprising. 14.20 As a second example, we reanalysed the data on car production and the FT share index, previously considered in Section 12.12. The correlation matrices are given in Table 14.5, giving a clear indication in favour of a VAR{ 1) scheme. The estimates for the initial model appear in Table 14.5(c); they suggest only a contemporaneous relationship between the series. It should be noted that the
Table 14.5 Crosscorrelation
and
partial
production (yi) and the FT share index
correlation
matrices
for
UK
car
(y2)
(a) Crossicorrelations 0
y\ yi
LOO 0.56 + +
Coding
0.70i 0.51 + +
0.49 0.87 + +
0.80 0.53 + +
0.56 LOO + +
3
2
1
Lag
0.57 0.45 + +
0.36 0.71 + +
0.30 0.56 + +
(b) Partial correlations Lag
0
y\ yi
1.00 0.56
Coding
0.80 0.53
0.56 1.00 +
+
+
+
3
2
1 0.49 0.87
+
+
+
+
0.17 0.07
0.09 0.07
0.21 0.23
0.14 0.00
•
• '
(c) Initial estimates /0.796 01 =
l
•
/l 603 W = \ 307 0 .873,/ •
307\ 609/
(d) Residual crosscorrelations 0
y] yi
LOO 0.31
Coding
+
+
•
+
+
*
0.31 1.00
0.14 0.03
3
2
1
Lag
0.18 0.07 •
0.26 0.29 + +
0.14 0.03 •
0.10 0.11 •
(e) Final estimates
0i =
/0.750 0.873r /values:
and
/l 487
234\
\
609/
234
0.09 0.10 '
242
Multivariate series
series were not differenced as it has been argued by several authors (e.g. Tiao and Box, 1981) that one should not difference individual components prior to vector modelling. The residual crosscorrelations, in Table 14.5(d), suggest the possible ad¬ dition of a firstorder moving average component. The final estimates, in Table 14.5(e), show the lag 1 MA component for cars on the FT index to be marginally significant. We may conclude that the FT index has some weak explanatory power for the car production figures, but there is no flow the other way. This concurs with the earlier analysis in Section 12.12. 14.21 VARMA schemes provide structural models, a benefit that may be further enhanced by adding explanatory variables. However, whether sub¬ stantial forecasting gains materialise from using these models is more open to question. McNees (1986) compared the forecasting performance of Litterman s BVAR method with that of several major econometric models; the results are generally encouraging. As more empirical comparisons become available, the strengths and weaknesses of vector methods will be better understood. As we have noted previously, the method of analysis ultimately selected will depend on both the purpose of the study and the subject matter.
Exercises 14.1 A VAR{ 1) process has
Is it stationary? 14.2 Invert the VMA(\) scheme with
14.3 14.4
0.4
0.4'
v0.2
0.6
as far as p  3. (Check that it is invertible first.) Find the crosscorrelation function for the VMA (1) scheme with £ = I and 0i as in Exercise 14.2. Carry out a VARMA analysis for one or more of the following data sets • (a) UK imports (Table 12.5) (b) US flour prices (Appendix A, Table A12) K f^,dia Pinkham sales and advertising (Appendix A, Table A13) (d) US hogs (Appendix A, Table A14)
15
Other recent developments
Introduction 15.1 It is inevitable that an introductory book such as this must treat many topics only briefly or not at all. Yet it is often these omitted areas that represent the most exciting parts of the subject in terms of new developments. In order to overcome this difficulty, at least partially, the present chapter gives brief details of recent developments in a variety of areas, and provides references for further reading. 15.2 Our discussion of seasonal adjustment procedures in Chapter 4 was presented before we had considered ARIMA models. The fusion of classical adjustment procedures with formal modelling techniques is one of the major recent advances in time series. These developments are discussed in Sections 15.57. The ARIMA models considered so far have assumed complete sets of observations taken at regular intervals. Further, it has been assumed that regular or seasonal differencing will be sufficient to induce stationarity and, finally, that leastsquares procedures will generate satisfactory estimates. All of these assumptions may break down; Sections 15.811 discuss missing values and unequal spacing, fractional differencing, and robust estimation. 15.4 A further major area that is attracting increasing attention is that of nonlinear timeseries, which we considered only indirectly when using simple transformations. In the frequency domain, the bispectrum may be used to check for nonlinearity and a variety of timedomain methods have been developed to handle nonlinearities. These are described in Sections 15.1221. The chapter concludes with a brief discussion of multidimensional processes in
15.3
Sections 15.2225.
Seasonal adjustment 15.5 Our discussion of seasonal adjustment procedures in Chapter 4 revealed both a strength and a weakness in moving average procedures. The strength
244
Other recent developments
lay in their local nature, ensuring that observations in the distant past would be adjusted slightly or not at all when a new observation came to hand. Their weakness was the high variability associated with the adjustments to the most recent observations. This led to the development of the XllARIMA method (Dagum, 1975) whereby the series is first forecast using a seasonal ARIMA model; the XI1 smoothing operations are then applied to the extended series. This procedure has been found to produce smaller revisions to adjusted values than the pure XI1 method (e.g. Dagum 1982). Further, Cleveland and Tiao (1976) showed that the additive Xll procedure is closely approximated by an ARIMA scheme, thereby bringing the debate on seasonal adjustment pro¬ cedures into the mainstream of modern time series analysis. 15.6 A second issue of major importance to official statisticians is whether adjustments should be concurrent; that is, whether the adjustment factors should be recalculated every time a new observation comes to hand. In general, most official procedures have been based upon periodic revisions, whereby seasonal adjustment factors are computed once a year and then applied to new observations as they become available. Kenney and Durbin (1982) showed that concurrent adjustment provides major benefits in terms of reducing the magnitude of revisions to the seasonally adjusted series; they also demonstrated that XllARIMA is to be preferred over pure Xll. McKenzie (1984) has also demonstrated the benefits to be derived from concurrent adjustment. Pierce (1983) provides a summary of a report produced for the US Federal Reserve, whose recommendations include moving to concurrent adjustment. Many government agencies are now moving to adopt these recommendations: an encouraging sign that sound statistical studies can lead to changes of policy.
15.7 A comprehensive review of current issues in seasonal adjustment is provided by Bell and Hillmer (1983a,b); this paper is followed by comments from several discussants and a reply by the authors.
Missing values and unequal spacing 15.8 Our discussions have always assumed complete data records and equal ime intervals between observations. Modest departures from these assump¬ tions such as the different lengths of the months can be ignored if the recorded variable is a stock (e.g. total ownership of cars) or adjusted to perday if it is flow (e.g. production of cars). Such modifications have been found to work well in practice (Granger, 1963). When unequal spacing of the observations is more marked, a different approach must be employed. For example, if the underlying process is a continuous ^(1) scheme and observations occur at times y(tk) = a{Uk)y{tk~\) + e(u/c), where Uktktk1,
a(uk) = exp(auk),
and E{c(uk)} = 0,
var [£(«*)} = o2uk.
'
(15.1)
Robust estimation
245
Model (15.1) reduces to the regular ^47?(1) scheme when Uk = 1 and = exp(o;). Quenouille (1958) was a pioneer in developing such AR schemes, which received relatively little further attention until 1980 or thereabouts. Since then the Kalman filter has been shown to provide an elegant framework within which such models can be developed and fitted iteratively. In essence the state space component can be updated repeatedly between observations and the observation component is updated as and when another observation becomes available. This same process can be applied to handle regularly spaced series with missing values. Jones (1985) provides an excellent introduction to this procedure. Wright (1986) provides an extension of Holt’s method for forecasting based upon irregularly spaced data.
Fractional differencing 15.9 When we defined the differencing operation in Section 3.14, we always took the power d to be an integer. However, in principle, we could consider fractional values of d, 0 < d < 1. To see the effects of this operation, consider the model (1  B)dyt = Vdyl = £t.
(15.2)
When d = 0, (15.2) is a white noise process and when d= 1 it corresponds to the random walk. For fractional d, we may express (15.2) in random shock form as yt = (\B)dtt = (1 + dB + ,
— £r + det i H
 B2 + ••• le, d(d+ 1) —
£/2 F
(15.3)
The ratio of successive coefficients in (15.3) approaches one, rather than strictly less than one in AR schemes; cf. (5.27). Thus (15.2) allows for strong persistence in a time series. Yet, the model is stationary for d < leading to var(^) =
q2T(l 2d)
(15.4)
[T(\d)}1/2 ‘
Thus, model (15.2) may describe a stationary process with a very slow rate of decay in the coefficients, termed a long memory process. Conversely, fractional differencing with d < 1 may be more appropriate as a way of removing apparent nonstationarity. For further details, see Granger and Joyeux (1980) and Hosking (1981).
Robust estimation 15.10
Throughout this book we have recognised that extreme observations may have a considerable impact upon estimation procedures and subsequent
246
Other recent developments
forecasting performance. Two types of extreme observation may be con¬ sidered, as noted in Section 13.4. In general, innovation outliers (10) may even be beneficial in estimation but additive outliers (AO) can seriously distort the estimation process. Unfortunately, as indicated in the review by Martin and Yohai (1985), the usual Mtype robust estimators (cf. Huber, 1981) are ineffective for the AOcase. Various modifications are possible, but these tend to work only for AR schemes. Instead, Martin and Yohai (1985) recommend the use of robust smoothercleaners whereby extrema are identified and adjusted by means of a modified Kalman filter operation. This notion is quite close to the idea of testing for outliers and adjusting by intervention variables employed in Section 13.13, except that the robust procedure operates in a smooth fashion. In both cases, the resulting estimators may lack desirable largesample properties such as Fisher consistency, but this is likely to be outweighed in practice by the ability to avoid the bias caused by large outliers of the AOtype.
15.11 Estimates of the spectrum are equally affected by additive outliers. An iterative scheme based on the smoothercleaner approach may be utilized here also; see, for example, Martin (1983).
Nonlinear models 15.12 With the exception of multiplicative seasonal models of the HoltWinters type in Section 8.23, our development has relied heavily on the assumption that processes are linear. Unfortunately, nature is not always so understanding. We now explore several ways in which nonlinear models may be developed, although it has to be admitted that our understanding of such processes is far from complete. There are four principal ways in which we can examine nonlinear processes: (1) (2) (3) (4)
consider nonlinear functions of past yt and ef; develop intrinsically nonlinear schemes that change as certain boundaries or thresholds are crossed; introduce random coefficients which enable use to make successive linear approximations to nonlinear schemes; apply transformations which induce linearity (as with the Box—Cox transform introduced in Section 5.4).
In the following sections we review each of these approaches in turn but first we consider the bispectrum which provides a frequency description for nonhnear processes.
Bispectra 15.13 The regular spectrum uses only the second order, or covariance structure of a time series. Frequencydomain analysis may be extended to higher orders using polyspectra, developed by Brillinger (1965). The case of
Bilinear models
247
greatest interest is the bispectrum which may be defined in terms of the thirdorder moments in the following way. Consider a stationary time series y(t) with E{y(t)} = n,
cov{y(t),y(t s)} = ys,
and define the thirdorder moments y(si, s2) = E{u(t)u(t  Si)u(t  s2)\,
(15.5)
where u(t) = y(t)  /i. The concept of stationarity used now extends the assumptions that ii{.y6(t)} < , and that all expectations depend only on relative positions in the series, as in (15.5). Then the bispectrum is given by w(cti,ct2)
=^~2
47r
yOi, s2)exp(iaiSi  ia^),
(15.6)
where the sums range over  °o < si, s2 < °o. The bispectrum may be estimated from the sample version of (15.6), provided an appropriate twodimensional window is used; see, for example, Brillinger and Rosenblatt (1967a,b). A test for nonlinearity using the bispec¬ trum is described in Subba Rao (1983).
Bilinear models 15.14 Several attempts have been made to develop time series models as nonlinear functions of past yt and e? values, but many of these foundered on the problem of being able to specify reasonable conditions for stationarity. However, moving average models for proportions, such as ——= e,  OiCt1 yt1
(15.7)
yt = yt1 + EfjYi  Q\£t\yt\
(15.8)
or suggest the class of bilinear models:
yt — iyti
—
 4>pytp ~
— 0i£ri— ■■■— dqe.tq
m
k
+ E E Puytjeth
05.9)
7=1 1=1
which may be denoted by BL(p, q, m, k). This class has been studied in detail by Granger and Andersen (1978) and Subba Rao (1981, 1983).
Example 15.1
Consider the BL( 1,0,1,1) scheme yt = (t>yti +(3ytiEti + £t,
(15.10)
where the error terms are independent and identically distributed Af(0, 1) variables and et is independent of ytj for all j > 0. Then it follows that E(yt) = Pl(l) = ii V{yt) =
(1  0 + 2ff2 + 20j32) (l0)(l02/32)
(15.11) (15.12)
248
Other recent developments
and the autocorrelations are (15.13) Secondorder (asymptotic) stationarity follows from the expressions for the mean and variance; we require that 0 < 1
and
02 + /32 < 1.
15.15 General conditions for asymptotic stationarity are given by Subba Rao (1981); also, he demonstrates that the BL(p, 0,p, 1) scheme has the same autocorrelation structure as the ARM A (p, 1) process. This can be seen from Example 5.9 and (15.13) when p= 1. In consequence, it is not possible to distinguish between BL and ARMA schemes solely on the basis of their secondorder properties. Granger and Anderson (1978) recommend consider¬ ation of the ACF of e}, the squared residuals. For linear models, all these autocorrelations have nearzero expectations, but features of interest will show up for nonlinear schemes. Higherorder frequencydomain properties may be examined using the bispectrum defined in Section 15.13; see Subba Rao (1983). Estimation and prediction procedures are considered by Gabr and Subba Rao (1981).
Threshold autoregression 15.16
Many systems may be subject to structural change of a more or less predictable type. For example, an economy may be operating under conditions of either labour shortage (full employment) or labour surplus (underemploy¬ ment). Again, the population dynamics of a wildlife population are different when resources are plentiful than when resource are scarce or under pressure from that population. In such cases, it is natural to consider using different models for the two regimes, switching from one to the other when some threshold is crossed; hence the use of threshold autoregessive (TAR) models.
Example 15.2
A firstorder TAR with two submodels is
yt — 0ii37 i + £f, yt = 012371 + e,,
^ c,
(15.14a)
if yt_ i > c;
(15.14b)
if
c is the threshold parameter and the usual assumptions are made concerning the error process. The process is stationary provided 0V < 1, j=l,2. Extensions to more submodels and higherorder schemes are straightforward Details and examples are given in Tong and Lim (1980), including a discussion ^ u0X ^ata (APPend*x Table A4). The frequency domain properties of TAR schemes are considered by Pemberton and Tong (1983).
Random coefficients 15.17 The statespace models of Chapter 9 allow coefficients that are timedependent; in that discussion, the coefficients were usually taken to be
Transformations and growth curves 249 nonstationary although that is clearly not necessary. In a similar vein, but with a rather different emphasis, we may consider random coefficients models such as the following (1) scheme:
yt = 4>uyt\ + £f 4>u = i + yt,
(15.15a) (15.15b)
where e, ~ IIN(0, a2) and yt ~ IIN(0, oj2) and (er, ys) are independent for all t and 5. Model (15.15) may be extended in the usual way to include both higherorder lags and explanatory variables. A comprehensive discussion is given in Nicholls and Pagan (1985); our discussion follows their general development. Equations (15.15) reduce to
yt — 0i yt1 + £r + yjtyt1,
(15.16)
from which it follows that the process is stationary if and only if 0 1 + W2 < 1.
(15.17)
Leastsquares estimators may be obtained by the following twostep process: (1)
estimate 0i by the usual LS estimator 01 = 2
(2)
ytyti/'Z y2t\\
(15.18)
compute the residuals
ut = yt 0iTfi
(15.19)
Vt — u2 = ao + ot\y2~i + £*,
(15.20)
and fit the regression model
where ao = o2 and a\ = co2 since
E(u2 \ yt1) = oz + w2yj~i.
(15.21)
A negative value for a\ implies that we set ty, ± z(o2 + w2y})W2\
(15.22)
these intervals clearly widen as yt departs increasingly from the mean value of zero, reflecting the relative lack of knowledge of the process.
Transformations and growth curves 15.19
The BoxCox transform, introduced in Section 5.4, is a useful dataanalytic device for stabilising variances which also tends to produce more nearly linear processes. However, the power transform is clearly restricted in the types of nonlinearity it can approximate. In particular, there is no
250
Other recent developments
convenient way of allowing for upper or lower bounds on the range of the random variable, yet such bounds often occur in practice. A good example is the logistic growth curve yt = L0 +
(L  Lp) 1 + exp{/3(f
 to)} ’
(15.23)
where Lo, L and /3 are all positive. This curve has the limits yt> Lo
as t * 
oo
and yt^> L
as t >
oo;
also there is an inflection at t = to when yt = (L + Lo)/2. The logistic has been used used widely as a model for growth processes since it has finite upper and lower bounds and a natural interpretation for as a rateofchange parameter. Suitably scaled, the logistic is very similar to the normal distribution function except in the extreme tails. Differentiating (15.23) with respect to t we obtain, after some manipulation, ~ = &{yt Lo){L yt).
(15.24)
In this form the logistic is often used to describe epidemiological processes, usually with L0 = 0. In this context, the derivative represents the rate of increase in infectives, y represents the number of infectives, L — y the number of susceptibles and /3 the rate of new infections per contact between infectives and susceptibles. The rate peaks at y = L\2, corresponding to the inflection mentioned above. The logistic has also been used to describe the rate of technological change (Martino, 1983) and market penetration for new pro¬ ducts (Bass, 1969).
15.20 The problem that the logistic poses for the timeseries analyst is how to incorporate the random error component. For example, Bass (1969) used (15.24), approximating the derivative by a difference and reparameterising to the form Vyt = Po +0iyti + fayti + et.
(15.25)
Unfortunately, this model often fails to perform satisfactorily in this form as a peak is reached and then yt is projected to decline, contrary to the known nature of the process (Heeler and Hustad, 1980). 15.21 In growth curve studies, the fourparameter form (15.23) is often used Wlth “ additive error term satisfying the usual assumptions; that is, e (0, a ). This appears to work quite well in those circumstances where overall fit is more critical than extrapolation. For time series studies, it is more appropriate to use local trends than global ones, as for linear processes. When processes are cumulative, it is better to consider the increments over time, thereby using the previous level as the new starting point. A variety of such schemes have been suggested, see Ord and Young (1989), but one particular approach that seems promising is to consider the transformation
h‘ = h(yt) = ln{(y7  Ll>)l(Ly  yj)}
(15.26)
Multidimensional processes
251
and to set ht = (3o + di t + e/,
(15.27)
Vht = (81 + e".
(15.28)
or
When 7=1, (15.26) corresponds to the logistic curve and when 7 = 0, to the Gompertz curve: y, = L exp(exp(/30  8n)),
(15.29)
another popular model for such processes. Thus, (15.26) offers a family of transformations similar to the BoxCox transform for the unbounded case; see Section 5.4. Further, (15.27) can easily handle unequally spaced data if the error variance is made proportional to the time between observations; (15.28) can be adapted in similar fashion. An attraction of (15.28) is that it allows the investigator to consider the whole range of ARMA schemes for modelling the error process. For further details, see Ord and Young (1989).
Multidimensional processes 15.22 Processes may be defined spatially as well as, or in place of, the time dimension. Assumptions of stationarity may then be applicable in some or all dimensions. For example, consider a twodimensional spatial process observed on the regular grid of locations (/, j),
/=1,...,M,
7=1,
An array of autocorrelations may be defined by p(si,s2) = y(si, s2)ly(0,0),
(15.30)
where 7 represents the covariances. The sample analogue is r(si, s2) = c(si, s2)lc(0, 0)
(15.31)
where the covariance terms are given by (N si)(N — s2)c(sus2) = S2 u(i, j)u(i  su j  s2),
(15.32)
where u(i, j) = y(i, j)  y and the sums are taken over z' = si + l to M, y — 52 + 1 to N2. Guyon (1982) showed that the corrections (Ar;5,) are necessary for two and higherdimensional schemes. The spectrum is vv(ai, a2) = 7^7 2E 7(51,52)exp(iai5i  ia2s2),
4ir
(15.33)
where the sums ranges over — 00 < 51, s2 < °o. As in other cases, the sample analogue must be smoothed in order to obtain consistent estimators. Ripley (1981, pp. 817) gives several examples of spatial spectra. 15.23 Considerable attention has been devoted to isotropic or direction invariant spatial processes which depend only on the distance between two points, y(xi) and y(x2), defined as d\2 = {(*n — x2i)2 + (xn — x22)2]1/2, where xj = (xn, x,2).
(15.34)
252
Other recent developments
The autocovariance term then becomes c(d) = E u(Xi)u(xj)lN(d),
(15.35)
where the sum is taken over all pairs (/, j) such that du = d or, because of data limitations, all pairs that fall in some range dL s$ d ^ dv, there being N(d) such pairs. The detailed analysis of spatial dependence, for both regular lattice and irregularly located data, is discussed in Cliff and Ord (1981). 15.24 Spatiotemporal processes that assume both spatial and temporal dependence are considered by Aroian and his coworkers in a series of papers (see Aroian, 1980) and by Pfeiffer and Deutsch (1980, and references cited therein); see also the special issue of Communications in Statistics, series B (1980) devoted to this topic. 15.25 It is often much more difficult either to justify the assumption of spatial stationarity or to induce stationarity by suitable filtering operations. It is then more appropriate to consider the process as multivariate time series schemes (Bennett, 1979) or a spatial econometric model (Anselin, 1988).
Appendix A Data sets and references
Table A.1 Values of series ut = 1.1 uti  0.5ut 2 + et where et is a rectangular random variable with range —9.5 to 9.5, rounded off to nearest unit Number of term
1 2 3 4 5
6 7
8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 Source:
Value of series
Number of term
7 6 6 4 3 4 5 1 10 10 6 4 4 7 2 6 17 24 17 4 1 5
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
Kendall, 1946.
Value of series 4 5 9 4 4 3 9 4 8 6 3 2 0 1 3 3 1 8 3 8 10  16
Number of term
Value of series
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
 13 1 6 4 11 15 9 8 4  1 4 7 11 0 1 0 5 11 8 3 5
Table A.2 Gross domestic product, at constant factor prices, for United Kingdom (1980= 100) 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967
48.0 50.2 52.6 53.8 53.3 56.3 58.8 60.9 61.4 62.5 62.4 65.2 68.6 69.9 70.7 73.1 77.3 79.6 81.0 82.2
1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987
85.8 87.5 88.9 90.7 93.2 98.1 92.8 92.6 96.1 97.2 100.0 102.1 100.0 98.9 99.9 103.7 105.5 109.5 112.8 117.6
Source: Central Statistical Office.
Table A.3 Wolfer sunspot numbers: yearly, based on 100 observations 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794
101 82 66 35 31 7 20 92 154 125 85 68 38 23 10 24 83 132 131 118 90 67 60 47 41
1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819
21 16 6 4 7 14 34 45 43 48 42 28 10 8 2 0 1 5 12 14 35 46 41 30 24
1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844
16 7 4 2 8 17 36 50 62 67 71 48 28 8 13 57 122 138 103 86 63 37 24 11 15
Source: Schuster (1906). Discussions: Box and Jenkins (1976), who also cite earlier studies.
1845 1846 1847 1948 1949 1850 1851 1852 1953 1954 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869
40 62 98 124 96 66 64 54 39 21 7 4 23 55 94 96 77 59 44 47 30 16 7 37 74
Table A.4 Numbers of lynx trapped in Mackenzie River district of NW Canada from 1821 to 1934 182140 269 321 585 871 1 475 2 821 3 928 5 943 4 950 2 577 523 98 184 279 409 2 285 2 685 3 409 1 824 409
184160
186180
151 45 68 213 546 033 129 536 957 361 377 225 360 731 638 725 871 119 684 299
236 245 552 623 311 721 254 687 255 473 358 784 594 676 251 426 756 299 201 229
1 2 2
1 2 2 2
1 3 6 4
1 1 2 1
18811900
2 2 4 2
1 4 3
469 736 042 811 431 511 389 73 39 49 59 188 377 292 031 495 587 105 153 387
190120
192134
758 307 465 991 313 794 836 345 382 808 388 713 800 091 985 790 674 81 80 108
229 399 132 432 574 935 537 529 485 662 000 590 657 396
1 3 6 6 3 1
1 2 3 3 2 3
1 2 3 2 1
1 1 2 3
Source: Elton and Nicholson (1942). Discussions include: Tong (1977), Pemberton and Tong (1983).
Table A.5 US Government Treasury Bill Rate 197480: monthly average figure (per cent) Year Month
1974
1975
1976
1977
1978
1979
1980
Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.
7.76 7.06 7.99 8.23 8.43 8.14 7.75 8.74 8.36 7.24 7.58 7.18
6.49 5.58 5.54 5.69 5.32 5.18 6.16 6.46 6.38 6.08 5.47 5.50
4.96 4.85 5.05 4.88 5.18 5.44 5.28 5.15 5.08 4.93 4.81 4.35
4.60 4.66 4.61 4.54 4.94 5.00 5.15 5.50 5.77 6.19 6.16 6.06
6.45 6.46 6.32 6.31 6.43 6.71 7.07 7.04 7.84 8.13 8.79 9.12
9.35 9.27 9.46 9.49 9.58 9.05 9.26 9.45 10.18 11.47 11.87 12.07
12.04 12.81 15.53 14.00 9.15 7.00 8.13 9.26 10.32 11.58 13.89 15.66
Source: US Dept of Commerce Discussion: Kendall, Stuart and Ord (1983). Mote* There was a policy change in late 1979; it is suggested that initial analysis be confined to 197479.
Table A.6 Sales Data for company X
1965 1966 1967 1968 1969 1970 1971
Jan.
Feb.
Mar.
Apr.
May
June
July
Aug.
Sept.
Oct.
Nov.
Dec.
154 200 223 346 518 613 628
96 118 104 261 404 392 308
73 90 107 224 300 273 324
49 79 85 141 210 322 248
36 78 75 148 196 189 272
59 91 99 145 186 257
95 167 135 223 247 324
169 169 211 272 343 404
210 289 335 445 464 677
278 347 460 560 680 858
298 375 488 612 711 895
245 203 326 467 610 664
Source: Chatfield and Prothero (1973). Discussion: In and following Chatfield and Prothero (1973); Raveh (1985).
Table A.7 UK unemployment, quarterly, 194980 Quarter Year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
1
2 1.45 1.38 1.92 2.07 2.78
2.12 1.86 2.14 3.67
2.12 1.71 1.52 2.65 2.69
2.68 2.80 3.29 4.20 3.33 2.73 3.37 5.73
6.00 6.30 6.03 6.13 10.17 12.73 13.60 13.33 13.67 12.13
3
1.20 1.21
1.02 1.22
1.56
2.21
1.27 2.16
2.29
2.01
1.68
1.46 1.46
1.46
2.02 2.51 1.71 1.44 1.30 2.33 2.49 2.40 2.62 3.45 3.83 2.80 2.43 3.66 5.57 5.93
6.00 5.53 6.46 10.70 12.60 12.87 12.83 13.33 11.87
Source: Central Statistical Office. Discussion: Koot, Young and Ord (1989).
2.11 2.21 1.60 1.39 1.33 2.26 2.43 2.48 2.70 3.82 3.87 2.56 2.70 4.67 6.30 6.87 6.57 5.97 8.17 12.13 13.73 12.87 13.13 12.96 11.97
1.12 1.37 1.55 2.49 2.04 1.67 1.79 2.46 2.17 1.59 1.49 2.30 2.57 2.51 2.55
2.68 4.13 3.50 2.26 2.75 5.00 5.80 6.30 5.83 5.63 8.90 12.27 13.44 12.89 13.40 11.87 11.67
Appendix A
257
Table A.8 Thousand hectolitres of homeproduced whisky, UK
1980 1981 1982 1983 1984 1985 1986 1987
Jan.
Feb.
Mar.
Apr.
May
June
July
Aug.
Sept.
Oct.
Nov.
Dec.
34.6 53.5 31.5 10.4 10.7 17.1 12.3 11.7
59.1 67.9 57.4 50.6 54.7 33.4 35.7 32.5
82.5 50.5 61.9 73.6 78.3 90.7 91.1 52.1
9.3 6.6 7.1 8.7 7.5 7.6 7.6 14.0
12.2 12.3 12.4 16.2 12.9 16.8 14.4 21.5
19.5 18.5 21.9 23.4 17.6 21.7 22.3 30.6
28.5 24.0 19.7 20.9 23.5 25.7 24.3 32.3
29.3 29.6 26.8 27.2 26.9 29.6 28.1 29.0
35.1 34.6 30.9 31.4 26.9 32.2 34.5 35.2
58.9 53.8 49.7 50.2 56.2 56.3 53.8 53.6
79.8 73.3 72.4 82.8 73.8 78.2 77.9 80.6
52.8 52.6 55.9 49.3 44.8 51.6 54.2 53.0
Source: HM Customs and Excise. Note: Figures prior to April 1983 included other homeproduced spirits.
258
Appendix A
W I w tj
00000000000000
w HJ w cu
ooooooooooo
00000000000000
0000000000000000
0 0 0 — 0
0£
< £ C/5 _05
_Q
e flj E >>
CD
\_ CD
>
_o
C
"B, E
so ©
fe ^ s
g
260
Appendix A
Table A. 10 Retail sales of variety stores in US, 196779 Jan. 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
296 331 345 370 394 393 463 490 489 503 427 438 483
Oct.
Nov.
429 464 494 553 521 593 642 684 684 596 616 648
499 525 548 621 596 649 737 749 781 665 673 713
Feb. 303 361 364 378 411 425 459 490 511 537 450 458 483
Mar. 365 402 427 453 482 503 554 598 612 636 573 548 593
Apr. 363 426 445 470 484 529 576 615 623 560 579 584 620
May 417 460 478 534 550 581 615 681 726 607 615 639 672
June 421 457 492 510 525 558 619 654 692 585 601 616 650
July 404 451 469 485 494 547 589 637 623 559 608 614 643
Aug. 436 476 501 527 537 588 637 694 734 608 617 647 702
Sept. 421 436 459 536 513 549 601 645 662 556 550 588 654
Dec.
1 1 1 1 1 1 1 1 1 1
915 939 022 122 069 191 279 245 386 229 199 261
Source: US Bureau of the Census. Discussion: Bell (1983). Notes: The intervention variable is defined with respect to April 1976 when a large chain (W T Grantl
=s
bee" ad'd
di
_
,_h^h
i
1
i
mmO^'^trr)o^0r'00a\O^(NrnTfio'^>t^5p2\O t^ht^r'OOoooooooooooooooo»0\ONO\0\oo\0\0\0\0\0
TlTlNO"tl^ONN'tm'nm'o^'0'000Nm5(Sccifi
ooTr^rf'fS(Nm^ovou~i'oa\>^fS^rr)';J'',oorr>r',:i'f'r~r1 — — —i i i i i 7 i 7 i i i i i
NO\OO^NninOO(ioMfn'J!,irt Oo>fNOcot000 v©roOfOo>rtoOVO'.Ot'~ rooo>n 0(Nm ‘ Monitoring renal transplants: an pphcation of the multipass Kalman filter. Biometrics, 39, 867878. mith R. L. and Miller, J. E. (1986). A nonGaussian statespace model and application to prediction of records. /. Roy. Statist. Soc. B 48 7988 PT^, AaTk 33n434?radUati0n °f the rateS 0f SiCkneSS ;"d mortaIi‘>'
Splnd, H. (1983). A fast estimation method for the vector autoregressive movm^average model with exogenous variables. J. Amer. Statist. Ass , IS, Stigler, S. M. (1986).
The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press, Boston. f tuart, A and Ord. J. K. (1987). Kendall’s Advanced Theory of Statistics volume 1, 5th edition. Charles Griffin, London rustics,
“s’o^1943);2044255thCOry °f bili"ear
Se™S m°ddS' ^ ^
References
289
Subba Rao, T. (1983). The bispectral analysis of nonlinear stationary time series with reference to bilinear time series models. In Brillinger, D. R. and Krishnaiah, P. R. (eds) q.v. pp 293319. Sweet, A. L. (1985). Computing the variance of the forecasting error for the Holt—Winters seasonal models. J. Forecasting, 4, 235—243. Texter, P. A. and Ord. J. K. (1988). Automatic Forecasting Using Explanatory Variables: A Comparative Study. Penn State University, Dept, of Manage¬ ment Science Working paper. Texter, P. A. and Ord. J. K. (1989). Automatic selection of forecasting methods for nonstationary series. Int. J. Forecasting, 5. Theil, H. (1966). Applied Economic Forecasting. North Holland, Amsterdam. Theil, H. (1971). The Principles of Econometrics. John Wiley & Sons, New York, and Chichester, England. Tiao, G. C. and Box, G. E. P. (1981). Modelling multiple time series with applications. J. Amer. Statist. Ass., 76, 802—816. Tiao, G. C. and Tsay, R. S. (1983). Consistency properties of least squares estimates of autoregressive parameters in ARMA models. Ann. Statist., 11, 856871. Tiao, G. C. and Tsay, R. S. (1989). Model misspecification in multivariate time series (with discussion). J. Roy. Statist. Soc. B., 51, 157—213. Tong, H. (1977). Some comments on the Canadian lynx data. J. Roy. Statist. Soc. A., 140, 432436. Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data (with discussion). J. Roy. Statist. Soc. B., 42, 245292. Tsay, R. S. and Tiao, G. C. (1984). Consistent estimates of autoregressive parameters and extended sample autocorrelation function for stationary and nonstationary ARMA models. J. Amer. Statist. Ass., 79, 8496. Velu, R. P., Reinsel, G. C. and Wichern, D. W. (1986). Reduced rank models for multiple time series. Biometrika, 73, 105118. Wald, A. (1955). Selected Papers in Statistics and Probability, McGrawHill, New York. Wallis, K. F. (1972). Testing for fourthorder autocorrelation in quarterly regression equations. Econometrica, 40, 617636. Wallis, K. F. (1974). Seasonal adjustment and relations between variables. J. Amer. Statist. Ass., 69, 1831. Wallis, W. A. and Moore, G. H. (1941). A significance test for time series analysis. J. Amer. Statist. Ass., 36, 401409. Whittle, P. (1953). The analysis of multiple stationary time series. J. R. Statist. Soc. B., 15, 125139. Whittle, P. (1963). On the fitting of multivariate autoregressions, and the approximate canonical factorization of a spectral density matrix. Biomet¬ rika, 50, 129134. Wichern, D. W. (1973). The behaviour of the sample autocorrelation function for an integrated moving average process. Biometrika, 60, 235239. Wickens, M. R. (1972). A comparison of alternative tests for serial correlation in the disturbances of equations with lagged dependent variables. University of Bristol Working Paper. Winkler, R. L. and Makridakis, S. (1983). The combination of forecasts. J. Roy. Statist. Soc. A., 146, 150157.
290
References
Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6, 324342. Working, H. (1960). Note on the correlation of first differences of averages in a random chain. Econometrica, 28, 916918. Wright, D. J. (1986). Forecasting data published at irregular time intervals using an extension of Holt’s method. Management Science, 32, 499510. Young, P. and Ord. J. K. (1985). The use of discounted least squares in technological forecasting. Technological Forecasting and Social Change 28, 263274. ’ Yule, G. Udny, (1927). On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers. Phil Trans A 226, 267298. ' ” Yule, G. Udny, (1971). Statistical papers: Selected by Alan Stuart and M. G. Kendall. Charles Griffin, London.
Author Index
Abraham, B. 130, 132, 133 Akaike, H. 117,147,238,239 Ali, M. M. 213 Almon, S. 197, 212 Ameen, J. 132 Andersen, A. P. 247, 248 Anderson, A. 140 Anderson, O. D. 64, 82, 256 Anderson, R. L. 81 Anderson, T. W. 17, 214 Anselin, L. 252 Ansley, C. 106, 107, 238 Armstrong, J. S. 123, 139, 140, 219 Aroian, L. A. 252 Bachelier, L. 40 Bartlett, M. S. 78, 165, 173, 182 Bass, F. M. 250 Bates, J. 140 Beguin, J. M. 102 Bell, W. R. 6, 47, 226, 228, 244, 260 Bennett, R. J. 4, 252 Beveridge, W. H. 166 Bhansali, R. J. 117, 173, 180 Bhattacharya, M. N. 230, 263 Birkhoff, G. D. 52 Blackman, R. B. 170, 175 Bliss, C. I. 12 Box, G. E. P. 17, 52, 69, 72, 96, 106, 111, 112, 197, 214, 218, 221, 225, 237, 238, 242, 254, 264 Brillinger, D. R. 17, 180, 246, 247 Brown, R. G. 131 Brown, R. L. 150 Bucy, R. S. 144 Burman, J. P. 47, 155, 178, 179 Burn, D. A. 78 Chan, H. 58 Chanda, K. C. 64 Chatfield, C. 97, 134, 180, 256 Cleveland, W. P. 244 Cleveland, W. S. 48, 96, 179, 228
Cliff, A. D. 4, 252 Cochrane, D. 212 Coen, P. G. 214 Cook, R. D. 202, 212 Cooley, J. W. 174 Cooper, J. P. 218 Cowden, D. J. 35, 269 Cox, D. R. 4, 26, 52 Dagum, E. 244 Daniell, P. J. 170 Daniels, H. E. 81 De Jong, P. 236 Deutsch, S. J. 252 Devlin, S. J. 48 Dewey, E. R. 10 Dickey, D. A. 90, 91, 92 Dixon, W. J. 81 Dooley, K. 216 Duncan, D. B. 145 Durbin, J. 45, 49, 62, 82, 106, 150, 151,155, 212, 213, 221, 244 Efron, B. 79 Elton, C. 255 Evans, J. M. 150
Fildes, R. 141, 145 Findley, D. F. 79, 140 Ford, E. D. 265 Foster, F. G. 22 Friedman, M. 24 Fuller, W. A. 17, 66, 76, 78, 90, 91, 92, 104, 125, 160 Funkhauser, FI. G. 1 Gabr, M. M. 248 Gardner, E. S. 125, 132, 133, 134, 141 Gardner, G. 151 Gleissberg, W. 20 Gomme, E. D. 214 Gourieroux, C. S. 102
292
Author index
Granger, C. W. J. 139, 140, 146, 164, 168, 184, 195, 244, 245, 247, 248 Gray, H. L. 102 Grether, D. M. 179 Griliches, Z. 212 Gudmundsson, G. 215 Guyon, X. 251 Hannan, E. J. 17, 102, 173, 234 Hanssens, D. M. 196 Harrison, P. J. 132, 134, 145, 151 Harvey, A. C. 14, 113, 145, 147, 148, 151, 154, 155, 201, 221, 238 Hasan, T. 177 Hasza, D. P. 92, 125 Hatanaka, M. 164, 184 Haugh, L. D. 197, 214 Hayya, J. C. 58 Heeler, R. M. 250 Hendry, D. F. 14, 195, 218 Hibon, M. 139 Hill, G. 141 Hillmer, S. C. 6, 47, 201, 228, 238, 244, 260 Holbert, D. 79 Holt, C. C. 133 Horn, S. D. 145 Hosking, J. R. M. 24 Huber, P. 246 Hughes, A. O. 168 Hustad, T. P. 250 Jenkins, G. M. 17, 69, 72, 96, 106, 197, 254 Jevons, W. S. 2 Johnston, J. 14, 232 Jones, R. H. 151, 245 Joyeux, R. 245 Kalman, R. E. 144 Kang, H. 58 Karavellas, D. 180 Kelley, G. D. 102 Kendall, M. G. 17, 19, 20, 22, 24, 30, 32, 38, 45, 47, 49, 55, 61, 62, 76, 78, 80, 81, 83, 104, 117, 120, 145, 146, 160, 165, 166, 179, 182, 214, 218, 253, 255 258 Kenny, P. B. 244 Khintchine, A. Ya. 52 King, P. D. 12 Kohn, R. 238 Koot, R. S. 259 Krishnaiah, P. R. 17, 180 Ledolter, J. 130, 132, 133 Leipnik, R. P. 81 Levene, H. 20 Lewis, P. A. W. 174 Libert, G. 141 Lim.K. S. 248
Litterman, R. B. 239 Liu, L. M. 196 Ljung, G. M. Ill Lusk, E. J. 140
Madow, W. G. 81 Makridakis, S. 139, 140 Mann, H. B. 104 Martin, R. D. 106, 175, 246, 265 Martino, J. P. 123, 250 McGee, V. 140 Mclntire, D. D. 102 McKenzie, E. 134, 141 McKenzie, S. K. 244 McNees, S. K. 242 Miller, H. D. 4, 26 Miller, J. E. 23 Monfort, A. 102 Moore, G. H. 20, 21 Moran, P. A. P. 80 Morettin, P. 62 Morris, M. J. 146 Murphy, M. J. 49 Nelson, C. R. 58, 218 Nerlove, M. 179 Newbold, P. 17, 106, 107, 139, 214, 218 Nicholls, D. F. 249 Nicholson, M. 255 Orcutt, G. H. 212 Ord, J. K. 4, 17, 19, 20, 22, 24, 30, 38, 45, 47, 49, 58, 61, 62, 76, 78, 80, 104, 120, 140, 141, 145, 146, 160, 165, 179, 182, 219, 240, 250, 252, 255, 259, 265 Pagan, A. R. 249 Pagano, M. 77 Palda, K. S. 170, 176, 180 Pemberton, J. 248, 255 Peters, S. 151, 238 Pfeiffer, P. E. 252 Phillips, G. D. A. 151, 201 Phillips, P. C. B. 81 Pierce, D. A. 111,112,203,244 Pierse, R. G. 151 Plackett, R. L. 144, 150 Playfair, W. 2 Poirer, D. J. 105 Priestley, M. B. Prothero, D. L. Quenouille, M. H. 245, 261, 264
17, 160, 173, 179, 184 256 79, 81, 111
Rao, M. M. 17 Raveh, A. 256 Reid, D. J. 139 Reilly, D. P. 119,120,216
280 237
Author index Reinsel, G. C. 264 Richard, J. F. 14, 218 Ripley, B. D. 4, 251 Rissanen, J. 102 Roberts, S. A. 135 Robinson, P. M. 175 Rosenblatt, M. 247 Sargan, J. D. 77, 83, 166 Schuster, A. 254 Schwartz, G. 117 Shiskin, J. 47 Slutzky, E. 65 Smith, A. F. M. 150 Smith, R. L. 23 Son, MS. 79 Spencer, J. 31 Spliid, H. 238 Stevens, C. F. 145 Stigler, S. M. 2 Stuart, A. 17, 19, 20, 22, 24, 30, 32, 38, 45, 47, 49, 55, 61, 62, 76, 78, 80, 104, 117, 145, 146, 160, 165, 179, 182, 218, 255 Subba Rao, T. 247, 248 Sweet, A. L. 134 Texter, P. A. 120, 141, 219 Theil, H. 113,232
293
Tiao, G. C. 99, 100, 105, 201, 216, 221, 225, 234, 237, 238, 239, 242, 244, 260, 262, 264 Todd, P. H. J. 154 Tong, H. 248, 255 Tsay, R. S. 99, 100, 105, 216, 234, 238, 239, 262, 264 Tukey, J. W. 170, 174, 175 Velu, R. P. >
264
Wald, A. 104 Wallis, K. F. 47, 213 Wallis, W. A. 20, 21 Watson, G. S. 213 Weisberg, S. 202, 212 Welch, P. D. 174 West, M. 150 Whittle, P. 106, 236 Wichern, D. W. 88, 264 Wickens, M. R. 213 Winkler, R. L. 140 Winters, P. R. 133 Working, H. 58 Wright, D. J. 245 Yohai, V. J. 106, 246 Young, P. 250, 259, 265 Yule, G. U. 3, 65
Subject Index
Abbe’s contributions 81 adaptive forecasting, see forecasting additive models 42, 134 aggregation 58, 146 A1C, see information criteria alias, in spectrum analysis 164 amplitude, in spectrum analysis 156, 183 angular frequency 156 ARIMA schemes 6872, 101, 1068, 13540, 146, 1625, 190 multivariate 232 Autobox 11920, 2167, 2267 autocorrelation 52 function (ACF) 54, 56, 60, 63, 72 generating function 678 sample ACF 77, 837 automatic model selection 11620 141 2267 autoregressive models 5563, 72 78 801 827, 1046, 1623 multivariate 2302 threshold 248 autoregressive (intergrated) movingaverages, see ARIMA schemes Backshift operator 37, 40 Bartlett window 173 Bayesian forecasting, see statespace models best linear predictor 123, 1248 Beveridge wheat series 1668 bias, in SACF 79 in spectrum 169 bibliography 17 BIC, see information criteria bilinear models 2478 bispectra 2467 bootstrap 79 BoxCox transform 524, 249 BoxPierceLjung statistic 1112, 114 Brown’s method 1312, 139 Calendar problems 5, 2278 Census Xll method 478, 179, 184, 244
central limit theorem 779 centred averages 36 checking, see diagnostics closed system 192, 230 coherence 183 cointegration 196 complex demodulation 177 continuous time series 4 correlogram, see sample ACF cospectrum 183 cross correlation 1813 function (CCF) 1823, 190 crossspectra 1839 cyclical fluctuations 14 Damping of SACF 83 Daniell window 170 data sets and analyses airline traffic 9, 23, 24, 48, 969, 1146 1379, 1524, 2259 barley yields 7, 19, 39 births, by hour of day 12 Financial Times Index 13, 26, 39 935 10810, 114, 1857, 1956, 2057 hog prices, in USA 23742, 264 immigration into USA 10, 534 1713 229 imports into UK 2148 motor vehicles 1847, 1956, 2057 sheep population 8, 22, 3234, 39 924 102, 10910, 114, 1357 vegetable prices 26, 446 decomposition 14, 155 delay 191 diagnostics 1106, 2016 difference equations 74 difference operator 37, 40 differencesign test 21 differencing 3739 discounted least squares 1313 discrete time series 4 distributed lag models 198, 212 duality between AR and MA schemes
657
Subject index DurbinLevinson algorithm DurbinWatson test 213
62, 100, 236
Echo effects in spectrum 173 end effects in trend fitting 35 ergodic theorem 52 errors, autocorrelated in regresssion 198 errorreducing power 32 ESACF, see extended autocorrelations estimation 10410 ARIMA schemes 1501 interventions 225 maximum likelihood 1068 robust 2456 transfer functions 2001 EWMA, see exponential smoothing exante forecasts 208, 210, 219 explanatory variables 189, 196—7 exponential smoothing 12933, 1368 expost forecasts 208, 210, 219 extended autocorrelations 99102 Filtering 173, 1756, 197—8 forecasting 12243, 180 evaluation of methods 13541, 219—20 A"steps ahead 140 multivariate 2402 transfer functions 20611 forecast mean square error (FMSE) 1237, 142 Fourier analysis 157—61 Fourier transform 161 fast 1745 fractional differencing 245 frequency domain 156 Gain 189 in crossspectra 183 General election 15 geometric series 734 goodnessoffit 1123,203 Hamming 170 Hanning 170 harmonic analysis 15761, 1779 harmonic series 61 Harrison’s method 1345, 139 Holt’s method 133, 1367, 142, 155, 245 HoltWinters method 1334, 1379, 1423, 246 hyperparameters 151 Identification 17, 82102, 112, 11620 multivariate 2378 of transfer functions 1908 with interventions 223—4 impulse response function 189 information criteria 1178, 238 integrated models 69 intensity, of spectrum 159
295
intervention analysis 2217 inverse autocorrelations 968, 179—80 invertibility 67 multivariate 233—5 isotropy 251 Kalman filter 14455, 238, see also structural modelling Kendall’s tau 22 kernel, in smoothing spectrum
169
Lagged relationships 198 lag window 16971 length of a time series 6 linear systems 1925 logistic curve 250—1 longterm forecasting 122—3 MannWald theorem 1045, 212 Markov scheme 558, 162 mediumterm forecasting 122 missing values 244—5 mixed ARIMA schemes, see ARIMA schemes model checking, see diagnostics model identification, see identification modelling paradigm 16 moving average 2837, 128—9 effects of 36 models 637, 73, 827, 101, 106, 163 multidimensional processes 2512 multiplicative models 42, 712, 134 multivariate series 23042 . autocorrelations 235—6 estimation 238—40 forecasting 2402 identification 2378 Nonlinear models 24651 nonstationarity 8892, 1278 notation 15 Nyquist frequency 164 Object of time series analysis open system 192, 230 oscillation 15 outliers 202, 2267
14
Parsimony, principle of 69 partial autocorrelation 62 function (PACF) 54, 57, 61, 63, 72 sample PACF 812, 83—7 Parzen window 170 peaks and troughs 1820 periodogram 166 phase, in spectrum analysis 183 phase length 20 plan of book 16 polynomial trends 2831 portmanteau tests 1112, 114,2034
296
Subject index
prediction, see forecasting prewhitening 1756, 190, 214, 2168 Quadrature spectrum
183
Random coefficients model 2489 random effects mode! 160 randomness, tests of 1826 random walk 3940, 57 records test 22 regression 198,211,2126 and seasonality 213 autojorrelated errors 198, 2123 random coefficients 2489 stepwise 2147 reliability, in forecasting 13541 residuals, analysis of, see diagnostics robust estimation 2456 Seasonal adjustment 458, 179, 2434 seasonal models 42, 49, 702, 969 128 1335, 1423, 1515 transfer functions 198 seasonality 14, 4150, 1779 tests for 2426 serial correlation 4, 7682 sampling theory 7781 see also sample ACF shortterm forecasting 122 sidebands, in spectrum 173 SlutzkyYule effect 645 spatial models 4, 2512 spectral density function 1614 smoothing 16879 spectral window 16973 spectrum 156 analysis 15680 evolutionary 179 examples of 1658 multivariate 1839 sampling theory 165 Spencer’s smoothing formulae 313, 39, 65 transfer function of 1779 statespace models 14455
and ARIMA 1468 recursive updating 14850 stationarity 5, 512, 63, 112 see also nonstationarity stationary processes 5571 stochastic processes 4, 26 structural models 1515, 2278 Tapering 175 tau test for trend 22 technological forecasting 123, 24951 tests of randomness 1826 tests for trend 213 in seasonal series 23 threshold autoregression 248 timedependent parameters 2489 transfer functions 1756, 189200 diagnostic 2016 estimation 2001 forecasting 20611 form of 1902 transformations 524, 24951 trend 14, 2740, 151 trend removal 324, 435, 173 truncation point 169 Tukey window 170 turning points test 1820 Unequal time intervals unidentifiability 232
56, 2445
Variance stabilisation 524 variate difference 38 vector schemes, see multivariate Wavelength 156 white noise 162 windows, lag and spectral
16971
Xll method, see Census Xll method Yule scheme 5962, 163 YuleWalker equations 602, 1045, 190 multivariate 236
• library
NIH
Amazing Research. Amazing Help.
http://nihlibrary.nih.gov
10 Center Drive Bethesda, MD 208921150 3014961080

This third edition of Time Series is a thorough re the classic text by the late Sir Maurice Kendall. T 1496 00488 6175 statistical analysis of time series has undergone changes in the last twenty years. Keith Ord presents an integrated uptodate treatment that uses some of the classical methods, such as dataanalytic devices, before proceeding to consider modern approaches to model building. The mathematics is kept at a reasonable level to make the book accessible to undergraduates and postgraduate researchers in many applied fields (such as economics, heliology, oil prospecting etc).
Key Features ★ Both ARIMA and structural approaches to model building are discussed. ★ Now includes worked examples using real and simulated data. ★ Contains end of chapter exercises; sixteen data sets for further analysis; and a discussion of spectral methods, including Fast Fourier transforms, intervention analysis and transfer function models. ★ Includes an introduction to multiple time series and new sections on traditional forecasting procedures; comparative evaluations; and automatic model selection procedures.
About the Author Keith Ord is Professor of Management Science and Statistics at Pennsylvania State University.