Statistics for astrophysics: Time series analysis 9782759827411

This book is the result of the 2019 session of the School of Statistics for Astrophysics (Stat4Astro) that took place on

242 98 5MB

English Pages 258 Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Statistics for astrophysics: Time series analysis
 9782759827411

Table of contents :
Organisers
Lecturers
Acknowledgments
List of Participants
Table of Contents
Foreword
1 - AN OVERVIEW OF THIS BOOK
2 - TIME SERIES, FUNDAMENTAL CONCEPTS
3 - ARMA AND ARIMA TIME SERIES
4 - COMPLEMENTS AND APPLICATIONS
5 - KALMAN FILTER AND TIME SERIES
6 - AN INTRODUCTION TO STATE SPACE MODELS

Citation preview

Stascs for Astrophysics Time Series Analysis

Didier Fraix-Burnet, Gérard Grégoire, Eds 6-11 October 2019, Autrans – France

Printed in France ISBN(print): 978-2-7598-2740-4 – ISBN(ebook): 978-2-7598-2741-1 All rights relative to translation, adaptation and reproduction by any means whatsoever are reserved, worldwide. In accordance with the terms of paragraphs 2 and 3 of Article 41 of the French Act dated March 11, 1957, “copies or reproductions reserved strictly for private use and not intended for collective use” and, on the other hand, analyses and short quotations for example or illustrative purposes, are allowed. Otherwise, “any representation or reproduction – whether in full or in part – without the consent of the author or of his successors or assigns, is unlawful” (Article 40, paragraph 1). Any representation or reproduction, by any means whatsoever, will therefore be deemed an infringement of copyright punishable under Articles 425 and following of the French Penal Code. © EDP Sciences, 2022

Organisers Didier Fraix-Burnet, Univ. Grenoble Alpes, CNRS, IPAG, France G´ erard Gr´ egoire, Univ. Grenoble Alpes, LJK, Grenoble, France St´ ephane Girard, Inria Grenoble Rhˆone-Alpes, Grenoble, France

Lecturers G´ erard Gr´ egoire, Univ. Grenoble Alpes, LJK, Grenoble, France ´ ´ Eric Moulines, Ecole Polytechnique / Acad´emie des Sciences, Paris, France Marianne Clausel, Universit´e de Lorraine, Nancy, France

Acknowledgments We thank our sponsors: ˆ Institut de Plan´etologie et d’Astrophysique de Grenoble (IPAG) ˆ Inria Grenoble Rhˆ one-Alpes ˆ Universit´e Grenoble Alpes ˆ Idex Communaut´e Grenoble Alpes ˆ Grenoble INP ˆ Laboratoire Jean Kuntzmann (LJK) ˆ Programme National de Cosmologie et Galaxies (PNCG), Institut National des Sciences de l’Univers (INSU), CNRS ˆ Programme National Gravitation, R´ef´erences, Astronomie, M´etrologie (PNGRAM), Institut National des Sciences de l’Univers (INSU), CNRS ˆ Programme National de Physique Stellaire (PNPS), Institut National des Sciences de l’Univers (INSU), CNRS

´ ˆ Programme National Hautes Energies (PNHE), Institut National des Sciences de l’Univers (INSU), CNRS ˆ PERSYVAL-lab, Grenoble ˆ GdR MaDICS (CNRS)

In addition to the participants below, are also present on the photo: G´erard Gr´egoire, Marianne Clausel, Didier Fraix-Burnet, St´ephane Girard

List of Participants BHATTA Gopal, Astronomical observatory, Jagiellonian University, Poland BLAINEAU Tristan, Laboratoire d’Astrophysique de Lyon, France ´ GOMEZ-GARRIDO Miguel, Observatorio Astron´ omico Nacional, Spain KRISHNAN Saikruba, Centrum Astronomiczne im. Mikolaja Kopernika, Poland LACEDELLI Gaia, Dipartimento di Fisica e Astronomia, University of Padova, Italy LECLERC Aurelia, Institut de Plan´etologie et d’Astrophysique de Grenoble, France MANTHOPOULOU Eleni-Evangelia, Dipartimento di Fisica e Astronomia, University of Padova, Italy MARQUETTE Jean-Baptiste, Laboratoire d’Astrophysique de Bordeaux, France MERLE Thibault, Institut d’Astronomie et d’Astrophysique, Universit´e Libre de Bruxelles, Belgium MONTOLI Alessandro, Physics Department, Universit` a degli Studi di Milano, Italy ORLITOVA Ivana, Astronomical Institute, Czech Academy of Sciences, Czech Republic PERES Ricardo, University of Zurich, Switzerland ROGGERO Noemi, Institut de Plan´etologie et d’Astrophysique de Grenoble, France ROQUETTE Julia, Department of Physics & Astronomy, University of Exeter, UK SREEJITH Sreevarsha, Laboratoire de Physique de Clermont-Ferrand, France

Table of Contents

Foreword

7

1 - AN OVERVIEW OF THIS BOOK Examples of time series. Modelization by linear regression Gérard Grégoire

9

2 - TIME SERIES, FUNDAMENTAL CONCEPTS Gérard Grégoire

35

3 - ARMA AND ARIMA TIME SERIES Gérard Grégoire

77

4 - COMPLEMENTS AND APPLICATIONS Gérard Grégoire

145

5 - KALMAN FILTER AND TIME SERIES Gérard Grégoire

181

6 - AN INTRODUCTION TO STATE SPACE MODELS Randal Douc, Éric Moulines & David Stoffer

215

Foreword The topic of the 2019 session was the time series (including variabilities and transient events) that, from celestial mechanics to gravitational waves, from exoplanets to quasars, concern nearly all the astrophysics. Variable phenomena are ubiquitous in the Universe: periodic (orbits, cycles, pulses, rotations...), transient (explosions, bursts, stellar activity...), random (accretion, ejection...) or regular (apparent motions...). The detection, the characterization and the classification of these variabilities falls within a discipline of statistics called time series analysis. In astrophysics, the detection can be immediate to alert other telescopes, or very detailed to identify some exoplanets or probe the interior of stars. The characterization is required for the physical modelling and understanding. Classification is of course necessary to organize the observations. Time series analysis is not new in astrophysics, but it is widespread in many other disciplines (meteorology, finance, economy, medical sciences...) so that this is an important branch of statistics with huge developments that astronomers often ignore. Three statisticians gave lectures during the live session. We are very much indebted to G´erard Gr´egoire who devised the scientific content of the school and provided most of the lectures. He did a considerable job ´ writing five out of the six chapters of this book. We are grateful to Eric Moulines and Marianne Clausel to have taken the time to prepare and give their fascinating lectures. We also thank St´ephane Girard without whom the organization of the school would not have been possible. The first chapter is intended to give a detailed overview of the contents of the present book. It provides also, through a list of examples, an introduction to the field of time series we are interested in. In a last part, some elementary models using deterministic components to account for a trend and/or a seasonal behavior, are analyzed by the way of multiple linear regression. It is also an opportunity to recall the basics of linear regression model which is constantly involved in time series analyzis methods. The second chapter present the fundamental concepts to deal with time series. It introduces the stationarity, the autocovariance and autocorrelation or partial autocorrelation functions, the spectral density, the discrete Fourier transform, linear, causal and invertible series, linear forecasting. Then it goes on with the statistical tools to describe a stationary process (among which the periodogram well known to astronomers) and finishes with the vector time series. The third chapter is devoted, in a first part, to a rather extensive presentation of the ARMA (AutoRegressive Moving Average) series with also developments of the simplest and most used cases. In a second part generalizations ARIMA, SARMA, SARIMA are introduced which tackle some form of non-stationarity or of seasonality accounting with some form of randomness. Then several sections deal with the forecasting of ARMA series, the estimation of the parameters of the models, a detailed presentation of the strategies to build an ARIMA model. Finally, using R procedures, a comprehensive study of two examples is provided.

8

Statistics for Astrophysics - Time Series Analysis

The fourth chapter gives useful complements to the previous chapters, in particular the procedures to decide whether the series is non-stationary or not, whether the non-stationarity is due to a deterministic trend or a random trend deriving from the autoregressive polynomial having a unit root. Long memory time series, ARMAX series, transfer function models and their applications are also addressed in this chapter, together with a brief approach of multivariate modelling. The last two chapters are devoted to state space models for time series. The fifth chapter by G´erard Gr´egoire deals with the particular case of linear Gaussian state space models. That is the observed variable Yt is a noised linear expression of a hidden variable Xt whose behaviour is Markovian. G´erard Gr´egoire shows that ARIMA series and some other time series models can be represented as state space models. He presents the filtering and smoothing Kalman algorithms, and provide several practical exercises to better grasp the theory. The issue of missing observations is also addressed together with examples and a discussion of some R libraries devoted to the Kalman methodology. This chapter can be seen as a preliminary introduction to the advanced and more general presentation given in the sixth and final chapter. ´ Moulines and David StofThe last chapter, written by Randal Douc, Eric ´ fer, corresponds to the presentation by Eric Moulines during the live session. It addresses the case of discrete-valued state space HMM (Hidden Markov Models) as well as non-linear and/or non-Gaussian continuous-valued HMM. Analogous expressions to the Kalman filter and Kalman smoother algorithms are provided for the conditional distribution of the state. Although the recursions appear simple, approximations by numerical methods are needed. Importance sampling, sequential importance sampling without/with resampling are sequential Monte Carlo methods to approximate the sequence of a posteriori distributions of the state variable by a set of particles and associated weights, which are recursively updated. Some examples cast light on the methods. The reader can find some of the R codes used in this book on the school website http://stat4astro2019.sciencesconf.org/.

Didier Fraix-Burnet G´erard Gr´egoire

CHAPTER 1

—————————————————AN OVERVIEW OF THIS BOOK. Examples of time series. Modelization by linear regression. Gérard GRÉGOIRE1 , 1

Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France

Abstract. This chapter serves as an introduction to the book. Firstly it is intended to precise the place of this presentation in the field of time series. For each chapter a detailed view is given which focus on the main addressed issues. Next, examples of data sets used along this text or in other documents are presented. A third part deals with the use of multiple linear regression in times series. It is worth to note that the linear model is often underlying all along the text. We recall the basic assumptions of the linear model and main results concerning statistical inference. Using R we fit a regression model on a data set showing a series with apparent deterministic trend and seasonality. Several exercises are proposed to extend this example.

To Serge Dégerine Expert connu et reconnu dans le monde des chercheurs du domaine des “time series”, comme en témoignent, parmi d’autres, les publications Dégerine S. [1987], [1990] et [1994]. Souvenir de mon admiration toujours étonnée pour de lumineuses intuitions et une clairvoyance immédiate dans des problèmes complexes de séries temporelles. Souvenir aussi d’une amitié, d’une générosité définitives et de qualités humaines à toutes épreuves.

Figure 1 reproduces what appears to be the oldest known example of a time series plot, dating from the tenth (or possibly eleventh) century and showing the inclinations of the planetary orbits (Tufte [1983]). Commenting on this artifact, Tufte says “It appears as a mysterious and isolated wonder in the history of data graphics, since the next extant graphic of a plotted time-series shows up some 800 years later.”  e-mail: [email protected]

10

Statistics for Astrophysics - Time Series Analysis

Figure 1: Inclinations of the planetary orbits. 10th-11th century.

This paragraph is reprinted from the book of Cryer and Chan [2008]. It is interesting to see that astrophysicists were in some way pioneers in the world of time series. Foreword The five first chapters of this document are concerned with basic notions in time series. As possible we tried to give the theoretical basics of the field, with rigorous arguments as well as modeling motivations, detailed examples using R and exercises. The writing of the document has benefited from the reading of many authors. Among others, we would like to cite, arranged alphabetically, Aragon Y. [2011], Brockwell P. and Davis R. [2014], Cryer J. and Chan K. [2008], Douc R., Moulines E. and Stoffer D. [2014]., Durbin J. and Koopman S. [2012], Harvey A. [1994], Shumway R. and Stoffer D. [2017], and Wei W. [2005]. A particular mention is due to Shumway and Stoffer. Some parts of our document are inspired from their book, and their detailed R code has shown to be precious to write our own procedures. This chapter is devoted in its first part to situate our presentation in the field of time series and to give a relatively detailed overview of the document, though trying to avoid technical and theoretical aspects. Next, a list of data sets of various domains are presented. Some of them are used along the text to illustrate the studied modelings or to demonstrate a particular method. As all along the document we are in a linear context, we resort often to the projection of an element Y of an Hilbert space H onto a subspace generated by a finite set X1 , . . . , Xk of other elements of H, and accordingly to perform a linear regression. Hence we recall basic elements of the usual multiple regression for observations of a real variable and values of k deterministic regressors. We recall also main results concerning statistical inference about the parameters. Using an actual data set we demonstrate the procedure to build a model accounting linearly for a deterministic trend and seasonal components in a time series.

An overview of this book

11

While Chapter 6 is concerned with nonlinear state space models, Chapters 1-5 are mainly devoted to linear time series. Expression “time series” refers to sequences of values of a time varying variable which is generally observed at regular times, for instance each hour or each day, or each year, etc.. It is necessary to clarify what is meant by linear time series models. In fact two definitions can be encountered: • (Yt ) is a linear time series when we can write for any t Yt =

∞ 

ψ j εt− j

where

j=−∞

∞ 

ψ2j < ∞

and

(εt ) ∼ i.i.d.(0, σ2ε ),

(1)

j=−∞

where (εt ) ∼ i.i.d.(0, σ2ε ) means that the εt ’s are independent and identically distributed, with mean zero and common finite variance σ2ε . • Some authors prefer to say that (Yt ) is linear when, roughly speaking, for m ≥ 0, E(Yt+m | Y s , −∞ < s ≤ t − 1) is a linear function of the past of Yt , that is of {Y s , −∞ < s ≤ t − 1}. This means that the best prediction of Yt+m is linear in the past of Yt . We make it more precise in Chapter 2 and Chapter 3. Let us note that when (Yt ) is linear in the first sense, it satisfies also the 2nd definition, see Appendix A of Chapter 2. Note also that the assumption “i.i.d.” in (1) is quite determinant: if we replace it by “uncorrelated” the linearity in the 2nd sense is not ensured, excepted when (εt ) is Gaussian. Frequently in time series models, (εt ) appears as a perturbation and satisfies for any t, E(εt ) = 0 and Var(εt ) is a constant. Furthermore, it is also assumed that, for s  t, Cov(ε s , εt ) = 0, that is the εt ’s are uncorrelated. We then say that (εt ) is a white noise. Finally note also in this preamble that it is often said that there is two different fields of methods in time series analysis: the time domain methods and the frequency domain methods. Although we give basics about frequency domain, that is we present the spectral density together with some first elements of statistical inference in the univariate context as well as in the mutivariate one and, using R, treat some examples, we give more emphasis on time domain methods.

1 An overview of the contents. Chapter 2 develops the basic tools of L2 time series. In a first part we develop fundamental notions for univariate L2 series. The probability distribution of such a series and the 2nd order stationarity are defined. This weak form of stationarity requires that the moments E(Xt ), Var(Yt ) and Cov(Yt+h , Yt ) are independent of t for any h. This justifies notations μ = E(Xt ), σ2 = Var(Yt ) and γ(h) = Cov(Yt+h , Yt ). A trivial and important example of stationary series is the white noise: (εt ) is a white noise when it is a collection of uncorrelated variables with same means and variances. In some cases we assume that the noise is i.i.d. (independent and identically distributed), or i.i.d. Gaussian. The function γ(·) is called the autocovariance function, and ρ(·) = γ(·)/γ(0) the autocorrelation function. These functions are fundamental to analyze the behavior of L2 time series. Also introduced is the important notion of partial autocorrelation function, denoted φhh , which gives the correlation between Yt and Yt+h when we have removed the linear dependence of Yt and Yt+h on intermediate variables Yt+1 , Yt+2 , . . . , Yt+h−1 . The Fourier transform of γ(·), when γ(·) is absolutely summable, defines the spectral density of a stationary time series, which is a key tool for analyzing trigonometric periodicities in the signal.

12

Statistics for Astrophysics - Time Series Analysis

Basic statistics for estimation and tests concerning mean, variance, autocorrelation, partial autocorrelation are presented. The periodogram, estimator of the spectral density, is defined in terms of the discrete Fourier transform and some first properties, as convergence of the mean and asymptotic χ2 distribution, are presented. Vector time series are the subject of Section 6. For simplicity we focus on bivariate series. The generalization to larger dimensions is straightforward. The 2nd order stationarity of Z t = (Xt , Yt ) means that (Xt ) and (Yt ) are stationary, and also that the cross-covariance Cov(Xt+h , Yt ) is independent of t. The autocovariance function is now a matrix function Γ(h) which, in general, is nonsymetric. We have in fact Γ(h) = Γ(−h) . When Z t is stationary and Γ(h) is absolutely summable, the spectral density is a matrix whose elements are the spectral densities associated to the elements of Γ(h). Partial autocorrelation is also generalized. Notice that some care must be taken as some properties of the univariate case are not preserved. Finally we address statistical inference for the mean, variance, autocovariance, autocorrelation functions and spectral density. We point that, for two √ independent stationary series, n  ρXY (h) is asymptotically Gaussian with zero mean and variance  1 + 2 ∞j=1 ρX ( j)ρY ( j). When (Xt ) and (Yt ) are two white noises the variance is equal to 1, but the variance may be seriously inflated when each of (Xt ) and (Yt ) is significantly autocorrelated. This leads to a whitening transformation of (Xt ), see Chapter 4, when studying the relationship between (Yt ) and (Xt ) by regressing Yt on Xt . We are concerned in Chapter 3, with autoregressive moving average series, namely ARMA series. A series (Yt ) is said ARMA(p,q) when we have Yt = φ1 Yt−1 + φ2 Yt−2 + · · · + φ p Yt−p + εt + θ1 εt−1 + θ2 εt−2 + · · · + θq εt−q ,

(2)

where (εt ) is a white noise. When θ1 = θ2 = · · · = θq = 0 and φ p  0 the series is said AR(p), i.e. autoregressive with order p. When φ1 = φ2 = · · · = φ p = 0 and θq  0, (Yt ) is an order q moving average series, in short a MA(q) series. It is convenient to write (2) in the form φ(B)Yt = θ(B)εt where φ(z) = 1 − φ1 z − φ2 z2 − · · · − φ p z p and θ(z) = 1 + θ1 z + θ2 z2 + · · · + θq zq and B is the lag operator defined by BYt = Yt−1 . The equation (2) is a model for a variable which depends on its recent past and on random shocks which occurred also at preceding times. The class of ARMA models is a very important one for several reasons. First it provides parsimonious and flexible models for many time series data sets. Also, in some way this class is dense in the set of stationary time series. An other reason is that the properties of these series have been extensively studied. Moreover ARMA series are a basis to build series models which are not stationary, but can be transformed into ARMA stationary series by differencing. Precisely we consider series (Yt ) such that (ΔYt ) ∼ARMA(p,q), where ΔYt = Yt − Yt−1 = (I − B)Yt . We then say that (Yt ) ∼ARIMA(p, 1, q) where the value 1 means that we use differencing only once. More generally, for an integer d, ARIMA(p, d, q) means that we perform differencing d times to get an ARMA(p,q) series. Note that the order d could be fractional, which means that −0.5 < d < 0.5 and d  0. This provides stationary series, denoted ARFIMA(p, d, q) and called autoregressive moving average fractionaly integrated, whose autocorrelation is slowly decreasing. We then speak of long memory time series. We deal with these series in Chapter 4. Seasonality can be also introduced in ARMA and ARIMA models. This gives rise to SARMA and SARIMA models. In this chapter, we found it convenient to describe also filtering moving average methods to extract seasonal and trend components of a time series. Note that there is no ARMA modeling concerned in this part, but it could be interesting to compare the results when using two ways for accounting for seasonality. Several other sections are devoted to forecasting, statistical inference, strategy for building ARIMA

An overview of this book

13

models, and are the subject of significant developments. A key point in developing forecasting and statistical inference for ARMA models is that, roughly speaking, for any t we are able to write Yt as a linear expression of εt , εt−1 , . . . (causality), and εt as a linear expression of Yt , Yt−1 , . . . (invertibility). All along this chapter, simulations and examples are provided. In a final section, two data sets are extensively investigated fitting ARIMA models: first the well-known Wolf’s Sunspots data, and next a series reporting monthly employment of young males in the US, available under the name man_empl_W9 on the school website. This latter data set is from W. Wei [2005]. Chapter 4 deals with several subjects which are related in some way with ARMA models. In the univariate case we are concerned with stationarity tests, unit root tests, long memory time series, ARMAX models and regression of a series on an other one, transfer function models, intervention analysis. Considering multivariate series we are lead to address multivariate linear regression and vector ARMA models that is VARMA models. Considering stationarity and unit root tests, in the simplest case we assume that the series we are handling is an AR(1) series, Yt = ρYt−1 + ηt , and we are to test H0 : ρ = 1 against H1 : |ρ| < 1. That is, roughly speaking, we are testing a random walk against a zero mean stationary AR(1) series. The statistic n( ρn − 1), called Dickey-Fuller statistic, can be used. Its distribution under H0 is not a standard one but is tabulated and critical values are provided by softwares. More general situations can be dealed with. In particular, if instead of assuming an AR(1), we consider an AR(p) model with p > 1, we can resort to the model with error correction. These tests are based on non-stationarity as null hypothesis. Some other tests, for instance KPSS, focus on the reverse, that is stationarity is the null hypothesis. Long memory time series we are interested in are named ARFIMA(p, d, q) where −0.5 < d < 0.5. The order d differencing is said fractional and the difference operator is defined through the expression of the development of (1 − z)d applied when changing z to B. As in the case of ARIMA series, we get after differencing an ARMA(p, q) stationary series. The original series itself is stationary with autocorrelation function satisfying, for large h, ρ(h) ∼ c · h2d−1 where c  0. This is to be compared with |ρ(h)| ≤ c · |a|h , with |a| < 1 and c ≥ 0 , for any ARMA series. Assuming orders p and q are known we can achieve estimation of d, φ1 , · · · , φ p , θ1 , · · · , θq using maximum likelihood method. ARMAX models can be seen as regression models with ARMA errors. Two methods are presented to estimate the parameters of the regression. The first one consists in estimating the covariance matrix of the error using the residuals of a preliminary ordinary regression, without accounting for the ARMA structure, and use GLS , i.e. the generalized least squares method. The sketch of the second one is the following. We start by running an ordinary regression and estimating the ARMA model of the residuals. Next the estimated transformation  φ(B)/ θ(B) for whitening the noise is applied to both sides of the regression equation. We are then with a regression equation whose noise is deemed to be white and can apply ordinary least squares. As expected this latter method is more efficient than the first one.  When we are faced with (Yt ) and (Xt ) related by Yt = ∞j=0 ν j Xt− j + ηt = ν(B)Xt + ηt it appears that ω(B) d often it makes sense to write ν(B) in the form B with δ(B) ω(B) = ω0 + ω1 B + · · · + ω s Bs , δ(B) = 1 − δ1 B − · · · − δr Br . This representation is called a transfer function. The method to deal with this kind of model can be sketched in the following way. We first whiten (Xt ) and use the relevant operator to filter (Yt ). Then t and the whitened X t allows to identify d, r the cross correlation function between the filtered series Y

14

Statistics for Astrophysics - Time Series Analysis

and s. We can then write a regression equation with Yt as response variable and Yt−1 , . . . , Yt−r and Xt−d−1 , . . . , Xt−d−s as regressors and fit an ARMAX model. The intervention models aim at modeling the effect of an event appearing at time τ. A feasible assumption is that the effect is a change in the mean of the series we are handling. We can resort to a transfer function to model that kind of series, and also to account for possible outliers. We develop this point in Subsection 4.1 and demonstrate the method on the data set related to the effect of the 11th of September 2001 on the airline traffic. For modeling joint behavior of several series a solution is to use vector ARMA series, named VARMA. We give an introduction to these models together with a presentation of a form of multivariate regression. For simplicity we focus on VAR(1) model and consider a bi-dimensional series as an example with only two time series. Chapter 5 deals with the linear Gaussian state space model and Kalman filtering. It is intended to provide an introduction and/or motivation for Chapter 6 contributed by Douc, Moulines and Stoffer. Roughly speaking, in a state space model, we observe the series (Yt ) which is related to an unobserved series (Xt ) along Yt = AXt + Vt . The series (Xt ), called the state series, is a markovian process defined by Xt = φXt−1 + Wt . The series (Vt ) and (Wt ) are independent Gaussian noises. Numerous time series models can be represented as state space models, in particular ARIMA series. We give several examples of such a representation. A particular simple state space model example is provided by the local level model. It is defined by Yt = μt + Vt and μt = μt−1 + Wt , where again (Vt ) and (Wt ) are independent Gaussian noises. Thus the state space representation is defined by Xt = μt , A = 1 and φ = 1. We take advantage of the simplicity of the local level model to derive the filtering and forecasting Kalman algorithms. The arguments to derive recursive formulas for the algorithms are essentially based on handling conditional distributions and are generally considered as a “Bayesian” approach. Naturally the smoothing algorithm is also given. An example where we can run by hand the algorithm is exhibited. The algorithm provides easily Yt|t−1 , the forecasting of Yt given Yt−1 , Yt−2 , . . . , Y1 , and accordingly the innovations Yt − Yt|t−1 . Hence this gives the likelihood function and enables us to carry out maximum likelihood estimation. The complete method is demonstrated precisely on two examples. The first one is a simulation of an ARMA(1,1) series. Two state space representations are used from which we get estimates of φ1 , θ1 and σ2ε . It turns out that the estimates for the two representations are rather close and also close to what is provided by the R procedure arima, which is not unexpected since arima also resorts to Kalman algorithms. The second example deals with the benchmarking dataset which reports yearly Nile flow during 100 years. Demonstration of how the Kalman filter can treat missing data is also given, and some R libraries devoted to state space models and Kalman filter are reviewed. Chapter 6 addresses non-linear and/or non-Gaussian state space models. Thus the relation between Yt and Xt may be non-linear, and the noises (Vt ) and (Wt ) when present, may be non-Gaussian. In fact the authors R. Douc, E. Moulines and David Stoffer, consider two sorts of state space models. Firstly the discrete-valued state space HMM (Hidden Markov Models): the state variable (Xt ) takes its values in a discrete space X ans similarly (Yt ) in the a discrete space Y. This latter assumption about Y could be relaxed. The sequence (Xt ) is in fact a Markov chain. The authors show that (Xt , Yt ) is a bi-dimensional Markov chain, and following an approach initiated by Baum and al. [1970] develop expressions analogous to the Kalman filter (resp. Kalman smoother) algorithms for the recursive computation of the a posteriori distribution of the state Xt when Y0 , . . . , Yt (resp. Y0 , . . . , Yn ) are observed. The complexity to evaluate the likelihood of Y0 , . . . , Yn grows linearly with the number of observations n and quadratically with the number of states while the complexity of the direct evaluation behaves like O(mn ). 0:n = argmax x ∈X n+1 P(X0:n = x0:n | Y0:n ) can be The a posteriori most likely sequence of states, i.e. X 0:n

An overview of this book

15

efficiently computed using the Viterbi algorithm based on dynamic programming principle. When the discrete assumption is relaxed, that is in the continuous case the expression for recursive computations can be generalized, but in practice need approximations by numerical methods. Sequential Monte Carlo methods to approximate the sequence (φt (x)) of conditional distributions of the state are developed in the second part of the chapter. Importance sampling, sequential importance sampling, sequential importance sampling with resampling are used to approximate the sequence (φt (x)) by a set of particles with associated weights which are recursively updated. Some theoretical properties of these methods, as convergence and asymptotic normality, are presented and some examples help to get a better grasp of the methodology.

2 Some examples of time series We give here some examples of time series observed in concrete situations. We first begin by two historic examples of interest for modeling in astrophysics. 2.1 Wolf Sunspot data.

The series concerns the Wolf yearly sunspot numbers between 1700 and 2001, that is n = 302 observations. See Figure 2. This data set stimulated a lot of scientists since sunspots are suspected to affect the weather of Earth and thereby human activities. The sunspot number reflect the intensity of solar activity. Very early, in the world of statisticians, researchers were interested in the subject. For instance, among others Yule [1927], Bartlett [1950] and Whittle [1954]. The file is available in the basic R library datasets. This data file is used in Chapter 3, devoted to ARIMA series. We show that computation of the spectral density reveals two periodic components. Elementary analysis suggests nonstationarity. A preliminary square root transformation is applied as well as computation of first order differences. We demonstrate the building model strategy using autoregressive-integrated moving average models.

2.2 Magnitude of a star.

The data in Figure 3 are the magnitude (i.e. brigthness) of a star taken at midnight for 600 consecutive days. The data are taken from the classic text, The Calculus of Observations, a Treatise on Numerical Mathematics, by E.T. Whittaker and G. Robinson, (1923, Blackie & Son, Ltd.). They are often used to illustrate some points in spectral analysis. The periodogram shows two important peaks at frequencies ω1 = 0.035 and ω2 = 0.041 that is approximately 29 day cycle and 24 day cycle. Some less prominent peaks are present; they correspond to harmonics of the two frequencies. An important remark is that the proximity of two prominent peaks could be interpreted as an amplitude modulated signal and the plot 3 suggests such a behavior. Loosely speaking, this comes from the identity 2 cos(2πωt) cos(2πτt) = cos(2π(ω + τ)t) + cos(2π(ω − τ)t) which gives rise to two close peaks when τ is very small. 2.3 Public transit boardings related to gazoline price.

Figure 4 shows plots of both series. The file named boardings contains monthly data on the number of people who boarded transit vehicles (mostly light rail trains and city buses) in the Denver, Colorado, region for August 2000 through December 2005, that is n = 68. The file price reports the gazoline

Statistics for Astrophysics - Time Series Analysis

100 0

50

Sunspot

150

16

0

50

100

150

200

250

300

Index

0

5

star magnitude 10 15 20 25

30

35

Figure 2: Wolf yearly Sunspots data. 1700-2001.

0

100

200

300 day

400

500

600

Figure 3: Magnitude of a star. 600 consecutive daily observations.

price during the same period. We deal with this data set in Chapter 2. Both given variables have been transformed by applying logarithm. Both series turn out to be nonstationary and we compute the differenced series. Then the relationship between the series is studied through cross-correlation, spectral densities and the squared coherence function. 2.4 Employment of young men in US.

This data set concerns monthly employment (in thousands) for males between ages 16 and 19 in the US from January 1971 to December 1981, that is 132 observations. These data are presented by W.

An overview of this book

17

log.boardings

12.60 12.50

y1

5.2

12.40

4.8

5.0

y2

5.4

5.6

12.70

log.price

2001

2002

2003

2004

2005

2001

2006

2002

2003

2004

2005

2006

Time

Time

Figure 4: Plots of log.price at left side and log.boardings at right side.

600

Man_empl_ts 800 1000

1200

Wei in his book [2005]. The time series seems nonstationary and shows some clear periodicity. We analyze these data in Chapter 3: we will see that we are lead to transform the variable by twice differencing, namely at first and twelve orders, and then to fit an ARMA model.

1972

1974

1976 Time

1978

1980

1982

Figure 5: Young man employment in the US.

2.5 St Lawrence river annual flow.

This data set reports the annual flow of the river St. Lawrence at Ogdensburg, New York State, from 1860 to 1956, i.e. 97 observations, cf. Vujica M. Yevdjevich [1963]. K. Hipel and A. McLeod [1994]

18

Statistics for Astrophysics - Time Series Analysis

0.8

St Lawrence annual flow 0.9 1.0 1.1

provide an extensive study using time series for modeling the flow of a number of rivers. Figure 6 presents the observations. The interest of these data relies on the fact that the aucorrelation function is rather slowly decreasing and can be considered as an example of real data which show long memory. It is used in Chapter 4, together with a simulate data set, to illustrate long memory time series.

1860

1880

1900

1920

1940

Time

Figure 6: St. Lawrence river annual flow from 1850 to 1956.

2.6 El Niño and fish recruitment.

The data set is related to the relationship between El Niño and fish recruitment (monthly data, from 1950 January to 1987 September, n=453, available in astsa). The data were used by Shumway and Stoffer. Two monthly registered time series are of interest: SOI (Southern Oscillation Index) is an index which measures changes in air pressure in Central Pacific Ocean related to changes in temperature resulting from El Niño effect. REC is the recruitment, that is the quantity of new fishes in the concerned month. We are interested in the link between Y = REC and X = S OI. The relationship is investigated in Chapter 4 as an illustration of ARMAX and transfer function models. 2.7 The 11th of September 2001 effect on airline traffic.

The airlines data set is related to the effect of the 11th of September 2001 on airline traffic. The data set shows the logarithm of monthly airline passenger-miles in the US from January 1996 through May 2005, that is n = 113 observations, see Figure airline. The data set is provided with the library TSA under the name airmiles. We use it in Chapter 4 to illustrate intervention models through appropriate transfer functions. 2.8 Cardiovascular mortality and Temperature.

We consider the data of weekly cardiovascular mortality in Los Angeles County extracted from a study by R. Shumway et al. (1988), over a 10-year period. See Figure 9. The authors are interested in the relationship between the three series mortality, temperature and particulate levels. The three series can be downloaded from astsa under the names “cmort”, “tempr” and “part”. Here for simplicity we limit ourselves on the couple mortality and temperature. We note the strong seasonal components

19

−1.0

−0.5

soi 0.0

0.5

1.0

An overview of this book

1960

1970 Time

1980

1950

1960

1970 Time

1980

0

20

40

rec

60

80

100

1950

Log(airmiles) 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8

Figure 7: Southern Oscillation Index (top) and fish recruitment (bottom). 1950-1987.

1996

1998

2000

2002

2004

Year

Figure 8: Logarithm of monthly airline passenger-miles in US from January 1996 to May 2005. The September 11th 2001 effect.

in both series. In Chapter 4 we consider the vector series Yt = (Yt1 , Yt2 ) = (cmortt , temprt ) where (cmortt ) is the cardiovascular mortality series and (temprt ) the temperature series. As an illustration of the presentation about VARMA models, we investigate the link between the two variables by fitting

20

Statistics for Astrophysics - Time Series Analysis

a VAR(1) model with a linear trend.

70

80

90 100

120

Cardiovascular Mortality

1970

1972

1974

1976

1978

1980

1976

1978

1980

Time

50

60

70

80

90

100

Temperature

1970

1972

1974 Time

Figure 9: Cardiovascular mortality and Temperature.

2.9 Johnson and Johnson series

The file JohnsonJohnson from the library {datasets} reports quarterly earnings per share for the U.S. company Johnson & Johnson. This series is often use as benchmarking data set. There are 84 observations from the first quarter of 1960 to the last quarter of 1980, i.e. 21 years. We see on Figure 10 that the series shows some periodicity, the amplitude of seasonal effect is growing with time and the seems to be driven by a quadratic trend. In Chapter 5, we use the Kalman smoothing algorithm to fit a BSM model (basic structural model) to these observations. 2.10 Yearly counts of major earthquakes.

We consider the series of annual counts of major earthquakes for the years 1900–2006, shown by Figure 11. These data can be found in the book of W. Zucchini, I. MacDonald and R. Langrock [2016]. As the values of the series are counts, it would be inappropriate to fit an autoregressive moving-average model. Instead, a space state model can be proposed, based on a Markov chain with two or three states and Poisson distributions conditionaly to the state. Let us observe that we are no more in the framework of linear time series, and methods developed in Chapter 6 would be relevant to deal with these data.

An overview of this book

21

0

Quarterly Earnings per Share 5 10

15

Johnson and Johnson series

1960

1965

1970 Time

1975

1980

Figure 10: Quarterly earnings per share. Johnson and Johnson.

5

10

15

yearly counts 20 25 30

35

40

yearly earthquakes number

1900

1920

1940

1960

1980

2000

V1

Figure 11: Yearly counts of major earthquakes in the world, from 1900 to 2006.

3 Time series and Regression A first elementary model for time series could be of the following form. For t = 1, . . . , n Yt = β0 + β1 xt1 + · · · + βr xtr + εt = xt β + εt , where (εt ) ∼ i.i.d.(0, σ2ε ), that is the εt are independent and identically distributed with zero mean and common variance σ2ε . This series is called a white noise, see Chapter 2 for more details. The values at time t of the r non-random variables x1 , . . . , xr are gathered in the vector xt = (1, xt1 , . . . , xtr ) and

22

Statistics for Astrophysics - Time Series Analysis

β = (β0 , . . . , βr ) is the vector of the model parameters. The simplest of this kind of models is got for r = 1 and called simple linear regression. We first present this model. Simple linear regression The simple linear model is a model with only one variable x, defined by: Yt = β0 + β1 xt + εt , where εt are i.i.d. N(0, σ2ε ) and xt is the value of x at time t. The assumption that the noise is Gaussian is worthwhile to be noted as this have some importance for the statistical inference about the model. Note that, in time series context, we are often interested by xt = t, that is model with a linear trend and an additional noise. We are given observations (xt , yt ), t = 1, . . . , n, from which we have to estimate β = (β0 , β1 ) and σ2ε . We can use the least square method. It consists in looking for ( β0 ,  β1 ) minimizing: S (β0 , β1 ) =

n 

(yt − β0 − β1 xt )2 .

t=1

The solution can be written:



(yt − y¯ )(xt − x¯) cov(y, x) , =  var(x) (xt − x¯)2  β0 = y¯ −  β1 x¯,   where y = (y1 , . . . , yn ) , x = (x1 , . . . , xn ) , and y¯ = 1/n yt and x¯ = 1/n xt are the sample means. The fitted values and the residuals are defined by:  β1 =

 yt =  β0 +  β1 xt , et = yt −  yt , and an unbiased estimate of σ2ε is : 2 = σ ε

(3) (4)

1  2 e . n − 2 t=1 t n

 β = ( β0 ,  β1 ) is unbiased, convergent, Gaussian and with minimum variance in the class of unbiased β = ( β0 ,  β1 ) is unbiased, estimators. When the Gaussian assumption for the noise (εt ) is dropped  convergent, asymptotically Gaussian and with minimum variance in the class of linear unbiased estimators.

Multiple linear regression Let us consider the general case with r variables, x1 , . . . , xr , that is we observe the model (5) Yt = β0 + β1 xt1 + · · · + βr xtr + εt ,   with (εt ) ∼ i.i.d. N(0, σ2ε ). The observations are given by (xt1 , . . . , xtr ) , yt , t = 1, . . . , n. Thus we are interested in statistical inference about β = (β0 , β1 , . . . , βr ) and σ2ε .

An overview of this book

23

The basic principle is similar to the one for simple linear regression. That is the criterium is the minimum sum of squared errors: S (β) =

n 

(yt − β0 − β1 xt1 − · · · − βr xtr )2 .

(6)

t=1

Plugging the observations in the space Rn , we can see the issue as a linear projection procedure. We need some notation y = (y1 , . . . , yn ) , ε = (ε1 , . . . , εn ) ,

(7) (8)

and for t = 1, . . . , n, xt = (1, xt1 , . . . , xtr ) , that is xt is the vector of the values of all the (r + 1) independent variables for the t-th observation. Then we set the design matrix X

X = (x1 , . . . , xn ) .

X is a n × (r + 1) matrix and Xt j is the t-th observed value for the variable x j . These notations provide a vectorial writing of the observations: y = Xβ + ε. In the following, for convenience, we use also the notation x j = (x1 j , . . . , xn j ) , 1 ≤ j ≤ r, that is x j is the vector of the n observed values of the j-th variable, and x0 is the n-dimensional vector 1 = (1, 1, . . . , 1) . Thus we use the same notation for the generic variable x j and for the vector of its n observations. Note that this allows us to write X = (1, x1 , . . . , xr ). The sum of squared errors (6) can be rewritten: y − Xβ 2 , where for a n-dimensional vector y, y =



n 2 t=1 yt .

Clearly the optimal β is such that Xβ is the projection in the space Rn of y onto the subspace generated by the (r + 1) columns of X. This means that  β is such that (y − X β) ⊥ x j ,

j = 0, . . . , r,

(9)

with x0 = 1. Assuming that X  X is regular the solution is unique and given by  β = (X  X)−1 X  y.

(10)

In a way similar to the simple linear regression, the fitted values and the residuals are given by:  β0 +  β1 xt1 + · · · +  βr xtr , yt =  et = yt −  yt .

(11) (12)

24

Statistics for Astrophysics - Time Series Analysis

yn ) and e = (e1 , . . . , en ) , this gives in matrix notation: With  y = ( y1 , . . . ,  y = X β,

(13)

e = y − y.

(14)

An unbiased estimate of σ2ε is defined by 2 = σ ε

S S err , n − (r + 1)

 where S S err = S ( β) = e2t , see (6).  2  et /n and Note that, from (9) for j = 0, we have et = 0, which implies var(e) = var(y) = var( y) + var(e). A useful measure of the quality of the model is R2, the ratio between the sample variances var( y) and var(y), that is  2 et /n var( y) =1− . R2 = var(y) var(y)  Thus R2 is the variance part explained by the regression. Using S S T ot = (yt − y¯ )2 = n × var(y), and  yt − y¯ )2 = n × var( S S reg = ( y), we have S S T ot = S S reg + S S err and this yields R2 =

S S reg S S err =1− . S S T ot S S T ot

The R2 is always provided in the linear regression fitting results by softwares, together with the adjusted version R2a which takes into account the number of parameters and is defined by: R2a = 1 −

S S err /(n − r − 1) n−1 =1− (1 − R2). S S T ot /(n − 1) n−r−1

Some basic properties We limit ourselves to some essential properties. For a more detailed but concise presentation, see for example the courses given in the foregoing schools “Statistics for Astrophysics”, Grégoire [2014a, 2014b]. See also for a basic vademecum on Inferential Statistics Grégoire [2016]. Note that there is a numerous literature dealing with linear regression. A main fact is that the linear regression relies on a geometrical principle: the involved response random vector y as well as the vectors of values of each variable x0 , . . . , xn are “living” in a n-dimensional Hilbert space and we are to determine the projection of y on the subspace generated by x0 , . . . , xn . Let’s remind also that we assumed that the error series (εt ) is i.i.d. Gaussian. However we provide also later on some preserved statistical properties when the Gaussian assumption is not satisfied. • β and  σ2ε are unbiased and convergent.  β is a maximum likelihood estimator and as such is optimal in the sense that its variance is minimum in the class of unbiased estimator. The same optimality property holds true for  σ2ε .

An overview of this book

25

• β is gaussian with covariance matrix σ2ε (X  X)−1 , that is  β ∼ N(β, σ2ε (X  X)−1 ).

(15)

β and  σ2ε are independent. (n − r − 1) σ2ε /σ2ε is χ2n−r−1 -distributed and  • For the i-th component we get:

 βi ∼ N(βi , σ2ε Cii ),

(16)

β j is σ2ε Ci j where we note C the matrix (X  X)−1 . and the covariance between  βi and  2 to σ2 in the variance expression σ2Cii given above and noting si the standard error • Substituting σ ε ε ε of  βi yields :  βi − βi 2Cii , ∼ T n−r−1 with si = σ (17) ε si which is the basis to design tests and confidence intervals for βi . Comparing models and testing linear hypothesis Relevant issues in multiple regression often concern the significance of one specific variable, or of a group of variables, or more generally of a linear relationship between some variables. Clearly (17) provides a way to test the significance of xi since, under H0 : βi = 0,  βi /si is T n−r−1 distributed. Testing the significance of a subset of variables means in fact comparing two nested models. Given a model M0 , a submodel M sm is defined by a relationship T β = 0, where T is a s × (r + 1) matrix with s < r + 1. When T is of rank s, T β = 0 defines a submodel whose dimension is r + 1 − s. Roughly speaking s is the dimension reduction. Under H0 : T β = 0 the statistic F=

(S S err (M sm ) − S S err (M0 ))/s (S S err (M0 )/n − r − 1)

(18)

is F s,n−r−1 distributed. This procedure can be performed in R, fitting both models separately and either calculate F using (18) or comparing them by ANOVA. An example is given below. Note also that, when T is of rank s, we can easily extend this approach to test T β = C, where C is any s-dimensional constant.

Model choice Finally, note that we are often faced with multiple regression with a number of variables and have to select a sparse model. This can be done using iterative nested models (forward, backward, stepwise methods,...).

26

Statistics for Astrophysics - Time Series Analysis

AIC, AICc and BIC information criteria are also used. The principle is to penalize the maximized loglikelihood with a function of the number of variables in the model: this leads to a balance between the adequation of the fitted model and the number of used variables. Let’s recall that fitting a model with a large number of parameters may result in imprecise estimates. Forms adapted to the linear model are given below: log  σ2ε (k) + log  σ2ε (k) + log  σ2ε (k) +

n+2k n , n+k n−k−2 , k log n n ,

AIC AICc BIC

where k is the number of parameters,  σ2ε (k) a maximum likelihood estimator of σ2ε (k) and n the number of observations. The simple expression for the log-likelihood comes from the fact that (εt ) is Gaussian. This implies  σ2ε = 1n ni=1 (yi −  yi )2 from that the maximum likelihood estimator of the variance,  σ2ε , is given by  2 which it comes that, up to an additional constant, the log-likelihood behaves like − log  σε . Therefore maximize the likelihood is equivalent to minimize log  σ2ε , and in the 3 criteria we look for the value k∗ which minimizes the criterium. Finally it must be said that the information criteria have been the subject of a lot of papers. It is usually admitted that AIC has a tendency to select models with too much variables. BIC is the bayesian information criterium. Simulations have shown that for large samples, BIC often select the correct model. The criterium AICc, which is a correction of AIC turns out to give in general better results in case of small samples with a number of parameters reasonably large. Non-gaussian case 2 is no longer When the errors are non-Gaussian and n is not large the  βi are no longer Gaussian, σ ε 2 2  σε are generally not independent. χ -distributed and β and  But the  βi are still unbiased and the same holds true for  σ2ε .  Furthermore the βi are asymptotically gaussian, and more generally it can be proved under reasonable βi −βi )/si ≈ conditions that ( β,  σ2ε ) is asymptotically gaussian. Practically, for n large enough we have ( σ2ε ≈ N(σ2ε , (μ4 − σ4ε )/n)) where μ4 = E(ε4 ). N(0, 1) where si was given above by (17), and  Finally note also that  β is optimal in the class of linear unbiased estimators. Example 1 Beer dataset. Fitting with R a linear model on a time series data set with deterministic linear trend and seasonal components. The beer dataset from Wei [2005] provides the figures of quarterly U.S. beer production from the first quarter of 1975 to the fourth quarter of 1982 (millions of barrels). For convenience we copy the 32 observations at the beginning of the R code. We first present the data: raw data and seasonal plot. We fit a linear trend by linear regression. Next we fit a model with linear trend and seasonal components. Several methods can be used to take into account the seasonality in a regression model. Here our approach is the following. We can write Yt = α0 + α1 t + c1 S 1 (t) + c2 S 2 (t) + c3 S 3 (t) + c4 S 4 (t) + εt

(19)

 where S i (t) = 1 when t is a i-th quarter in the year and S i (t) = 0 otherwise. Clearly 4i=1 S i (t) = 1. This means that our model is not identifiable. To overcome this the coefficients ci are constraint to

An overview of this book

27

satisfy c1 + · · · + c4 = 0. The model becomes: Yt = α0 + α1 t + c1 (S 1 (t) − S 4 (t)) + c2 (S 2 (t) − S 4 (t)) + c3 (S 3 (t) − S 4 (t)) + εt = α0 + α1 t + c1 S1 + c2 S2 + c3 S3 + εt

(20) (21)

with S j = S j − S 4 , j = 1, 2, 3. Thus we have a model with 5 + 1 parameters to estimate: β0 = α1 , β1 = α2 , β2 = c1 , β3 = c2 , β4 = c3 and σ2ε . Note that we could fit a model with an additional quadratic term t2 , but one can check that the result is not significantly different.

p a r ( mfrow=c ( 1 , 2 ) ) l i b r a r y ( caschrono ) library ( forecast ) library ( astsa ) y 0 . In the general model φ(B)(1 − B)d Yt = θ0 + θ(B)εt , θ0 is a deterministic trend mean of the differenced series. θ0 can be considered as an additional parameter in the model and its significance tested after the fitting of the model. Another way is to consider Zt = (1 − B)d Yt and to test that Zt is a mean zero ARMA process. This can be done by an approximate t-test based on Z t /S Z . The standard error S Z is usually approximated by:  1/2 γZ (0)  SZ = ρZ (2) + · · · + 2 ρZ (k) , 1 + 2 ρZ (1) + 2 n ρZ (1), ρZ (2), . . . , ρZ (k) are the first k significant terms where γZ (0) is the sample variance of (Zt ) and of the sample ACF of (Zt ). 6.2 Checking adequacy of the tentative model.

Given the data have been possibly transformed and/or differenced, and a first choice for p, d an q have been done, we investigate the adequacy of this tentative model. The main tool is to examine the residuals. The residuals are intended to estimate the ε,t s. Several possibilities have been proposed in the literature. From the writing φ(B)Yt = θ(B)εt , with the usual assumptions, we get εt = φ(B)/θ(B) Yt which εt = φ(B)/ θ(B) Yt . yields et =

120

Statistics for Astrophysics - Time Series Analysis

Another choice is generally preferred: et = Yt − Ytt−1 , that is the prediction error. For an ARMA series (65) and (66) of Subsection 4.4 allow to get the predictions Ytt−1 and hence the residuals et . 1/2

t−1 When normalized by the standard error (P , these residuals should be approximately: t ) • uncorrelated under the usual assumption that (εt ) ∼ WN(0, σ2ε ), • independent when (εt ) ∼ iid(0, σ2ε ), • independent N(0, 1) when (εt ) ∼ iid N(0, σ2ε ). For checking these properties it is recommended: • to plot the residuals to detect visually any departure from these properties, • to check normality visually by drawing histograms and Q-Q plots, • to use Box-Pierce and Ljung and Box tests to detect any inappropriate ACF behavior (the ρe (i) should be approximately independent and N(0, 1/n)-distributed), • to check that the ACF and PACF of the residuals series do not show any evidence for an AR, or MA, or ARMA structure, which could lead to question the tentative model and try a new one where the residuals structure would be accounted for. These procedures to evaluate some characteristics of residuals are part of what is often called diagnotics’ study. Using the standard arima{stats} function to fit a model and next tsdiag{stats} yields plots for the residuals, ACF and p-values of the Ljung-Box test for a given range of lags. summary provides the estimated coefficients with their standard errors, estimated error variance, log-likelihood, aic information criterium and some measures to evaluate the quality of the fitting. These latter measures called “Training set errors measures” are the following:  • ME = 1n et . ME is the Mean Error.   • RMS E = 1n e2t . RMS E is the Root Mean Square Error, i.e. the root of the MS E.  • MAE = 1n |et |. ME is the Mean Absolute Error.  • MPE = 1n pt , where pt = 100 (et /yt ). MPE is the Mean Percentage Error.  • MAPE = 1n |pt |, where pt = 100 (et /yt ). MAPE is the Mean Absolute Percentage Error.    1 n pt , where

pt = et / n−1 • MAS E = 1n

j=2 |y j − y j−1 | . The MAS E is intended to correct a shortcoming of MPE: when yt is close to 0 the contribution of pt to MPE can be unreasonably high. • ACF1 is the first autocorrelation of the residuals, that is ρe (1).

7 Examples. Below, using R, we analyze two data sets. The main objective is more to illustrate basic methods and procedures displayed in this chapter than to provide a deep and complete study of the data sets. Note also that for convenience, in some parts, we anticipate on Chapter 4, and for instance, perform unit root tests or fit ARIMA models with exogeneous variables. This can be seen as first introductive examples for subjects which are presented in some detail in the next chapter. In processing with R, in this chapter and the following one, we resort to several libraries stats, astsa, TSA, forecast, caschrono, tseries, urca.

ARMA and ARIMA Time Series

121

Without any doubt we would have more often resorted to stats, but the other used libraries, in some points, either provide complementary results or give more practical or more friendly presented outputs. For instance acf2 from astsa provides a simultaneous plot of the sample ACF and PACF. sarima from astsa fits a differenced arima model with, by default, a drift term included. sarima gives by default diagnostics, while we have to use tsdiag after arima. We use auto.arima from forecast to get a first model with a high level fit, even though we next investigate other models around. arfima from forecast is also used in Chapter 4 to fit fractionally differenced ARFIMA models. periodogram and respectively Arimax from TSA allow us to get a raw periodogram and respectively to fit a model with exogeneous variables, transfer functions and oultliers. The library tseries is required for unit root tests such as adf.test and kpss. In Chapter 4 the library urca is used for similar tests. acf2y and t_stat are from caschrono. acf2y provides a friendly presentation for together sample ACF and PACF. t_stat gives p-values for significance tests of estimated coefficients resulting from an arima fitting. 7.1 Study of a data set reporting the Wolf yearly sunspot numbers.

We study here a wellknown dataset, the series of yearly registered sunspot numbers from 1700 to 2001, i.e. n = 302. This dataset stimulated a lot of scientists since sunspots are suspected to affect the weather of Earth and thereby human activities. The file is available in the library datasets. It is convenient to use two versions of the data set, as a vector Sunspot and as a ts class object Sunspot_ts. S u n s p o t