The Elements of Financial Econometrics 9787030433985, 703043398X

Language:Chinese,HardCover,Pub Date:2015-3-1,publisher:Science Press,description:HardCover, Pub Date: 2015-3-1 Pages: 36

312 126 24MB

English Pages [386] Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Elements of Financial Econometrics
 9787030433985, 703043398X

Table of contents :
Пустая страница

Citation preview

The Elements of Financial Econometrics

 Mathematics Monograph Series

30

The Elements of Financial Econometrics (计量金融精要) Jianqing Fan Qiwei Yao

SCIENCE PRESS

Beijing

Responsible Editor: yuzhuo Chen

Copyright© 2015 by Science Press Published by Science Press 16 Donghuangchenggen North Street Beijing 100717, P. R. China Printed in Beijing All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the copyright owner.

ISBN 978-7-03-043398-5 (Beijing)

Preface This is an introductory textbook for financial econometrics at the master’s level. Readers are assumed to have some background in calculus, linear algebra, statistics, and probability, all at undergraduate level. Knowledge in economics and finance is beneficial but not essential. This book grew out of the lecture notes for the “Financial Econometrics” course taught by Jianqing Fan for Master in Finance students at Princeton University since 2003 and for Master in Financial Engineering students at Fudan University since 2011. The audiences are always very broad with diverse background in mathematics, physics, computer science, economics, or finance. Those talented students wish to learn fundamentals of financial econometrics in order to work in financial industry or to pursue a Ph.D. degree in related fields. The challenges are to give all of them sufficient financial econometrics knowledge upon the completion of the class. This text book is written with the aim to achieve such an ambitious goal. We trust that the book will be of interest to those coming to the area for the first time, to readers who already have economics and finance background but would like to further sharpen their quantitative skills, and also to those who have strong quantitative skills and would like to learn how to apply them to finance. Applicationoriented analysts will also find this book useful, as it focuses on methodology and includes numerous case studies with real data sets. We purposely keep the level of mathematics moderate, as the power of the quantitative methods can be understood without sophisticated technical details. We also avoid the cook book style of writing, as good understanding of statistical and econometric principles in finance enable readers to apply the knowledge far beyond the problems stated in the book. Numerical illustration with real financial data is throughout the book. We also indicate, whenever possible, where to find the relevant R codes to implement the various methods. Due to the nature of the subject, it is inevitable that we occasionally step into more sophisticated techniques which rely on more advanced mathematics; such sections are marked with “∗” and can be ignored for beginners. Most technical arguments are collected in a “Complements” section at the end of some chapters,

ii

Preface

but key ideas are left within the main body of the text. What is financial econometrics? Broadly speaking, it is an interdisciplinary subject that uses statistical methods and economic theory to address a variety of quantitative problems in finance. These include building financial models, testing financial economics theory, simulating financial systems, volatility estimation, risk management, capital asset pricing, derivative pricing, portfolio allocation, proprietary trading, portfolio and derivative hedging, among others. Financial econometrics is an active field of integration of finance, economics, probability, statistics, and applied mathematics. Financial activities generate many new problems and products, economics provides useful theoretical foundation and guidance, and quantitative methods such as statistics, probability and applied mathematics are essential tools to solve quantitative problems in finance. Professionals in finance now routinely use sophisticated statistical techniques and modern computation power in portfolio management, proprietary trading, derivative pricing, financial consulting, securities regulation, and risk management. When the class was first taught in 2003, there were very few books on financial econometrics. The books that have strong impact on our preparation of lecture notes are Campbell et al. (1997) and Fan and Yao (2003). With one semester of teaching, we can only cover the important elements of financial econometrics. We use this name as the title of the book, as it also reflects the modulus aspect of the book. This allows us to expand this textbook to cover other fundamental materials in the future. For example, “Simulation methods in finance” and “Econometrics of continuous time finance” are taught at Princeton, but are not in this book nor in Fudan’s financial engineering class due to time constraints. Another important topic that we wish to cover is the analysis of high-frequency financial data. The book consists of two integrated parts: The first four chapters are on time series aspects of financial econometrics while the last five chapters on cross-sectional aspects. The introduction in Chapter 1 sets the scene for the book: using two financial price time series we illustrate the stylized features in financial returns. The efficient markets hypothesis is deliberated together with statistical tests for random walks and white noise. A compact view of linear time series models is given in Chapter 2, including ARMA models, random walks, and inference with trends. We also include a brief introduction on the exponential smoothing based forecasting techniques for trends and momentum, which are widely used in financial industry. Chapter 3 introduces various heteroscedastic volatility models. A compact introduction to state space models including the techniques such as Kalman filter and particle filters is included as an appendix. Chapter 4 contains some selective topics in multivariate time series analysis. Within the context of vector autoregressive models, we also introduce the topics such as Granger causality, impulse response

Preface

iii

functions, and cointegration, as they play important roles in economics and finance. The second begins with Chapter 5, which introduces portfolio theory and derives the celebrated capital asset model. We also introduce statistical techniques to test such a celebrated model and provide extensive empirical studies. Chapter 6 extends the capital asset pricing model to multi-factor pricing model. The applications of the factor models and econometrics tests on the validity of such pricing models are introduced. In addition, principal component analysis and factor analysis are briefly discussed. Chapter 7 touches several practical aspects of portfolio allocation and risk management. The highlights of this chapter include risk assessments of large portfolios, portfolio allocation under gross-exposure constraints, and large volatility matrix estimation using factor models and covariance regularization. Chapter 8 derives the capital asset pricing model from consumption, investment, and saving point of view. This gives students different perspective on where the financial prices come from and a chance to appreciate its differences from pricing financial derivatives. Chapter 9 calculates the prices implied by the models of returns. It gives us an idea of what the fundamental price of a stock is and how the prices are related to the dividend payments and short-term interest rates. Many people have been of great help to our work on this book. Early drafts of this book has been taught to about over five hundred students and there are our enthusiastic readers. In particular, we are grateful to Yingying Fan, Yue Niu, Jingjing Zhang, Feng Yang, Weijie Gu, Xin Tong, Wei Dai, Jiawei Yao, Xiaofeng Shi, and Weichen Wang for their gracious assistance of teaching Financial Econometrics class at Princeton. R´ene Carmona provides us his course outlines on Financial Econometrics that helps the selection of the topics of the course. Many treatments of ARIMA models are inspired by the lecture notes of George Tiao. Alex Furger and Michael Lachans spend a great amount of their precious time to proof-read the final version of the book. We are very grateful to their contributions and generosity. We would like to thank to Yacine Ait-Sahalia and Yazhen Wang for stimulating discussions on the topic and Shaojun Guo for formatting the references of the book. We are indebted to Jiaan Yan for his encouragement and support to publish this book in China and to Xiongwen Lu for inviting us to teach the course in Fudan University. We would also like to thank to Yuzuo Chen for providing various editorial assistance. Jianqing Fan’s research was generously supported by the National Science Foundation and National Institutes of Health of the USA, and Academy of Mathematics and System Science and National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences. December 2014

Jianqing Fan, Princeton Qiwei Yao, London

The Elements of Financial Economics

Contents Preface Chapter 1 Asset Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 One-period simple returns and gross returns . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Multiperiod returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Log returns and continuously compounding . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.4 Adjustment for dividends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.5 Bond yields and prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.6 Excess returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Behavior of financial return data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Stylized features of financial returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3 Efficient markets hypothesis and statistical models for returns . . . . . . . . . . 16 1.4 Tests related to efficient markets hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.1 Tests for white noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.2 Remarks on the Ljung-Box test∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.4.3 Tests for random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.4.4 Ljung-Box test and Dickey-Fuller test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5 Appendix: Q-Q plot and Jarque-Bera test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5.1 Q-Q plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5.2 Jarque-Bera test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.6 Further reading and software implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 2 Linear Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.1 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2 Stationary ARMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.1 Moving average processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.2 Autoregressive processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.3 Autoregressive and moving average processes . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3 Nonstationary and long memory ARMA processes . . . . . . . . . . . . . . . . . . . . . 50

vi

Contents 2.3.1

Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.3.2

ARIMA model and exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.3.3

FARIMA model and long memory processes∗ . . . . . . . . . . . . . . . . . . . . . . . . 53

2.3.4

Summary of time series models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.4 Model selection using ACF, PACF and EACF∗ . . . . . . . . . . . . . . . . . . . . . . . . 55 2.5 Fitting ARMA models: MLE and LSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.5.1 Least squares estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.5.2 Gaussian maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.5.3 Illustration with gold prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.5.4 A snapshot of maximum likelihood methods∗ . . . . . . . . . . . . . . . . . . . . . . . . 67 2.6 Model diagnostics: residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.6.1 Residual plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.6.2 Goodness-of-fit tests for residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.7 Model identification based on information criteria . . . . . . . . . . . . . . . . . . . . . . 73 2.8 Stochastic and deterministic trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 2.8.1 Trend removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 2.8.2 Augmented Dickey-Fuller test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.8.3 An illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.8.4 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.9 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.9.1 Forecasting ARMA processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.9.2 Forecasting trends and momentum of financial markets . . . . . . . . . . . . . . . 89 2.10 Appendix: Time series analysis in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 2.10.1 Start up with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 2.10.2 R-functions for time series analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 2.10.3 TSA – an add-on package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 2.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Chapter 3 Heteroscedastic Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.1 ARCH and GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.1.1 ARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.1.2 GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.1.3 Stationarity of GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 3.1.4 Fourth moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.1.5 Forecasting volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.2 Estimation for GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 3.2.1 Conditional maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . 120 3.2.2 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 3.2.3 Applications of GARCH modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Contents

vii

3.2.4

Asymptotic properties∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

3.2.5

Least absolute deviations estimation∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

3.3 ARMA-GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 3.4 Extended GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 3.4.1 EGARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 3.4.2 Asymmetric power GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 3.4.3 Excess returns and GARCH-in-Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 3.4.4 Integrated GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 3.5 Stochastic volatility models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 3.5.1 Probabilistic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 3.5.2 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 3.5.3 Leverage effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 3.6 Appendix: State space models∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 3.6.1 Linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 3.6.2 Kalman recursions for Gaussian models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 3.6.3 Nonlinear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 3.6.4 Particle filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Chapter 4 Multivariate Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.1 Stationarity and auto-correlation matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.1.1 Stationary vector processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.1.2 Sample cross-covariance/correlation matrices . . . . . . . . . . . . . . . . . . . . . . . 165 4.2 Vector autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 4.2.1 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 4.2.2 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.2.3 Model selection and diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.2.4 Illustration with real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.2.5 Granger causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 4.2.6 Impulse response functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 4.3 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 4.3.1 Unit roots and cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 4.3.2 Engle-Granger method and error correction models . . . . . . . . . . . . . . . . . 187 4.3.3 Johansen’s likelihood method∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 4.3.4 Illustration with real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Chapter 5 Efficient Portfolios and Capital Asset Pricing Model . . . . . 201 5.1 Efficient portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.1.1 Returns and risks of portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

viii

Contents 5.1.2

Portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

5.1.3

Efficient portfolios and Sharpe ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

5.1.4

Efficient frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

5.1.5

Challenges of implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

5.2 Optimizing expected utility function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 5.3 Capital asset pricing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 5.3.1 Market portolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 5.3.2 Capital asset pricing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 5.3.3 Market β and its applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.4 Validating CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 5.4.1 Econometric formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 5.4.2 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 5.4.3 Testing statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 5.5 Empirical studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 5.5.1 An overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 5.5.2 Fama-French portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5.5.3 Further remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 5.6 Cross-sectional regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 5.7 Portfolio optimization without a risk-free asset . . . . . . . . . . . . . . . . . . . . . . . . 228 5.8 CAPM with unknowing risk free rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 5.8.1 Validating the Black version of CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 5.8.2 Testing statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 5.9 Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 5.9.1 Proof of (5.43) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 5.9.2 Proof of (5.48) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 5.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Chapter 6 Factor Pricing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 6.1 Multifactor pricing models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 6.1.1 Multifactor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 6.1.2 Factor pricing models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 6.2 Applications of multifactor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 6.3 Model validation with tradable factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 6.3.1 Existence of a risk-free asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 6.3.2 Estimation of risk premia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 6.3.3 Testing statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 6.3.4 An empirical study using Fama-French portfolios . . . . . . . . . . . . . . . . . . . 256 6.3.5 Absence of a risk-free asset∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 6.4 Macroeconomic variables as factors∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

Contents

ix

6.5 Selection of factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 6.5.1 Principal component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 6.5.2 Factor analysis∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Chapter 7 Portfolio Allocation and Risk Assessment . . . . . . . . . . . . . . . . . 272 7.1 Risk assessment of large portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 7.1.1 Stability of a portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 7.1.2 Stability and risk approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 7.1.3 Errors in risk assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 7.1.4 Representative portfolios with a given exposure . . . . . . . . . . . . . . . . . . . . . 281 7.2 Estimation of a large volatility matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 7.2.1 Exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 7.2.2 Regularization by thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 7.2.3 Projections onto semi-positive and positive definite matrix spaces . . . . 287 7.2.4 Regularization by penalized likelihood∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 7.2.5 Factor model with observable factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 7.2.6 Approximate factor models with observable factors . . . . . . . . . . . . . . . . . . 295 7.2.7 Approximate factor models with unobservable factors . . . . . . . . . . . . . . . 298 7.3 Portfolio allocation with gross-exposure constraints . . . . . . . . . . . . . . . . . . . 301 7.3.1 Portfolio selection with gross-exposure constraint . . . . . . . . . . . . . . . . . . . 302 7.3.2 Relation with covariance regularization∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 7.4 Portfolio selection and tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 7.4.1 Relation with regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 7.4.2 Portfolio selection and tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 7.5 Empirical applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 7.5.1 Fama-French 100 portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 7.5.2 Russell 3000 stocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 7.6 Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 7.6.1 Proof of Theorem 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 7.6.2 Proof of Theorem 7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 7.6.3 Proof of (7.48) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Chapter 8 Consumption based CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 8.1 Utility optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 8.2 Consumption-based CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 8.2.1 CCAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 8.2.2 Power utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 8.3 Mean-variance frontier∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

x

Contents

8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Chapter 9 Present-value Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 9.1 Fundamental price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 9.2 Rational bubbles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 9.3 Time-varying expected returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 9.4 Empirical evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 9.5 Linear regression under dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14

Plots of log returns against simple returns of the Apple Inc . . . . . . . . . . . . . . 3 Yield spreads and returns of bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Daily, weakly and monthly returns of S&P 500 index . . . . . . . . . . . . . . . . . . . . 8 Daily, weakly and monthly returns of Apple stock . . . . . . . . . . . . . . . . . . . . . . . 9 Histograms and Q-Q plots of log returns of S&P 500 . . . . . . . . . . . . . . . . . . . . 9 Histograms and Q-Q plots of log returns of Apple stock . . . . . . . . . . . . . . . . 10 ACF of log-, squared and absolute returns of S&P 500 . . . . . . . . . . . . . . . . . 11 ACF of log-, squared and absolute returns of Apple stock . . . . . . . . . . . . . . 12 Tail distributions of S&P 500 index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 VIX and S&P 500 index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Relationship among different processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Ljung-Box and Dickey-Fuller tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Time series plot and sample ACF plot of four moving-average processes of different orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Time series plot and sample PACF plot of 4 autoregressive processes . . . 44 Five stationary time series with the same marginal distribution . . . . . . . . 47 Sample ACF and PACF for five stationary time series plotted in Figure 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Relationship among different processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Time series plot and ACF of a random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 The yields of a basket HY bonds and their differences . . . . . . . . . . . . . . . . . . 53 A schematic overview of time series models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Plots for the daily gold prices and their differences . . . . . . . . . . . . . . . . . . . . . 64 Daily gold prices and their one-step-ahead predicted prices . . . . . . . . . . . . 66 Good and bad residual patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Diagnostic plots for daily gold prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Diagnostic plots for daily gold prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Q-Q plot of the residuals from fitted ARIMA models for the daily gold prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

xii

List of Figures

2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9

A random walk plus white noise with a linear trend . . . . . . . . . . . . . . . . . . . 76 log daily S&P 500 index prices and their sample ACF . . . . . . . . . . . . . . . . . 79 Time series plot of the residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Quarterly earnings of IBM and Johnson and Johnson . . . . . . . . . . . . . . . . . 82 ACFs for the earnings of Johnson and Johnson . . . . . . . . . . . . . . . . . . . . . . . 83 Illustration of the best predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 MACD technical indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 RSI technical indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Figures of an ARCH(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Features of a simulated ARCH(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Relationship among different white noise processes . . . . . . . . . . . . . . . . . . . . 112 Features of GARCH(1,1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Features of GARCH(1,1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Diagnostic plot of GARCH(1,1) fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Daily prices of the S&P 500 index, Goldman Sachs and Ford . . . . . . . . . . 125 Daily returns of the S&P 500 index, Goldman Sachs and Ford and their predictive intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 ACFs for squared and the absolute residuals GARCH(1,1) fit . . . . . . . . . 128 Daily returns and their predicted VaRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Absolute errors of Gaussian MLE and LADE . . . . . . . . . . . . . . . . . . . . . . . . 134 Fitted volatility using EGARCH for S&P 500, Goldman Sachs and Ford. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Q-Q plots of standardized residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Fitted volatility based on APGARCH(1,1) fit . . . . . . . . . . . . . . . . . . . . . . . . 144 VaRs based on APGARCH(1,1) fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Weights in exponential smoothing and its equivalent window size . . . . 147 Predicted VaR from stochastic volatility models . . . . . . . . . . . . . . . . . . . . . 152 Daily log close prices of FTSE 100 index, FTSE MidCap index, and FTSE SmallCap index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Sample cross-correlations of the log returns . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Sample cross-correlations of the residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Daily returns and their predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Daily log prices of S&P 500 index and JP Morgan . . . . . . . . . . . . . . . . . . . . 185 Impulse response functions of the fitted AR(1) model . . . . . . . . . . . . . . . . . 185 A random walk path of a drunk and an associated path of his dog . . . . 190 U.S. Treasury real yield curve with different maturities . . . . . . . . . . . . . . . . 196 Time series and ACF plots for the five candidate cointegrated variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

List of Figures

5.1 5.2 5.3 5.4 5.5 6.1 6.2 6.3 6.4 6.5 6.6 6.7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 8.1 8.2 8.3 9.1 9.2 9.3 9.4 9.5 9.6

xiii

Efficient portfolio frontier and Sharpe ratios . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Exponential utility functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 The market betas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 CAMP and 6 Fama-French testing portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Minimum-variance portfolios without a risk free asset . . . . . . . . . . . . . . . . . 230 Several market risk factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Fama-French three factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Multifactor models and 6 Fama-French testing portfolios . . . . . . . . . . . . . . 257 Multifactor models and 25 Fama-French testing portfolios . . . . . . . . . . . . . 258 Market risks and its nearly independent factors . . . . . . . . . . . . . . . . . . . . . . . 265 Proportion of variances explained by the first two principal components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Multiple R2 of the 100 Fama-French portfolios regressed on the returns of S&P 500 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Simulated daily returns of the four assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Actual risks of the minimum variance portfolios . . . . . . . . . . . . . . . . . . . . . . . 278 Volatility matrix between the returns of S&P 500 and the changes of VIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 SCAD and its local approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Average losses of estimating covariance matrices . . . . . . . . . . . . . . . . . . . . . . 294 Average of estimating precision matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Spectral distributions of estimated covariance matrices . . . . . . . . . . . . . . . . 297 Risks of portfolios against gross-exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Illustration of risk improvement by using the penalized least-squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Characteristics of invested portfolios as a function of gross exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Consumer price index and inflation rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 US inflation adjusted GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Consumption growth, inflation adjusted returns of the CRSP index and risk-free interest rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Rational bubble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Inflation adjusted S&P 500 index and its dividend . . . . . . . . . . . . . . . . . . . . 334 log-prices and its approximation by approximate present-value model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Dividend yield against future returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Dividend yield against future returns in the period 1951-1980. . . . . . . . . 339 Dividend yield against future returns in the period 1981-2010. . . . . . . . . 340

xiv

List of Figures

9.7 9.8 9.9 9.10

Dividend yield against future returns in the period 1927-1950 . . . . . . . . . 341 Changes of the yields of 3-month treasury . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Changes of short-term rates again future returns in 1981-2010 . . . . . . . . . 343 Changes of short-term rates again future returns in 1951-1980 . . . . . . . 344

Chapter 1 Asset Returns The primary goal of investing in a financial market is to make profits without taking excessive risks. Most common investments involve purchasing financial assets such as stocks, bonds or bank deposits, and holding them for certain periods. Positive revenue is generated if the price of a holding asset at the end of holding period is higher than that at the time of purchase (for the time being we ignore transaction charges). Obviously the size of the revenue depends on three factors: (i) the initial capital (i.e. the number of assets purchased), (ii) the length of holding period, and (iii) the changes of the asset price over the holding period. A successful investment pursues the maximum revenue with a given initial capital, which may be measured explicitly in terms of the so-called return . A return is a percentage defined as the change of price expressed as a fraction of the initial price. It turns out that asset returns exhibit more attractive statistical properties than asset prices themselves. Therefore it also makes more statistical sense to analyze return data rather than price series. 1.1

Returns

Let Pt denote the price of an asset at time t. First we introduce various definitions for the returns for the asset. 1.1.1

One-period simple returns and gross returns

Holding an asset from time t − 1 to t, the value of the asset changes from Pt−1 to Pt . Assuming that no dividends paid are over the period. Then the one-period simple return is defined as Rt = (Pt − Pt−1 )/Pt−1 .

(1.1)

It is the profit rate of holding the asset from time t − 1 to t. Often we write Rt = 100Rt %, as 100Rt is the percentage of the gain with respect to the initial capital Pt−1 . This is particularly useful when the time unit is small (such as a day or an hour); in such cases Rt typically takes very small values. The returns for less

2

Chapter 1

Asset Returns

risky assets such as bonds can be even smaller in a short period and are often quoted in basis points , which is 10, 000Rt. The one period gross return is defined as Pt /Pt−1 = Rt + 1. It is the ratio of the new market value at the end of the holding period over the initial market value. 1.1.2

Multiperiod returns

The holding period for an investment may be more than one time unit. For any integer k  1, the returns for over k periods may be defined in a similar manner. For example, the k-period simple return from time t − k to t is Rt (k) = (Pt − Pt−k )/Pt−k , and the k-period gross return is Pt /Pt−k = Rt (k) + 1. It is easy to see that the multiperiod returns may be expressed in terms of one-period returns as follows: Pt Pt Pt−1 Pt−k+1 = ··· , Pt−k Pt−1 Pt−2 Pt−k Rt (k) =

Pt − 1 = (Rt + 1)(Rt−1 + 1) · · · (Rt−k+1 + 1) − 1. Pt−k

(1.2)

(1.3)

If all one-period returns Rt , · · · , Rt−k+1 are small, (1.3) implies an approximation Rt (k) ≈ Rt + Rt−1 + · · · + Rt−k+1 .

(1.4)

This is a useful approximation when the time unit is small (such as a day, an hour or a minute). 1.1.3

Log returns and continuously compounding

In addition to the simple return Rt , the commonly used one period log return is defined as (1.5) rt = log Pt − log Pt−1 = log(Pt /Pt−1 ) = log(1 + Rt ). Note that a log return is the logarithm (with the natural base) of a gross return and log Pt is called the log price. One immediate convenience in using log returns is that the additivity in multiperiod log returns, i.e. the k period log return rt (k) ≡ log(Pt /Pt−k ) is the sum of the k one-period log returns: rt (k) = rt + rt−1 + · · · + rt−k+1 . An investment at time t − k with initial capital A yields at time t the capital A exp{rt (k)} = A exp(rt + rt−1 + · · · + rt−k+1 ) = Aek¯r ,

(1.6)

1.1

Returns

3

where r¯ = (rt + rt−1 + · · · + rt−k+1 )/k is the average one-period log returns. In this book returns refer to log returns unless specified otherwise. Note that the identity (1.6) is in contrast with the approximation (1.4) which is only valid when the time unit is small. Indeed when the values are small, the two returns are approximately the same: rt = log(1 + Rt ) ≈ Rt . However, rt < Rt . Figure 1.1 plots the log returns against the simple returns for the Apple Inc share prices in the period of January 1985 – February 2011. The returns are calculated based on the daily close prices for the three holding periods: a day, a week and a month. The figure shows that the two definitions result almost the same daily returns, especially for those with the values between −0.2 and 0.2. However when the holding period increases to a week or a month, the discrepancy between the two definitions is more apparent with a simple return always greater than the corresponding log return.

Figure 1.1

Plots of log returns against simple returns of the Apple Inc share prices in

January 1985 – February 2011. The blue straight lines mark the positions where the two returns are identical.

The log return rt is also called continuously compounded return due to its close link with the concept of compound rates or interest rates . For a bank deposit account, the quoted interest rate often refers to as ‘simple interest ’. For example, an interest rate of 5% payable every six months will be quoted as a simple interest of 10% per annum in the market. However if an account with the initial capital $1 is held for 12 months and interest rate remains unchanged, it follows from (1.2) that the gross return for the two periods is 1 × (1 + 0.05)2 = 1.1025, i.e. the annual simple return is 1.1025 − 1 = 10.25%, which is called the compound

4

Chapter 1

Asset Returns

return and is greater than the quoted annual rate of 10%. This is due to the earning from ‘interest-on-interest’ in the second six-month period. Now suppose that the quoted simple interest rate per annum is r and is unchanged, and the earnings are paid more frequently, say, m times per annum (at the rate r/m each time of course). For example, the account holder is paid every quarter when m = 4, every month when m = 12, and every day when m = 365. Suppose m continues to increase, and the earnings are paid continuously eventually. Then the gross return at the end of one year is lim (1 + r/m)m = er .

m→∞

More generally, if the initial capital is C, invested in a bond that compounds continuously the interest at annual rate r, then the value of the investment at time t is C exp(rt). Hence the log return per annum is r, which is the logarithm of the gross return. This indicates that the simple annual interest rate r quoted in the market is in fact the annual log return if the interest is compounded continuously. Note that if the interest is only paid once at the end of the year, the simple return will be r, and the log return will be log(1 + r) which is always smaller than r. In summary, a simple annual interest rate quoted in the market has two interpretations: it is the simple annual return if the interest is only paid once at the end of the year, and it is the annual log return if the interest is compounded continuously. 1.1.4

Adjustment for dividends

Many assets, for example some blue-chip stocks, pay dividends to their shareholders from time to time. A dividend is typically allocated as a fixed amount of cash per share. Therefore adjustments must then be made in computing returns to account for the contribution towards the earnings from dividends. Let Dt denote the dividend payment between time t − 1 and t. Then the returns are now defined as follows: Rt = (Pt + Dt )/Pt−1 − 1, rt = log(Pt + Dt ) − log Pt−1 ,   Rt (k) = Pt + Dt + · · · + Dt−k+1 Pt−k − 1,   k−1  Pt−j + Dt−j . log rt (k) = rt + · · · + rt−k+1 = P t−j−1 j=0 The above definitions are based on the assumption that all dividends are cashed out and are not re-invested in the asset.

1.1

1.1.5

Returns

5

Bond yields and prices

Bonds are quoted in annualized yields. A so-called zero-coupon bond is a bond bought at a price lower than its face value (also called par value or principal), with the face value repaid at the time of maturity. It does not make periodic interest payments (i.e. coupons), hence the term ‘zero-coupon’. Now we consider a zerocoupon bond with the face value $1. If the current yield is rt and the remaining duration is D units of time, with continuous compounding, its current price Bt should satisfy the condition Bt exp(Drt ) = $1, i.e. the price is Bt = exp(−Drt ) dollars. Thus, the annualized log-return of the bond is log(Bt+1 /Bt ) = D(rt − rt+1 ).

(1.7)

Here, we ignore the fact that Bt+1 has one unit of time shorter maturity than Bt . Suppose that we have two baskets of high-yield bonds and investment-grade bonds (i.e. the bonds with relatively low risk of default) with an average duration of 4.4 years each. Their yields spread (i.e. the difference) over the Treasury bond with similar maturity are quoted and plotted in Figure 1.2. The daily returns of bonds can then be deduced from (1.7), which is the change of yields multiplied by the duration. The daily changes of treasury bonds are typically much smaller. Hence, the changes of yield spreads can directly be used as proxies of the changes of yields. As expected, the high-yield bonds have higher yields than the investment grade bonds, but have higher volatility too (about 3 times). The yield spreads widened significantly in a period after the financial crisis following Lehman Brothers filing bankrupt protection on September 15, 2008, reflecting higher default risks in corporate bonds.

6

Chapter 1

Figure 1.2

Asset Returns

Time series of the yield spreads (the top panel) of high-yield bonds (blue curve)

and investment-grade bonds (red curve), and their associated daily returns (the 2nd and 3rd panels) in November 29, 2004 – December 10, 2014.

1.1.6

Excess returns

In many applications, it is convenient to use an excess return , which is defined in the form rt − rt , where rt is a reference rate. The commonly used reference rates are, for example, bank interest rates, LIBOR rates (London Interbank Offered Rate: the average interest rate that leading banks in London charge when lending to other banks), log returns of a riskless asset (e.g., yields of short-term government bonds such as the 3-month US treasury bills) or market portfolio (e.g. the S&P 500 index or CRSP value-weighted index, which is the value-weighted index of all stocks traded in three major stock exchanges, created by the Center for Research in Security Prices of University of Chicago). For bonds, yield spread is an excess yield defined as the difference between the yield of a bond and the yield of a reference bond such as a US treasury bill with a similar maturity.

1.2

Behavior of financial return data

1.2

Behavior of financial return data

7

In order to build useful statistical models for financial returns, we collect some empirical evidence first. To this end, we look into the daily closing indices of the S&P 500 and the daily closing share prices (in US dollars) of the Apple Inc in the period of January 1985 – February 2011. The data were adjusted for all splits and dividends, and were downloaded from Yahoo!Finance. The S&P 500 is a value-weighted index of the prices of the 500 large-cap common stocks actively traded in the United States. Its present form has been published since 1957, but its history dates back to 1923 when it was a value-weighted index based on 90 stocks. It is regarded as a bellwether for the American economy. Many mutual funds, exchange-traded funds, pension funds etc are designed to track the performance of S&P 500. The first panel in Figure 1.3 is the time series plot for the daily closing indices of S&P 500. It shows clearly that there was a slow and steady increase momentum in 1985-1987. The index then reached an all-time high on March 24, 2000 during the dot-com bubble, and consequently lost about 50% of its value during the stock market downturn in 2002. It peaked again on October 9, 2007 before suffering during the credit crisis stemming from subprime mortgage lending in 2008-2010. The other three panels in Figure 1.3 show the daily, the weekly and the monthly log returns of the index. Although the profiles of the three plots are similar, the monthly return curve is a ‘smoothed’ version of, for example, the daily return curve which exhibits higher volatile fluctuations. In particular, the high volatilities during the 2008-2010 are more vividly depicted in the daily return plot. In contrast to the prices, the returns oscillate around a constant level close to 0. Furthermore, high oscillations tend to cluster together, reflecting more volatile market periods. Those features on return data are also apparent in the Apple stock displayed in Figure 1.4. The share prices of the Apple Inc are also non-stationary in time in the sense that the price movements over different time periods are different. For example in 1985 – 1998, the prices almost stayed at a low level. Then it experienced a steady increase until September 29, 2000 when the Apple’s value sliced in half due to the earning warning in the last quarter of the year. The more recent surge of the price increase was largely due to Apple’s success in the mobile consumer electronics market via its products the iPod, iPhone and iPad, in spite of its fluctuations during the subprime mortgage credit crisis. We plot the normalized histograms of the daily, the weekly and the monthly log returns of the S&P 500 index in Figure 1.5. For each histogram, we also superimpose the normal density function with the same mean and variance. Also plotted in Figure 1.5 are the quantile-quantile plots for the three returns. (See an introduction on Q-Q plots in Section 1.5.) It is clear that the returns within the given holding

8

Figure 1.3

Chapter 1

Asset Returns

Time series plots of the daily indices, the daily log returns, the weekly log

returns, and the monthly log returns of S&P 500 index in January 1985 – February 2011.

1.2

Behavior of financial return data

Figure 1.4

9

Time series plots of the daily prices, the daily log returns, the weekly log

returns, and the monthly log returns of the Apple stock in January 1985 – February 2011.

Figure 1.5

Histograms (the top panels) and Q-Q plots (the bottom panels) of the daily,

weekly, and monthly log returns of S&P 500 in January 1985 – February 2011. The normal density with the same mean and variance are superimposed on the histogram plots.

periods are not normally distributed. Especially the tails of the return distributions

10

Chapter 1

Asset Returns

are heavier than those of the normal distribution, which is highlighted explicitly in the Q-Q plots: the left tail (red circles) is below (negatively larger) the blue line, and the right tail (red circles) is above (larger) the blue line. We have also noticed that when the holding period increases from a day, a week to a month, the tails of the distributions become lighter. In particular the upper tail of the distribution for the monthly returns is about equally heavy as that of a normal distribution (red circles and blue line are about the same). All the distributions are skewed to the left due to a few large negative returns. The histograms also show that the distribution for the monthly returns is closer to a normal distribution than those for the weekly returns and the daily returns. The similar patterns are also observed in the Apple return data; see Figure 1.6.

Figure 1.6

Histograms (the top panels) and Q-Q plots (the bottom panels) of the daily,

weekly, and monthly log returns of the Apple stock in January 1985 – February 2011. The normal density with the same mean and variance are superimposed on the histogram plots.

Figures 1.7 and 1.8 plot the sample autocorrelation function (ACF) ρk against the time lag k for the log returns, the squared log returns and the absolute log returns. Given a return series r1 , · · · , rT , the sample autocorrelation function is defined as ρk = γ k / γ0 , where

1.2

Behavior of financial return data

γ k =

T −k 1  (rt − r¯)(rt+k − r¯), T t=1

11

r¯ =

T 1  rt . T t=1

(1.8)

γ k is the sample autocovariance at lag k. It is (about) the same as the sample −k (the difference is correlation coefficient of the paired observations {(rt , rt+k )}Tt=1 in the definition of the sample mean in the calculation of the sample covariance). The sample autocorrelation functions for the squared and the absolute returns are defined in the same manner but with rt replaced by, respectively, rt2 and |rt |. For each ACF plot in Figures 1.7 and 1.8, the two dashed horizontal lines, which are √ ±1.96/ T , are the bounds for the 95% confidence interval for ρk if the true value is ρk = 0. Hence ρk would be viewed as not significantly different from 0 if its estimator ρk is between those two lines. It is clear from Figures 1.7 and 1.8 that all the daily, weekly and monthly returns for both S&P 500 and the Apple stock exhibit no significant autocorrelation, supporting the hypothesis that the returns of a financial asset are uncorrelated across time. However there are some small but significant autocorrelations in the squared returns and more in the absolute returns.

Figure 1.7

Autocorrelations of the daily, weekly, and monthly log returns, the squared

daily, weekly, and monthly log returns, and the absolute daily, weekly, and monthly log returns of S&P 500 in January 1985 – February 2011.

12

Chapter 1

Figure 1.8

Asset Returns

Autocorrelations of the daily, weekly, and monthly log returns, the squared

daily, weekly, and monthly log returns, and the absolute daily, weekly, and monthly log returns of the Apple stock in January 1985 – February 2011.

Furthermore the autocorrelations are more pronounced and more persistent in the daily data than in weekly and monthly data. Since the correlation coefficient is a measure of linear dependence, the above empirical evidence indicates that the returns of a financial asset are linearly independent with each other, although there exist nonlinear dependencies among the returns at different lags. Especially the daily absolute returns exhibit significant and persistent autocorrelations — a characteristic of so-called long memory processes. 1.2.1

Stylized features of financial returns

The above findings from the two real data sets are in line with the so-called stylized features in financial returns series, which are observed across different kinds of assets including stocks, portfolios, bonds and currencies. See, e.g. Rydberg (2000). We summarize these features below. (i) Stationarity. The prices of an asset recorded over times are often not stationary due to, for example, the steady expansion of economy, the increase of productivity resulting from technology innovation, and economic recessions or financial crisis. However their returns, denoted by rt for t  1, typically fluctuates around a

1.2

Behavior of financial return data

13

constant level, suggesting a constant mean over time. See Figures 1.3 and 1.4. In fact most return sequences can be modeled as a stochastic processes with at least time-invariant first two moments (i.e. the weak stationarity; see 2.1). A simple (and perhaps over-simplistic) approach is to assume that all the finite dimensional distributions of a return sequence are time-invariant. (ii) Heavy tails. The probability distribution of return rt often exhibits heavier tails than those of a normal distribution. Figures 1.5 and 1.6 provide the quantilequantile plot or Q-Q plot for graphical checking of normality. See Section 1.5 for detail. A frequently used statistic for checking the normality (including tail-heaviness) is the Jarque-Bera test presented in Section 1.5. Nevertheless rt is assumed typically to have at least two finite moments (i.e. E(rt2 ) < ∞), although it is debatable how many moments actually exist for a given asset. The density of t-distribution with degree of freedom ν is given by −(ν+1)/2  x2 −1 fν (x) = dν 1+ , (1.9) ν √ where dν = B(0.5, 0.5ν) ν is the normalization constant and B is a beta function. This distribution is often denoted as t(ν) or tν . Its tails are of polynomial order fν (x)  |x|−(ν+1) (as |x| → ∞), which are heavier than the normal density. Note that for any random variable X ∼ t(ν), E{|X|ν } = ∞ and E{|X|ν−δ } < ∞ for any δ ∈ (0, ν]. When ν is large, t(v) is close to a normal distribution. In fact, based on a sample of size 2500 (approximately 10-year daily data), one can not differentiate t(10) from a normal distribution based on, for example, the Kolmogorov-Smirnov test (the function KS.test in R). However their tail behaviors are very different: A 5-standard-deviation (SD) event occurs once in every 14000 years under a normal distribution, once in every 15 years under t10 , and once in every 1.5 years under t4.5 . The calculation goes as follows. The probability of getting a −5 SD daily shock or worse under the normal distribution is 2.8665 × 10−7 (which is P (Z < −5) for Z ∼ N (0, 1)), or 1 in 3488575 days. Dividing this by approximately 252 trading days per year yields the result of 13844 years. A similar calculation can be done with different kinds of t-distributions. If the tails of stock returns behave like t4.5 , the left tail of typical daily S&P 500 returns, the occurrence of −5 SD event is more often than what we would conceive. Figure 1.9 plots the quantiles of the S&P 500 returns in the period January 1985 – February 2011 against the quantiles of t(ν) distributions with ν = 2, 3, · · · , 7. It is clear that the tails of the S&P 500 return are heavier than the tails of t(5), and are thinner than the tails of t(2) (and perhaps also t(3)). Hence it is reasonable to assume that the second moment of the return of the S&P 500 is finite while the 5th

14

Chapter 1

Asset Returns

moment should be infinity.

Figure 1.9

Plots of log returns against simple returns of the Apple Inc share prices in

January 1985 – February 2011. The blue straight lines mark the positions where the two returns are identical.

1.2

Behavior of financial return data

15

(iii) Asymmetry. The distribution of return rt is often negatively skewed (Figures 1.5 and 1.6), reflecting the fact that the downturns of financial markets are often much steeper than the recoveries. Investors tend to react more strongly to negative news than to positive news. (iv) Volatility clustering. This term refers to the fact that large price changes (i.e. returns with large absolute values) occur in clusters. See Figures 1.3 and 1.4. Indeed, large price changes tend to be followed by large price changes, and periods of tranquility alternate with periods of high volatility. The time varying volatility can easily be see in Figures 1.3. The standard deviation of S&P 500 returns from November 29, 2004 to December 31, 2007 (3 months before JP Morgan Chase offered to acquire Bear Stearns at a price of $2 per share on March 17, 2008) is 0.78%, and the standard deviation of the returns since 2008 is 1.83%. The volatility of S&P 500 in 2005 and 2006 is merely 0.64%, whereas the volatility at the height of the 2008 financial crisis (September 15, 2008 to March 16, 2009, i.e. from a month before to 6 months after Lehmann Brother’s fall) is 3.44%. (v) Aggregational Gaussianity. Note that a return over k days is simply the aggregation of k daily returns; see (1.6). When the time horizon increases, the central limit law sets in and the distribution of the returns over a long time-horizon (such as a month) tends toward a normal distribution. See Figures 1.5 and 1.6. (vi) Long range dependence. The returns themselves hardly show any serial correlation, which, however, does not mean that they are independent. In fact, both daily squared and absolute returns often exhibit small and significant autocorrelations. Those autocorrelations are persistent for absolute returns, indicating possible long-memory properties. It is also noticeable that those autocorrelations become weaker and less persistent when the sampling interval is increased from a day, to a week to a month. See Figure 1.7. (vii) Leverage effect. Asset returns are negatively correlated with the changes of their volatilities (Black 1976, Christie 1982). As asset prices decline, companies become more leveraged (debt to equity ratios increase) and riskier, and hence their stock prices become more volatile. On the other hand, when stock prices become more volatile, investors demand high returns and hence stock prices go down. Volatilities caused by price decline are typically larger than the appreciations due to declined volatilities. To examine the leverage effect, we use the VIX, which is a proxy of the implied volatility of a basket of at-money S&P 500 options maturity in one month, as the proxy of volatility of the S&P 500 index. Figure 1.10 shows the time series of VIX and S&P 500 index (left panel) and the returns of S&P 500 index against the change of volatilities (right panel). The leverage effect is strong, albeit VIX is not a prefect measure of the volatility of the S&P 500 index, involving the

16

Chapter 1

Asset Returns

volatility risk premium (Ait-Sahalia, Fan, and Li, 2013)

Figure 1.10

Time series plot of VIX (red) and the S&P 500 index (blue) in Nov. 29, 2004

– Dec. 14, 2011 (the left panel), and the plot of the daily S&P 500 returns (in percent) against the changes of VIX (the right panel).

1.3

Efficient markets hypothesis and statistical models for returns

The efficient markets hypothesis (EMH) in finance assumes that asset prices are fair, information is accessible for everybody and is assimilated rapidly to adjust prices, and people (including traders) are rational. Therefore price Pt incorporates all relevant information up to time t, individuals do not have comparative advantages in the acquisition of information. The price change Pt −Pt−1 is only due to the arrival of “news” between t and t + 1. Hence individuals have no opportunities for making an investment with return greater than a fair payment for undertaking riskiness of the asset. A shorthand for the EHM: the price is right, and there exist no arbitrage opportunities. The above describes the strong form of the EMH: security prices of traded assets reflect instantly all available information, public or private. A semi-strong form states that security prices reflect efficiently all public information, leaving rooms for the value of private information. The weak form merely assumes security prices reflect all past publicly available information. Under the EHM, an asset return process may be expressed as rt = μt + εt ,

εt ∼ (0, σt2 ),

(1.10)

1.3

Efficient markets hypothesis and statistical models for returns

17

where μt is the rational expectation of rt at time t − 1, and εt represents the return due to unpredictable “news” which arrives between time t − 1 and t. In this sense, εt is an innovation – a term used very often in time series literature. We use the notation X ∼ (μ, ν) to denote that a random variable X has mean μ and variance ν, respectively. The assumption that Eεt = 0 reflects the belief that on average the actual change of log price equals the expectation μt . By combining the EHM model (1.10) with some stylized features outlined in Section 1.2, most frequently used statistical models for financial returns admit the form rt = μ + εt , εt ∼ WN(0, σ 2 ), (1.11) where μ = Ert is the expected return, which is assumed to be a constant. The notation εt ∼ WN(0, σ 2 ) denotes that ε1 , ε2 , · · · form a white noise process with Eεt = 0 and var(εt ) = σ 2 ; see (i) below. Here we assume that var(εt ) = var(rt ) = σ 2 is a finite positive constant, noting that most varying volatilities in the return plots in Figures 1.3 & 1.4 can be represented by conditional heteroscadasticity under the martingale difference assumption (ii) below. The assumption (1.11) is reasonable, supported by the empirical analysis in section 1.2. There are three different types of assumptions about the innovations {εt } in (1.11), from the weakest to the strongest. (i) White noise innovations : {εt } are white noise, denoted as εt ∼ WN(0, σ 2 ). Under this assumption, Corr(εt , εs ) = 0 for all t = s. (ii) Martingale difference innovations : εt form a martingale difference sequence in the sense that for any t E(εt |rt−1 , rt−2 , · · · ) = E(εt |εt−1 , εt−2 , · · · ) = 0.

(1.12)

One of the most frequently used format for martingale difference innovations is of the form εt = σt ηt , (1.13) where ηt ∼ IID(0, 1) (see (iii) below), and σt is a predictable volatility process, known at time t − 1, satisfying the condition E(σt |rt−1 , rt−2 , · · · ) = σt . Note that ARCH and GARCH processes are special cases of (1.13). (iii) IID innovations : εt are independent and identically distributed, denoted as εt ∼ IID(0, σ 2 ). The assumption of IID innovations is the strongest. It implies that the innovations are martingale differences. On the other hand, if {εt } satisfies (1.12), it holds that

18

Chapter 1

Asset Returns

for any t > s, cov(εt , εs ) = E(εt εs ) = E{E(εt εs |εt−1 , εt−2 , · · · )} = E{εs E(εt |εt−1 , εt−2 , · · · )} = 0. Hence, {εt } is a white noise series. Therefore the relationship among the three types of innovations is as follows: IID ⇒ Martingale differences ⇒ White noise. The relationship is summarized in Figure 1.11.

Figure 1.11

Relationship among different processes: Stationary processes are the largest

set, followed by white noise, martingale difference (MD), and i.i.d. processes. There are many useful processes between stationary processes and white noise processes, to be detailed in Chapters 2 and 3.

The white noise assumption is widely observed in financial return data. It is consistent with the stylized features presented in Figures 1.7 and 1.8. It is implied by the EMH, as the existence of the non-zero correlation between εt+1 and its lagged values leads to an improvement on the prediction for rt+1 over the rational expectation μ. This violates the hypothesis that εt+1 is unpredictable at time t. To illustrate this point, suppose Corr(εt+1 , εs ) = ρ = 0 for some s  t. Then r t+1 = μ + ρ(rs − μ) is a legitimate predictor for rt+1 at time t. Under the EMH, the fair predictor for rt+1 at time t is rt+1 = μ. Then it is easy to see that E{( rt+1 − rt+1 )2 } = var(εt+1 ) = σ 2 ,

whereas E{( rt+1 − rt+1 )2 } = E{(ρεs − εt+1 )2 } = (1 − ρ2 )σ 2 < σ 2 . i.e. the mean squared predictive error of r t+1 is smaller. Hence the white noise assumption is appropriate and arguably necessary under the EMH. It states merely

1.3

Efficient markets hypothesis and statistical models for returns

19

that the asset returns cannot be predicted by any linear rules. However it says nothing beyond the first two moments and remains silent on the question whether the traded asset returns can be predicted by nonlinear rules or by other complicated strategies. On the other hand, the empirical evidence reported in Section 1.2 indicates that the IID assumption is too strong and too restrictive to be true in general. For example, the squared and the absolute returns of both S&P 500 index and the Apply stock exhibit significant serial correlations, indicating that r1 , r2 , · · · , therefore also ε1 , ε2 , · · · , are not independent with each other; see Figures 1.7 & 1.8. Note that rt = log(Pt /Pt−1 ). It follows from (1.11) that log Pt = μ + log Pt−1 + εt .

(1.14)

Hence under the assumption that the innovations εt are IID, the log prices log Pt , t = 1, 2, · · · form a random walk , and the prices Pt , t = 0, 1, 2, · · · , are a geometric random walk. Since the future is independent of the present and the past, the EMH holds in the most strict sense and nothing in the future can be predicted based on the available information up to the present. If we further assume εt to be normal, Pt follows a log normal distribution. Then the price process Pt , t = 0, 1, 2, · · · , is a log normal geometric random walk. As the length of time unit shrinks to zero, the number of periods goes to infinity and the appropriately normalized random walk log Pt converges to a Brownian motion, and the geometric random walk Pt converges to a geometric Brownian motion under which the celebrated Black-Scholes formula is derived. The concept that stock market prices evolve according to a random walk can be traced back at least to French mathematician Louis Bachelier in his PhD dissertation in 1900. A weaker form of random walks relaxes εt to be, for example, martingale differences. The martingale difference assumption offers a middle ground between white noise and IID. While retaining the white noise (i.e. the linear independence) property, it does not rule out the possibility of some nonlinear dependence, i.e. {rt } are uncorrelated but {rt2 } or {|rt |} may be dependent with each other. Under this assumption model (1.11) may accommodate conditional heteroscadasticity as in (1.13). In fact, many volatility models including ARCH, GARCH and stochastic volatility models are special cases of (1.11) and (1.13) with εt being martingale differences. The martingale difference assumption retains the hypothesis that the innovation εt+1 is unpredictable at time t at least as far as the point prediction is concerned. (Later we will learn that the interval predictions for εt+1 , or more precisely, the risks εt+1 may be better predicted by incorporating the information from its lagged values.) The best point predictor for rt+1 based on rt , rt−1 , · · · is the conditional

20

Chapter 1

Asset Returns

expectation rt+1 = E(rt+1 |rt , rt−1 , · · · ) = μ + E(εt+1 |εt , εt−1 , · · · ) = μ, which is the fair expectation of rt+1 under the EMH. The last equality in the above expression is guaranteed by (1.12). We call rt+1 as the best in the sense that it minimizes the mean squared predictive errors among all the point predictors based on rt , rt−1 , · · · . See Section 2.9.1 for additional details. In summary, the martingale hypothesis, which postulates model (1.11) with martingale difference sequence {εt }, assures that the returns of assets cannot be predicted by any rules, but allow volatility to be predictable. It is the most appropriate mathematical form of the efficient market hypothesis. 1.4

Tests related to efficient markets hypothesis

One fundamental question in financial econometrics is if the efficient markets hypothesis is consistent with empirical data. One way to verify this hypothesis is to test if returns are predictable. In the sequel we will introduce two statistical tests which address this issue from different angles. 1.4.1

Tests for white noise

From the discussion in the previous section, we have learned that if returns are unpredictable, they should be at least white noise. On the other hand, the assumption that returns are IID is obviously too strong. The autocorrelations in squared and absolute returns shown in Figures 1.7 & 1.8 clearly indicate that returns at different times are not independent of each other. In spite of a large body of statistical tests for IID (e.g. see the rank-based test of Hallin and Puri (1988), and also see section 2.2 of Campbell et al. (1997), we focus on testing white noise hypothesis, i.e. returns are linearly independent but may depend on each other in some nonlinear manners. The test for white noise is one of the oldest and the most important tests in statistics, as many testing problems in linear modelling may be transformed into a white noise test. There exist quite a few testing methods; see section 7.4 of Fan and Yao (2003) and the references within. We introduce below a simple and frequently used omnibus test, i.e. Ljung-Box portmanteau test . The linear dependence between rt and rt−k is comprehensively depicted by the correlation function between rt and rt−k : cov(rt , rt−k ) ρk ≡ Corr(rt , rt−k ) =

. var(rt )var(rt−k )

1.4

Tests related to efficient markets hypothesis

21

In fact if ρk = 0, rt and rt−k are linearly independent, and ρk = ±1 if and only if rt = a + brt−k for some constants a and b. When {rt } is a white noise sequence, ρk = 0 for all k = 0. In practice, we do not know ρk . Based on observed returns r1 , · · · , rT , we use the estimator ρk = γ k / γ0 instead, where γ k is defined in (1.8). The Ljung-Box Qm -statistic is defined as Qm = T (T + 2)

m  j=1

1 ρ2 , T −j j

(1.15)

where m  1 is a prescribed integer. Note that Qm is essentially a weighted sum of the squared sample ACF over the first m lags, though the weights are approximately the same when T m. Intuitively we reject the white noise hypothesis for large values of Qm . How large is large depends on the theoretical distribution of Qm under the null hypothesis, which turns out to be problematic; see below. In practice a chi-square approximation is used: For α ∈ (0, 1), let χ2α,m denote the top α-th percentile of the χ2 -distribution with m degrees of freedom. The Ljung-Box portmanteau test: Reject the hypothesis that {rt } is a white noise at the significance level α if Qm > χ2α,m or its P-value, computed as P (Q > Qm ) with Q ∼ χ2m , is smaller than α. In practice one needs to choose m in (1.15). Conventional wisdom suggests to use small m, as serial correlation is often at its strongest at small lags. This also avoids the large estimation errors in sample ACF at large lags and error accumulation issue in summation (1.15). However using too small m may miss the autocorrelations beyond the lag m. It is not uncommon in practice that the Ljung-Box test is performed simultaneously with different values of m. The R-function to perform the Ljung-Box test is Box.test(x, lag=m, type= "Ljung"), where x is a data vector. The command Box.test(x, lag=m, type= "Box") performs the Box-Pierce test with the statistic Q∗m . We apply the Ljung-Box test to the monthly log returns of the S&P 500 index displayed in the bottom panel of Figure 1.3. The sample size is T = 313. The testing results are shown in Table 1.1. We cannot reject the white noise hypothesis for the log return data, as the tests with m = 1, 6, 12 and 24 are all not significant. In contrast, the tests for the absolute returns are statistically significant with the P-values not greater than 0.3%, indicating that the absolute returns are not white noise. The tests for the squared returns are less clearly cut with the smallest P-value 0.019 for m = 1 and the largest P-value 0.492 for m = 24. Indeed, the monthly data exhibits much weaker correlation than daily. See also the three panels on the right in Figure 1.7.

22

Chapter 1 Table 1.1

Asset Returns

P-values based on the Ljung-Box test for the S&P 500 data

returns squared returns absolute returns

m Qm P-value Qm P-value Qm P-value

1 2.101 0.147 5.517 0.019 8.687 0.003

6 5.149 0.525 12.292 0.056 39.283 0.000

12 8.958 0.707 16.964 0.151 49.721 0.000

24 14.080 0.945 23.474 0.492 76.446 0.000

A different but related approach is to consider the normalized Q∗m -statistic: m 1  √ T ρ(j)2 − m . 2m j=1

The asymptotic normality of this statistic under the condition that m → ∞ and m/T → 0 has been established for IID data by Hong (1996), for martingale differences by Hong and Lee (2003) (see also Durlauf (1991) and Deo (2000)), and for other non-IID white noise processes by Shao (2011), and Xiao and Wu (2011). However, those convergences are typically slow or very slow, resulting in the size distortation of the tests based on the asymptotic normality. In addition, as pointed out above, when j is large, ρj tends to be small. Therefore, including those terms ρˆ2j adds noises to the test statistic without increasing signals. How to choose a relevant m adds a further complication in using this approach. Horowitz et al.(2006) proposed a double blockwise bootstrap method to perform the tests with the statistic Q∗m for non-IID white noise. 1.4.2

Remarks on the Ljung-Box test∗

The chi-square approximation for the null distribution of Ljung-Box test statistic is based on the fact that when {rt } is an IID sequence, ρ1 , · · · , ρm are asymptotically independent, and each of them has an asymptotic distribution N (0, 1/T ) (Theorem 2.8(iii) of Fan and Yao (2003)). Hence Q∗m

≡T

m 

ρ2j ∼ χ2m

approximately for large T.

j=1

Now, it is easy to see that Qm is approximately the same as Q∗m when T is large, since (T + 2)/(T − j) ≈ 1. Hence, it also follows χ2m -distribution under the null hypothesis. In fact Q∗m is the test statistic proposed by Box and Pierce (1970). However Ljung and Box (1978) subsequently discovered that the χ2 -approximation to the distribution of Q∗m is not always adequate even for T as large as 100. They suggest to use the statistic Qm instead as its distribution is closer to χ2m . See also Davies, Triggs and Newbold (1977).

1.4

Tests related to efficient markets hypothesis

23

A more fundamental problem in applying the Ljung-Box test is that the statistic itself is defined to detecting the departure from white noise, but the asymptotic χ2 -distribution can only be justified under the IID assumption. Therefore, as formulated above, it should not be used to test the hypothesis that the returns are white noise but not IID, as then the asymptotic null distributions of ρ(k) depend on the high moments of the underlying distribution of rt . These asymptotic null distributions may typically be too complicated to be directly useful in the sense that the asymptotic null distributions of Qm or Q∗m may then not be of the known forms for fixed m; see, e.g. Romano and Thombs (1996). Unfortunately this problem also applies to most (if not all) other omnibus white noise tests. One alternative is to impose an explicit assumption on the structure of white noise process (such as a GARCH structure), then some resampling methods may be employed to simulate the null distribution of Qm . Furthermore if one is also willing to impose some assumptions on the parametric form of a possible departure from white noise, a likelihood ratio test can be employed, which is often more powerful than a omnibus nonparametric test, as the latter tries to detect the departure (from white noise) to all different possibilities. The analogy is that an all-purpose tool is typically less powerful on a particular task than a customized tool. An example of the customized tool is the Dicky-Fuller test in the next section. However it itself is a challenge to find relevant assumptions. This is why omnibus tests such as the Ljung-Box test are often used in practice, in spite of their potential problems in mispecifying significance levels. 1.4.3

Tests for random walks

Another way to test the EMH is to look at the random walk model (1.14) for log prices Xt ≡ log Pt . In general we may impose an autoregressive model for the log prices: Xt = μ + αXt−1 + εt .

(1.16)

To test the validity of model (1.14) is equivalent to testing the hypothesis H0 : α = 1 in the above model. This is a special case of the unit-root test which we will revisit again later. We introduce here the Dickey-Fuller test which in fact deals with three different cases: (i) the model (1.16) with a drift μ, (ii) the model without drift Xt = αXt−1 + εt ,

(1.17)

and (iii) the model with both drift and a linear trend Xt = μ + βt + αXt−1 + εt .

(1.18)

24

Chapter 1

Asset Returns

Based on observations X1 , · · · , XT , let α  be the least squares estimator for α, and SE( α) be the standard error of α . These can easily be obtained from any leastsquares package. Then the Dickey-Fuller statistic is defined as W = ( α − 1)/SE( α).

(1.19)

We reject H0 : α = 1 if W is smaller than a critical value determined by the significance level of the test and the distribution of W under H0 . The intuition behind this one-sided test may be understood as follows. This random walk test is only relevant when the evidence for α < 1 is overwhelming. Then we reject H0 : α = 1 only if the statistical evidence is in favor of H1 : α < 1. The hypothesis H1 implies that Xt is a stationary and causal process (see section 2.2.2 below) for models (1.16) and (1.17), and, furthermore, the changes {Xt − Xt−1 } is an autocorrelated process. In the context of model (1.14), this implies that the returns rt = log Pt − log Pt−1 are auto-correlated and, therefore, are not white noise. When α > 1, the process Xt is explosive, which implies rt = μ + γ log Pt−1 + εt for some positive constant γ. The latter equation has little bearing in modelling real financial prices except that it can be used as a tool for modeling financial bubbles; see Phillips and Yu (2011). The least squares estimators α  may easily be evaluated explicitly. For example, under (1.16), the least-squares estimate is α =

T T

  ¯ T )(Xt−1 − X ¯ T −1 ) ¯ T −1 )2 , (Xt − X (Xt−1 − X t=2

t=2

where

T

¯T = X

1  Xt , T − 1 t=2

T

¯ T −1 = X

1  Xt−1 . T − 1 t=2

Furthermore let μ  be the least squares estimator for μ in (1.16). Then

 1  ¯ T −1 )2 . SE( α) = (Xt − μ −α Xt−1 )2 (Xt−1 − X T − 3 t=2 t=2 T

2

T

Under model (1.17), T 

α =

T 

Xt Xt−1

t=2 T  t=2

, 2 Xt−1

SE( α) =

(Xt − α Xt−1 )2

t=2

(T − 2)

T 

. 2 Xt−1

t=2

There also exists the Dickey-Fuller coefficient test, which is based on the test statistic T (ˆ α − 1). The asymptotic null distributions are complicated, but can be tabulated.

1.4

Tests related to efficient markets hypothesis

25

At significant level α = 0.05, the critical values are −8.347 and −13.96 respectively for testing model (1.17) (without drift) and model (1.16) (with drift). Although the Dickey-Fuller statistic is of the form of a t-statistic (see (1.19)), t-distributions cannot be used for this test as all the three models under H0 are nonstationary (see section 2.1 below). In fact the Dickey-Fuller test statistic admits certain non-standard asymptotic null distributions and those distributions under models (1.16) – (1.18) are different from each other. Fortunately the quantiles or critical values of those distributions have been tabulated in many places; see, e.g. Fuller (1996). Table 1.2 lists the most frequently used critical values, evaluated by simulation with the sample size T = 100. Larger sample sizes will result in critical values that are slightly smaller in absolute value and smaller sample sizes will result in somewhat larger critical values. Table 1.2

The critical values of the (augmented) Dickey-Fuller test Model

(1.17) or (2.66): no drift, no trend (1.16) or (2.67): drift, no trend (1.18) or (2.68): drift & trend

10% −1.61 −2.25 −3.15

Significance level 5% −1.95 −2.89 −3.45

1% −2.60 −3.51 −4.04

The R-code “aDF.test.r” defines a function aDF.test which implements the (augmented) Dickey-Fuller test: aDF.test(x, kind=i, k=0), where x is a data vector, and i should be set at 2 for model (1.16), 1 for model (1.17), and 3 for model (1.18). We now apply the Dickey-Fuller test to the log daily, weekly, and monthly prices displayed in Figure 1.3. Since the returns (i.e. the differenced log prices) fluctuate around 0 and show no linear trend, we tend to carry out the test based on either model (1.16) or model (1.17). But for the illustration purpose, we also report the tests based on model (1.18). The P-values of the test with the three models for the daily, weekly, and monthly prices are listed below. Model used daily weekly monthly

(1.17) > 0.9 > 0.9 > 0.9

(1.16) 0.392 0.336 0.413

(1.18) 0.646 0.698 0.791

Since none of those tests are statistically significant, we cannot reject the hypothesis that the log prices for the S&P 500 are random walk. This applies to daily, weekly, and also monthly data. We also repeat the above exercise for the daily, weekly and monthly returns (i.e. the differenced log prices), obtaining the P-values smaller than 0.01 for all the cases. This shows that the returns are not random walks across difference frequencies.

26

Chapter 1

Asset Returns

The Dickey-Fuller test was originally proposed in Dickey and Fuller (1979). It has been further adapted in handling the situations when there are some autoregressive terms in models(1.16) – (1.18); see section 2.8.2 below. 1.4.4

Ljung-Box test and Dickey-Fuller test

Both Ljung-Box and Dickey-Fuller tests can be used to validate some aspects of efficient markets hypothesis. First of all, the input of Ljung-Box is the returns of assets, whereas the Dickey-Fuller test utilizes the log-prices. Secondly, the alternative hypothesis of Ljung-Box is nonparametric, which merely requires the stationary correlated processes, whereas the Dickey-Fuller test is designed to test against the parametric alternative hypothesis, which is a stationary AR(1) process. Thus, the Ljung-Box test is more omnibus whereas the Dickey-Fuller is more specific. Putting both in terms of asset returns, the null hypotheses of both problems are the same: returns behaves like uncorrelated white noise. However, the alternatives of the Ljung-Box is larger, as shown in Figure 1.12.

Figure 1.12

In terms of returns, the null hypotheses of both Ljung-Box and Dickey-Fuller

tests are the same. However, the alternative of Ljung-Box is larger.

1.5 1.5.1

Appendix: Q-Q plot and Jarque-Bera test Q-Q plot

A Q-Q plot is a graphical method for comparing two probability distribution functions based on their quantiles. It is particularly effective to reveal the differences in the tail-heaviness of the two distributions. For any probability distribution function F and α ∈ (0, 1), the α-th quantile of F is defined as F −1 (α) = max{x : F (x)  α}. (1.20) For any two probability distribution functions F and G, the quantile-quantile plot, or simply the Q-Q plot, of F and Q is a curve on a two-dimensional plane obtained by plotting F −1 (α) against G−1 (α) for 0 < α < 1.

1.5

Appendix: Q-Q plot and Jarque-Bera test

27

It can be shown that if one distribution is a location-scale transformation of the other, i.e.   x−μ −1 F (x) = σ G σ for some constant μ and σ > 0, their Q-Q plot is a straightline. (In fact the inverse is also true.) Hence it is useful to draw a straight line passing through the two inter-quarters of the quantiles to highlight the differences in the tails of the two distributions. We illustrate the usefulness of a Q-Q plot by an example using the daily S&P 500 returns. The lower-left panel in Figure 1.5 is the Q-Q plot of F and G, where F is the standard normal distribution and G is the empirical distribution of the daily returns of S&P 500 index. It gives basically the scatter plot of pairs   i − 0.5   F −1 , x(i) , i = 1, · · · , n, n where x(i) is the ith smallest value of the data {xi }ni=1 , representing the empirical i/n-quantile, and F −1 ((i − 0.5)/n) is its corresponding theoretical quantile modulus a location-scale transform. We do not use F −1 (i/n) as its theoretical quantile to avoid F −1 (i/n) = ∞ for i = n. Different software has slightly different modifications from what is presented above, but the key idea of comparing the empirical quantiles with those of their referenced distribution remains the same. The blue straight line marks the position if the two distribution are identical under a location-scale transformation. The points on the left in the graph are the lower quantiles, corresponding to α close to 0 (see (1.20) above). Since those points are below the blue line, the lower quantiles of G (empirical quantiles) are smaller (i.e. negatively larger) than their expected values if G is a location-scale transformation of F . This means that the left tail of G is heavier than that of F , namely the daily returns of S&P 500 index have a heavier left tail than the normal distribution. Similarly, it can be concluded that the daily returns of S&P 500 index have a heavier right tail than the normal distribution. However, the daily returns of S&P 500 index have lighter tails than the t-distribution with degree of freedom 3, as shown in Figure 1.9. Q-Q plots may be produced using the R-functions qqplot, qqline or qqnorm. 1.5.2

Jarque-Bera test

Q-Q plots check normality assumptions by informal graphical inspection. They are particularly powerful in revealing tail behavior of data. Formal tests can also be constructed. For example, one can employ the Kolmogorov-Smirnov test for testing normality. A popular test for normality is the Jarque-Bera test, which is defined as n JB = [S 2 + (K − 3)2 /4] 6

28

Chapter 1

Asset Returns

where for a given sequence of data {xi }ni=1 , n 

S=

(xi − x¯)3 /n

μ ˆ3 =  i=1 3/2 3 n σ ˆ  (xi − x ¯)2 /n i=1

is the sample skewness and n 

K=

(xi − x ¯)4 /n

μ ˆ4 =  i=1 2 n σ ˆ4  2 (xi − x ¯) /n i=1

is the sample kurtosis. Therefore, the JB-statistic validates really only the skewness and kurtosis of normal distributions. Under the null hypothesis that the data are drawn independently from a normal distribution, the asymptotic null distribution of the JB-statistic follows approximately χ22 -distribution. Therefore, the P-value can easily be computed by using χ22 -distribution. 1.6

Further reading and software implementation

The books that have strong impact on our writing are Fan and Yao (2003) and Campbell et al. (1997). The former emphasizes advanced theory and methods on nonlinear time series and has influenced our writing on the time series aspect. The latter emphasizes on the economic interpretation of econometric results of financial markets and has shaped our writing on the cross-sectional aspect of the book. Since preparing the first draft of the lecture notes on Financial Econometrics taught at Princeton University in 2004, a number of books on the subject have been published. For an introduction to financial statistics, see Ruppert (2004, 2010), Carmona (2004, 2013), and Franke, et. al (2015). Tsay (2010, 2013) provide an excellent and comprehensive account on the analysis of financial time series. Gourieroux and Jasiak (2001) provide an excellent introduction to financial econometrics for those who are already familiar with econometric theory. For the financial econometrics with emphasis on investments, see Rachev et al. (2013). Most of the computation in this book was carried out using the software package R, which is free and publicly available. See Section 2.10 for an introduction and installation. The books by Ruppert (2010), Carmona (2013) and Tsay (2013) are also implemented in R.

1.7

Exercises

29

It is our hope that readers will be stimulated to use the methods described in this book for their own applications and research. Our aim is to provide information in sufficient detail so that readers can produce their own implementations. This will be a valuable exercise for students and readers who are new to the area. To assist this endeavor, we have placed all of the data sets and codes used in this book on the following web site: http://orfe.princeton.edu/~jqfan/fan/FinEcon.html 1.7

Exercises

1.1 Download the daily, weekly and monthly prices for the Nasdaq index and the IBM stock from Yahoo!Finance. Reproduce Figures 1.3 – 1.8 using the Nasdaq index and the IBM stock data instead. 1.2 Consider a path dependent payoff function Yt = a1 rt+1 + · · · + ak rt+k where {ai }ki=1 are given weights. If the return time series is weak stationary in the sense that cov(rt , rt+j ) = γ(j). Show that var(Yt ) =

k  k 

ai aj γi−j .

i=1 j=1

A natural estimate of this variance is the following substitution estimator: var(Y ˆ t) =

k  k 

ai aj γˆi−j ,

i=1 j=1

ˆ t )  0. where γˆi−j is defined by (1.8). Show that var(Y 1.3 Consider the following quote from Eugene Fama who was Myron Scholes’ thesis adviser: If the population of prices changes is strictly normal, on the average for any stock · · · an observation more than five standard deviations from the mean should be observed about once every 7000 years. In fact such observations seem to occur about once every three or four years. (See Lowenstein, 2001, page 71.) For X ∼ N (μ, σ 2 ), P (|X − μ| > 5σ) = 5.733 × 10−7 , deduce how many observations per year Fama was implicitly assuming to be made. If a year is defined as 252 trading days and daily returns are normal, how many years is it expected to take to get a 5 standard deviation event? How does the answer to the last question change when the daily returns follow the t-distribution with 4 degrees of freedom. 1.4 Is the (marginal) distribution of log-returns over a long time horizon (e.g. monthly or quarterly) close to normal? Explain briefly. 1.5 Generate a random sample of size 1000 from the t-distribution with ν degrees of freedom and another random sample of size 1000 from the standard normal distribution. Apply the Kolmogorov-Smirnov test to check if they come from the same distribution. Report the results for ν = 5, 10, 15 and 20.

30

Chapter 1

Asset Returns

1.6 Report the P-values for appying the Jarque-Bera test to the data given in Exercise 1.1. What can you conclude based on these P-values? 1.7 Generate a random sample of size 100 from the t-distribution with ν degrees of freedom for ν = 5, 15 and ∞ (i.e. normal distribution). Apply the Jarque-Bera test to check the normality and report the P-values. 1.8 According to the efficient market hypothesis, is the return of a portfolio predictable? Is the volatility of a portfolio predictable? State the most appropriate mathematical form of the efficient market hypothesis. 1.9 If the Ljung-Box test is employed to test the efficient market hypothesis, what null hypothesis is to be tested? If the autocorrelation for the first 4 lags of the monthly log-returns of the S&P 500 is ρˆ(1) = 0.2, ρˆ(2) = −0.15, ρˆ(3) = 0.25, ρˆ(4) = 0.12 based on past 5 years data, is the efficient market hypothesis reasonable? 1.10 Generate 400 time series from the independent Gaussian white noise {rt }Tt=1 with T = 100. Compute √ Z = T ρˆ(1), Qm , Q∗m for m = 3, 6, and 12. Plot the histograms of Z, Q3 , Q∗3 and Q6 and compare it with their asymptotic distributions. Report the first, fifth and tenth percentiles of the statistic |Z1 |, Q3 , Q∗3 , Q6 , Q∗6 , Q12 and Q∗12 , among 400 simulations and compare them with their theoretical (asymptotic) percentiles. 1.11 Repeat the experiment in Exercise 1.10 when T = 400 and rt is generated from the t-distribution with degree of freedom 5. 1.12 What is the alternative hypothesis of the Dickey-Fuller test for the random walk? Suppose that based on last 120 quarterly data on the US GDP, it was computed that α ˆ = 0.95. Does the US GDP follow a random walk with a drift? Answer the question at 5% significance level (the critical value is -13.96) using the Dickey-Fuller coefficient test for the model with drift. 1.13 (Implication of martingale hypothesis). Let St be the price of an asset at time t. One version of the EMH assumes that the prices of any asset form a martingale process in the sense that E(St+1 |St , St−1 , · · · ) = St , for all t. To understand the implication of this assumption, we consider the following simple investment strategy. With initial capital C0 dollars, at the time t we hold αt dollars in cash and βt shares of an asset at the price St . Hence the value of our investment at time t is Ct = αt + βt St . Suppose that our investment is self-financing in the sense that Ct+1 = αt + βt St+1 = αt+1 + βt+1 St+1 , and our investment strategy (αt+1 , βt+1 ) is entirely determined by the asset prices up to the time t. Show that if {St } is a martingale process, there exist no strategies such that Ct+1 > Ct with probability 1.

Chapter 2 Linear Time Series Models Data obtained from observations collected sequentially over time are common in this information age. For example, we have collections on daily stock prices, weekly interest rates, monthly sales figures, quarterly consumer price indices (CPI) and annual gross domestic product (GDP) figures. Those data collected over time are called time series. The purpose of analyzing time series data is in general two-fold: to understand the stochastic mechanism that generates the data, and to predict or forecast the future values of a time series. This chapter introduces a class of linear time series models, or more precisely, a class of models which depict the linear features (including linear dependence) of time series. Those linear models and associated inference techniques provide the basic framework for the study of the linear dynamic structure of financial time series and for forecasting future values based on linear dependence structures. 2.1

Stationarity

One of the important aspects of time series analysis is to use the data collected in the past to forecast the future. How can historical data be useful for forecasting a future event? This is through the assumption of stationarity which refers to some time-invariance properties of the underlying process. For example, we may assume that the correlation between the returns of tomorrow and today is the same as those between any two successive days in the past. This enables us to aggregate the information from the data in the past to learn about the correlation. This correlation invariance over time is a typical characteristic of the so called weak stationarity or covariance stationarity . It facilitates linear prediction which is essentially based on the correlation between a predicated variable and its predictor (such as in linear regression). A stronger time-invariant assumption is that the joint distribution of the returns in a week in the future is the same as that in any weeks in the past. In other words, prediction is always based on some invariance properties over time, although the invariance may refer to some characteristics of the probability distribution of the process, or to the law governing the change of the distribution. We introduce the concept of stationarity more formally below.

32

Chapter 2

Linear Time Series Models

A sequence of random variables {Xt , t = 0, ±1, ±2, · · · } is called a stochastic process and is served as a model for a set of observed time series data. It is convenient to refer to {Xt } itself as a time series. It is known that the complete probability structure of {Xt } is determined by the set of all the finite-dimensional distributions of {Xt }. Fortunately most linear features concerned depend on the first two moments of {Xt }, which are the main objects depicted in linear time series models. Of course if {Xt } is a Gaussian process in the sense that all its finite-dimensional distributions are normal, the first two moments then determine the the probability structure of {Xt } completely and {Xt } is a linear process. A time series {Xt } is said to be weakly stationary (or second order stationary or covariance stationary ) if E(Xt2 ) < ∞ and both EXt and cov(Xt , Xt+k ), for any integer k, do not depend on t. For weakly stationary time series {Xt }, let μ = EXt denote its common mean. We define the autocovariance function (ACVF) as γ(k) = cov(Xt , Xt+k ) = E{(Xt − μ)(Xt+k − μ)},

(2.1)

and the autocorrelation function (ACF) as  ρ(k) = Corr(Xt , Xt+k ) = γ(k) γ(0)

(2.2)

for k = 0, ±1, ±2, · · · . Note that γ(0) = var(Xt ) is independent of t. For simplicity, we drop the adverb “weakly” and call {Xt } stationary if it is weakly stationary, i.e. {Xt } has finite and time-invariant first two moments. It is easy to see that ρ(0) = 1 and ρ(k) = ρ(−k) for any stationary processes, and that the variance-covariance matrix of the vector (Xt , · · · , Xt+k ) is ⎛ γ(0) γ(1) γ(2) · · · γ(k − 1) ⎞ ⎜ ⎜ var(Xt , · · · , Xt+k ) = ⎜ ⎜ ⎝

γ(1) .. .

γ(0) .. .

γ(1) .. .

···

γ(k − 2) γ(k − 3) γ(k − 4) · · · γ(k − 1) γ(k − 2) γ(k − 3) · · ·

γ(k − 2) ⎟ ⎟ .. ⎟. . ⎟ ⎠ γ(1) γ(0)

Therefore, for any linear combinations, k k k    ai Xt+i ) = ai aj cov(Xt+i , Xt+k ) var( i=1

i=1 j=1

=

k  k  i=1 j=1

ai aj γ(i − j)  0.

(2.3)

2.2

Stationary ARMA models

33

As such, the function γ(·) is referred to as the semi-positive definite function. A very specific class of processes that plays a similar role to zero in the number theory is the white noise. When ρ(k) = 0 for any k = 0, {Xt } is called a white noise, and is denoted by Xt ∼ WN(μ, σ2 ), where σ 2 = γ(0) = var(Xt ). In other words, a white noise is a sequence of uncorrelated random variables with the same mean and the same variance. White noise processes are building blocks for constructing general stationary processes. In practice we use an observed sample X1 , · · · , XT to estimate ACVF and ACF by the sample ACVF and sample ACF . They are basically the sample covariance and sample correlation of the lagged pairs {(Xt−k , Xk )}Tt=k+1 . Formally, they are defined as follows γ (k) = 

T 1  ¯ ¯ (Xt − X)(X t−k − X), T

 ρ(k) = γ (k) γ (0),

(2.4)

t=k+1

 ¯ = T −1 (k), we use the divisor T instead where X 1tT Xt . In the estimator γ of T − k. This is a common practice adopted by almost all statistical packages. It ensures that the function γˆ (·) is semi-positive definite (Exercise 2.2), a property given by (2.3). See Fan and Yao (2003) pp.42 for further discussion on this choice. Weak stationarity is indeed a very weak notion of stationarity. For example, if {Xt } is weakly stationary, it does not imply that {Xt2 } is weakly stationary. Yet, the latter time series has very strong connections with the volatility of financial returns. Therefore, we need a stronger version of stationarity as follows. A time series {Xt , t = 0, ±1, ±2, · · · } is said to be strongly stationary or strictly stationary if the k-dimensional distribution of (X1 , · · · , Xk ) is the same as that of (Xt+1 , · · · , Xt+k ) for any k  1 and t. This assumption is needed in the context of nonlinear prediction. Obviously the strict stationarity implies the (weak) stationarity provided E(Xt2 ) < ∞. In addition, the strong stationarity of {Xt , t = 0, ±1, ±2, · · · } implies that of the time series {g(Xt ), t = 0, ±1, ±2, · · · } is also strongly stationary for any function g. 2.2

Stationary ARMA models

One of the most frequently used time series models is the stationary autoregressive moving average (ARMA) model. It is frequently used in modeling the dynamics on returns of financial assets and other time series.

34

2.2.1

Chapter 2

Linear Time Series Models

Moving average processes

Perhaps the simplest stationary time series are moving average (MA) processes. They also facilitate the computation of the autocovariance function easily. A simple example of this is the k-period return (1.6). Let εt ∼ WN(0, σ 2 ). For a fixed integer q  1, we write Xt ∼ MA(q) if Xt is defined as a moving average of q successive εt as follows: Xt = μ + εt + a1 εt−1 + · · · + aq εt−q ,

(2.5)

where μ, a1 , · · · , aq are constant coefficients. In the above definition, εt stands for the innovation at time t, and the innovations εt , εt−1 , · · · are unobservable. The intuition behind the moving average equation (2.5) may be understood as follows: innovation εt stands for the shock to the market at time t and Xt is the impact on the return from the innovations up to time t. The coefficient ak is regarded as a “discount” factor on the k-lagged innovation εt−k . For example, for ak = bk and |b| < 1, the impact of εt fades away exponentially over time. In fact Xt defined by (2.5) is always stationary with EXt = μ, as the coefficients a1 , · · · , aq do not vary over time. We first use a simple example to illustrate how to calculate ACVF and ACF for MA processes. Example 2.1

For MA(1) model Xt = μ + εt + aεt−1 , it holds

γ(0) = var(Xt ) = var(εt ) + var(aεt−1 ) + 2cov(εt , aεt−1 ) = (1 + a2 )σ 2 . Similarly, γ(1) = cov(Xt , Xt−1 ) = cov(εt + aεt−1 , εt−1 + aεt−2 ) = aσ 2 , since there is only one common term, εt−1 in both Xt and Xt−1 . Now for the ACVF of lag two, we have Xt = μ + εt + aεt−1 , which does not share the same set of innovations {εt } as Xt−2 = μ + εt−2 + aεt−3 . Therefore γ(2) = 0. Similarly, γ(k) = 0 for any k  2. Consequently, ρ(1) = a/(1 + a2 ),

ρ(k) = 0 for any |k| > 1.

Since 2|a| < 1 + a2 , |ρ(1)|  0.5 for any MA(1) process.

(2.6)

2.2

Stationary ARMA models

35

The above calculation can easily be extended to more general cases. For an MA(g) process.defined as (2.5), var(Af) = E {(X t —/x)2 } = F?{(st + a-i&t-i + ■■• + a g £t_ g )2 } = p, the best fit to (2.28) is obviously (see Exercise 2.6) 3 bfcp — bp , bktp-y].

^/cO —c, bkj — 5i j *

—0, ■• • , bkk — 0*

Hence, t t (&) = 0. For a stationary AR(p) process, the PACF cuts off at p, i.e. iv(k) = 0 for any k > p. The sample PACF is defined by least squares estimation: t t (A:) = bkk, where (bki, ,bkk) minimizes the sum T

-------- bk X t _ k )2 .

t=k+l

This is clearly a sample version of (2.28). However, the sample PACF will not admit exactly zero cut-off. For a stationary AR(p) process defined by (2.15) with Et ~ IID(0,cr2 ) and E( e ^) < oo, it holds that as T —>oo, a/ T t t ^

)

AT(0,1),

for any k > p.

(2.29)

See Proposition 3.1 of Fan and Yao (2003). Figure 2.2 displays the time series plots of four stationary AR series (with length T = 100) together with their sample PACF plots. For each PACF plot, we also superimpose the confidence bounds il.9 6 /V ^ - By (2.29), we expect that for an AR(p) process, t t (A:) for most k > p are within those two bounds. Figure 2.2 shows that this criterion is adhered for all the four time series considered here. For the monthly returns of the Center for Research in Security E xam ple 2.5 Prices (CRSP) value-weighted index in January 1926 - December 1997(T = 864), it can be computed that Lag PACF

1 0.11

2 -0.02

3 -0.12

4 0.04

5 0.07

6 -0.06

7 0.02

8 0.06

9 0.06

10 -0.01

Since the confidence bounds are 1.96/V5" = 0.067, we may select the order 5, since the |t f(A:)| < 0.066 for k > 5. Note that jt (5) is slightly larger than 0.66, we may also choose the order p = 3, resulting in a more parsimonious model.

C o pyrC ghaaLLU ^Iusho^-ifiei^: S /iS S S ®

Chapter 2

44

AR(1) with a=0.7

AR(1) with a= 0.7

I 0

4 | 20

| 40

I 60

I 80

II 100

time

0

20

Linear Time Series Models

- 0 . 2 ----------h ---------H - ---------1------------110 15 5 20 Lag

time

Lag

AR(4) process

AR(4) process

40

60

80

100

5

10

15

20

time

Figure 2.2 Time series plot and sample PACF plot of (from top to bottom) 4 autoregressive processes. AR(1): X t = 0.7Xt - i + et, AR(1): X t = -0.7X t - i + e t , AR(2): X t = 0.26 + 0.5Xt-1 + 0.24At—2 + £t, and AR(4): X t = 0.5Xt-i + 0.24Xt-2 + 0.2Xt-3 —0.8X4-4 + where et are independent 7V(0,1) random variables. The horizontal line in each time series plot marks the position of the mean E X t . The two dashed lines in each PACF plot are the confidence bounds ±1.96/\/T .

2.2

2.2.3

Stationary ARMA models

45

A utoregressive and moving average processes

Combing AR and MA together, a general autoregressive and moving average (ARMA) model with the order (p, q) admits the form X t = c 4- bi X t - i + • • • + bpXt_p 4- Et + ^iEt-1 4- • •• 4- aq Et-q,

(2.30)

where Et ~ WN(0, a 2 ), c,bi, - ■■, bp , ai, • • ■,a q are parameters. Let a(x) = 1 4- aix H---- aq x q

and b(x) = 1 —b ix --------- bp x p .

Then, model (2.30) can be more compactly written as b(B)X t = a(B)Et . To ensure that (p, q) are the genuine orders of the model, we assume that the two equations a(x) = 0 and b(x) — 0 do not have common roots. Otherwise, we can cancel the common factors from the above equation. Based on the discussions in section 2.2.1 and section 2.2.2, the following assertions hold. S tatio n arity When the p roots of the AR characteristic equation b(x) — 0 are outside of the unit cycle, equation (2.30) defines a stationary process {At, t = 0 ,± l,± 2 , • • which admits an MA(oo) representation (see also (2.22) above.) X t = &(B)“ l a(B)e t ~ MA(oo), i.e. such a stationary solution is also causal. Furthermore E X t = c /(l —Zq------- bp \ and the ACF p(&) —> 0 at an exponential rate as k —> oo. In other words, ARMA models are suitable for short memory processes. Yule-W alker equation For the stationary process defined by (2.30), by multi­ plying (X t-k — p) on both sides and then taking the expectation with the use of 7(fc) = E X t(X t-k - p), we have 7(fc) = Z>i7(/c —1) 4------ 1- bp^k —p),

for any k > q.

Here, the condition k > q is imposed so that X t-k is even more past than £*,-•■ , Et- q so that they are uncorrelated. Dividing both sides by 7(0), we have a similar equation for the ACF: p(k) = bip(k —1) 4------ \-bp p ( k - p ) y

for any k > q.

(2.31)

From this equation, it can also be shown that the autocovariance function decays exponentially fast, p(k) = O(ak ) where a is the reciprocal of the minimum modulus of the root of the characteristic function. See Exercise 2.5.

Chapter 2

46

Linear Time Series Models

In vertibility When the q roots of the MA characteristic equation a(x) = 0 are outside of the unit cycle, the process {W} defined by (2.30) is invertible in the sense that X t AR(oo). Figure 2.3 presents the time plots for five stationary time series (with the length T = 100 for each of them). The top panel is for normal white noise X t ^ad N ( l ,1.81). The other four are taken from the following ARMA(1,1) model X t ~ M= b(X t —fj,) +

4-

Et

W , 1),

with (b, a) = (0, 0.9) for MA(1), (6, a) = (0.669, 0) for AR(1), (5, a) = (—0.669, 0) for another AR(1), and (6, a) = (0.5, 0.279) for ARMA(1,1). We always set mean /j, = E (X t) — 1. The values of a and b are specified such that the marginal distribu­ tions of X t under the five different settings are the same. We add a horizontal line at 1 in each panel to mark the mean position of the series. Normal white noise 4 2

- • kAM ♦a .

i

0 -2

1__

__ L 20

0 4 2 0

Ji ‘

40 60 MA(1) with a=0.9

80

a /L A -yv *“ WIx “ '

-2 - 1

1

0

20

/W > 0 -

./> _ A

___ 1___ *_________1_____4_______ 1__

60 40 AR(1) with b= 0.669

ft /*ft X .y n t

/■

, r. T

20

L_____________ 1 _____________

40 60 AR(1) with b ——0.669

IpAM M?t& I »x/«

0

100

U

2 L I ift

2L i.

80

fl ft A.

4 L_l__ !L___________ 1________________1_______________ 0

100

U

,

20

40

80

L

100

J d M .»K« I W I. ____________1_____________ 1__ 60

80

100

2.2

Stationary ARMA models

47

ARMA(1,1) with 6= 0.5 and a = 0.279

0

20

40 '

60

80

100

Figure 2.3 Plots of X t against t for five stationary time series with the same marginal distribution Xt ~ N(l, 1.81).

The top panel is patternless as the points fluctuate randomly around the mean. In contrast the two successive points of MA(l) process in the second panel is positively correlated as p(l) = 0.9/(l + 0.92 ) = 0.497, and there is the tendency that X t and X j-i stay on the same side of the mean. This tendency is much more pronounced for the AR(1) process with b = 0.669 in the 3rd panel and the ARMA(1, 1) process in the bottom panel. In both cases X t is effectively an MA(oo) process with positive coefficients, and it tends to take long excursions before it returns to the mean — a clear indication of serial dependence in the data. The 4th panel shows a different pattern of AR(1): since b = —0.669, X t oscillates around the mean in such a way that X t and X t - i tend to take opposite sites of the mean. If the evidence for the serial dependence in time plots is subtle, such evidence is much more apparent with both ACF and PACF plots. Figure 2.4 presents the sam­ ple ACF and sample PACF based on five time series data sets plotted in Figure 2.3. An ACF plot plots p(k) against k with two added varying confidence bounds at

±—

Vt

fc-l x 1/2 V p O )2 .

fc = l , 2 , - -

(2.32)

For each k, p(k) should be sandwiched by the above two bounds with approximate probability 0.95 if X t ~ MA(£ —1); see (2.10). Hence if indeed X t ~ MA(g), we expect that most p(&) are between the above two bounds among 95% of k > q. Figure 2.4 shows that this is indeed the case. For MA(1), p(&) is beyond the bounds only at lag k = 1; showing off the property that the ACF for MA(1) cuts off at lag 1. The sample ACFs for the two AR(1) and the ARMA(1,1) processes delay exponentially. A PACF plot plots 7r(fc) against k with the two added constant confidence bounds at ±1.96/V T . When X t ~ AR(p), we expect that most ?r(fc) are sandwiched by those two bounds 95% of the time among all k > p; see (2.29). Figure 2.4 shows that for two AR(1) processes, t t (1) is beyond the bounds; indicating the property that the PACF for AR(1) cuts off at lag 1.

Linear Time Series Models

Chapter 2

48 ACF for white noise

PACF for white noise 0.2

0.2

0.0 -0 .2 k 15 5 10 ACF for MA(1) with a= 0 .9

20

10 5 15 PACF for MA(1) with a= 0 .9

20

-

0.2

0.2 Ii

0.2

.1



. . 1 1 1 11 1

II

1

111

'5 10 15 ACF for AR(1) with b = 0.669

1 1 . .

-0 .2

1

1



■ ■

________1________L _____ 1________1

20

15 10 5 20 PACF for AR(1) with b = 0.669

0.6

0.6

0.2

0.2

-0 .2

. I l l i r

-0 .2 10 5 15 ACF for AR(1) with b = —0.669

0.4

20

5 10 20 15 PACF for AR(1) with b = —0.669 0.2

-

i L-~‘~

0.0

I

1

-0.2



-0 .6

-

i

i

i

5 10 15 ACF for ARMA(Ll)

Figure 2.4

-0 .6

i

Z--------------1__________1__________1__________L_

20

10 5 15 PACF for ARMA(1,1)

20

Sample ACF and PACF for five stationary time series plotted in Figure 2.3.

Two dashed curves in each ACF plot are the confidence bounds (2.32), and in each PACF plot are the confidence bounds ±1.96/-\/T. E x a m p le 2.6 Let us compute the autocovariance function fo r the A7?AL4(1,1) model. For k > 1, we have 7(fc) = 6i 7 (fc - 1) = ■• • = which decays exponentially. To determine the value 7(0) and 7(1), we use the fact that the present noise and the past data are uncorrelated to obtain 7(1) — Cov(At, X f - i) = C o v (b iX t-i +

= &i7(0) + a ia 2 , C opyr^M S>LUM uihu^--^A^-.

2.2

Stationary ARMA models

49

noticing X t- i contains a term £t-i-

similar calculation yields

7(0) = var(Xt) = &?7(0) + (1 + a fyj 2 4- S ailer 2 . Solving the above two equations yield 7(0) =

1+ °?+JaiV

7(1) =

+g +

(2 33)

and 7(Aj) = 7(1)61—1 , for k

1.

(2.34)

To gain further insight on ARMA models, let us express ARAf4(l,l) model in terms of AR(oo) model. Assuming |a| < 1, we have

E x am p le 2.7

a(B) = (1 - a S )" 1 = 1 + aB 4- a2 B 2 + • • ■. Using this and A7?AL4(1,1) equation (2.30), we have (1 + aB + a 2 B 2 + • • • )(X f -

- c) = et .

In other words, X t 4- a X t-i 4- a2 X t-2 4------- ^i(^/.-i 4- aXt-2 4----- ) —------- = et1 —a Simplifying the above equation, we have Xt = ------- F (bi —a ^ X t- i 4- aXt-2 4- a2 X t-3 4---- ) 4X

Ct

In other words, the present value is the weighted average of the past values, with weights that decay exponentially. This kind of weighted average is referred to as the exponential smoothing. Stationary, Causality and Stability* . In this book, we call an ARMA process, defined by (2.30), stationary if the roots of the corresponding AR characteristic equation b(x) — 0 are outside of the unit circle. In this case, the process is MA(00), i.e. X t is a linear combination of £t,£t-h • • • • Such a process is called causal as X t is caused by the innovations at time t and its lagged values. In fact as long as no roots of equation b(x) = 0 are on the unit cycle, there exists a unique sta­ tionary solution {Xf, t = 0, ± 1,±2, • • •} from model (2.30) (see §2.1.2 of Fan and Yao (2003)), although we deal with causal stationary processes only in this book. The processes with unit roots (such as random walks and ARIMA models) are nonstationary, which will be discussed further in section 2.8.1 below. Note that the

Chapter 2

50

Linear Time Series Models

aforementioned stationary solution is the process which satisfies model (2.30) for all —co < t < oo. It should not be confused with a process starting with p arbitrary initial values Xi, • ■• , X p . For the latter, {X^, t > p} defined by (2.30) is stable only if all the roots of &(a?) = 0 are outside of the unit cycle. This stable process will merge together with the stationary process as t increases. On the other hand, if there are no unit roots but at least one-root inside the the unit cycle, {X*, t > p} defined by (2.30) with arbitrary initial values X i,-- - ,X P is explosive, and it will not converge to the stationary solution of (2.30). So far, we have introduced a number of stationary models. To help us com­ prehend them, Figure 2.5 depicts their relationship. Stationary time series models are a large class, containing all processes with time-invariant second moments. We specify a subclass of useful models, namely stationary ARMA models. Within the context of ARMA models, no attempts are made to depict the structure of a white noise process beyond its first two moments. However GARCH processes introduced in the next chapter are white noise. The conditional variances of GARCH pro­ cesses play important roles in understanding the volatility of financial time series. Of course all i.i.d. sequences form a subset of GARCH processes, which exhibit no serial dependence at all.

Figure 2.5

Relationship among different processes: Stationary processes are the largest

set, followed by ARMA, ARMA-GARCH, and i.i.d. white noises processes.

N o n s ta tio n a r y a n d lo n g m e m o ry A R M A p ro ce sse s

2.3

The stationarity requires some time-invariable properties for time series, which is not always observed in practice. For example, according to the efficient market hypothesis, the logarithm of the price processes behaves like a random walk. 2.3.1

R a n d o m w alk s

A random walk without drift [see also (1.17)] is defined as X t — X^—i -f-

t — 1,2, ■• • ,

(2.35)

2.3

Nonstationary and long memory ARMA processes

51

where ~ WN(0, cr2 ). Hence a random walk is an AR(1) model with the AR coefficient equal to 1, whose characteristic equation b(x) — 1 —x has a unit root, i.e. a root on the unit cycle. We may assume that the process starts at an arbitrary point X q = c, then var(Xt) = var(X t-i + efr) —var(eq H------ F et ) = tu 2 , which increases linearly with time t. Therefore X t is not stationary. A random walk is a mathematical model for the position of a ‘drunk’ who walks a random step on a line with size £f at time t. When {st } are independent and normally distributed, it is a discrete version of Brownian motion. Although the structure of random walks is extremely simple, it is one of the most frequently used model for some nonstationary components of time series with complex structures. Figure 2.6 displays a random walk of length 300 generated from (2.35) with standard normal together with its sample ACF. The time series plot resembles patterns of financial asset prices. See, for example, the top panel in Figures 1.3 and 1.3. As Xt is not stationary, the autocorrelation Corr(Xt ,Xt_|.fc) depends on both t and k. Since X t = c + cov(Xt , X t + k) = var(Xf) —ter2 , Hence, it is easy to see for k

for all k

0.

1 that

Corr(X t , Xt+k) =

y/tcr'2-(t + k)a 2 = { t/(t +

(2.36)

which is close to 1 for small k and large t. This explains why the sample ACF, defined as in (2.3), is close to 1 at small time lags, and decays slowly when the lag increases; see the second panel in Figure 2.6. This is in fact a common feature for nonstationary processes with unit roots. Note that differenced random walk VXt = X t —X t-i is white noise, where V denotes the difference operator, and for any integer d > 2, X d X t = V(V d “ x X t ). Random walk 10 5 0 -5 0

50

100

150

200

250

300

Chapter 2

52

Linear Time Series Models

Figure 2.6 Time series plot and ACF of a random walk.

2.3.2

A R IM A m odel and exponential sm oothing

For a nonstationary time series such as the log-prices of financial assets, one often takes the difference and hopes the resulting time series is stationary. Autoregressive integrated moving average (ARIMA) models form a useful class of nonstationary models. We say X t ~ ARIMA(p,d, q) if \7d X t is stationary ARMA(p, q). Of course the models with small p. d, q are practically more useful. One such an example is ARIMA(O,1,1) model et ~WN(0, q.

b^ —

Hence { Z ^ } is then an MA(