Model risk in financial markets : from financial engineering to risk management 9789814663403, 9814663409

1,742 308 3MB

English Pages 353 [382] Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Model risk in financial markets : from financial engineering to risk management
 9789814663403, 9814663409

Citation preview

9524_9789814663403_tp.indd 1

14/5/15 9:59 am

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

9524_9789814663403_tp.indd 2

14/5/15 9:59 am

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data Tunaru, Radu. Model risk in financial markets : from financial engineering to risk management / Radu Tunaru (University of Kent, UK). pages cm Includes bibliographical references and index. ISBN 978-9814663403 (alk. paper) 1. Financial risk management. 2. Risk management. 3. Financial engineering. I. Title. HD61.T86 2015 332'.0415011--dc23 2015017268

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

Copyright © 2015 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

In-house Editor: Li Hongyan

Printed in Singapore

Hongyan - Model Risk in Financial Markets.indd 1

14/5/2015 9:27:55 AM

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

“I will never die for my beliefs; because I might be wrong...” Bertrand Russell

To my dear daughters Filipa and Joana, who need me to be always right or at least almost all the time.

v

page v

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Preface

Finance was part of the evolution of humanity for thousands of years. Follow the money and you will understand the course of history. The nascency of mathematics was triggered by the need to solve problems related to money and finance. Modern finance has experienced a meteoric rise in the 1980s, coupled with the introduction of computers on a large scale and also with the liberalization of financial markets. Scientists from many other disciplines like Mathematics, Statistics, Physics, Mechanics, Enginering and Economics, found a new uncharted territory in Modern Finance and they embraced the new “gold” scientific race. After a sunrise there is a sunset, and exuberance quite often masks lack of full understanding of the complexity of problems that may surface at any moment in time. The series of crises in Finance culminated with the subprime-liquidity crisis of 2007 that was reminiscent of the financial crash of 1929. Who was to blame and what really happened is still the subject of intensive research, and valuable lessons are to be learned overall. While everybody is offering an opinion about toxic assets and liquidity measures and trying to design measures of systemic risk impact, not enough attention is paid in my opinion to another source of future problems, that could also reach catastrophic and endemic levels, that is the risk carried by different models, or shortly model risk. What is model risk? Is it important? Can we measure it? These questions will receive a suite of answers in this book but it would be unrealistic to say it covers the entire spectrum of problems related to model risk in financial markets. This book is written to help graduate students in Finance, MBA and postgraduate students in Quantitative Finance, risk managers, analysts in product control and model validation teams in investment banks, regulators, academics and other parties working as consultants or in rating

vii

page vii

April 28, 2015

12:28

viii

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

agencies, understand the difficulty of financial modelling. Since those working with models must have a minimum level of knowledge in quantitative finance I assume that the reader has standard knowledge of probability theory, statistics, econometrics, asset pricing, derivatives, risk management and financial engineering. Financial modelling intrinsically introduces model risk and this book aims to highlight the diversity of model risk facets in finance. It should help the reader develop an inquiring approach when dealing with models in finance. Radu Tunaru, London 2014

page viii

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

List of Notations

p(t, T ) rt

the price at time t of a zero coupon bond with maturity T short rate at time t, unless otherwise stated

rf

is the continuous-compounding constant risk-free rate per annum

Q

denotes a generic risk-neutral probability measure, also called martingale pricing measure

P

denotes the physical (also called objective or real-world) probability measure

{St }t≥0

is the stochastic process associated with an asset value

{Xt }t≥0

is the stochastic process associated with an asset value or its return

L(t; T1 , T2 )

is the forward Libor rate at time t for the future period [T1 , T2 ]

V aRα (X)

is used for value-at-risk at critical level α for the risk of an asset X. Typically α = 1%, 5%

ESα (X)

is used for the expected shortfall at critical level α for the risk of an asset X

L

is the likelihood function

(·)

+

= max(·, 0)) is the positive part function

F or FX

is the cumulative distribution function of a random variable X

φ

is the probability density function of the standard Gaussian distribution

ix

page ix

April 28, 2015

12:28

x

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

ϕ(a, δ)

is the probability density function of the Gaussian distribution with mean a and variance δ

Φ

is the cumulative probability density function of the standard Gaussian distribution

∼a

means distributed asymptotically the same as

o(·)

denotes the “Little-O” Landau notation, so g(n) = o(f (n)) iff fg(n) (n) → 0 when n → ∞

O(·)

denotes the “Big-O” Landau notation, so g(n) = O(f (n)) iff there is a constant c such that

g(n) f (n)

→ c when n → ∞

Wt or Zt

are generally used for the Wiener process at time t



represents the notation “for all”



represents the notation that “there exists”



denotes that the two expression on the left side of it and right side of it are equal up to a proportionality constant

1A (·)

denotes the indicator function 1A (x) = 1 if x ∈ A and it is zero otherwise.

page x

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Contents

Preface

vii

List of Notations

ix

List of Figures

xix

List of Tables

xxiii

1.

Introduction

2.

Fundamental Relationships 2.1 2.2 2.3 2.4

2.5 2.6 3.

1 11

Introduction . . . . . . . . . . . . . . . . . . . Present Value . . . . . . . . . . . . . . . . . . Constant Relative Risk Aversion Utility . . . Risk versus Return: The Sharpe Ratio . . . . 2.4.1 Issues related to non-normality . . . . 2.4.2 The Sharpe ratio and negative returns APT . . . . . . . . . . . . . . . . . . . . . . . Notes and Summary . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

Model Risk in Interest Rate Modelling 3.1 3.2 3.3

3.4

Introduction . . . . . . . . . . . . . . . . . . . . Short Rate Models . . . . . . . . . . . . . . . . . Theory of Interest Rate Term Structure . . . . . 3.3.1 Expectations Hypothesis . . . . . . . . . 3.3.2 A reexamination of Log EH . . . . . . . . 3.3.3 Reconciling the arguments and examples Yield Curve . . . . . . . . . . . . . . . . . . . . . xi

11 11 12 14 14 15 16 18 21

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

21 22 31 31 36 38 39

page xi

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

xii

Model Risk in Financial Markets

3.5 3.6 3.7 4.

3.4.1 Parallel shift of a flat yield curve . . . . . . . 3.4.2 Another proof that the yield curve cannot be 3.4.3 Deterministic maturity independent yields . 3.4.4 Consol modelling . . . . . . . . . . . . . . . . Interest Rate Forward Curve Modelling . . . . . . . One-factor or Multi-factor models . . . . . . . . . . . Notes and Summary . . . . . . . . . . . . . . . . . .

. . . flat . . . . . . . . . . . . . . .

Arbitrage Theory 4.1 4.2 4.3

4.4 5.

Carte˙main˙WS

Introduction . . . . . . . . . . . . . . . . . . . . Transaction Costs . . . . . . . . . . . . . . . . . Arbitrage . . . . . . . . . . . . . . . . . . . . . 4.3.1 Non-convergence financial gain process 4.3.2 Distortion operator with arbitrage . . . Notes and Summary . . . . . . . . . . . . . . .

Derivatives Pricing Under Uncertainty 5.1

5.2 5.3

5.4 5.5

39 40 41 42 45 48 51 55

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

55 56 58 58 60 63 65

Introduction to Model Risk . . . . . . . . . . . . . . . . . 65 5.1.1 Parameter estimation risk . . . . . . . . . . . . . 68 5.1.2 Model selection risk . . . . . . . . . . . . . . . . . 70 5.1.3 Model identification risk . . . . . . . . . . . . . . 71 5.1.4 Computational implementation risk . . . . . . . . 74 5.1.5 Model protocol risk . . . . . . . . . . . . . . . . . 75 Uncertain Volatility . . . . . . . . . . . . . . . . . . . . . . 77 5.2.1 An option pricing model with uncertain volatility 78 Option Pricing under Uncertainty in Complete Markets . 80 5.3.1 Parameter uncertainty . . . . . . . . . . . . . . . . 81 5.3.2 Model uncertainty . . . . . . . . . . . . . . . . . . 86 5.3.3 Numerical examples . . . . . . . . . . . . . . . . . 87 5.3.4 Accounting for parameter estimation risk in the Black-Scholes model . . . . . . . . . . . . . . . . . 88 5.3.5 Accounting for parameter estimation risk in the CEV model . . . . . . . . . . . . . . . . . . . . . . 92 A Simple Measure of Parameter Uncertainty Risk . . . . . 97 Bayesian Option Pricing . . . . . . . . . . . . . . . . . . . 99 5.5.1 Modelling the future asset value under physical measure . . . . . . . . . . . . . . . . . . . . . . . . 100

page xii

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Contents

xiii

5.5.2

5.6 5.7

5.8 6.

Portfolio Selection under Uncertainty 6.1 6.2

6.3

6.4 7.

Modelling the current asset value under a riskneutral measure . . . . . . . . . . . . . . . . . . . Measuring Model Uncertainty . . . . . . . . . . . . . . . . 5.6.1 Worst case risk measure . . . . . . . . . . . . . . . Cont’s Framework for Model Uncertainty . . . . . . . . . 5.7.1 An axiomatic approach . . . . . . . . . . . . . . . 5.7.2 A coherent measure of model risk . . . . . . . . . 5.7.3 A convex measure of model risk . . . . . . . . . . Notes and Summary . . . . . . . . . . . . . . . . . . . . .

Introduction to Model Risk for Portfolio Analysis . . . Bayesian Averaging for Portfolio Analysis . . . . . . . 6.2.1 Empirical Bayes priors . . . . . . . . . . . . . 6.2.2 Marginal likelihood calculations . . . . . . . . Portfolio Optimization . . . . . . . . . . . . . . . . . . 6.3.1 Portfolio optimisation with stochastic interest rates . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Stochastic market price of risk . . . . . . . . . 6.3.3 Stochastic volatility . . . . . . . . . . . . . . . Notes and Summary . . . . . . . . . . . . . . . . . . .

101 102 103 104 104 106 109 113 115

. . . . .

. . . . .

115 118 119 120 121

. . . .

. . . .

123 125 126 127

Probability Pitfalls of Financial Calculus

129

7.1 7.2 7.3 7.4

129 130 131 133 133 134 135 136 136 137 138 138 139 140

7.5

7.6

7.7

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Distribution Functions and Density Functions Gaussian Distribution . . . . . . . . . . . . . . . . . . . . Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Mean-median-mode inequality . . . . . . . . . . . 7.4.2 Distributions are not defined by moments . . . . . 7.4.3 Conditional expectation . . . . . . . . . . . . . . . Stochastic Processes . . . . . . . . . . . . . . . . . . . . . 7.5.1 Infinite returns from finite variance processes . . . 7.5.2 Martingales . . . . . . . . . . . . . . . . . . . . . . Spurious Testing . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Spurious mean reversion . . . . . . . . . . . . . . 7.6.2 Spurious regression . . . . . . . . . . . . . . . . . Dependence Measures . . . . . . . . . . . . . . . . . . . .

page xiii

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

xiv

Model Risk in Financial Markets

7.7.1 7.7.2

7.8 8.

Carte˙main˙WS

7.7.3 7.7.4 7.7.5 Notes

Problems with the Pearson linear correlation coefficient . . . . . . . . . . . . . . . . . . . . Pitfalls in detecting breakdown of linear correlation . . . . . . . . . . . . . . . . . . . Copulas . . . . . . . . . . . . . . . . . . . . . More general issues . . . . . . . . . . . . . . Dependence and Levy processes . . . . . . . and Summary . . . . . . . . . . . . . . . . . .

. . . 140 . . . . .

. . . . .

. . . . .

141 145 152 153 154

Model Risk in Risk Measures Calculations

157

8.1 8.2

157 158 158 159 160 163 163 168 168 171 175 176 177 179 179 181 187 187 191 192 193 195 196 197 199 199

8.3 8.4

8.5

8.6

8.7

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . Controlling Risk in Insurance . . . . . . . . . . . . . . . . 8.2.1 Diversification . . . . . . . . . . . . . . . . . . . . 8.2.2 Variance . . . . . . . . . . . . . . . . . . . . . . . Coherent Distortion Risk Measures . . . . . . . . . . . . . Value-at-Risk . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 General observations . . . . . . . . . . . . . . . . . 8.4.2 Expected shortfall and expected tail loss . . . . . 8.4.3 Violations ratio . . . . . . . . . . . . . . . . . . . 8.4.4 Correct representation . . . . . . . . . . . . . . . . 8.4.5 VaR may not be subadditive . . . . . . . . . . . . 8.4.6 Artificial improvement of VaR . . . . . . . . . . . 8.4.7 Problems at long horizon . . . . . . . . . . . . . . Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Uncertainty in risk estimates: A short overview . 8.5.2 Backtesting VaR . . . . . . . . . . . . . . . . . . . Asymptotic Risk of VaR . . . . . . . . . . . . . . . . . . . 8.6.1 Normal VaR . . . . . . . . . . . . . . . . . . . . . 8.6.2 More general asymptotic standard errors for VaR 8.6.3 Exact confidence intervals for VaR . . . . . . . . . 8.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . 8.6.5 VaR at different significance levels . . . . . . . . . 8.6.6 Exact confidence intervals . . . . . . . . . . . . . . 8.6.7 Extreme losses estimation and uncertainty . . . . 8.6.8 Backtesting expected shortfall . . . . . . . . . . . Notes and Summary . . . . . . . . . . . . . . . . . . . . .

page xiv

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Contents

9.

xv

Parameter Estimation Risk 9.1 9.2

9.3

9.4 9.5 9.6

Introduction . . . . . . . . . . . . . . . . . . . . . . . Problems with Estimating Diffusions . . . . . . . . . 9.2.1 A brief review . . . . . . . . . . . . . . . . . 9.2.2 Parameter estimation for the Vasicek model 9.2.3 Parameter estimation for the CIR model . . Problems with Estimation of Jump-Diffusion Models 9.3.1 The Gaussian-Poisson jump-diffusion model . 9.3.2 ML Estimation under the Merton Model . . 9.3.3 Inexistence of an unbiased estimator . . . . . A Critique of Maximum Likelihood Estimation . . . Bootstrapping Can Be Unreliable Too . . . . . . . . Notes and Summary . . . . . . . . . . . . . . . . . .

205 . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

10. Computational Problems 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Problems with Monte Carlo Variance Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Pitfalls in Estimating Greeks with Pathwise Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Pitfall in Options Portfolio Calculation by Approximation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Transformations and Expansions . . . . . . . . . . . . . . 10.5.1 Edgeworth expansion . . . . . . . . . . . . . . . . 10.5.2 Computational issues for MLE . . . . . . . . . . . 10.6 Calculating the Implied Volatility . . . . . . . . . . . . . . 10.6.1 Existence and uniqueness of implied volatility under Black-Scholes . . . . . . . . . . . . . . . . . 10.6.2 Approximation formulae for implied volatility . . 10.6.3 An interesting example . . . . . . . . . . . . . . . 10.7 Incorrect Implied Volatility for Merton Model . . . . . . . 10.8 Notes and Summary . . . . . . . . . . . . . . . . . . . . .

205 206 206 208 212 215 215 216 218 218 221 224 227 227 228 232 239 242 242 244 245 245 248 249 251 253

11. Portfolio Selection Using the Sharpe Ratio

257

12. Bayesian Calibration for Low Frequency Data

263

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 263

page xv

April 30, 2015

14:13

xvi

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

12.2 Problems in Pricing Derivatives for Assets with a Slow Business Time . . . . . . . . . . . . . . . . 12.3 Choosing the Correct Auxiliary Values . . . . . . . 12.4 Empirical Exemplifications . . . . . . . . . . . . . 12.4.1 A mean-reversion model with predictability drift . . . . . . . . . . . . . . . . . . . . . . 12.4.2 Data augmentation . . . . . . . . . . . . . 12.5 MCMC Inference for the IPD model . . . . . . . . 12.6 Derivatives Pricing . . . . . . . . . . . . . . . . . . 12.7 Notes and Summary . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . in the . . . . . . . . . . . . . . . . . . . .

13. MCMC Estimation of Credit Risk Measures 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 13.2 A Short Example . . . . . . . . . . . . . . . . . . . . 13.3 Further Analysis . . . . . . . . . . . . . . . . . . . . 13.3.1 Bayesian inference with Gibbs sampling . . . 13.4 Hierarchical Bayesian Models for Credit Risk . . . . 13.4.1 Model specification of probabilities of default 13.4.2 Model estimation . . . . . . . . . . . . . . . 13.5 Standard&Poor’s Rating Data . . . . . . . . . . . . . 13.5.1 Data description . . . . . . . . . . . . . . . . 13.5.2 Hierarchical model for aggregated data . . . 13.5.3 Hierarchical time-series model . . . . . . . . 13.5.4 Hierarchical model for disaggregated data . . 13.6 Further Credit Modelling with MCMC Callibration . 13.7 Estimating the Transition Matrix . . . . . . . . . . . 13.7.1 MCMC estimation . . . . . . . . . . . . . . . 13.7.2 MLE estimation . . . . . . . . . . . . . . . . 13.8 Notes and Summary . . . . . . . . . . . . . . . . . .

264 266 268 268 269 270 276 281 283

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

14. Last But Not Least. Can We Avoid the Next Big Systemic Financial Crisis? 14.1 Yes, We Can . . . . . . . . . . . . . . . . . . . . . . . . 14.2 No, We Cannot . . . . . . . . . . . . . . . . . . . . . . . 14.3 A Non-technical Template for Model Risk Control . . . 14.3.1 Identify the type of model risk that may appear 14.3.2 A guide for senior managers . . . . . . . . . . . 14.4 There is Still Work to Do . . . . . . . . . . . . . . . . .

283 285 290 291 294 295 297 301 301 302 308 309 313 316 316 318 319

321 . . . . . .

321 322 324 325 326 327

page xvi

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Contents

Carte˙main˙WS

xvii

15. Notations for the Study of MLE for CIR process

329

Bibliography

331

Index

351

page xvii

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

List of Figures

1.1

Two worlds of Finance through a mathematical eye . . . . . .

6

3.1

Comparison of probability density functions under the Vasicek and CIR models for the value rate r at T = 10 . . . . . . . . .

25

Comparison of probability density functions under the Vasicek and CIR models for the value rate r at T = 5 . . . . . . . . .

26

Comparison of probability density functions under the Vasicek and CIR models for the value rate r at T = 20 . . . . . . . . .

27

First comparison of simulated paths for the Vasicek model and the CIR model having the same parameters . . . . . . . . . .

29

Second comparison of simulated paths for the Vasicek model and the CIR model having the same parameters . . . . . . . .

30

Posterior densities of the Black-Scholes parameters and the European call and put option price for the FTSE100 index. The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year .

89

The Posterior densities of the Black-Scholes market price of risk μ−r σ , and the Greek delta parameter for the European call and put option prices for the FTSE100 . . . . . . . . . . . . .

91

The posterior densities for the parameters of the CEV model calculated using data on FTSE100 index with MCMC from a sample of 20,000 values. The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year . . . . . . . . . . . . . . . . . . . . .

93

3.2 3.3 3.4 3.5

5.1

5.2

5.3

xix

page xix

April 28, 2015

12:28

xx

5.4

5.5

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Posterior surface for the European call price on the FTSE100 generated by the parameter uncertainty on γ and σCEV . The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year .

95

Posterior surface for the European put price on the FTSE100 generated by the parameter uncertainty on γ and σCEV . The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year .

96

6.1

The stock market crash in October 1987 . . . . . . . . . . . . 117

7.1

Calculating the conditional correlation coefficient as a function of the marginal correlation coefficient and the ratio between the marginal variance and the conditional variance . . . . . . 143

7.2

The correlation between comonotonic lognormal variables ln(X) ∼ N (0, 1) and ln(Y ) ∼ N (0, σ) . . . . . . . . . . . . . . 147

8.1

Comparison of VaR and ES for Gaussian and Student’s distributions under Solvency II and the Swiss Solvency Test . . . . 167

8.2

Daily FTSE100 returns and 1% Value-at-Risk calculations for the period 22 May 2012 to 1 November 2013 using the adjusted closed price series . . . . . . . . . . . . . . . . . . . . . . . . . 170

8.3

Daily FTSE100 returns and 1% Value at Risk calculations for the period 22 May 2012 to 1 November 2013 using the low price series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.4

Daily FTSE100 returns and 1% Expected Shortfall calculations for the period October 2012 to November 2013 using the adjusted closed price series . . . . . . . . . . . . . . . . . . . . . 200

10.1

A symmetric butterfly payoff constructed from trading two long European call options with exercise prices K1 and K3 and short 3 . . . 230 two European call options with strike price K2 = K1 +K 2

10.2

A comparison of the options portfolio valuation using the delta and the delta-gamma approximation methods. The portfolio of options has six long European call options with exercise prices 105, two short European call options and four short European put options with exercise price 95 . . . . . . . . . . . . . . . . 240

page xx

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

List of Figures

Carte˙main˙WS

xxi

10.3

A comparison of the options portfolio valuation using the delta and the delta-gamma approximation methods. The portfolio of options has six long European call options with exercise prices 105, five short European call options and one short European put options with exercise price 95 . . . . . . . . . . . . . . . . 241

10.4

A comparison of the options portfolio valuation using the delta and the delta-gamma approximation methods. The portfolio of options has six long European call options with exercise prices 105, two short European call options and two short European put options with exercise price 95 . . . . . . . . . . . . . . . . 242

10.5

Calculating the implied volatility for a stock with current value $34.14 from the market price $4.7 of a European call option with maturity T = 0.45 years and a strike price of K = $30.00, assuming that the risk-free rate is 2.75% . . . . . . . . . . . . 250

10.6

Calculating the implied volatility for a stock with current value $34.14 from the market price $4.7 of a European call option with maturity T = 0.45 years and a strike price of K = $30.00, assuming that the risk-free rate is 5% . . . . . . . . . . . . . . 250

12.1

Historical trend of the IPD Annual UK commercial property index for the period 1980-2009 . . . . . . . . . . . . . . . . . . 273

12.2

Posterior densities of the main parameters of interest of the mean-reverting model for the IPD index. θ is the meanreversion parameter, α is the long-run mean on the log scale and σ is the volatility per annum. All densities are constructed from a sample of observations collected from 50,000 MCMC iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

12.3

Posterior densities of fair forward prices on the IPD index. Calculations are representative for the year 2009 and all future five year maturities. All densities are constructed from a sample of vector observations collected from 50,000 MCMC iterations following a burn-in period of 150,000 iterations . . . 278

12.4

Posterior densities of fair prices of European vanilla call option on the IPD UK All Property index. Calculations are for 2009 and all future five year maturities. All densities are constructed from a sample of vector observations collected from 50,000 MCMC iterations following a burn-in period of 150,000 iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

page xxi

April 28, 2015

12:28

xxii

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

13.1

Comparison of mean default probabilities: observed versus loglinear and logistic models for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . . . . 13.2 Comparison of calibration results for default probabilities: loglinear and logistic models versus observed. All credit ratings are used for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Comparison of calibration results for default probabilities: Bayesian log-log and Bayesian logistic models versus observed. All credit ratings used are for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . . . . 13.4 Posterior kernel density estimates for investment grade default probabilities using the logistic link model and the S&P data for the aggregate number of defaults over the horizon 1981–2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Posterior kernel density estimates for non-investment grade default probabilities using the logistic link model and the S&P data for the aggregate number of defaults over the horizon 1981–2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Posterior kernel density estimate for the ratio between the cumulative default probability in the speculative grade categories and the cumulative default probability in the investment grade categories. Aggregated Standard&Poor’s 1981– 2004 data, logistic model . . . . . . . . . . . . . . . . . . . . . 13.7 Observed values and posterior means with credible intervals for investment grade default probabilities, Standard&Poor’s yearly rating data on all corporates between 1981–2004 . . . . . . . . 13.8 Observed values and posterior means with credible intervals for non-investment grade default probabilities, Standard&Poor’s yearly rating data on all corporates between 1981–2004 . . . . 13.9 The posterior mean and credible interval for the yearly factor yt of the Bayesian Panel Count Data model and Standard&Poor’s data between 1981 and 2004 . . . . . . . . . . . . . . . . . . . 13.10 Correlation matrix of probabilities of default p1 , p2 , . . . , p7 corresponding to the Standard&Poor’s seven rating categories: AAA, AA, A, BBB, BB, B, and CCC/C. The data used is the Standard&Poor’s yearly rating data on all corporates between 1981–2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

288

289

294

303

304

307

310

311

315

316

page xxii

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

List of Tables

2.1 2.2 2.3

Example of good asset with bad Sharpe ratio . . . . . . . . . Comparison of Sharpe ratios when risk free rate is rf = 5% . Comparison of Sharpe ratios when risk free rate is rf = 5% .

5.1

MCMC Analysis of Bayesian option pricing under the BlackScholes GBM model. Posterior inference statistics for mean, standard deviation, median and 2.5% and 95% quantiles from a sample of 50,000 values . . . . . . . . . . . . . . . . . . . . . Posterior estimates of the parameters of the CEV model from the FTSE100 data. Inference is obtained with MCMC from a sample of 20,000 values . . . . . . . . . . . . . . . . . . . . . . Posterior estimates of parameters of the CEV model from the FTSE100 data.The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2

5.3

8.1 8.2

8.3

8.4

Example of a bivariate random vector with correlation coefficient equal to 1/24 . . . . . . . . . . . . . . . . . . . . . . . . Comparative Violations Ratio Calculations for the FTSE100 for the period 22 May 2012 to 1 November 2013 using adjusted closed price series . . . . . . . . . . . . . . . . . . . . . . . . . Comparative Violations Ratio Calculations for the FTSE100 for the period 22 May 2012 to 1 November 2013 using low price series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Backtesting for the four methods of calculating VaR for the FTSE100 for the period 22 May 2012 to 1 November 2013 . . xxiii

15 15 16

88

93

97

161

171

171 183

page xxiii

April 28, 2015

12:28

xxiv

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

8.5

Example of a confidence interval for a Gaussian VaR using the simulation of asymptotic distribution method from [Dowd (2000b)] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

8.6

FTSE 100 Daily Returns: Summary statistics for the period 2000 to 2010, and for the subsamples for the periods 2004-2006 and 2008 to 2010 . . . . . . . . . . . . . . . . . . . . . . . . . 194

8.7

VaR estimates and 90% confidence interval bounds for FTSE 100 daily returns. Results are presented for the entire sample 2000-2011 and also for the subperiods 2004-2006 and 2008-2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

11.1

Performance of volatility-diversified US portfolios The performance statistics are of the daily relative returns on the different portfolios based on equity and VIX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

11.2

Performance of volatility-diversified US portfolios The performance statistics are of the daily relative returns on the different portfolios based on equity, bonds and VIX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures) . . . . . . . . . . . . . . . . . . . . . . . 259

11.3

This table summarizes the performance of volatility-diversified European portfolios. The performance statistics are of the daily relative returns on the different portfolios based on equity and VSTOXX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures) . . . . . 260

11.4

This table summarizes the performance of volatility-diversified European portfolios. The performance statistics are of the daily relative returns on the different portfolios based on equity, bonds and VSTOXX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

page xxiv

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

List of Tables

12.1

12.2

12.3

12.4

12.5

Posterior analysis summary for the MCMC analysis of the IPD index on the log-scale, for the model with non-constant fundamental level. θ is the mean-reversion parameter, α and β are the intercept and slope parameters of the linear long-run mean on the log scale and σ is the volatility per annum. Posterior estimates are the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations . . . . . . . . . . . . . . . . . . . Posterior analysis summary for the MCMC analysis of the IPD index for the model with a constant fundamental level. θ is the mean-reversion parameter, α is the long-run mean on the log scale and σ is the volatility per annum. Posterior estimates are the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations . . . . . . . . . . . . . . . . . . . . . . . . . Posterior statistics of the augmented data representing the proper bridge between data points. Posterior estimates are the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations following a burn-in period of 150,000 iterations . . . Posterior analysis summary for the MCMC analysis of the forward prices on the IPD index. The forward prices on the IPD index, calculated for 2009 and five annual future maturities. Calculations are performed under a given term structure of the market price of risk λ1 = 2.58, λ2 = 0.73, λ3 = 0.70, λ4 = 0.82, λ5 = 1.00. Estimates are for the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations following a burn-in period of 150,000 iterations . . . . . . . . . . . . . . Posterior analysis summary for the MCMC analysis of the European call vanilla prices on the IPD index. The European call vanilla prices on the IPD index have an at-the-money strike K ∼ = 1219, calculated for 2009 and five annual future maturities. Calculations are performed under the following given term structure of the market price of risk λ1 = 2.58, λ2 = 0.73, λ3 = 0.70, λ4 = 0.82, λ5 = 1.00. Estimates are for the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations following a burn-in period of 150,000 iterations . . .

xxv

271

274

275

277

279

page xxv

April 28, 2015

12:28

xxvi

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

13.1

Corporate default probabilities implied by the log-linear regression model for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

13.2

Corporate default probabilities implied by the logistic linear regression model for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

13.3

Posterior estimates of the mean, standard deviation, median and quantiles for the Bayesian logistic model. All credit ratings used here are for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

13.4

Posterior estimates of the mean, standard deviation, median and quantiles for the Bayesian log-log model. All credit ratings used are for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

13.5

Comparison of the DIC measure for the Bayesian logistic regression model and the Bayesian log(-log) regression model. All credit ratings used are for corporates rated by Moody’s over the period 1993 to 2000 . . . . . . . . . . . . . . . . . . . 293

13.6

Bayesian MCMC posterior estimates from the Standard&Poor’s data for the aggregate number of defaults over the horizon 1981–2004. The hierarchical model (13.8) is estimated with the logit, probit, and log(-log) link functions . . . 305

13.7

Bayesian MCMC posterior estimates of default probabilities p1 , . . . , p7 for each rating category, obtained from fitting the Bayesian hierarchical model to the Standard&Poor’s 1981– 2004 aggregated data with the logit, probit, and log–log link functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

13.8

Bayesian estimates for Standard&Poor’s yearly rating data on all corporates between 1981–2004; parameters of the time series model (13.15) . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

13.9

Bayesian MCMC posterior estimates of correlations of probabilities of default, based on the logistic link model, Standard&Poor’s yearly rating data on all corporates between 1981–2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

13.10 Bayesian MCMC posterior inference for the Bayesian Panel Count Data model using Standard&Poor’s yearly rating data on all corporates between 1981–2004 . . . . . . . . . . . . . . 314

page xxvi

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

List of Tables

Carte˙main˙WS

xxvii

13.11 Posterior means of parameters b. Their value on the main diagonal is not relevant because of the identifiability constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 13.12 Posterior medians of transition probabilities using data from Standard&Poor’s between 1981-2004 . . . . . . . . . . . . . . 317 13.13 Maximum likelihood estimators of transition probabilities using data from Standard&Poor’s on all corporates between 1981-2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

page xxvii

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 30, 2015

14:13

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 1

Introduction

Why do I consider this book to be relevant to the community of people involved in one way or another with financial modelling? Because the next big crisis already has the seeds planted and its roots are growing stronger and bigger by the day. There are already manifestations of model risk that led to substantial losses and it may sound clich´e to say that this is only the tip of the iceberg. In 1987 Merrill Lynch reported losses of 300 million USD on stripped mortgage-backed securities because of an incorrect pricing model and five years later in 1992 J.P. Morgan lost about 200 million USD in the mortgage-backed securities market because of inadequate modelling of prepayments. Bank of Tokyo/Mitsubishi announced in March 1997 that its New York-subsidiary dealing with derivatives had incurred an $83 million loss because of their internal pricing model overvalued a portfolio of swaps and options on USD interest rates. [Dowd (2002)] pointed out that the loss was caused by wrongly using a one-factor Black-Derman-Toy (BDT) model to trade swaptions. The model was calibrated to market prices of ATM swaptions but used to trade out-of-the-money (OTM) Bermudan swaptions, which was not appropriate. With the benefit of hindsight it is known now that pricing OTM swaptions and Bermudan swaptions requires multi-factor models. Also in 1997, NatWest Capital Markets reported a £50 million loss because of a mispriced portfolio of German and U.K. interest rate derivatives on the book of a single derivatives trader in London who fed his own estimates of volatility into a model pricing OTC interest rate options with long maturities. The estimates were high and led to fictitious profits. It is not clear whether the trader simply inflated the volatility estimate or she/he came up with the estimate that was more “convenient” to her/him. [Elliott (1997)] pointed out that these losses were directly linked to model risk. [Williams (1999)] remarked that model risk was not included

1

page 1

April 30, 2015

14:13

2

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

in standard risk management software and in 1999 about 5 billion USD losses were caused by model risk. The recent advances of algorithmic trading add another dimension to model risk. It is difficult to say what exactly is happening and who is to blame in this new type of superfast trading, most of it being opaque and difficult to control. A Deutsche Bank subsidiary in Japan used some “smart” models to trade electronically that went wild in June 2010, going into an infinite loop and taking out a $183 billion stock position. The thing about computers is that any mistakes are executed now thousand of times faster than before. There is no doubt in my mind that the next big financial crisis will be generated by model risk. The website gloriamundi.com accumulated by the summer of 2014 about 7500 papers dedicated to financial risk management, covering hundreds of different models and methods to calculate Value at Risk (VaR), for example. This flurry of papers was very much the result of an effervescence in the 1990s. Nevertheless, this mountain of research could not stop the Enron disaster and the dotcom bubble of 2002. The next chapter in financial evolution was in the 2000s with an explosion of research focused on credit risk, once the credit markets took off on the back of the credit default swap (CDS) concept. As of July 2014, more that 1600 credit risk papers were available to download from www.defaultrisk.com, over 250 of which were papers on credit risk models. Add to that the thousands of papers on derivatives pricing and hedging across various asset classes and you get the picture, very much of a jungle. This book does not aim to describe prescriptive science for finance problems. The main purpose is to illustrate pitfalls that may be obscured in the specialised literature but not known to a wider audience. Hence, the focus in this book is on Model Risk in Finance, covering theoretical as well as practical issues related to options pricing and financial engineering, risk management and model calibration, computational and heuristical methods applied to finance. Models in general are described through mathematical concepts such as equations, probability distributions, stochastic processes and so on. The model is a simplified version of the complex realities observed in financial markets. Essential to the modelling process is the determination of parameters fixing the coordinates of the evolution of asset prices, hedging ratios, risk measures, performance measures and so on. The uncertainty inherently present in parameter estimation is one major source of what we call model risk. Should the volatility parameter σ be 35% or 25%? Maybe both values are feasible but one is more likely than the other.

page 2

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Introduction

Carte˙main˙WS

3

Model risk is more than parameter estimation uncertainty but at the same time the parameter estimation is a procedure that is done daily by thousands of financial houses and banks around the world so the exposure to this type of risk is arguably the highest among the many facets of model risk. Parameter estimation may cause direct losses to a financial investor or institution as hinted at by some examples above but model risk is much more than the uncertainty in parameter estimates. It also includes model identification and model selection as well as incompatibilities with known theoretical results or empirical evidence. Model risk has been identified previously in all asset classes, see [Gibson (2000)] and [Morini (2011)] for interest rate products, [Satchell and Christodoulakis (2008)] and [R¨ osch and Scheule (2010)] for portfolio applications, [Satchell and Christodoulakis (2008)], [R¨ osch and Scheule (2010)] and [Morini (2011)] for credit products, and [Campolongo et al. (2013)] for asset backed securities. It has also been recognized in relation to measuring market risk, see [Figlewski (2004)], [Escanciano and Olmo (2010)], [Boucher et al. (2014)]. The concepts of risk and uncertainty have been intertwined. Since the backbone of this book is one of quantitative finance I consider risk as being associated with a given, fully specified set of possible outcomes, the question being with what probability a possible outcome may occur. Uncertainty on the other hand is a recognition of the existence of outcomes unspecified that may still occur and with which we do not have any way of associating a probability. Playing cards or roulette falls in the first category, saying whether there is life on a far away planet is an example of the latter and predicting the next type of fish you will encounter when going deep in the ocean is an example where both risk and uncertainty are combined. Likewise, we can make statements about the possible future value of the share price of Apple but we cannot say very much on the source of the next big crash in financial markets. In other words, the share price of Apple is risky while the source of the next big financial collapse is uncertain. The important distinction between risk and uncertainty goes back to [Knight (1921)] who pointed out that risk stems from situations where we do not know the outcome of a scenario, but can accurately measure the probability of occurrence. In contrast, uncertainty is associated with scenarios where it is not possible to know all the information required to determine the scenario probabilities a priori. With the nascency of modern finance in the 1960s and 1970s, model risk and uncertainty in general have preoccupied researchers in relation to

page 3

April 28, 2015

12:28

4

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

various problems studied. Thus, early mentions of model risk and uncertainty in finance include [Merton (1969)], [Jorion (1996)], [Derman (1996)], [Crouhy et al. (1998)], [Green and Figlewski (1999)]. Further notable contributions can be found in [Cairns (2000)], [Hull and Suo (2002)], [Brooks and Persand (2002)], [Talay and Zheng (2002)], [Charemza (2002)], [Alexander (2003)], [Cont (2006)], [Kerkhof et al. (2010)], [Boucher et al. (2014)]. There are many previous definitions of model risk. Here are some from various authors, just to show the wide perspective on model risk in the previous literature. [Gibson et al. (1999)] state “Model risk results from the inappropriate specification of a theoretical model or the use of an appropriate model but in an inadequate framework or for the wrong purpose.”

while for [McNeil et al. (2005)] model risk can be defined as “the risk that a financial institution incurs losses because its risk-management models are misspecified or because some of the assumptions underlying these models are not met in practice.”

For [Barrieu and Scandolo (2013)] “The hazard of working with a potentially not well-suited model is referred to as model risk”

and [Boucher et al. (2014)] define model risk as “the uncertainty in risk forecasting arising from estimation error and the use of an incorrect model”.

Model risk has also been identified to some extent by the Basel Committee on Banking Supervision in the Basel II framework, see [Basel (2006)] and [Basel (2011)]. Financial institutions ought to gauge their model risk. Furthermore, model validation is one component of the Pillar 1 Minimum Capital Requirements and Pillar 2 Supervisory Review Process. Unfortunately, in the Basel III framework (see [Basel (2010)]) it is stated that there are “a number of qualitative criteria that banks would have to meet before they are permitted to use a models-based approach”. Hence, the “Model validation and backtesting” guidelines, which focus mainly on counterparty credit risk management, allow qualitative or subjective decisions. For example, insurance or reinsurance companies can compute their solvency capital requirement using an internal risk model if it is approved by the supervisory authorities. This can be interpreted in many ways and it allows room for discretion, which may compound model risk rather than control it.

page 4

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Introduction

Carte˙main˙WS

5

In this book I distinguish between model risk and operational risk, even if the latter has multiple facets and it may be bundled together with model risk by some financial operators. One important aspect of operational risk, that should not be considered part of model risk, is data input error. The Vancouver stock exchange started a new index initialized at the level of 1000.000 in 1982. However, less than two years later it was observed that the index was constantly decreasing to about 520 despite the exchange setting records in value and volume as described in the Wall Street Journal in 1983. Upon further investigations it was revealed that the index, which was updated after every transaction, was recalculated by removing the decimals after the third decimal instead of rounding off. Hence, the correct value of 1098.892 became the published value of 520. Although it was a computational error, this is an example of operational risk after all and not of model risk. Another facet of operational risk is given by fiscal-legal updating. Sudden changes in law may expose a bank to great losses. In the UK a law on lower dividend tax credit was exploited by UBS in the 1990s. The law was changed in 1997 and caused many banks to suffer immediate losses with UBS incurring huge losses. In general, see [Gibson (2000)], models used by banks simply ignore the impact of sudden fiscal change. While innovation is in general beneficial, the proliferation of models applied in finance may cause havoc in the immediate future because too little time and effort are dedicated to verifying all these models and the pitfalls associated with them. Ignoring model risk may provide the wrong incentive for new financial products innovation. This book aims to help initiate procedures that will make model validation and product control more stringent and the models that are used in the industry more robust. An applied econometrician was saying once that Finance is all about discounted cash-flows. Maybe that was the case in the 1950s but certainly it is not the case nowadays. As I shall point out later, even discounting simple cashflows may not be as straightforward as one may think. Moreover, modern finance evolves in a dual world, looking back at historical asset prices and series of events to learn stylized facts and looking forward trying to decide how future prices will take shape. The graph in Fig. 1.1 illustrates the two adjacent finance worlds associated with an asset whose value is described as a stochastic process S = {St }t≥0 defined over a measurable space (Ω, F). Model risk in a nutshell can be conceptualised as different probability measures P defined over this measurable space and under which suitable financial calculus re-

page 5

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

6

Carte˙main˙WS

Model Risk in Financial Markets

lated to S is done. It is also possible to work with a discrete time process, S = {St }t∈{t−n ,...,t−1 ,t0 ,t1 ,...,tN } , that is closer to reality to some extent. Historical data lives under the physical measure P while options data lives mainly under a risk-neutral measure Q. Parameter estimation is usually realised under the physical measure P from a historical sample St−n , . . . , St−1 , St0 . Various market risk measures such as value-at-risk or expected shortfall are also calculated in this region. The risk-neutral measure Q covers forward-looking calculations usually associated with contingent claims on S. Thus, in this region we use data for parameter calibration. There is a subtle difference between estimation and calibration. The former is part of the process of model searching, in other words we still do not know the model and tests may be carried out before selecting one or more models. The latter takes place after a model or models have been selected, usually based on external theoretical considerations, and parameters are sought such that some criteria are met. today

RND Q

physical measure P

t−n

...

t− 3

t− 2

t− 1

t0

t1

t2

t3

...

tN = T

Fig. 1.1: Two worlds of Finance through a mathematical eye. Model risk appears under both P and Q. Backtesting is a very useful exercise that can be applied to get assurance that the models do not contradict reality. However, backtesting is done under P because by definition it requires historical data. On the other hand, model validation is a

page 6

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Introduction

Carte˙main˙WS

7

more complex process that embeds backtesting but also goes beyond using historical data and hence it covers both P and Q. There were other books on model risk published over the years, see [Gibson (2000)], [Satchell and Christodoulakis (2008)], [R¨ osch and Scheule (2010)] and [Morini (2011)]. Each offer valuable contributions to the issue of model risk. Nevertheless the product innovation and the associated model development in the last decade has generated even more model risk than what was previously recognized. Furthermore, there is currently a rift between academia and the finance industry, which is also noticeable in the books mentioned above. In this book I am trying to bridge this gap and offer examples of model risk from both sides. In my view model risk is intrinsically built in as soon as we propose models as simplified mechanisms of reality. Furthermore, I also draw attention to the model risk associated with risk measures, something that is usually swept under the carpet. Risk managers and model validators are just as exposed to model risk as financial engineers and quants helping trading desks. My book also departs from the previous books on the topic of model risk in that I strongly advocate the use of Bayesian modelling coupled with Markov Chain Monte Carlo techniques for extracting inference on financial markets. From the outset I recognise that in this book I shall not reveal which model is “the best” for pricing or hedging, for a particular asset class or financial product. Quite the contrary. The main aim is to illustrate, by offering sound formal theoretical arguments but also practical examples, that a lot of uncertainty and risk is inherent in financial modelling. This book presents a wide perspective on model risk related to financial markets, from financial engineering to risk management, from financial mathematics to financial statistics, from theory to practice, from classical concepts to some of the latest concepts being introduced for financial modelling. The book is aimed at graduate students, researchers, practitioners, senior managers in financial institutions and hedge-funds, regulators and risk managers, who are keen to understand the pitfalls of financial modelling. Quantitative finance is a relatively new science and much has been written on various directions of research and industry applications. In this book the reader gradually learns to develop a critical view on fundamental theories and new models being proposed. The book is most useful for those looking for a career in model validation, product control and risk management functions primarily but it is also useful for traders, structured finance analysts and postgraduate researchers.

page 7

April 28, 2015

12:28

8

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

This book is divided into three main parts1 , one dedicated to the financial engineering and financial mathematics aspects of models and methods, and hence dealing more with pricing, the second part covering the financial statistics side of models, and therefore linked more directly to risk management and a last part showing some examples discussed at length. Evidently, the book is based on a large set of examples that I have collected over the years from various sources. The set of results presented in this book are collected mainly from various academic and non-academic sources but there are also some results presented for the first time to an audience. While it would be impossible to cover extensively all asset classes, all financial products, and all theories in finance, I have deliberately covered in my book a wide range of models, results and applications in financial markets aiming to show that model risk is generally spread and we need to do more to deter its proliferation in the future. When I was writing this monograph I was fortunate to receive help in various forms from many great people that deserve acknowledgement. First and foremost I am greatly indebted to Stuart Heath, Byron Baldwin and the team at EUREX- Deutsche-Borse for their support over the years. Some of the interesting examples in this book were possible only because I could get some very interesting data from EUREX. Many of the ideas contained in this book came to my mind following fruitful discussions with many colleagues and friends working in Stochastic Analysis, Statistics, Operational Research, Financial Mathematics, Risk Management and Financial Econometrics. Hence, I am grateful for various insights to Dan Crisan, Frank Fabozzi, Stuart Hodges and Catalina Stefanescu. This book was started in my study leave term at Kent Business School, University of Kent and I am grateful for this opportunity. I have also used the book for a course on Model Risk delivered in the summer of 2015 to postgraduate research students at the Swiss Finance Institute, University of Zurich. Many people helped me revise and improve the initial manuscript. Here I say a big thank you to George Albota, Catalin Cantia, Walter Farkas, 1 I left out of this book the programming or implementation part associated with model risk. I hope others will complement the discussion in this text with an incisive view on programming issues related to models in finance. The research described in [Aruoba and Fernandez-Villaverde (2014)] shows that the speed of computation can vary widely across different programming languages. This is an area that deserves more future research, particularly with the advancement of algo trading and high frequency finance.

page 8

April 30, 2015

14:21

BC: 9524 - Model Risk in Financial Markets

Introduction

Carte˙main˙WS

9

Arturo Leccadito, Ciprian Necula, Natalie Packham, Tommaso Paletta, Ekaterini Panopoulou, Silvia Stanescu, Huamao Wang and Sherry Zheng. I am indebted to several people who helped with various issues related to improving the quality of the manuscript and wish to express my appreciation to Tamara Galloway, and to Hongyan Li and D. Rajesh Babu for their expertise. Last but not least, I wish to thank my editor-in-chief — Yubing Zhai — who believed in this project from the beginning. Needless to say, all mistakes are mine and I would appreciate it if you would let me know of any or if you have some comments about the book. My personal e-mail address is [email protected] and my current working address is [email protected]. I am also extremely grateful to my family for accepting my tempestuous moods when writing this book and for the family time I have sacrificed over three years to try and write my first book. My seven year old daughter already wants to step into her father’s steps; she would like to become an author. She has “models” in mind too, mermaids and fairies, all more exciting and sophisticated stuff...

page 9

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 2

Fundamental Relationships

2.1

Introduction

When undergraduate students first come across the subject of Finance, the most fundamental concepts they learn are present value of discounted cashflows, utility and investment performance measures such as Sharpe ratio. Evidently over time, these concepts were expanded upon so that from net present value one quickly gets to real options, from utility to indifference pricing and from Sharpe ratio to portfolio optimisation. However, the fundamental concepts are still taught in schools around the world. In this chapter some important concepts, models and theories from classical finance are reviewed and some pitfalls are pointed out. Thus, in Sec. 2.2 it is illustrated that the present value of an asset can be infinite even when the risk-free rate is finite and constant. The examples in Sec. 2.3 show that the expected utility may not be finite. Other sections focus on issues related to the Sharpe ratio.

2.2

Present Value

Present value calculations are fundamental in finance, being applied to fair value calculations of assets, in testing permanent income hypothesis or in developing inventory models. The present value, or PV, relies on forecasting a sequence of future values of the variable under study at finite or infinite horizons. [Pesaran et al. (2007)] point out that these predictions are linked to uncertainty and may vary widely, depending on the assumptions on the parameters as well as the form and stability properties of the data generating process associated with the cash flow sequence. The following result indicates that when the growth rate is greater than the discount rate 11

page 11

April 28, 2015

12:28

12

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

the present value of an infinite cash-flow series can be infinite. Proposition 2.1. Consider an economy with a riskless asset providing a positive, constant and equal rate of return ρ for any maturity and a risky asset {St }t≥0 with the price at time t given by a rational expectation model based on a cash-flow generated process {xt+j }j∈N that follows the dynamics of a geometric random walk model Δ ln xj = μ + σεj

(2.1)

where μ and σ are given constants applying for any time period, and the errors ε are i.i.d., with zero mean and unit variances. If Mε (σ) is the moment generating function of εt assuming it does exist, and if 1 + ρ < eμ Mε (σ) then, for any time t, the present value of the risky asset is infinite. Proof. The data generating process is in discrete time so it is natural to assume that type of dynamics for all subsequent probability calculations. For any t > 0 St = lim

N →∞

N  j=1

1 E(xt+j |Ft ) (1 + ρ)j

(2.2)

Using (2.1), for any given values of the parameters μ, σ, we can deduce that E(xt+j |Ft ; μ, σ) = ejμ [Mε (σ)]j xt with Mε (σ) the moment generating function of the error ε, when this exists. Replacing in(2.2) it can be observed that the limit in (2.2) is finite only 1 eμ Mε (σ) < 1. when (1+ρ) Notice that the phenomenon occurring in the above example is unrelated to arbitrage and it is a manifestation of an investment assumption scenario. 2.3

Constant Relative Risk Aversion Utility

Consider that for a random consumption x an economic agent has the expected Constant Relative Risk Aversion (CRRA) utility function  (1 − α)−1 E[x1−α ], α > 0; (2.3) E(U ) = E(log(x)) α = 1.

page 12

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Fundamental Relationships

Carte˙main˙WS

13

where α is the risk aversion parameter. If F is the cumulative distribution function of z = log(x) then whenever the moment generating function Mz (t) of z exists at t = 1 − α it follows that E(U ) = (1 − α)−1 Mz (1 − α) Therefore, for α = 1, if Mz (t) is not defined for t = 1 − α then the expected utility E(U ) does not exist. [Geweke (2001)] showed that the expected utility may not exist in the case of power utility functions associated with random consumption and he provided examples when the distribution of log consumption is either known, i.e. it has fixed known parameters, or unknown, that is when the parameters have their own distributions. The next example shows that there are known distributions for the log of random consumption such that the expected utility function does not exist. Proposition 2.2. If the log of the random consumption z is nonstandardized Student t(μ, σ 2 ; n) distributed then for any α = 1 the expected CRRA utility is not finite. Proof. When α > 1 lim (1 − α)−1



c→0

and when α < 1 lim (1 − α)−1

c→∞



exp[(1 − α)z]dF (z) = −∞

c



c 0

exp[(1 − α)z]dF (z) = +∞

This kind of example may also occur in more complex situations such as Bayesian learning. Assume now that F ≡ F (z; θ) and θ ∼ G(θ). When  α = 1, if E(z) = E[z|θ]dG(θ) exists then E(U ) = E(z). For α = 1  1 1 Eθ [Mz (1 − α; θ)] = Mz (1 − α; θ)dG(θ) E(U ) = 1−α 1−α if Mz (1 − α; θ) exists for all θ and is finitely integrable with respect to the distribution of θ. However, Bayesian learning is not improving the situation as the following example from [Geweke (2001)] shows. Proposition 2.3. Suppose that the random consumption of an agent is log normal z ∼ N (μ, σ 2 ). Furthermore, assume that σ 2 has the inverted gamma distribution ν0 s20 /σ 2 ∼ χ2 (ν0 ), and μ|σ 2 ∼ N (μ0 , q0 σ 2 ), where ν0 , s0 , q0 are positive numbers and μ0 ∈ R, all given. Then for α = 1 the expected utility fails to exist.

page 13

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

14

Carte˙main˙WS

Model Risk in Financial Markets

Proof. If α = 1 it is easy to see that E(U ) = μ0 as long as ν0 > 1 to avoid degeneracy. Now consider α = 1. First, let us assume that σ 2 is known. Then   1 exp (1 − α)μ0 + [(1 − α)2 (1 + q0 )]σ 2 /2 E(U ) = 1−α when ν0 s20 /σ 2 ∼ χ2 (ν0 ) it follows that the expectation  ∞ E[exp(tσ 2 )] = exp(tσ 2 )p(σ 2 )dσ 2 0

exists if and only if t ≤ 0. This implies that E(U ) =

   1 Eσ2 exp (1 − α)μ0 + [(1 − α)2 (1 + q0 )]σ 2 /2 1−α

fails to exist. 2.4

Risk versus Return: The Sharpe Ratio

An important measure to gauge the relative importance of an investment against its risk is the Sharpe ratio defined as the expected value of excess return divided by the standard deviation of excess return. SR =

E(RP ) − rf σP

(2.4)

where E(RP ) is the portfolio expected return, rf is the risk free rate and σP is the expected standard deviation, over the same period. The Sharpe ratio is used intensively in finance and the investments industry to rank portfolios and thus identify investment opportunities. 2.4.1

Issues related to non-normality

One would expect that, ceteris paribus, the investments with larger positive excess returns to be preferable. The following example proves otherwise1 . Proposition 2.4. It is possible to have an investment asset Y that performs no worse than another investment asset X in all states but the Sharpe ratio of Y to be less than the Sharpe ratio of X. Proof. Consider the assets with the excess returns described in Table 2.1. 1 An insightful discussion of this phenomenon as well as a technical solution are offered in [Cerny (2009)].

page 14

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Fundamental Relationships

15

Table 2.1: Example of good asset with bad Sharpe ratio. Probability excess return of asset X excess return of asset Y

0.3 -2% -2%

0.5 3% 3%

0.2 4% 40%

Table 2.2: Comparison of Sharpe ratios when risk free rate is rf = 5%. Expected return E(RP ) σp Sharpe ratio

Portfolio A 4.5% 20% -0.025

Portfolio B 4.5% 14% -0.0357

It is easy to see that E(X) = 0.017, σX = 0.024 so that the Sharpe ratio for asset X is equal to 0.693. Similarly E(Y ) = 0.089, σY = 0.157 so that the Sharpe ratio for asset Y is equal to 0.567. Hence, using the Sharpe ratio as an indicator of investment profitability leads to the conclusion that asset Y is less attractive than asset X, although clearly asset Y performs as least as well as asset X. The explanation behind this example is the non-Gaussianity, or more exactly the difference in (positive) skewness, since the distribution which is more skewed to the right has a higher standard deviation. 2.4.2

The Sharpe ratio and negative returns

The Sharpe ratio favors portfolios that have large excess returns and small risk as represented by the standard deviation. However, this idea works when the excess returns are positive. The following examples show that investors may draw the wrong conclusions when using Sharpe ratio. In the first example, the expected returns are identical but less than the risk-free rate. Considering the two portfolios A and B with summary data in Table 2.2 we can see that the portfolio with the higher Sharpe ratio is portfolio A. However, portfolio B has the same expected return but lower risk. One can argue that this situation is caused by negative excess returns but when this is the case the investor could simply move all the money into cash. However, there are other situations when changing the asset class is not as simple as it may seem. For example, consider that portfolios A and

page 15

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

16

Carte˙main˙WS

Model Risk in Financial Markets

Table 2.3: Comparison of Sharpe ratios when risk free rate is rf = 5%. Expected return E(RP ) σp Sharpe ratio

Portfolio X -5% 20% -0.5

Portfolio Y -4% 15% -0.6

B are directly linked to two projects in mining companies in different parts of the world and the investor knows that what differentiates the projects is the initial stage of ramping up mining in full production. Once this is done, it is expected that both mines produce the same quantity of a commodity that is then subsequently sold internationally for the same price. Hence, the two projects are differentiated by the risk to get the mines in full production from initial investment and the costs of doing so, which differ from country to country. Because of initial exploitation set-up costs the expected returns until the horizon when production is set-up and fully operational are less than the risk-free rate. We can also set-up an example involving different negative expected returns and different risks, but leading to a similar problem. The portfolios described in Table 2.3 indicate that portfolio X has a higher Sharpe ratio than portfolio Y. However the latter has higher return and lower risk. While these examples have more of an academic flavor they point out real problems, particularly in the aftermath of the subprime crisis. Not only it is possible to experience periods when many pre-existent portfolios are expected to produce negative returns, due to systemic risk for example, but also the risk-free rate may not be risk-free anymore. The last point is related to the downgrade of sovereign bonds by major rating agencies, so for an international investor it is difficult to see what asset will yield a truly risk-free rate. 2.5

APT

[Ross (1976)] provided an example of a market where an equilibrium pricing formula does not converge to the formula given by arbitrage pricing theory as the number of assets spanning the market increases. The set-up is quite simple, assuming the existence of a riskless asset with expected rate of return rf and a family of risky assets with i.i.d. Gaussian distributed returns R i = ρi + ε i

(2.5)

page 16

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Fundamental Relationships

17

where ρi is the expected ex-ante rate of return corresponding to asset i, with E(εi ) = 0 and E(ε2i ) = σ 2 , for all i ∈ {1, . . . , n}. In equilibrium, the no-arbitrage theory implies that there is no more idiosyncratic risk and therefore ρi ≈ rf ,

for all i ∈ {1, . . . , n}.

(2.6)

Proposition 2.5. Consider an economy where the representative market agent has a von Neumann-Morgenstern utility function of the constant absolute risk aversion (CARA) form U (z) = − exp(−Az)

(2.7)

where A is a positive constant real number. Assume that on this market there is exactly one unit of the riskless asset and each risky asset provides a random numeraire amount ci with mean mi and variance s2 , for all i ∈ {1, . . . , n}. Then the model described in equation (2.5) will be in contradiction with the no-arbitrage principle. Proof. Considering the obvious choice of the riskless asset as the numeraire assume that the agent’s wealth is W and the portfolio Π of the risky assets is identified by πi , the proportion of wealth invested in i-th risky asset. Then if R = [R1 , . . . , Rn ] and δ = [ρ1 , . . . , ρn ] E{U [W (rf + Π[R − rf 1n ]]} = − exp (−AW rf )E{exp −AW Π[R − rf 1n ]} = − exp (−AW rf ) exp (−AW Π[δ − rf 1n ])

2 σ 2  (AW ) Π Π × exp 2 The maximum is then found as the solution to system of the equations σ 2 (AW )Πi = ρi − rf

(2.8)

Walras’ law implies that W =

n  i=1

Πi W + 1 =

n 1  (ρi − rf ) + 1 Aσ 2 i=1

(2.9)

For asset i, if pi is the current price in the chosen numeraire then Ri = pcii . Assuming that all risky assets are in unit supply leads to pi = Πi W and n therefore W = i=1 pi + 1.

page 17

April 28, 2015

12:28

18

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Solving equation (2.8) for the unknowns pi gives pi =

1 [mi − As2 ] rf

so the expected returns are calculated as

 mi mi = rf ρi ≡ pi mi − As2

(2.10)

Remark that ρi does not change when n changes, in contradiction with no-arbitrage relationship (2.6). In addition to breaking the sought no-arbitrage condition, if mi > As2 it follows that the wealth W and the relative risk aversion AW increase when n increases. [Ross (1976)] provided a solution to the above counterexample suggesting that it is sufficient to assume that there is an increasing number of trades but a fixed number of assets. 2.6

Notes and Summary

When reporting large Sharpe ratios the analyst should bear in mind that these could be just the outcome of non-Gaussian returns, or simply arising from trading strategies having ex ante high negative skewness. Strategies based on options are a typical example. In this case, Sharpe ratios should not be used. Dealing with these non-normalities is the subject of future research. Even more worrying, Sharpe ratios are heavily influenced by the volatility of the strategy but this may not give a true measure of the embedded risk. In practice, when doing backtesting on performance measures of trading strategies, it is common to adjust the reported Sharpe ratios by roughly 50%. The main reason for this is data mining and data snooping. [Harvey and Liu (2014)] advocate against the standard 50% haircut for Sharpe ratios and they develop a multiple testing technology that will provide better haircut values, with the highest Sharpe ratios being penalized less softly while the marginal Sharpe ratios are penalized a lot heavier. Over the years some clear problems emerged more for the applicability of the Sharpe ratio as a performance measure. The most important, outlined in [Dybvig and Ingersoll (1982)] is that this measure can be unreliable in the case of a portfolio with nonlinear payoffs. [Cerny (2009)] extended the

page 18

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Fundamental Relationships

Carte˙main˙WS

19

definition of the Sharpe Ratio from quadratic utility to an entire family of CRRA (Constant Relative Risk Aversion) utility functions. There could be problems arising even in relation to fundamental ideas and theories. Here are some of the highlights of this chapter. (1) [Pesaran et al. (2007)] showed that stock prices can be quite sensitive to the assumptions on the uncertainty and instability of the parameters of the dividend process. They point out that in order to understand the dynamics of stock prices one should deal with the uncertainty surrounding the underlying fundamental process. (2) Moreover, [Pesaran et al. (2007)] argue that the impact of the uncertainty about the growth rate in different market states on the present valuations is directly linked to the finance literature, see [Timmermann (1993)], on how investors learning about the dividend growth process can generate the excess volatility behaviour patterns observed for asset prices. (3) The infinite horizon is not essential since even with a finite horizon, present values can be very sensitive to small changes in the estimated growth rate, particularly as the growth rate comes closer to the discount rate. (4) The Sharpe ratio does not work in standard form as a portfolio selection measure when it takes negative values, that is when expected returns are lower than the risk free rate. (5) Equilibrium pricing may be in contradiction with the no-arbitrage principle.

page 19

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 3

Model Risk in Interest Rate Modelling

3.1

Introduction

We shall denote by p(t, T ) the price at time t of a zero-coupon bond paying 1 dollar at maturity T . Then the continuous compounding yield to maturity at time t on this bond is ln p(t, T ) . (3.1) R(t, T ) = T −t For t < T1 < T2 we define the forward LIBOR rate as the interest rate L(t; T1 , T2 ) such that p(t, T1 ) − p(t, T2 ) 1 (3.2) L(t; T1 , T2 ) = T2 − T1 p(t, T2 ) Similarly, one can define the continuously compounded forward yield ρ(t; T1 , T2 ) as ln(p(t, T2 )) − ln p(t, T1 ) (3.3) ρ(t; T1 , T2 ) = − T2 − T1 The instantaneous forward rate at time t for future maturity T is defined as the interest rate given by 1 ∂p(t, T ) (3.4) ρ(t, T ) = − p(t, T ) ∂T It is evident then that  T 1 R(t, T ) = ρ(t, u)du. (3.5) T −t t Moreover, when the limits exist as T t, we can define the instantaneous interest rate, or short rate as r(t) ≡ R(t, t) = ρ(t, t) (3.6) The bond market models considered in this chapter are defined on a filtered probability space (Ω, F; F; Q) where F = {Ft }t≥0 . We also assume that the filtration F is the internal one generated by W and Q is the martingale pricing measure. 21

page 21

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

22

3.2

Carte˙main˙WS

Model Risk in Financial Markets

Short Rate Models

Short rate models are still used for simulating future interest rates embedded in cash-flow analysis. The Vasicek and CIR models are two of the most known and used models of short rate. Here we highlight the differences between interest rate paths that the two can generate, after calibration on the same data set. [Vasicek (1977)] developed a model for the risk-free rate of interest {rt })t≥0 , given by the following continuous-time SDE drt = k(b − rt )dt + σdWt

(3.7)

where k, b, σ > 0. The model is employed usually under a risk-neutral measure Q but there is no reason why the model given by (3.7) cannot represent a one-factor model with state rt under the physical measure. Hence, rt can be seen as either a short rate or a short term interest rate. The difference is slightly subtle, the first one being unobservable whereas the latter is observable such as 3-month Libor rates. This was the first important continuous-time model that is different from GBM. As it is well-known b is interpreted as the long-run mean riskfree rate1 , k represents the speed of mean reversion to b and σ is the local volatility parameter. It is not difficult to show that the conditional distribution of rt+u given the value of rt , where u > 0, is Gaussian with E[rt+u |rt ] = b + (rt − b)e−ku 1 − e−2ku var[rt+u |rt ] = σ 2 2k Hence, the long-term standard deviation of rt is √σ2k . Thus, a major difference between a mean-reverting model like Vasicek and a GBM model is that for the Vasicek model the variance increases with time but converges asymptotically to a constant level, while for GBM variance increases proportionally with time indefinitely. The great appeal of the Vasicek model is that it gives directly the formula for zero-coupon bond prices at time t for maturity T p(t, T ) = exp[A(t, T ) − B(t, T )rt ] where B(t, T ) =

1−e

−k(T −t)

k

and

σ2 σ2 B(t, T )2 . A(t, T ) = [B(t, T ) − (T − t) b − 2 ] − 2k 4k 1 It

can be proved that limt→∞ E[rt ] = b.



page 22

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

23

Once again taking advantage of the Gaussian distribution of terminal distribution, it is easy to determine the price of a European call, respectively put, option on a zero-coupon bond maturing at time T ∗ > T , with strike K and maturity T . The formula under the Vasicek model for call is Callt = p(t, T ∗ )Φ(d1 ) − Kp(t, T )Φ(d2 )

(3.8)

where 1 d1 = ∗ ln σ



p(t, T ∗ ) Kp(t, T )

+

σ∗ , d 2 = d1 − σ ∗ 2

 ∗ −2k(T −t) and where we denote by σ ∗ = σk [1 − e−k(T −T ] 1−e 2k . One major shortcoming of the Vasicek model is that the rate rt can become negative and depending on the parameters’ values, with quite significant probabilities. This issue was circumvented by the Cox, Ingersoll and Ross (CIR) model given by √ drt = k(b − rt )dt + σ rt dWt

(3.9)

where k, b, σ > 0. As with the Vasicek model, the CIR model produces analytical formulae for the zero-coupon bond prices p(t, T ) = exp[a(T − t) − b(T − t)rt ] 2(eγu −1) γu −1)+2γ , γ (γ+k)(e   2γe(γ+k)u/2 2kb σ 2 ln (γ+k)(eγu −1)+2γ .

where b(u) =

=



(3.10)

k 2 + 2σ 2 and

a(u) = As opposed to Vasicek, we are out of the Gaussian framework here. 2 −kT If q = σ (1−e it can be proved that conditional on r0 the distribution 4k of rqt is a non-central chi-squared distribution with d = 4kb σ 2 degrees of 0 freedom and non-centrality parameter α = σ2 (e4kr . This result allows the kT −1) derivation of a closed-form formula for the prices of European call options on zero-coupon bonds with maturity T ∗ , exercise date T and strike price K. The formula is given by Callt = p(0, T ∗ )χ2 (d, α1 ; v1 ) − Kp(0, T )χ2 (d, α2 ; v2 )

(3.11)

where χ2 (d, α; v) is the cumulative distribution function of the non-central chi-squared distribution with d degrees of freedom and non-centrality pa-

page 23

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

24

Carte˙main˙WS

Model Risk in Financial Markets

rameter α, and d =

4kb σ2 ,

γ=



k 2 + 2σ 2

α1 =

σ 2 (eγT

8γ 2 eγT rt − 1)(2γ + (γ + k + σ 2 b(T ∗ − T ))(eγT − 1))

α2 =

σ 2 (eγT

8γ 2 eγT rt − 1)(2γ + (γ + k)(eγT − 1))

δ=

a(T ∗ − t) − ln K b(T ∗ − T )

v1 =

2δ[2γ + (γ + k + σ 2 b(T ∗ − T ))(eγT − 1)] σ 2 (eγT − 1)

v2 =

2δ[2γ + (γ + k)(eγT − 1)] σ 2 (eγT − 1)

The great advantage of the CIR model is that if r0 > 0 and 2kb ≥ σ 2 then rt > 0 almost surely. However, please note that when 2kb < σ 2 then there is a time t such that rt ≤ 0 almost surely. Moreover, looking at the zero-coupon bond prices given by (3.10) it is easy to see that

p(T, T ∗ ) < exp {a(T ∗ − T )} and if it also happens that a(T ∗ − T ) < 0 at the maturity T of the option then we get that, for exercise price K close enough to 1,

p(T, T ∗ ) < exp {a(T ∗ − T )} < K < 1

which is impossible and will automatically give a call option price equal to zero! The Vasicek model does not behave well in the proximity of exercise price 1 either since it will provide a positive call option price for K = 1. This option price inflation is caused by the fact that rt can be occasionally negative under the Vasicek model.

page 24

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Interest Rate Modelling 25

18

CIR Vasicek

25

15

CIR Vasicek

16

20

CIR Vasicek

15

10

Probability density

14

Probability density

Probability density

April 28, 2015

12 10 8 6

10

5

4

5

2 0 −0.2

0

0.2

0.4

r(10) when r(0)=0.01

0.6

0 −0.2

0

0.2

0.4

r(10) when r(0)=0.05

0.6

0 −0.2

0

0.2

0.4

r(10) when r(0)=0.10

0.6

Fig. 3.1: Comparison of probability density functions under the Vasicek and CIR models for the value rate r at T = 10.

Taking advantage of the analytical formulae available we can plot the probability densities of the interest rate r under each model at some horizon T = 10. In order to have a meaningful comparison we need the parameters of the two models to be calibrated over the same set of data. Hence the parameter estimates for the Vasicek model will likely be different from the parameters for the CIR model. This point and an ad-hoc solution to get equivalent sets of parameters for the two short rate models has been discussed in [Cairns (2004)]. As an example I consider below the values σV asicek = 0.02, bV asicek = 0.04, kV asicek = 0.0151. Then for CIR we get σCIR = 0.10, bCIR = 0.04, kCIR = 0.018. The densities shown in Fig. 3.1 were calculated for different initial values r0 . It is clear that Vasicek gives a symmetric density, taking possible negative values, having the same dispersion irrespective of the values of r0 and only shifting to the right as r0 increases. On the contrary, the CIR model produces a skewed distribution, taking only positive

page 25

12:28

BC: 9524 - Model Risk in Financial Markets

26

Carte˙main˙WS

Model Risk in Financial Markets

values and getting a longer right tail as r0 increases.

30

16

CIR Vasicek

12

CIR Vasicek

14

25

CIR Vasicek

10

Probability density

20

15

10

Probability density

12

Probability density

April 28, 2015

10 8 6

8

6

4

4 5

0 −0.2

2

2

0

0.2

0.4

r(5) when r(0)=0.01

0.6

0 −0.2

0

0.2

0.4

r(5) when r(0)=0.05

0.6

0 −0.2

0

0.2

0.4

r(5) when r(0)=0.10

0.6

Fig. 3.2: Comparison of probability density functions under the Vasicek and CIR models for the value rate r at T = 5. What happens for shorter or longer maturities than T = 10? We repeat the calculation of the densities under the two models for the same three different values of r0 and for T = 5 and T = 20, respectively. The graphs in Fig. 3.2 depict the first set of densities. Similar comments can be made as before but now we can see that the two models are getting a lot closer together. On the other hand, looking at the graphs in Fig. 3.3 where T = 20 it is evident that the two models have very different distributions, in spite of having almost identical means. In fixed income markets where cash-flow models are employed for risk management policies it is quite important to be able to simulate a large batch of paths that can be used for what-if scenario analysis and testing whether the current portfolio can sustain eventual losses due to the combination of market factors including interest rate changes along the path. For simulation purposes we shall work with the discrete-time versions of

page 26

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Interest Rate Modelling 18

CIR Vasicek

18

16

16

Probability density

14 12 10 8 6 4 2 0 −0.2

0

0.2

0.4

r(20) when r(0)=0.01

0.6

14

12

12

8 6

10 8 6

4

4

2

2

0 −0.2

0

0.2

0.4

r(20) when r(0)=0.05

0.6

CIR Vasicek

16

14

10

27

18

CIR Vasicek

Probability density

20

Probability density

April 28, 2015

0 −0.2

0

0.2

0.4

r(20) when r(0)=0.10

0.6

Fig. 3.3: Comparison of probability density functions under the Vasicek and CIR models for the value rate r at T = 20.

the two models. Thus, for Vasicek the evolution equation2 is √ Δrt = k(b − rt )Δt + σεt Δt

(3.12)

while for the CIR model the corresponding equation is √ √ Δrt = k(b − rt )Δt + σ rt εt Δt

(3.13)

where εt ∼ N (0, 1) for all t. For application purposes we consider the initial rate to be r0 = 3%, the simulation horizon to be T = 1 year with a discrete time step Δt = 0.004, that is daily, the speed of mean reversion k = 0.07, the long-run level b = 2.50% and the volatility parameter σ = 2.25% Using this data 2 An exact discretization can be obtained using the analytical formulae above but for small Δt the Euler-Maruyama discretization used here works fine. I have employed the Euler-Maruyama because it is easier to compare the two models under the same discretization scheme and also because later in the book I will highlight problems that can appear with this approach.

page 27

April 28, 2015

12:28

28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

information on parameters we can simulate in a spreadsheet paths from both data generating processes. An interesting question here is whether the regulators should ask investment banks to demonstrate that they use the same interest rate models for generating future paths as they use for pricing the assets they have on their book. After all, in a simplified way, if the bank is using the Vasicek model say for showing that it passes cash-flow risk management tests, why should they sell an exotic interest rate option with the price coming from a three factor interest rate model? Interest rate models give results that can be so different in many ways. They should not be used selectively depending on the task in hand. The graphs in Figs. 3.4(a)–3.5(c) describe a few possible examples of such paths. The paths illustrated in Fig. 3.4(a) are as expected, going up and down around the long-run level. However, Fig. 3.4(b) shows an example where the paths can stay entirely above the long run-mean for the horizon of your simulation while Fig. 3.4(c) shows a down-trending scenario. One of the well-known problems with the Vasicek model is the fact that it can generate negative short rates. In Fig. 3.5(a) we can see such an example where the Vasicek path goes slightly negative but then it recovers. Figure 3.5(b) shows that once in the negative domain the Vasicek model path may not recover to the positive range. Finally, in Fig. 3.5(c) we observe that even the CIR model can generate paths that go very close to zero, although it is proved theoretically that they stay positive.

page 28

March 31, 2015

12:19

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Interest Rate Modelling

29

Simulation of short-term interest rates 5.0% 4.5% 4.0%

Vasicek 3.5% 3.0% 2.5%

Equilibrium line b

2.0% 1.5% 1.0%

CIR

0.5%

256

239

222

205

188

171

154

137

120

86

103

69

52

35

1

18

0.0%

(a) expected Simulation of short-term interest rates

12.0%

10.0%

Vasicek 8.0%

6.0%

Equilibrium line b

4.0%

CIR

2.0%

256

239

222

205

188

171

154

137

120

103

86

69

52

35

1

18

0.0%

(b) all rates above long-run level Simulation of short-term interest rates 7.0% 6.0%

Vasicek

5.0% 4.0%

Equilibrium line b

3.0% 2.0%

CIR 1.0%

256

239

222

205

188

171

154

137

120

103

86

69

52

35

1

18

0.0%

(c) downward trend

Fig. 3.4: First comparison of simulated paths for the Vasicek model and the CIR model having the same parameters.

page 29

March 31, 2015

12:19

BC: 9524 - Model Risk in Financial Markets

30

Carte˙main˙WS

Model Risk in Financial Markets

Simulation of short-term interest rates 4.0%

3.0%

Vasicek 2.0%

1.0%

Equilibrium line b

256

239

222

205

188

171

154

137

120

86

103

69

52

35

1

18

0.0%

CIR

-1.0%

-2.0%

(a) Vasicek negative and then recovers Simulation of short-term interest rates 5.0% 4.0%

Vasicek

3.0% 2.0%

Equilibrium line b

1.0%

256

239

222

205

188

171

154

137

120

103

86

69

52

35

1

18

0.0%

CIR

-1.0% -2.0%

(b) Vasicek negative and recover Simulation of short-term interest rates 4.0% 3.0% 2.0%

Vasicek

1.0%

256

239

222

205

188

171

154

137

120

86

103

69

52

35

1

18

0.0%

-1.0%

Equilibrium line b

-2.0% -3.0%

CIR

-4.0% -5.0%

(c) Vasicek negative and CIR goes to zero

Fig. 3.5: Second comparison of simulated paths for the Vasicek model and the CIR model having the same parameters.

page 30

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Interest Rate Modelling

3.3 3.3.1

31

Theory of Interest Rate Term Structure Expectations Hypothesis

The expectations hypothesis states that the expected returns on risk-free bonds of different maturities are equal. This is equivalent to saying that the future spot interest rates will equal the forward rates. The continuously compounded interest rate returns have been used mainly in the literature and then the expectations hypothesis is called the Logarithmic Expectations Hypothesis (Log EH), to differentiate it from other forms of returns calculations. Considering two zero-coupon bonds with maturity dates T1 < T2 , by equating the expected continuously compounded returns on the two bonds from time t to the bond’s maturity, we can express the Log EH as

Et [log p(T1 , T2 )] = log

p(t, T2 ) p(t, T1 )

(3.14)

Equivalent identities to Log EH can be obtained after some algebra ρ(t, T ) = Et [r(T )]

(3.15)

or

R(t, T ) =

1 T −t



T

Et [r(u)]du

(3.16)

t

Here we revise the argument3 made in [Cox et al. (1981)] that the Expectations Hypothesis in the bonds market is linked to three or four mutually contradictory “equilibrium” models and that the Expectations Hypothesis will not in general describe equilibrium among uniformly risk-neutral investors. Since default-free coupon-bearing bonds can be decomposed as portfolios of risk-free zero-coupon bonds it is sufficient to consider the latter as the main bond instrument in our exposition. [Cox et al. (1981)] make the following set of assumptions characterizing the economy A1 There is a set of N state variables Y = {Yn }n∈{1,...,N } whose current values completely specify all relevant information for investors. 3 [Cox et al. (1981)] provide a theoretical argument implying that the Log EH is “incompatible . . . with any continuous-time rational expectations equilibrium whatsoever”.

page 31

April 28, 2015

12:28

32

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

A2 The state variables are jointly Markov with movements determined by the system of stochastic differential equations: dYn (t) = μn (Y, t)dt + gn (Y, t)dZ(t)

A3

A4 A5 A6

A7

where μn is the expected change per unit time for the nth state variable. Z(t) is a K-dimensional standardized Wiener process and gn is a Kdimensional vector measuring the response of the nth state variable to each of the K ≤ N sources of uncertainty in the economy. gn gm is the covariance of changes in the nth and mth state variables and the variance-covariance matrix with these elements is positive semidefinite and of rank K (or positive definite if K = N ). Investors are nonsatiated, strictly preferring more wealth to less. Investors are sufficiently risk tolerant that they are willing to hold the risky assets at finite expected rates of return. All investors believe the economy to be as described in (Al) and (A2), and have the same homogeneous assessments of the parameters. All markets are competitive, and each investor acts like a price taker. There is a market for riskless instantaneous borrowing and lending at the endogenously determined equilibrium interest rate r and markets for longer term borrowing and lending at endogenously set prices. There are no taxes, transactions costs, or other institutional sources of market friction. Markets are open continuously, and trading takes place only at equilibrium prices.

Let p(Y, t, T ) denote the price at time t of a pure discount bond promising to pay one dollar at time T . Standard Itˆ o calculus applied in a contingent claim context implies that dp(Y, t, T )/p(Y, t, T ) = α(Y, t, T )dt + δ  (Y, t, T )dZ(t) with α and δ  δ the expected rate of return and variance of return, respectively. If the Expectations Hypothesis, that the expected rates of return on all bonds are equal to the current instantaneous rate of interest, were true, then the expected rates of return would be empirically observable as α(Y, t, T ) = r(Y, t). If the Expectations Hypothesis is not true then the expected rate of return might vary with maturity and would require further specification. Based on standard arguments one can show that there is some vector  (Y, t) independent of maturity T , such that function λ  (Y, t)δ(Y, t, T ) = r + α(Y, t, T ) = r(Y, t) + λ

K  k=1

 k δk λ

(3.17)

page 32

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Interest Rate Modelling

This can be rewritten with the notation where λn = α(Y, t, T ) = r(Y, t) +

N 

33

K k=1

k gnk , as λ

λn pYn (Y, t, T )/p(Y, t, T )

(3.18)

n=1

where pYn (Y, t, T ) = 3.3.1.1

∂p(Y,t,T ) . ∂Yn

General view

The Expectations Hypothesis implies in its broadest interpretation that equilibrium is characterized by an equality among expected holding period returns on all possible default-free bond investment strategies over all holding periods. Consider the (re)investment periods t0 < t1 < . . . < tn and the arbitrary bonds with associated maturities Ti , Ti ≥ ti . Then the theory says that, for any holding period t0 to tn , the expected return must be independent of the arbitrary reinvestment times and the bonds selected, or in other words: 

p(Y, tn , Tn ) p(Y, t1 , T1 ) p(Y, t2 , T2 ) ··· (3.19) = ϕ(tn , t0 ). E p(Y, t0 , T1 ) p(Y, t1 , T2 ) p(Y, tn−1 , Tn ) Nevertheless, equation (3.19) cannot be generally valid since for example, it requires that the expected return over the period (t0 , t1 ) on a bond maturing at t2 must equal the certain return on a bond maturing at t1 E[p(Y, t1 , t2 )] 1 = p(Y, t0 , t1 ) p(Y, t0 , t2 )

(3.20)

Similarly, the return expected over the period (t0 , t2 ) on a bond maturing at t1 and rolled over at maturity into a bond maturing at t2 must equal the guaranteed return on the bond maturing at t2

 1 1 1 = E (3.21) p(Y, t0 , t2 ) p(Y, t0 , t1 ) p(Y, t1 , t2 ) The two equations imply that 

1 1 = E p(Y, t1 , t2 ) E[p(Y, t1 , t2 )]

(3.22)

which contradicts Jensen’s inequality. This proves the following important result Proposition 3.1. In equilibrium it is impossible that all expected returns for all holding periods are equal.

page 33

April 28, 2015

12:28

34

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

In order to avoid expressing the Expectations Hypothesis in terms of “term premiums” formulation, one interpretation of the Expectations Hypothesis suggested by [Cox et al. (1980)] is that expected holding period returns are equal only for one specific holding period. The choice of holding period as the immediate “shortest” interval makes a link to continuous time finance and it is sometimes called the Risk-Neutral Expectations Hypothesis, which can be formulated as follows: E[dp(Y, t, T )] = r(t)dt p(Y, t, T )

(3.23)

[Cox et al. (1981)] called it the Local Expectations Hypothesis (L-EH), advocating that this is still an equilibrium condition and not a derivation of the universal risk-neutrality. The L-EH implies the pricing equation      T

r(s)ds |Y (t)

p(Y, t, T ) = E exp − t

The traditional EH, that the guaranteed return from holding any discount bond to maturity is equal to the return expected from rolling over a series of single-period bonds, in the sense of [Lutz (1940)], is expressed by the identity     T 1 = E exp r(s)ds |Y (t) p(Y, t, T ) t This is referred to as the Return-to-Maturity Expectations Hypothesis (RTM-EH). [Malkiel (1966)] expressed the hypothesis through yields    T −1 1 ln[p(Y, t, T )] = E rs ds|Y (t) T −t T −t t and this is called the Yield-to-Maturity Expectations Hypothesis (YTMEH). A different formulation simply states that forward rates and expected spot rates are equal, which can be shown to be equivalent to  T E[r(s)|Y (t)]ds − ln[p(Y, t, T )] = t

and is more traditional in the literature. This formulation is called the Unbiased Expectations Hypothesis (U-EH). As pointed out in [Cox et al. (1981)], the L-EH, RTM-EH, and YTM-EH, are pairwise incompatible. Moreover, only the L-EH is equivalent to the absence of term premiums in the expected instantaneous rates of return on long bonds. For all the other three versions, term premiums are always positive.

page 34

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

3.3.1.2

Carte˙main˙WS

35

Expectations hypothesis and compatible equilibrium

It is possible to have a form of expectations hypothesis holding true but no equilibrium model that underlies rational expectations. [Cox et al. (1981)] claim that only the L-EH is compatible with equilibrium. Moreover, the authors also claim that “the other versions are incompatible with any continuous-time rational expectations equilibrium whatsoever”. From (3.18) it follows that L-EH holds in an economy if there are no premiums. Recall that term premiums under the other traditional forms of the EH are positive. Taking advantage of the continuous-time model for L-EH and applying Itˆ o’s calculus d(1/p) = (−α + δ  δ)(1/p)dt − (1/p)δ  dZ(t) Under the RTM-EH the expected change in the reciprocal of the bond price is E[d(1/p)] = −r(1/p)dt so α(Y, t, T ) = r(Y, t) + δ  (Y, t, T )δ(Y, t, T ) Hence the premium of each bond is equal to its variance of returns. Applying Itˆ o’s lemma again, d(ln p) = (α − 1/2δ  δ)dt + δ  dz(t) Under the U-EH or the YTM-EH, E[d(ln p)] = rdt implying that α(Y, t, T ) = r(Y, t) + 1/2δ  (Y, t, T )δ(Y, t, T ) Under both hypotheses, it is necessary that  t)δ(Y, t, T ) = ρδ  (Y, t, T )δ(Y, t, T ) λ(Y, where ρ = 1 for the RTM-EH and ρ = 0.5 for the YTM-EH and U-EH. This identity ought to hold in all states at all times. Recall that the state variables Y are driven by a set of K independent sources of uncertainty, so in order to impose the relationship at both times t and t + dt there must be a functional relation among the K realizations of the Wiener processes. However, as these processes are independent we reach a contradiction. The above argument has been used to state the following result Proposition 3.2. The RTM-EH, YTM-EH, and U-EH are not compatible with a continuous-time rational expectations equilibrium.

page 35

April 28, 2015

12:28

36

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Moreover [Cox et al. (1981)] also advocated the following general result Proposition 3.3. The L-EH hypothesis is not compatible with a continuous-time rational expectations equilibrium in the generally cited case of universal risk-neutrality unless interest rates are nonstochastic or other special circumstances obtain. 3.3.2

A reexamination of Log EH

Here we present the example developed in [McCulloch (1993)] of a continuous time economy in which the Log EH holds for all pairs of maturities, that will obviously circumvent the general argument in [Cox et al. (1981)]. The economy has a single consumption good with competitive markets and no transaction costs. There are an infinite identical agents. If X(t) is the representative agent real endowment, output X(t) is in stochastic supply and there are no production decisions. Storage is infeasible for this good as it is assumed that it is a perishable good. When the utility density function has the CRRA form  log C, η = 1; u(C) = 1 1−η C , η ∈ (0, ∞) \ {1}. 1−η ∞ each agent maximizes his/her expected utility t Et [u(C(T ))e−θ(T −t) ]dT . Here θ is the pure rate of time preference and η is the relative rate of risk aversion. For simplicity we shall denote x(t) = log X(t) and w(t, T ) = Et [x(T )]. It is clear that {w(t, T )}0≤t≤T is a martingale so w(t + dt, T ) − w(t, T ) are uncorrelated innovations. [McCulloch (1993)] also assumes that these innovations are Gaussian with instantaneous variance g(m) depending on time to maturity m = T − t, not on t 1 [vart (w(t + dt, t + m))] (3.24) g(m) = lim dt 0 dt It can be shown (see [McCulloch (1993)] for details) that  T  m g(T − u)du = g(s)ds h(m) ≡ vart (x(T )) = t

0

so if g(m) ≥ 0 then h will be an increasing function. The assumption that agents are identical and the market in equilibrium will imply that C(t) = X(t). Without going into too much detail, see [McCulloch (1993)], the first order condition for expected utility maximization gives Et [u (x(T ))e−θ(T −t) ] p(t, T ) = u (x(t))

page 36

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

37

At time t, x(T ) is a random variable x(T ) ∼ N (w(t, T ), h(T − t)) and under the CRRA assumption log u (X(T )) ∼ N (−ηw(t, T ), η 2 h(T − t)). It follows then that 1 log p(t, T ) = η[x(t) − w(t, T )] − θ(T − t) + η 2 h(T − t) 2 Differentiating with respect to T leads to ρ(t, T ) = η

∂w(t, T ) 1 + θ − η 2 g(T − t) ∂T 2

(3.25)

Defining the instantaneous logarithmic premium as π(m) = ρ(t, t + m) − Et [r(t + m)] the Log EH is equivalent to π(m) = 0 for all m. Taking the limit as T t gives 1 ∂w(t, T ) |T =t + θ − η 2 g(0) ∂T 2 1 2 ∂w(t, T ) + θ − η g(0) Et [r(T )] = η ∂T 2 r(t) = η

Hence π(m) =

1 2 η [g(0) − g(m)] 2

(3.26)

Therefore, a sufficient condition for the Log EH to hold in the economy described above is that g(m) = g0 ,

∀m

(3.27)

Remark that this condition is not necessary. If investors are risk-neutral, i.e. η = 0, the term premium will vanish and the Log EH will hold, regardless of the values of g(m). Nevertheless, in the economy model presented, this will also imply that the interest rates will be non-stochastic and will be all equal to θ. Here is a constructive example given in [McCulloch (1993)] of processes w(t, T ) that satisfy the Log EH and also lead to stochastic interest rate term structure. Proposition 3.4. It is possible to have a continuous time economy model such that the Log EH is compatible with the continuous-time rational expectations equilibrium.

page 37

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

38

Carte˙main˙WS

Model Risk in Financial Markets (1)

(2)

Proof. Let {Zt }t≥0 and {Zt }t≥0 be two uncorrelated Wiener processes with zero means and unit volatilities (1)

dw(t, T ) = ψ1 (m)dZt

(2)

+ ψ2 (m)dZt

(3.28)

where ψ1 and ψ2 are nonnegative differentiable functions, ψ1 increasing and ψ2 decreasing such that ψ1 (m)2 + ψ2 (m)2 = g0

(3.29)

with g0 a constant. Then it follows that g(m), the instantaneous variance is equal to g0 for any m. The concrete examples of the basis functions ψ1 and ψ2 are √ ψ1 (m) = g0 (1 − e−m )  ψ2 (m) = g0 (2e−m − e−2m ) Then one can calculate that (1)

dρ(t, T ) = η[ψ1 (m)dZt

(2)

+ ψ2 (m)dZt ]

The forward rates are stochastic for all maturities, and likewise R(t, T ) and r(t). 3.3.3

Reconciling the arguments and examples

The two analyses presented above seem to be contradictory. There are certain nuances that must be clarified. For the two processes example (3.28) presented in [McCulloch (1993)], after some standard stochastic calculus one can show that  T )dt + δ  p(t, T )dZt dp(t, T ) = [rt + λδ(m)]p(t,

(3.30)

k = ηψk (0). This equation is a particwhere δk (m) = η(ψk (0) − ψk (m)), λ ular case of the general example followed in [Cox et al. (1981)]. Under the L-EH the expected compounded instantaneous return on a bond with any maturity T is Et [d ln p(t, T )] = rt dt, which implies that 1  (3.31) λδ(m) = δ  (m)δ(m). 2 [Cox et al. (1981)] advocate that if this identity is true for all states of the world at all points in time t, then there must exist a functional relationship between the driving processes Z1 and Z2 , which is impossible since these are taken to be independent from the outset. Hence the conclusion that disequilibrium arbitrage opportunities are permitted under the L-EH.

page 38

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

39

[McCulloch (1993)] remarked that, when condition (3.29) is true, it  for all m, and therefore nofollows directly that 12 δ  (m)δ(m) = λδ(m), arbitrage is introduced in this particular case. The difference between the two analyses is explained by the fact that [Cox et al. (1981)] assume that the economy at any time t is explained by a finite set of state variable Yn , while [McCulloch (1993)] assume an infinite set of dimensional state variables. This crucial assumption is in addition to rational expectations, general equilibrium, and continuous time assumptions. The continuous time rational expectations equilibrium does not require the diffusion coefficients of the pricing model to depend nontrivially on all the state variables. While theoretically there seems to be logical support that the L-EH is a mathematically correct equilibrium characterization of the term structure in continuous time, empirically the L-EH hypothesis has been rejected for the short term maturities (see [Roll (1970); McCulloch (1975); Fama (1984)]).

3.4 3.4.1

Yield Curve Parallel shift of a flat yield curve

Very often in the financial literature it is assumed that the term structure of interest rates is flat and possible modifications consists in parallel shifts of the curve only. Unfortunately the combination of these two assumptions leads to arbitrage as the following example described in [Baz and Chacko (2004)] shows. Proposition 3.5. A barbell bond portfolio with maturities t − δt and t + δt has higher convexity than a t-year zero coupon bond with identical present value and identical modified duration. Proof. Let p(0, t) be the price of a zero-coupon bond with maturity t. The interest rates are all assumed to be equal to r > 0. Then the modified t , for any maturity t. duration of the bond with maturity t is 1+r The barbell is a bond portfolio with maturities each side of the target maturity t such that its value is equal to the position in the bond with maturity t and its modified duration is matching the duration of this bond too. If n− , n and n+ are the number of bonds with maturities t − δt, t and t + δt, respectively then the barbell bond portfolio is identified from the

page 39

April 28, 2015

12:28

40

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

following equations  n− p(0, t − δt) + n+ p(0, t + δt) = np(0, t) t−δt t+δt t 1+r n− p(0, t − δt) + 1+r n+ p(0, t + δt) = n 1+r p(0, t) Solving this system leads to the solution np(0, t) 2 However, the convexity of the barbell portfolio is equal to n− p(0, t − δt) = n+ p(0, t + δt) =

convbarbell =

t2 + t + (δt)2 (1 + r)2

and the convexity of the target bond is only convbond =

t2 + t (1 + r)2

which is smaller. The mismatch in convexity allows the introduction of arbitrage. 3.4.2

Another proof that the yield curve cannot be flat

Here we shall provide another example showing that the yield curve cannot be flat. A similar example is described in [McDonald (2006)]. We assume that the interest rates can change stochastically, with short rates following the general equation drt = μ(rt )dt + σ(rt )dWt

(3.32)

where {Wt }t≥0 is a Wiener process. Suppose that at any point in time the zero-coupon bonds at all maturities have identical yield to maturity. The interest rates are allowed to change but when they do the yields for all bonds change uniformly in such a way that the yield curve stays flat and it is equal to yt > 0 at time t for all future maturities. Hence, yt = yt (rt ) so applying Itˆ o’s formula leads to a generic equation of the following type dyt = [. . .]dt + σ(rt )dWt

(3.33)

Remark that this equation implies that the yield at time t is the same for all maturities, but as we move in time the yield to maturity is changing from constant levels to constant levels. For simplicity, the price of the zero coupon bonds are given evidently by p(t, T ) = e−yt (T −t)

page 40

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

41

Consider then a bond portfolio Π with n zero coupon bonds with maturity T1 and one bond with maturity T2 . The value of this portfolio at time t is Πt = np(t, T1 ) + p(t, T2 ) Applying Itˆ o’s formula again and taking into consideration that the price of the bond depends only on the level of yield yt and time to maturity dΠt = ndp(t, T1 ) + dp(t, T2 ) = −[n(T1 − t)p(t, T1 ) + (T2 − t)p(t, T2 )]dyt 1 + σ 2 [n(T1 − t)2 p(t, T1 ) + (T2 − t)2 p(t, T2 ) + yt (np(t, T1 ) + p(t, T2 ))]dt 2 Taking (T2 − t)p(t, T2 ) n=− (T1 − t)p(t, T1 ) the exposure to change in interest rate yield dyt is eliminated. Hence dΠt = rt Πt dt Replacing in the above formula on both sides, after some algebra we get that σ 2 (T2 − t)(T2 − T1 ) = 0 Since the volatility σ 2 = 0 and T2 = t it implies that the only possibility left is T2 = T1 , which is a contradiction. Remark that if σ = 0 then we would not get a contradiction. However, we would then have deterministic evolution of interest rates. The next examples show that it is not possible to have yields that are independent of maturity and deterministic, nor are the interest rates constant rt ≡ r > 0. 3.4.3

Deterministic maturity independent yields

Here we assume that the yield to maturity is independent of maturity and it is deterministic, so that we know from the beginning (time 0) the future constant level yt of the yield rates that will be seen at time t in the future. The following example, presented in [Capinski and Zastawniak (2003)], points to a contradiction vis-a-vis the historical evolution of yields in the financial markets rather than a mathematical contradiction. Proposition 3.6. If the yields {yt }t≥0 are at each point in time independent of maturities of bonds and they are deterministic then yt = y0 for any t > 0.

page 41

April 28, 2015

12:28

42

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Proof. Assume that y0 > yt for any t > 0. Then consider the following strategy, where interest rate calculations are done with continuous compounding. Borrow one dollar until time t and deposit the dollar for maturity t + 1. This is done using the current yield rate y0 . Later on at time t pay back the loan ey0 t by borrowing this amount for another extra year maturity. The lending rate is now yt . At time t + 1 the strategy provides ey0 (t+1) against a liability of ey0 t eyt . The net balance is ey0 t [ey0 − eyt ] > 0 which shows that there will be arbitrage. Similarly it cannot be true that y0 < yt , so the only option is that y0 = yt , for any t. However, there is ample evidence that the yields determined from bond prices vary with time. This means that the only way we can model the yields to be maturity independent is to consider them to be random. 3.4.4

Consol modelling

Consol bonds have been traded for more than 150 years. By 1850 the Paris Stock Exchange had become the world market for perpetual government bonds, see [Taqqu (2001)] for an inspiring discussion. This market influenced Louis Bachelier to study probabilistic modelling for financial assets. [Brennan and Schwartz (1982b)] is one of the earliest pieces of research concerning the use of consols in the fixed-income area in modern times. They introduced an early version of stochastic two factor models, modelling both the spot interest rates and the consol rates. Their model was an equilibrium model for bond pricing aimed at testing market efficiency. Unfortunately, the joint dynamics of the short and long rates can be unstable in the sense that the long rate may explode in finite time as pointed out by [Hogan (1993)]. This is not very useful for Monte-Carlo simulations using interest rate modelling embedded in other valuation and risk management frameworks such as actuarial pricing. Under risk neutrality the price of a consol equals the expectation of the payoffs discounted along the short rate paths. Therefore it is necessary that the model parameters satisfy some constraints for consistency between the short and consol rate processes. [Duffie et al. (1994)] identified a particular choice of volatility such that the consol and short rate processes are compatible. Present value calculations are fundamental to all areas of finance. For portfolio selection, risk management, asset management and structured finance, calculating the present value of assets such as stocks and bonds is

page 42

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

43

paramount. A large number of textbooks and articles in finance make the explicit or tacit assumption of constant interest rates. Usually this is done in order to simplify the exposition and keep the mathematics to simple levels. Unfortunately, this assumption is then transferred to applications in the finance industry and even to academic research output. The main “advantage” of assuming a flat constant interest rate is a straightforward discount factor calculation. With continuous compounding and a constant interest rate r per annum, this would be e−rT for maturity T , and (1+ r1 )mT m when discrete compounding with m periods per year is used. Complex risk management exercises that focus on other market risk factors than interest rates “benefit” computationally from this simplification. Here we show that assuming constant interest rates and working at the same time with discrete compounding leads to arbitrage in consol markets. Suppose that interest rates are constant r for any maturity and the interest rates calculation is applied with discrete compounding, with m > 1 periods in one year. Consider two perpetual bonds. The first bond pays a fixed coupon Ca annually while the second perpetual bond pays a coupon Cm = Cma at the end of each period. The price of the first bond at time zero is pa (0, ∞) =

∞ 

Ca Ca = i (1 + r) r

(3.34)

Ca Cm Cm = r j = (1 + m ) r/m r

(3.35)

i=1

The price of the second bond is pm (0, ∞) =

∞  j=1

We can then state the following result Proposition 3.7. In a market where interest rates are constant and equal to r > 0 for any maturity, and calculations are realised with discrete compounding, there is arbitrage. This counterexample is based on showing that the price of a consol paying a fixed sum Ca annually is equal to the price of a consol paying a fixed sum Cm = Cma every period, with m periods per year. Therefore the two bonds have an identical price. However, it is clear that the time value of money principle indicates that the second bond is more valuable since, each year, the second bond paid pays pro-rata earlier the same amount of money that the first bond pays at the end of the year. This is a paradoxical situation since the present value of the cash flow paid over one year only

page 43

April 28, 2015

12:28

44

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

by the consol paying more frequently is larger than the present value of the consol paying only annually. However, this inequality cannot be summed to infinity, for all years, as the result above proves. If the interest rate is still constant but calculations are done with continuous compounding we get the following calculations. For the first bond ∞  Ca (3.36) Ca e−ri = r pa (0, ∞) = e −1 i=1 while for the second bond pm (0, ∞) =

∞ 

Cm e−rj

(3.37)

j=1

=

Cm r/m e −

1

=

Ca r/m m(e

− 1)

(3.38)

er − 1 (3.39) m(er/m − 1) The two bonds have different prices now but there is no paradox now. The price of the second bond is more expensive than the price of the first bond. = pa (0, ∞)

Proposition 3.8. Consider a market where interest rates are constant and equal to r > 0 for any maturity and calculations are realised with continuous compounding. Then the price of a consol paying a fixed sum Ca annually is less than the price of a consol paying a fixed sum Cm = Cma every period, with m periods in one year. Here is an elementary proof of this result. We need to prove that er − 1 >1 (3.40) m(er/m − 1) Since we are working only with positive interest rates r the above inequality is equivalent to er − 1 > m(er/m − 1). Making the notation x = er/m the transformed inequality is xm − mx + (m − 1) > 0 Now we have the following series of equivalent inequalities xm − mx + (m − 1) > 0 x(xm−1 − 1) − (m − 1)(x − 1) > 0 (x − 1)(xm−1 + xm−2 + . . . + x − (m − 1)) > 0 (x − 1)2 (xm−2 + 2xm−3 + . . . (m − 2)x + m − 1) > 0

page 44

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

45

Since x > 0 the result follows. Remark that the inequality becomes equality only when r = 0 in which case the price of any consol is infinite. Where is the puzzle coming from or what lies behind it? The explanation is simple and resides in the concept of speed of convergence of series of real numbers. In other words, we have two infinite series that converge to the same number, but one series does it faster than the other. One can calculate easily the rate of convergence for the two series. For 1 while for the more frequent the annual coupon series the rate is μ = 1+r 1 payment series it is μ = 1+ r . This proves that the latter series converges m faster than the first series. Requiring that each year the same cumulative payment is made to bond holders leads to arbitrage because of the time value of money. 3.5

Interest Rate Forward Curve Modelling

Since we are going to focus also on the forward curve modelling it is useful to use the Musiela parameterisation for the HJM framework for short rates models, with x denoting the time to maturity. Hence, p(t, x) is the price at time t of a zero-coupon bond maturing at time t+x. Then the instantaneous forward rates are defined as r(t, x) = −

∂ log p(t, x) ∂x

(3.41)

The short rate becomes  t then R(t) = r(t, 0) and the money account B is given by B(t) = exp{ 0 R(s)ds}. This set-up follows [Bjork (2001)]. Given an initial (observed) forward curve {ro (0, x); x ≥ 0} a forward rate model is described by the system  dr(t, x) = β(t, x)dt + σ(t, x)dW (t), (3.42) r(0, x) = ro (0, x). The well-known HJM drift condition is translated under the Musiela parameterisation and under the Q measure the dynamics followed by the forward rate curve are described by 

 x ∂ r(t, x) + σ(t, x) σ(t, u) du dt + σ(t, x)dW (t) dr(t, x) = ∂x 0 r(0, x) = ro (0, x) [Bjork (2001)] and [Bjork and Christensen (1999)] looked at the important issue of consistency between the dynamics of a given interest rate

page 45

April 28, 2015

12:28

46

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

model and the forward curve family employed. What they meant by consistency is that, given an interest rate model M and considering a particular family G of forward rate curves such that the interest rate model is calibrated using this family, the pair (M; G) is consistent if all forward curves which may be produced by the interest rate model M are contained within the family G. Otherwise, the pair (M; G) is inconsistent. The concept of consistency between the interest rate models and the forward curves family is important because then the interest rate model actually produces forward curves which belong to the relevant family. If M and G are inconsistent, then the interest rate model will produce forward curves outside the family used for calibration purposes, resulting in a recalibration of the model parameters all the time, not because the model is an approximation to reality, but because the family does not fit with the model. In this context it is very important to define the concept of an invariant manifold. Definition 3.1. Consider the forward rate process with dynamics given in (3.43). A fixed family (manifold) of forward rate curves G is locally invariant under the action of r if, for each point (s; r) ∈ R+ × G, the condition rs ∈ G implies that rt ∈ G, on a time interval with positive length. If r stays forever on G, then G is called globally invariant. If G : A × R+ → R is a mapping on a finitely parameterized family of forward rate curves, then4 a forward curve manifold is defined as G = Im(G). If the volatility function σ = σ(r, x) is a functional of the infinite dimensional r-variable and a function of the real variable x, we can define the operator  x σ(r, s)ds H (σ(r, x)) = 0

and then the dynamics of the forward rates can be rewritten   ∂  r(t, x) + σ(rt , x)Hσ(rt , x) dt + σ(rt , x)dW (t) dr(t, x) = ∂x

(3.43)

The next result5 is crucial for deciding whether a forward rate model is consistent with a given family of forward rate curves. Theorem 3.1. Let G z and G x denote the Frechet derivatives of G with respect to z and x, respectively. Then the forward curve manifold G is 4A

more detailed technical description is given in [Bjork (2001)]. a formal proof see [Bjork and Christensen (1999)].

5 For

page 46

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Interest Rate Modelling

47

locally invariant for the forward rate process r(t; x) in M if and only if, 1 (3.44) G x (z, x) + σ(r)Hσ(r) − σr σ(r) ∈ Im[G z (z, x)] 2 (3.45) σ(r) ∈ Im[G z (z, x)] hold for all z ∈ A with r = G(z), where A ⊆ Rd is the parameters domain. The condition (3.44) is called the consistent drift condition, while the condition (3.45) is interpreted componentwise for σ and is called the consistent volatility condition. From an applied finance point of view consider the Nelson-Siegel (NS) forward curve6 that is a manifold G parameterized by z ∈ R4 , the curve being described by the application x → G(z; x) G(z, x) = z1 + z2 e−z4 x + z3 xe−z4 x

(3.46)

For z4 = 0, the Frechet derivatives are G z (z, x) = [1, e−z4 x , xe−z4 x , −(z2 + z3 x)xe−z4 x ]

(3.47)

G x (z, x) = (z3 − z2 z4 − z3 z4 x)e−z4 x

(3.48)

The natural parameter space is = {z ∈ R4 : z4 = 0, z4 > −γ/2} and when z4 = 0, it follows that G(z, x) = z1 + z2 + z3 x. The extended Vasicek model proposed by Hull and White (HW) is given by the short rate SDE dR(t) = [ψ(t) − αR(t)]dt + σdW (t)

(3.49)

with α, σ > 0. The equivalent forward rate equation is dr(t, x) = β(t, x)dt + σe−αx dWt

(3.50) −αx

. The conditions This implies that the volatility function is σ(x) = σe that must be verified for consistency are given in Theorem 3.1 and they are in this case G x (z, x) +

 σ 2  −αx e − e−2αx ∈ Im[G z (z, x)] α σe−αx ∈ Im[G z (z, x)]

(3.51) (3.52)

6 Bjork

considers a weighted Sobolev space where a generic point will be denoted by r for the forward rate curves. If γ > 0 is a fixed real number the norm 2  ∞  ∞ dr r2γ = (x) e−γx r 2 (x)e−γx dx + dx 0 0 will make the space of all differentiable functions r : R+ → R satisfying rγ < 1 a Hilbert space.

page 47

April 28, 2015

12:28

48

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Checking whether the Nelson-Siegel (NS) manifold of forward curves is invariant under HW dynamics we start with the second condition and fix a vector of parameters z. Then we search constants a, b, c, d such that for all x ≥ 0 σe−αx = a + be−z4 x + cxe−z4 x − d(z2 + z3 x)xe−z4 x

(3.53)

Since this identity is true only when z4 = α, this will contradict the consistent volatility condition. Hence an important result for model validation follows directly from the discussion above. Proposition 3.9 (Nelson-Siegel vs Hull-White). The Hull-White extended Vasicek model is inconsistent with the Nelson-Siegel family of forward curves. Hence the NS manifold is not large enough for the HW model. If the initial forward rate curve is on the manifold, then the HW dynamics will force the term structure off the manifold within an arbitrarily short period of time! Setting α = 0 in (3.49) leads to the continuous time version of the HoLee model. Therefore, the next similar result can be derived under the same theory developed by [Bjork (2001)]. Proposition 3.10 (Nelson-Siegel vs Ho-Lee). (1) The full NelsonSiegel family is inconsistent with the Ho-Lee model. (2) The degenerate family G(z; x) = z1 + z3 x is in fact consistent with Ho-Lee. These two examples raise the question whether any interest rate model is consistent with the Nelson-Siegel family of forward rate curves. [Filipovic (1998)] proved the following important result. Proposition 3.11. There is no non-trivial Wiener driven model that is consistent with the Nelson-Siegel family of forward curves. 3.6

One-factor or Multi-factor models

[Longstaff et al. (2001)] analysed the financial impact of employing singlefactor interest rate models to determine the exercise strategies for American swaptions when in fact the term structure is actually driven by multiple factors. They showed that the costs of using a suboptimal exercise strategy due

page 48

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

49

to the one-factor models could be very large, of the order of several billion dollars. Moreover, they also pointed out that when the dynamic specification of a model does not match actual market dynamics, the American exercise strategy implied by this model will be suboptimal no matter how frequently the analysts recalibrate the parameters from the latest cross section of current option prices. Furthermore, if the model is misspecified the American option values implied by the model will be biased estimates of the actual present value of cash flows generated by following the exercise strategy based on the model. Hence, it is wrong to overfit misspecified models and to frequently recalibrate simplistic models to overcome the lack of an alternative more appropriate model. Another important point made there was that using one factor models will introduce a significant discrepancy between the value of a derivative and the value of the dynamic hedging portfolio that attempts to replicate its payoff. [Longstaff et al. (2001)] also emphasize that the wrong principle of employing simple models that do not fit well over long periods but that are “updated” as it is needed, was a common practice in many options markets. As an example they hinted that practitioners used to continually update implied Black-Scholes volatilities to correct for calibration issues. Calibrating one-factor models from data may be a less than straightforward process. The BDT model has been used for many years in the industry as the main interest rate model. A BDT model can be calibrated either with the short rate volatility method by employing the current term structure of zero coupon bond prices and the term structure of future short rate volatilities, or with the yield rate volatility method by employing the current term structure of zero coupon bond prices and the term structure of yields on zero coupon bonds. For the former case [Sandmann and Sondermann (1993)] identified conditions for the calibration to be feasible. [Boyle et al. (2001)] covers the latter case and they derive mathematical conditions under which calibration to the yield volatility is feasible. Comparing the two approaches it is clear that it is technically more difficult to work with the yield volatilities. Furthermore, [Boyle et al. (2001)] also give some examples when the calibration based on the yield volatility breaks down for apparently plausible inputs. [Black and Karasinski (1991)] generalized the BDT model. If Xt = ln rt then the Black-Karasinski model is described by the SDE under a riskneutral measure (3.54) dXt = k(t)(ln b(t) − Xt )dt + σ(t)dWt where k(t), b(t) and σ(t) are all deterministic functions of time. These

page 49

April 28, 2015

12:28

50

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

functions are calibrated using the zero-coupon yield curve, the associated curve of volatilities and one interest derivatives curve such as the ATM caps. The model is not leading to analytical solutions for either bonds or option prices and analysts must resort to numerical solutions. However, [Hogan and Weintraub (1993)] discovered that under the Black-Karasinski model Et [B(t + s)|B(t)] = ∞, where B is a money market account and t, s > 0. This technical drawback has the practical effect that the Black-Karasinski model cannot be used to price Eurodollar futures contracts. One-factor models are still taught in business schools and used in industry by various participants. The main reason is analytical availability and computational speed. For this class of interest rate models it is known that all interest rates with various maturities are positively correlated. Moreover, the impact of the drift on a daily basis is not significant and in studying the evolution of yield curves, drift can be ignored. Taking these conditions into account [van Deventer (2011)] pointed out that when a positive shift occurs in the single factor automatically all the zero coupon yields at all maturities will rise. In contrast, when a negative shift occurs in the single factor then all zero coupon yields will fall and if the single factor stays unchanged then no change is expected to happen to any of the other zero coupon yields either. This observation allows the risk manager to validate a one factor model for interest rates by gathering empirical evidence on the percentages of the time all interest rates either rise together, fall together, or stay unchanged. If evidence is found that indicates some inconsistency with actual yield movements then the one factor model should not be validated for use in interest rate risk management and asset and liability management. [van Deventer (2011)] considered the same data as in [Dickler and van Deventer (2011b)] and [Dickler and van Deventer (2011a)], the U.S. Treasury yields reported by the Federal Reserve in its H15 statistical release between 2 January 1962 and 22 August 2011. The daily changes in yields were classified into four categories: negative shift only, positive shift only, zero shift only and other. The latter category corresponds to combinations of different shifts for different rates and they are called twists. The results obtained by [van Deventer (2011)] reveal that 7,716 of the 12,386 days of data showed yield curve twists. This is 62.3% of all data points and is clear evidence of inconsistency with the assumptions of one factor term structure models. This observation has important implications for finance because using one factor models for interest rate term structure means that the interest rate risk will be calculated incorrectly, and the risk manager is

page 50

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

51

not even aware of the presence or magnitude of error. The simulated yield curves from one factor models will not exhibit a twist, yet in almost two thirds of the situations the realised curves will be determined by twists. Even for ABS desks, using one factor interest rates can be problematic because most of the time the hedging programmes focus on duration. But the duration approach in a one factor term structure model implies that interest rates increase or decrease in a unidirectional joint way. This means that 62% of the time this hedging strategy would have been inappropriate. Needless to say, all other risk management wide measures such as economic capital, counterparty credit risk or liquidity risk will be mis-calculated. It is a similar story when the modelling is centred on the forward rate curve rather than on the yield curve. [Buraschi and Corielli (2005)] describe the time inconsistencies that may appear when using models from the HJM family. This is a very important question for the risk manager. Is the model selected by a bank or financial institution complex enough to generate curves that cover the observed or realised term structure curves from the past? On the other hand, is the model too complex and is generating curves that have never been observed in practice? Furthermore, [Filipovic (2009)] discusses the nonexistence of HJM models with proportional volatility which apparently was one of the major reasons for the introduction of LIBOR market models in fixed income markets. There is increasing evidence, see [Jarrow (2009)], that multi-factor models with more than three factors are actually required in practice, particularly when exotic interest rate derivatives are traded. At the same time, having more factors means also increased parameter estimation risk.

3.7

Notes and Summary

A great review of affine term structure models, including empirical evidence and problems related to parameter estimation is given by [Piazzesi (2010)]. [Longstaff (2000)] points out a series of examples indicating that the fixed income markets may not be complete. He also argues that “if fixed income markets are incomplete, the traditional forms of expectation hypothesis can hold without arbitrage”. Thus whether or not the expectation hypothesis holds is an empirical matter. In a thought provoking work [Osborne (2014)] reveals that the algebraic equation of finding the internal rate of return or the yield of a bond resulting from the time value of money equation may have multiple solutions. He

page 51

April 28, 2015

12:28

52

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

shows some interesting puzzles in fixed income markets and shows that the simple interest rate can be more useful for understanding consumer loans for example if the whole spectrum of algebraic is taken into consideration. What is the impact of using a single-factor term structure model instead of a “real” multi-factor one? [Longstaff et al. (2001)] analysed the costs of following single-factor exercise strategies for Bermudan swaptions under the assumption that the actual term structure framework has multiple factors. They first simulated paths of the term structure using the multi-factor model and calculated the value of the Bermudan swaption. Then, upon the same simulated paths of the term structure, they recalculated the value of the American-style swaption by recalibrating a single-factor model to the market at each exercise date and determining whether exercise is implied by the single-factor model at that exercise date. They concluded that the suboptimal exercise by using single-factor models can be in the order of magnitude of billions of dollars. The main cause of that was the fact the single-factor models implied perfectly correlated interest rates across the term structure. Nevertheless, [Andersen and Andreasen (2001)] provided a counterargument showing that single-factor models with time dependent volatility are a viable solution for pricing Bermudan swaptions. The examples discussed by [Longstaff et al. (2001)] were focused on the Vasicek model so the issues stemmed out of the particular characteristics of this one factor model rather than the entire single-factor interest rate models class. Modelling the yield curve is one of the most important blocks of quantitative finance. There are several frameworks available, from short-rate models to the HJM family of forward curves, from consol models to multifactor models, from LIBOR models to market models. Recently the OIS discount approach has taken interest rate modelling in a new direction, paying more attention to the organisation of financial markets. It is difficult to say which model performs best. There is a vast literature in this area. Some of the difficulties with some of these models have been highlighted in this chapter. Some great surveys of the models used for modelling the term structure of interest rates are provided in [Lemke (2006)], [Filipovic (2009)] and [Gibson et al. (2010)]. Other useful readings, particularly with a view on model risk related to term structure of interest rates are [Bjork and Christensen (1999)], [Gibson et al. (1999)], [Cairns (2000)], [Buraschi and Corielli (2005)], [Morini (2011)]. [Matthies (2014)] followed a purely statistical data-driven approach and compared different sets of variables, estimation and forecasting methods for

page 52

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Interest Rate Modelling

Carte˙main˙WS

53

statistical dynamic factor models. He found that including macroeconomic variables improves the forecasting ability of a factor-model There are many pitfalls when modelling interest rates term structures. Some valuable points for model validation are (1) Different short rate models may generate very different interest rate paths under the same set of parameters. (2) Paths of interest rate may stay for long periods above the “equilibrium” long-run mean level. (3) The Vasicek model can generate paths of short-rates that get into negative territory. The corresponding curve of zero coupon bond prices is well-behaved including for those problematic paths. (4) General versions of expected hypothesis theories are incompatible with a continuous-time rational expectations equilibrium in the generally cited case of universal risk-neutrality unless interest rates are nonstochastic or other special circumstances obtain. This conclusion can be overturned when it can be assumed that there are an infinite number of agents. (5) It is wrong to assume that the term structure of interest rates is flat. This is shown to introduce arbitrage related to bonds with different convexities. The method of calculating interest compounding can have an important effect in fixed income markets. I have shown here that consol pricing with constant risk-free rates may generate a new form of arbitrage. (6) It is not easy to select a particular family of models for interest rates, either for short-rates, or forward rates or for zero spot rates. There are theoretical problems that invalidate some models in some circumstances and there are empirical findings that lead to a rejection of one factor models. (7) One-factor models are not sufficient to account for the variability observed in the markets on term structure of interest rates. Using them to price derivatives may lead to suboptimal exercise, wrong hedging calculation and general losses that can be huge.

page 53

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 4

Arbitrage Theory

4.1

Introduction

One of the main ideas stemming out of the Black-Scholes pricing methodology is portfolio replication. By the law of one price two portfolios that have identical payoffs must have the same initial price. This crucial idea is the bedrock of modern finance and it opened the area of hedging in financial markets. Transaction costs were assumed negligible in the Black-Scholes model but in practice they cannot be ignored. How to take into account transaction costs is still very much open to debate but there are models that, while they may circumvent the transaction costs problem, may unintentionally introduce arbitrage. The interplay between discrete time processes and continuous-time processes is fascinating in finance. Although the former are more appropriate by design to real applications the latter are preferred theoretically for mathematical convenience. Hence, it has become customary that when a discrete-time model is proposed in the literature, to investigate its convergence when time interval becomes infinitesimally small. Quite surprisingly, even in the case when a discrete-time pricing process converges as the time interval becomes smaller to be a continuous-time interval, other quantities such as trading portfolios may lack convergence at the same time. The distortion operators have been used with some degree of success in the actuarial field, particularly in relation to risk management. It has been thought that they may provide a new methodology that can be useful for pricing and risk analysis. However, there are some important pitfalls when doing that which are pointed out in this chapter.

55

page 55

April 28, 2015

12:28

56

4.2

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Transaction Costs

[Melnikov and Petrachenko (2005)] proposed a binomial market model that covers transaction costs. The model is grafted on a binomial tree with N time steps, the path being identified with a sequence δ1 , . . . , δn , where n ∈ {0, 1, . . . , N } and each δ can take the values of 1’s and 0’s representing up and down movements. The market is spanned by a series of bond prices Bncred , Bndep for short and long positions respectively in money market and a series of bid-ask spreads Snbid , Snask for stock prices. By assumption Bncred , Bndep are Fn−1 -measurable while Snbid , Snask are Fn -measurable, such that 0 < Snbid ≤ Snask ,

0 < Bndep ≤ Bncred ,

Snbid (δ1 , . . . , δn , 1) > Snask (δ1 , . . . , δn , 0) For a European style contingent claim with general payoff at maturity given 1 2 , fN ) [Melnikov and Petrachenko (2005)] prove that there by fN = (fN exists a unique replicating self-financing strategy θ = {θn }n∈{0,1,...,N } = {βn , αn }n∈{0,1,...,N } , satisfying fN = θN +1 and the self-financing condition βn+1 (Bndep |Bncred )−βn (Bndep |Bncred )+(αn+1 −αn )(Snask |Snbid ) = 0 (4.1) where the operation “” is defined as  ab1 , if a ≥ 0; a  (b1 |b2 ) = ab2 , if a < 0.

(4.2)

The general contingent claim pricing formula that is derived from this binomial market in this way is CN (fN ) = β1  (B0dep |B0cred ) + α1  (S0ask |S0bid )

(4.3)

The following example provided by [Roux and Zastawniak (2006)] shows that the option pricing method for the binomial market model covering the transaction costs described above is open to arbitrage. Proposition 4.1. Consider the one-step binomial model with the following bid and ask stock prices

S0ask = 5 S0bid = 1

S1ask,up = 6 S1bid,up = 4 S1ask,down = 3 S1bid,down = 2

page 56

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Arbitrage Theory

57

and the constant bond prices B0dep = B0cred = B1dep = B1cred = 1 These prices for stock and for bond satisfy the assumption of the binomial market model covering transaction costs. Then, for the contingent claim f1 = (f11 , f12 ) with the following cash payoff represented synthetically from a position in bond and a position in stock: f1up = (f11,up , f12,up ) = (2, 0)

(4.4)

(f11,down , f12,down )

(4.5)

f1down

=

= (0, 0)

there is arbitrage if priced with the method described in [Melnikov and Petrachenko (2005)]. Proof. The replicating strategy illustrated in [Melnikov and Petrachenko (2005)] gives the following bi-dimensional binomial tree for the positions in bonds and stock, respectively: ⎞ ⎛ (β2up , γ2up ) = (2, 0) ⎠ ⎝ (β1 , α1 ) = (−2, 1) (β2down , γ2down ) = (0, 0) Therefore the option price calculated from formula (4.3) is in this case C1 (f1 ) = β1  (B0dep |B0cred ) + α1  (S0ask |S0bid ) = −2 × 1 + 1 × 5 = 3. It can be easily seen that selling the claim for 3 and keeping as hedge only 2 bonds for the cost of 2 will super-replicate the option. The minimum profit that will be obtained is 1. This counterexample is linked to a general idea discussed by [Dermody and Rockafellar (1991)] and [Bensaid et al. (1992)] that when transaction costs are taken into account it is possible to find a superreplication strategy that is cheaper than a replicating strategy. An insightful possible explanation1 of the above example goes along the following lines. For the one-period binomial model there is no arbitrage if and only if d < (1 + r) < u where as usual d and u are the down and up factors being applied to the stock price and r is the risk free rate. If S0 is set to be equal to the ask price S ask and S1 is equal to u × S bid then there is an arbitrage. 1I

thank Natalie Packham for this argument.

page 57

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

58

4.3 4.3.1

Carte˙main˙WS

Model Risk in Financial Markets

Arbitrage Non-convergence financial gain process

Continuous-time finance helps the development of finance theory. However, there are many models in finance that are set up as discrete-time models. For many of those models one can show some kind of convergence to a continuous time model when the number of time periods per unit of real time goes to infinity. The widely popular discrete-time binomial option pricing formula, suggested by Sharpe, was shown by [Cox et al. (1979)] to converge to the famous Black-Scholes formula. It is important to have conditions ensuring that the convergence is realised for particular models. A user guide to these technical conditions is provided by [Duffie and Protter (1992)]. Here we describe two examples indicating that, even though a trading strategy θ(n) may converge in law to a trading strategy θ and a price process S (n) may converge as well in law to a price process S, it is not necessarily true that the financialgain process θ(n) dS (n) will converge in law to the financial gain process θdS. The first example has its source in [Kurtz and Protter (1991)]. Proposition 4.2. Consider the trading strategies defined by θ(n) = θ = 1(T /2,T ] and consider the prices processes given by S (n) = 1[T /2+1/n,T ] for (n) θ(n) converges in law to θ and n > 2/T and S = 1[T /2,T  ] . (n)Then  S (n) converges in law to S but θ dS does not converge in law to θdS. Proof. It is easy to see that for all n > T2 and t >  t θ(n) dS (n) = 1

T 2

+

1 n

0

but also that for any time t



t

θdS = 0 0

The second example is from [Duffie and Protter (1992)]. It starts with {W (t)}t≥0 a standard Brownian motion and considers, for a given σ > 0, the process Rt = σW (t) as the cumulative return on a given investment. It is assumed that the market pays returns with a lag, on a moving average basis, in such a way that  t (n) Rs ds Rt = n 1 t− n

page 58

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Arbitrage Theory

59

This second series of returns will be called staled returns. Starting from some initial given price S0 , the corresponding price process S (n) derived (n) from the staled returns series is defined by St = S0 E(R(n) )t where the operator E is the stochastic exponential. Next, we assume that an investor chooses to invest his total wealth Xt , at time t, by placing a fraction α(Xt ) in this risky investment and keeping the remainder invested in a risk-free bond paying at a zero interest rate for simplicity. We assume for regularity that α is bounded with a bounded derivative. For un-staled returns the wealth process is given by  t α(Xs )Xs dRs Xt = X 0 + 0

where X0 is initial wealth. Similarly, with staled returns, the wealth process X (n) is given by  t n α(Xsn )Xsn dRsn Xt = X0 + 0

law

Although the stale cumulative return process R(n) −→ R, the corresponding wealth process X (n) does not converge in law to X. Proposition 4.3. The wealth process X (n) converges in law to the process Y given by  1 t dXs ds] (4.6) α(Xs )Xs2 [1 + Yt = Xt + 2 0 ds Proof. It can be assumed without losing generality that σ = 1. Then (n)

Rt

(n)

= Rt + [Rt

− Rt ]

= Vn (t) + Zn (t) (n)

where Vn (t) = R(t) for all n and where Zn (t) = Rt − Rt . Then it follows that Vn converges in law to R while Zn converges in law to 0. Since  t 1 1 Zn (s)dZn (s) = Zn (t)2 − t Hn (t) = 2 2 0 applying the continuous mapping theorem implies that Zn2 will also converge in law to 0 so Hn (t) will converge in law to − 2t . Following a similar line of demonstration for Kn =< Vn , Zn > we have Kn (t) = −t. Then Un = Hn − Kn converges in law to U where Ut = 2t . Applying Theorem (5.10) from [Kurtz and Protter (1991)] implies that X (n) converges in law to Y .

page 59

April 28, 2015

12:28

60

4.3.2

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Distortion operator with arbitrage

Starting from a physical distribution given as a cumulative transition distribution function F (t, x; T, y) for a traded asset {Xt }t≥0 , [Wang (2000, 2002)] proposed to price the asset X and derivatives contingent on it by applying a distortion operator defined by F D (t, x; T, y) = Φ[Φ−1 (F (t, x; T, y)) − λ(t, T )]

(4.7)

where λ(t, T ) is a deterministic function that is determined by modifying F D (t, x; T, y) until ED [XT |Xt = x] = x for all T > t. Numerically one has to solve the equation x = ED [XT |Xt = x]  ∞ = XT dF D (t, x; T, y) −∞  ∞ XT dΦ[Φ−1 (F (t, x; T, y)) − λ(t, T )] =

(4.8) (4.9) (4.10)

−∞

which has a unique solution in λ(t, T ) because F D is monotonic for all T > t and the initial condition is λ(t, t) = 0. Here we shall illustrate that this procedure may be in fact inconsistent with the no-arbitrage principle. The discussion here follows [Pelsser (2008)]. Assume that the underlying asset follows the process with the following general SDE dXt = μ(t, Xt )dt + σ(t, Xt )dWt

(4.11)

A change of probability measure, from the physical measure P to a risk-neutral measure Q is associated with a Girsanov kernel Kt such that t 2 K ds < ∞ almost everywhere, such that s 0  t   dQ 1 t 2 = exp Ks dWs − Ks ds dP 2 0 0  t and under Q the process WtQ = Wt − 0 Ks ds is a Q-Brownian motion. The SDE equation becomes dXt = [μ(t, Xt ) + σ(t, Xt )Kt ]dt + σ(t, Xt )dWtQ

(4.12)

By analogy the distortion change of measure will imply a similar equation dXt = [μ(t, Xt ) + σ(t, Xt )KtD ]dt + σ(t, Xt )dWtD

(4.13)

where WtD is the Brownian motion under the distorted measure and KtD denotes the Girsanov kernel associated with this.

page 60

April 30, 2015

14:13

BC: 9524 - Model Risk in Financial Markets

Arbitrage Theory

Carte˙main˙WS

61

Now, remarking that   F D (t, x; T, y) = E 1{XT ≤y} Xt = x it is easy to see that F D will satisfy Kolmogorov’s backward equation ∂ D ∂ F (t, x; T, y) + [μ(t, x) + σ(t, x)KtD ] F D (t, x; T, y) ∂t ∂x 1 ∂2 + σ 2 (t, x) 2 F D (t, x; T, y) = 0 (4.14) 2 ∂x Thus, re-arranging and using formula (4.7) we get after some algebra, with λ = λ(t, T ) and KtD is the corresponding Girsanov kernel, that ⎡ 2 ⎤   ∂ −1 F 1 (F )) ∂ φ(Φ ∂x λ − σ 2 (t, x) λ⎦ (4.15) σ(t, x)KtD = ⎣ ∂ −1 (F )) ∂t 2 φ(Φ F ∂x The distortion change of measure is free of arbitrage if KtD makes the μ(t,x) . Rearranging this ratio in process driftless, that is, only if KtD = − σ(t,x) the above equation leads to    2 ∂ ∂ 1 ∂ ∂x F ∂x F λ(t, T ) = − μ(t, x) + σ(t, x) λ(t, T ) ∂t φ(Φ−1 (F )) 2 φ(Φ−1 (F )) (4.16) This PDE can be solved for λ(t, T ) if and only if the coefficients in the equations depend only on time and not the state x. Hence the distortion operator change of measure is consistent with no-arbitrage principle only if the following conditions are satisfied   ∂ ∂ ∂x F μ(t, x) =0 (4.17) ∂x φ(Φ−1 (F ))   ∂ ∂ ∂x F σ(t, x) =0 (4.18) ∂x φ(Φ−1 (F )) μ(t,x) One implication of these two conditions is that the ratio σ(t,x) depends only on time. However, the following example demonstrates that pricing with a distortion operator may lead to arbitrage.

Proposition 4.4. Consider the stock paying no dividends process {St }t≥0 following the dynamics given by the SDE (4.19) dSt = [μ + a(μt − ln St )]St dt + σSt dWt and assume that the money market account evolves according to (4.20) dBt = rBt dt with r > 0 constant. Pricing with the distortion operator given in (4.7) is therefore inconsistent with the no-arbitrage principle.

page 61

April 28, 2015

12:28

62

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

St Proof. Consider Xt = B . Then t dXt = [μ − r + a((μ − r)t − ln Xt )]Xt dt + σXt dWt With standard stochastic calculus it can be proved that ln(XT ) has a Gaussian distribution with mean  T 1 e−a(T −s) [μ − r − σ 2 + a(μ − r)s]ds m(t, x; T ) = e−a(T −t) x + 2 t and variance σ2 (1 − e−2a(T −t) ) σX = 2a 2 Under the physical measure P this variance is bounded by σ2a . Remark that under P the transition cdf is   y − m(t, x; T ) (4.21) F (t, x; T, y) = Φ σ2 −2a(T −t) ) 2a (1 − e It is evident that the ratio of drift over volatility does depend on the state variable X, which violates the necessary condition discussed above. Alternatively, one can see that replacing (4.21) into conditions (4.17) and (4.18) will lead to a contradiction. Now, applying the distortion operator means that we change to a measure QD such that the process X is a martingale. The Girsanov kernel in this case is μ − r + a((μ − r)t − ln Xt ) (4.22) Kt = − σ and with the Girsanov change we get dXt = σXt dWtD (4.23) The distortion parameter function λ(t, T ) can be calculated from (4.10) and using (4.21) gives the solution x − m(t, x; T ) λ(t, T ) = σ2 (4.24) −2a(T −t) ) 2a (1 − e which is state dependent, in contradiction with the model setup. D Moreover, as emphasized by [Pelsser (2008)], under Q y−x F D (t, x; T, y) = Φ (4.25) 2 σ (T − t) and the variance under QD is σ 2 (T − t) which is unbounded when T → ∞.

Therefore, pricing under the risk-neutral measure QD given by the distortion operator method not only is incompatible with the no-arbitrage principle but it is misleading for risk-management purposes. The model above showed how the model switches from an asymptotically constant variance to an asymptotically unbounded variance.

page 62

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Arbitrage Theory

4.4

Carte˙main˙WS

63

Notes and Summary

The no-arbitrage principle is one of the cornerstones of modern finance. There are many technical problems that have been highlighted over the years for various facets of arbitrage. The Black-Scholes formula for the price of a European call option can be proved either with the bond replication method as used in the seminal paper by Black and Scholes or by the call replication method as proposed by Merton. [Rosu and Stroock (2003)] proved that the two methods are not equivalent and that the bond replication method fails if the call option delta is equal to one. The Black-Scholes formula has been widely criticised because it is based on the assumption of deterministic volatility. However, [El-Karoui et al. (1998)] showed that under the assumption that contingent claims have convex payoffs and that the only source of randomness in the misspecified volatility is a dependence on the current price of stock, the Black-Scholes formula can be robust to misspecification of volatility. Furthermore, they point out that the option prices can behave erratically when the underlying stock has stochastic and path-dependent volatility. [Mykland (2010)] is an excellent reading trying to give a solution to the problem of finding bounds one can set on derivatives prices when we know that there is statistical uncertainty. This problem is considered from the point of view of a single investor or regulator. This research continues the seminal discussion in [Avellaneda et al. (1995)] and [Lyons (1995)]. Wang transform has been applied extensively in actuarial circles. [Kijima and Muromachi (2008)] extended the Wang’s transform pricing method using the well-known equilibrium asset pricing model developed by Buhlmann. Furthermore, [Kijima (2006)] generalized the Wang transform to a multivariate set-up and proved that in the Gaussian case this coincides with the multivariate Esscher transform. Starting from the classical argument of Kahneman and Tversky, [Nguyen et al. (2009)] provide two possible motivations that may support theoretically Wang’s transform in valuing assets. Hence, I think more research is needed regarding the Wang transform to make sure the theoretical lapses are bridged or circumvented. The second fundamental theorem of asset pricing connects market completeness to the uniqueness of martingale measures. [Artzner and Heath (1995)] describe an example of an economy with a complete market and an infinite series of martingale pricing measures. In their example traders who select different measures may not agree on which claims can be approximated by attainable ones. Therefore, the trader will believe that the

page 63

April 28, 2015

12:28

64

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

market is complete only when the equivalent martingale measure selected is one of the two extremal martingale measures. [Battig (1999)] designed a different pricing set-up trying to solve the Artzner-Heath paradox. He provides an example where the existence of an equivalent martingale measure may imply lack of completeness. This idea was revisited in [Battig and Jarrow (1999)] where the authors showed that, under the new definitions of market completeness, a market can be complete and arbitrage opportunities still exist. [Jarrow et al. (1999)] refine the definition of no arbitrage and completeness, separating them such that the notion of completeness is independent of the concept of martingale measures. Hence, this type of completeness is different from the completeness defined by [Harrison and Pliska (1983)]. Thus, under their definitions, [Jarrow et al. (1999)] show that economies with no arbitrage preserve the second fundamental theorem. Some important points from this chapter are: (1) Some option pricing methods for the binomial market model covering transaction costs are open to arbitrage. (2) Even though a trading strategy θ(n) may converge in law to a trading strategy θ and a price process S (n) may converge as well in law to a price S, it is not necessarily true that the financial gain  process  (n)process θ dS (n) will converge in law to the financial gain process θdS. (3) Pricing assets with the Wang distortion operator may introduce arbitrage and give misleading risk-management results about volatility.

page 64

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 5

Derivatives Pricing Under Uncertainty

5.1

Introduction to Model Risk

Model risk and uncertainty has preoccupied academics in finance for some time, see [Merton (1969)]. For derivatives, the most known model is the Black-Scholes model for pricing equity derivatives. The model has been criticised for its assumptions, although the same critical yardstick does not seem to be applied to models that claim to be an improvement to this model. [Black (1989)] recognised some of the shortcomings of the model but also discussed ways to circumvent these in practice. Moreover, [ElKaroui et al. (1998)] showed that the Black-Scholes model is robust and they derive conditions for superhedging under this model. Model risk has been studied more thoroughly in statistical science where, from a given set of data, some information about a model, parametric or not, is extracted. This information can be used virtually to generate future outcomes of a quantity of interest or that can be used retrospectively to understand the evolution of a quantity of interest. Parametric models offer a dimensional shortcut, using only the values of a small set of parameters allowing us to have a high degree of understanding of the phenomenon under study. [Bernardo and Smith (1994)] and [Draper (1995)] enumerate the main sources of uncertainty related to the quantity under study: (1) uncertainty caused by the stochastic specification of the model; deterministic models as used in mechanics for example do not carry any degree of uncertainty. (2) uncertainty in the estimated values of the parameters underpinning the model; from a finite set of data we may not be sure about the true population value of the parameters. (3) uncertainty in the model used; it is difficult to know with certainty 65

page 65

April 28, 2015

12:28

66

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

that a given model is the correct one. This category can be classified further: (a) the true model belongs to a known class of models; (b) the class of models used are known to be approximations to a more complex model that is cumbersome to work with; (c) the class of models may provide a proxy for a more complex true model about which the modeller has no prior knowledge. Identifying the concept and the boundaries of model risk in financial markets has been attempted earlier by [Barry et al. (1991)], Crouhy et al. (1998), [Jorion (1996)] and Derman (1996). The latter pointed out three main reasons why a model may be considered inadequate: (1) the model parameters may not be estimated correctly; (2) it may be mis-specified; (3) it may be incorrectly implemented. This classification can be considered as an initial starting point by any risk management or model validation team but other authors pointed out other directions of model risk. For clarity we shall enumerate briefly the categories of model risk, describing the source of risk or uncertainty. The first category is commonly referred to as parameter estimation risk . The idea is that we are certain about the model specification and the corresponding set of parameters but we do not know for sure the value of those parameters vis-a-vis a set of observed data. The parameter estimation risk may surface when we do not know exactly which method of estimation to use, say maximum likelihood versus generalized method of moments, but also when we are sure about the method of estimation but we also recognize that any estimator may at best be in the vicinity of the true parameter value. Another subtle manifestation of parameter estimation risk is the necessity to produce auxiliary intermediary values for the series of asset prices or returns when the data is observable only at distant time points but practically speaking intermediary values from the same data generating process are needed. These auxiliary values can be treated as parameters under a Bayesian framework and their model values are subject to uncertainty. An example of this type of risk is presented in Sec. 12. The second category deals with model selection risk within a given family of models. Here the class of models is known, but we are not sure which model or models from this class represent the data. For example, for binary

page 66

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Carte˙main˙WS

67

discrimination we may know that the model generating the data is either the logistic regression model, the probit regression model or the Tobit regression model, but we are not sure which model it is. Likewise, for equity derivatives we may consider as certain that the true model belongs to the class of stochastic volatility models, such as the Heston model ([Heston (1993)]) or the Schobel and Zhu model ([Sch¨obel and Zhu (1999)]), but we are not sure which one it is. The next level in model risk is constituted by not even knowing whether the model or models investigated are the right ones. This is the most general type of model risk and in essence it deals with model identification risk. This category is more related to Knightian uncertainty than to model risk per se. However, the analyst can still streamline models that otherwise would look feasible by asking questions that are financial theory specific, or in other words employing exogenous model validation criteria. While the above three categories have a sequential logic and they are usually presented together, there are other forms of model risk that exist in quantitative finance. Another such risk is the computational implementation risk which is generated by overlooking technical conditions under which particular computational mathematical techniques work. While the above three types of risk can be dealt with most often using Bayesian inference, this type of risk is of a different nature and it arises from the analyst’s lack of knowledge. In this category we also include problems that are caused by approximation techniques wrongly applied to problems in finance, some examples and discussion being available later in Chapter 10. The four1 categories described above seem to be comprehensive. Yet, I would like to add another small category that seems to me to be growing in recent years, and that is model protocol risk. This category contains situations when two or more market agents exchange information about a particular quantity, not necessary a model, about which they have different understandings. This is perfidiously risky since on the surface there seems to be total agreement between two parties, whereas in reality the same “name” is given to different meanings. The understanding of each party may come in relation to an in-house model. Let us now review in more detail each category of model risk, illustrating where possible with examples and also pointing out other literature covering this subject. 1 We consider that generating auxiliary data is still a form of a parameter estimation problem.

page 67

April 28, 2015

12:28

68

5.1.1

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Parameter estimation risk

In the financial literature, estimation risk, defined as the risk associated with inaccurate estimation of parameters, has been discussed in various papers in finance, see for instance [Barry et al. (1991)], [Gibson et al. (1999)], [Green and Figlewski (1999)], [Talay and Zheng (2002)], [Bossy et al. (2000)] and [Berkowitz and O‘Brien (2002)]. [Karolyi (1993)] pioneered volatility estimates for given stocks using prior information on the cross-sectional patterns in return volatilities, which increased the precision of stock option price estimates. Hence, the uncertainty in the volatility estimate was shown to have an impact on applications in financial markets. Other previous works that focused on volatility uncertainty are [Avellaneda et al. (1995)] and [Lyons (1995)]. [Green and Figlewski (1999)] revisited this issue from the perspective of a financial institution writing options. Using simulation they showed that imperfect models and inaccurate volatility forecasts may cause contracts to be sold for too low a price or purchased for too high a price. Moreover, they may also lead to an inefficient hedging strategy being carried out, that may induce market risk and credit risk measures to be severely in error. The majority of estimation methods for diffusion processes applied in finance were coagulated around the maximum-likelihood principle but other improved techniques started to appear. [Yoshida (1990)] marked the development of a new stream of literature focused on estimation of diffusion processes from discrete data, observed at a fixed time distance δt , which is more realistic. Other notable contributions in this direction include [DacunhaCastelle and Florens-Zmirou (1986); Lo (1988); Ait-Sahalia (2002)]. Parametric methods for fixed and not necessarily small δt have been developed in [Hansen et al. (1998)] where scalar diffusions were characterised via their spectral properties. For general Itˆo diffusion processes, [Jiang and Knight (1997)] proposed an estimator based on discrete sampling observations that under certain regularity conditions is pointwise consistent and asymptotically follows a Gaussian mixture distribution. They also provided an example for short rate interest rate modelling. [Gobet et al. (2004)] looked at the nonparametric estimation of diffusions based on discrete data when the sampling tenor is fixed and cannot be made arbitrarily smaller. Remarkably, they have also proved that the problem of estimating both the diffusion coefficient (the volatility) and the drift in a nonparametric setting is ill-posed. [Fan and Zhang (2003)] expanded the research on model validation con-

page 68

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Carte˙main˙WS

69

sidering the well known problem of estimating the drift function μ(·) and the diffusion function σ(·) of a time-homogeneous diffusion model described by the stochastic differential equation dSt = μ(St )dt + σ(St )dWt

(5.1)

with {Wt }t∈[0,T ] a standard one-dimensional Brownian motion. [Stanton (1997)] developed estimators based on a higher-order approximation scheme and nonparametric kernel estimation for both drift and diffusion coefficients, arguing that the higher the order of the approximation, the faster it will converge to the true drift and diffusion of the process described in equation (5.1), when the interval between observations of the variable S gets smaller and smaller. In a nutshell Stanton claims that “even with daily or weekly data, we can achieve gains by using higher order approximations compared with the traditional first order discretizations.” Although these claims are to some extent correct [Fan and Zhang (2003)] proved that they can also be misleading because the variance inflation in the statistical estimation due to higher-order approximation is overlooked. The variance inflation phenomenon, which is known to appear with nonparametric methods, is also present in this context with parametric models. [Fan and Zhang (2003)] indicated that higher-order approximations lead to a reduction of the numerical approximation error within the asymptotic bias as stated by [Stanton (1997)] but at the same time the approximation suffers from an asymptotic variance growing almost exponentially with the order of the approximation. Hence, the higher-order approximation scheme may lead to less reliable results. [Fan and Zhang (2003)] proposed using the local linear estimation that overcomes the spurious “boundary effects” from Stanton’s kernel estimation, and they applied this improved estimation to test two very important hypotheses in finance. First, they investigated the hypothesis posed by [Chapman and Pearson (2000)] that the short-rate drift is actually nonlinear. This is done by testing the model given by (5.1) against the model given by dSt = (α + βSt )dt + σ(St )dWt .

(5.2)

The improved methodology was applied on the T-bill dataset covering the period January 8, 1954–December 31, 1999, weekly. The U.S. Treasury bill secondary market rates were the averages of the bid rates quoted on a bank discount basis by a sample of primary dealers who report to the Federal Reserve Bank of New York. The rates reported are based on quotes at the

page 69

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

70

Carte˙main˙WS

Model Risk in Financial Markets

official close of the U.S. government securities market for each business day. [Fan and Zhang (2003)] found that there was no strong evidence against the null hypothesis of linear drift. In the second application they looked at the significance of structural shifts of the S&P 500 series based on previously studied models. Using the dataset for the period January 4, 1971 to April 8, 1998, the logarithmic index series is used to test against the hypothesis of linear drift and also against various non-linear specifications for the drift and also for diffusion. The models compared for testing were the well known GBM dSt = μSt dt + σSt dWt .

(5.3)

dSt = (α + βSt )dt + σdWt

(5.4)

a Vasicek type model

and its more complex CIR type variant 1

dSt = (α + βSt )dt + σSt2 dWt

(5.5)

and a restricted CIR type form 3

dSt = σSt2 dWt .

(5.6)

The empirical evidence suggested that there was no strong evidence against the linear drift. This result is in opposition with the conclusion arrived at earlier by [Ait-Sahalia (1996)] and [Stanton (1997)]. At the same type the test done on the volatility function indicated that no volatility specification in the models (5.3)–(5.6) is correct. 5.1.2

Model selection risk

[Kerkhof et al. (2010)] provides an enhanced framework for model risk decomposition and measurement. Incorrect model specification resulting from model selection could be detected by the use of standard econometric methods. One well-known problem is that using R2 as the model selection criterion may compromise the modelling process since models with added irrelevant variables still give higher R2 values. A classical statistical modelling argument to circumvent this problem is to use model selection criteria that penalize models with a large number of parameters, such as Akaike information criterion or Schwarz Bayesian information criterion. One important class of solutions that has emerged recently as a possible answer to the risk posed by model identification is model averaging. This has been used by [Bunnin et al. (2002)] for options pricing. The model error

page 70

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

71

and parameter uncertainty related to the Black-Scholes model were also studied in [Jacquier and Jarrow (2000)]. They make specific assumptions about the distribution of model errors and use likelihood based estimators for inference and also generalize the error structure by nesting the BlackScholes model in a polynomial expansion of its inputs. This route does not seem to yield any out-of-sample improvement over the benchmark BlackScholes model. 5.1.3

Model identification risk

This is the most general and most encountered form of model risk. In a seminal paper Cont (2006) developed a framework to capture model risk conceptually and also numerically, proposing a model risk measure that has desirable theoretical properties capable of isolating model risk from market risk. A subtle point is made by [Kerkhof et al. (2010)] who considered that identification risk may also arise when observationally indistinguishable models have different consequences for capital reserves. [Cairns (2000)] highlighted a Bayesian approach to account for both parameter and model uncertainty when making inferences about some quantity of interest. He showed through some examples that including parameter uncertainty, and model uncertainty respectively, can have a significant impact on the final results. Traditional market risk is associated with the physical2 measure. Hence, it is not surprising that parameter uncertainty is mainly related to the physical measure P. [Branger and Schlag (2004)] argued that model risk is similar but not identical to market incompleteness . They pointed out that under market incompleteness, the true model (or probability measure) is assumed to be known, but the equivalent martingale measure is not unique. Under model risk, even the physical measure is unknown, and the market can be either incomplete or complete.

This is a very important point that deserves more discussion in the literature in my opinion. Hence, I shall reiterate here the points advocated by [Branger and Schlag (2004)]. Although model risk seems to be quite similar to market incompleteness, the latter does not induce the former. In an incomplete market, the true data generating process is assumed to be known but the number of risk factors defining the model is just too high relative to the number of linearly independent traded assets in that mar2 This

is also called in the literature the real-world or objective or empirical measure.

page 71

April 28, 2015

12:28

72

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

ket, and therefore the equivalent martingale measure is not unique. The usual solution to this problem is to select a risk measure such as the expected shortfall or variance, and then to find the risk-minimizing hedge. The solution arrived at in this way is intrinsically dependent on the data generating process. If there is model risk in the data generating process the hedging strategy cannot be calculated since the probability distribution of the hedging errors is unknown. Therefore, under model risk, there is no perfect hedge for the contingent claim. The market may be complete or incomplete and we may still have model risk. An easy example of a complete market with model risk is given by a geometric Brownian motion with known volatility, but unknown drift. Then, the model is by definition complete. However, if the hedging strategies are calculated as a functional on the true drift, like quantile hedging strategies developed in [F¨ollmer and Leukert (1999)] for example, then model risk still persists. Regarding the example of a complete market with model risk, we can consider a complete market given by a one dimensional diffusion for the returns Xt dXt = μ(Xt )dt + σdWt

(5.7)

where σ > 0 is unknown and μ(Xt ) = α, that is a geometric Brownian motion, or μ(Xt ) = κ(θ − Xt ), that is a mean-reverting process. Under both scenarios the market is complete and, more importantly, since the drift does not matter, both models should lead to the same pricing solutions. Yet, as emphasized by [Lo and Wang (1995)], the estimation of σ is influenced by the specification of the drift. Thus, model risk can be present directly through the estimation risk. More specifically, [Lo and Wang (1995)] considered a trending OU log price process pt ≡ ln Pt satisfying the stochastic differential equation dpt = [−γ(pt − μt) + μ]dt + σdWt where γ ≥ 0, p0 is constant and Wt is a the usual Wiener process and the real-world probability measure. The observation made by [Lo and Wang (1995)] is that since the drift should not matter for options pricing purposes the Black-Scholes formula should still prevail. However, they pointed out that the data generating process influences the estimators of the model parameters and therefore the standard Black-Scholes formula does not apply because the variance parameter should be different. The formula for pricing a European call on the underlying price process Pt with strike K in this

page 72

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Carte˙main˙WS

73

case is COU = CBS (Pt , t; K, r, σ)

(5.8)

where r is the risk-free rate, CBS is the standard Black-Scholes formula and σ2 ≡

ln(1 + 2ρτ (1)) s2 (r1 ) · τ (1 + 2ρτ (1))1/τ − 1

(5.9)

if the autocorrelation coefficient of order 1 of the returns for the τ -period rτ is ρτ (1) ∈ (− 12 , 0] and where s2 (rτ ) is the unconditional variance of rτ . Since for the OU trend process 1 ρτ (1) = − [1 − e−γτ ] 2

(5.10)

the condition is easy to verify. It is not difficult to observe that the option prices under the OU trend model are always greater than option prices under the standard GBM specification. Secondly, in a statistical sense, a model is associated with one probability measure. For contingent claims, this measure would be the physical measure P and one can see clearly how model risk surfaces from this source via parameter estimation for example. However, contingent claims pricing and hedging are linked to a risk-neutral or martingale pricing measure Q. Some models are even specified directly under the risk-neutral measure (which is wrong in my opinion). The switch between the two is done under clear rules and more often than not, there could be many pricing measures Q to choose from. This measure Q represents the second or complementary part of the model in financial markets. Different Q measures will likely produce different pricing and hedging results and therefore carry model risk. In my opinion, in financial markets, for contingent claims pricing and/or hedging the model is given by the pair of measures (P, Q) and model risk can appear under each of these two measures. One way to derive Q is using a particular assumption on the pricing kernel. Denoting generically by fQ (x; θ) the risk-neutral density for a particular asset log-return X, and by fP (x; μ) the associated real-world (physical) density, the two densities are linked by the well-known relationship fP (x; μ) = 

fQ (x; θ)/M (x; λ) fQ (u; θ)/M (u; λ)du

(5.11)

Here M (x; λ) is the projected pricing kernel, that is a stochastic discount factor projected onto the space of the asset log-return. Remark that the

page 73

April 28, 2015

12:28

74

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

physical density fP , may depend on a set of parameters μ and the riskneutral density fQ may depend on another3 set of parameters θ. Last but not least the pricing kernel is also parameterised. The following three choices are consistent with risk aversion behaviour explained by financial economics. • M (x; λ) = e−γx is given by the CRRA utility functions, γ being the relative risk aversion coefficient, and therefore λ = γ. &γ−1 % α St e x + β is given by the HARA utility function, • M (x; λ) = 1−γ St being the current  value of the stock price, and therefore λ = (α, β, γ). N • M (x; λ) = exp i=1 αi Ti (x) where Ti (·) is the i-th order Chebyshev polynomial. The utility corresponding to this specification was discussed in [Rosenberg and Engle (2002)]. Here λ = (α1 , . . . , αN ). The formula (5.11) links three objects, the RND fQ , the physical density fP and the pricing kernel M . Knowing any two of them will allow a direct calculation of the third one. Hence, one way to pinpoint the RND in incomplete markets would be to fully specify the physical density fP and the pricing kernel M (x; λ). But as the three examples above show, various parametric specifications linked to various utility functions lead to different pricing kernels, which in turn will lead to different possible RND. Thus, it is still possible to have model risk in incomplete markets. Moreover, the model risk may be dual, first generated by the choice of utility function and second by the actual estimation of λ. This is not surprising in my opinion. Any choice that the modeler has in using one model or some values for some parameters may introduce model risk. 5.1.4

Computational implementation risk

This type of risk is pervasive and is usually not reported. Most of the time it is associated with operational risk but I will dissociate implementation risk from operational risk because the former does not have intend to deceive, it is rather the lack of knowledge or genuine mistakes that may lead to it. [Derman (1997)] mentioned the real example of a convertible bond pricing model that was capable of capturing the various options associated with 3 To be precise f and also f may be derived non-parametrically as well. In addiP Q tion, when they are given by vectors of parameters μ and θ, respectively, there could be common parameters in the two sets but we may also have totally disjoint sets of parameters. The rich literature on the estimation of the risk-neutral density covers all these possibilities.

page 74

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Carte˙main˙WS

75

such a contract but which sometimes miscounted the future number of coupon payments. The calibration process is important since it makes the link between the historical and current data, between risk management, usually concerned with the goodness-of-fit of the econometric aspect of the model and hence backward looking, and the pricing of contingent claims, mainly organised around the concept of risk-neutral measure and hence forward looking. [Buraschi and Corielli (2005)] rightly criticised the widespread approach regarding practical yield curve asset pricing models relying on the periodic recalibration of their parameters and initial conditions, in order to eliminate discrepancies between model-implied and market prices. No-arbitrage interest rate models are centered on solutions that can usually be written in terms of the entire initial yield curve. This technique may be generally time inconsistent since the model at time t = 0 prescribes the set of possible term structures for subsequent t > 0. [Buraschi and Corielli (2005)] emphasize that incorrect calibration may introduce errors that violate the self-financing argument of the standard replication strategy. On the same line of research [Hull and Suo (2002)] investigated model risk resulting from model mis-calibration. Working with compound options and barrier options they also provided evidence that using the wrongly calibrated model may lead to large pricing and hedging errors. 5.1.5

Model protocol risk

Consider a model that says the share price of a company follows the standard GBM equation as in (5.3). At a different bank, it is believed that the share price of the same company follows an arithmetic Brownian motion. dSt = μdt + σdWt .

(5.12)

When agreeing on a derivative deal with S as the underlying asset the traders from the two banks communicate the price in terms of the volatility of the underlying asset. A price equivalent to a volatility of 30% is agreed. The model protocol risk arises when the trader using the model (5.12) believes that σ = 30%. While for the first bank volatility is indeed equal to σ and it is 30%, this is not true for the second bank. The reason for this lies in the definition of volatility as the standard deviation of asset returns per unit of time. Hence, the trader from the second bank must rewrite the model in terms of returns. This is easily done in equivalent form as

 μ σ dt + dWt . (5.13) dSt = St St St

page 75

April 28, 2015

12:28

76

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Therefore the volatility of S at time t for this trader would be equal to Sσt which is equal to σ only if St = 1. The idea behind this simple example appears also when more complex models are considered, in particular jump-diffusion models, or stochastic volatility models or even a combination of the two. The simple lesson here is that volatility is not simply the standard deviation of the conditional distribution of the asset as specified by the model. Another interesting example is the concept of short rate that was briefly introduced in Sec. 3.1 and it is defined by (3.6). To understand the difficulty in grasping this mathematical object in its full meaning it is best to compare the equations for a zero coupon bond price using the continuouscompounding yield rate {R(t, T )}, the instantaneous forward rate ρ(t, T ), the forward LIBOR rate L(t, Ti , Ti+1 ) and the short rate r(t). For the first three, considering that also t = T0 < T1 < . . . < TN = T we get p(t, T ) = exp (−R(t, T )(T − t))    T ρ(t, s)ds p(t, T ) = exp − t

p(t, T ) =

N '

1

i=0

[1 + L(t, Ti , Ti+1 )(Ti+1 − Ti )]

Thus, any knowledge of the values of interest rates (paths are sufficient) on the right side allows the calculation of the zero coupon bond prices. However, for the short rate    T

p(t, T ) = EQ t exp −

r(s)ds t

where Q is a risk-neutral % measure & corresponding to the bank account nut meraire4 B(t) = exp 0 r(s)ds . Therefore, it is impossible to reverse engineer the short rate from the zero coupon bond prices. It is also impossible to determine the zero coupon bond prices without knowing the dynamics of the short rate process under the equivalent martingale measure Q. The short rate should not be confused with the short term riskless rate employed by [Chan et al. (1992)] even if the models tested in that paper are generally called short rate models in the literature. There is a clear difference between the two. If the Vasicek model given as dr(t) = κ(θ − r(t))dt + σdW (t) 4 This

is perhaps the reason why [Cairns (2004)] called this instantaneous risk-free rate.

page 76

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

77

is proposed for the short rate r(t) then, once the parameters κ, θ and σ are estimated, we have the entire analytics needed to price bonds, options on bonds and so on. On the other hand if the same Vasicek model is proposed for one month Treasury bill yields then the dynamics of the model refer exclusively to this interest rate. In other words, calibrated short rate models will produce via bond prices longer maturity yields, whereas short term interest rate models will never be able to determine other rates than the pre-specified maturity. From its very definition the short rate is an instantaneous rate. It seems then paradoxical to me that empirical research conducted for testing or validating short rate models is based on interest rate data such as one-month interbank rates, or three months dealer bill rates, or even the OIS rates. Since these models will in the end give the entire spectrum of bond prices and yields it seems logical that they should be calibrated to a full set of market bond prices. The very brief discussion in Sec. 3.1 also reveals how easy it is to have what I call protocol risk, when we consider forward rates. The lack of clarity is extended also to forward volatility which could be taken as the volatility of a forward rate (which one ?!) or as the volatility of a spot interest rate process relative to some future time period. 5.2

Uncertain Volatility

The assumption of constant volatility in the Black-Scholes model has caused concerns early on in the literature and also in financial markets. We should know that all models are false,5 but inaccurate parameter estimates for volatility can have a great impact on the risk taken by options underwriters. As highlighted by [Green and Figlewski (1999)], if the estimate of volatility based on plain historical data is “too low” then model risk can be limited by using a larger volatility estimate when pricing options. The following quote from that paper is a combination of things that I fully agree and subscribe to with recommendations that simply give more ammunition to Warren Buffet’s claim that derivatives are weapons of mass destruction. “This leaves delta hedging based on valuation model implemented with forecasted volatilities as the only viable trading and risk management strategy for most institutional option writers. Our results show that this can be expected to involve consider5 By false we mean that they are not a 100% true representation of reality or the data generating process. Hence, if they are only 99% true we still say the models are false.

page 77

April 28, 2015

12:28

78

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

able risk exposure. Two important components of an overall risk management strategy for a derivatives book, therefore, should be use of the best pricing models and volatility estimators, and also diversification of (delta hedged) risk exposures across a variety of derivatives markets and instruments, with the hope that a “worst year” for one asset class may be mitigated by good years for others. A third component is simply to charge higher prices than the model indicates for the options that are written. Our results suggest that increasing the volatility input by one-quarter to one-half would substantially increase mean returns and reduce the fraction of losing trades. However, the worst losses to the strategy would remain very large – there would just be fewer of them.” The idea of charging higher prices than the model indicates is not something that would be recommendable nowadays, although I can see the computational shortcut6 . While this could be an easy temporary short fix it will change the behavior of market traders and it will allow more model risk to accumulate over a long period of time. An explanation of a possible source of this problem and an elegant solution has been offered by [Avellaneda et al. (1995)] and it is described in the next section as one of the first models recognizing the uncertain character of a model parameter. 5.2.1

An option pricing model with uncertain volatility

The starting point is the observation that option prices reflect not only the market’s view on the future value of the underlying asset but also its path of volatility. The problem is caused by the continuous updating of the information on the underlying asset which in turn changes the projection of its future path of volatility. This phenomenon renders the market incomplete because of this volatility risk. Under the physical measure the model proposed by [Avellaneda et al. (1995)] can be described by the following system of equations relative to a stock price process {St }t≥0  dSt St = μt dt + σt dWt , (5.14) dBt = rBt dt, where r > 0 is the risk-free rate, W is the usual Wiener process, and the processes {μt }t≥0 and {σt }t≥0 are non-anticipative stochastic processes. 6 Another point here is that models in general cannot produce values that reflect the risk of crashophobia in equity markets, as suggested by Mark Rubinstein.

page 78

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

79

Moreover, there exist positive constant scalars σmin and σmax such that σmin ≤ σt ≤ σmax ,

∀t > 0

(5.15)

The values σmin and σmax are exogenous to the model and can be established by market agents from sources such as historical highs and lows, or internal risk management values related to stress testing or similar7 . Let H(S) be a payoff depending on the stock price S and paid at time T . In the absence of arbitrage, the risk-neutral Itˆo SDE equation is the well-known dSt = rdt + σt dWt St

(5.16)

Consider P Q to be the class of all probability measures on the set of paths {St , 0 ≤ t ≤ T } such that there is a non-anticipative process {σt }t≥0 such that there is a solution to the SDE (5.16). The no-arbitrage asset pricing theory [Dana and Jeanblanc (2007)] implies that the value of the derivative at time t < T paying H at maturity is in the interval −rT −rT H(ST )], sup EQ H(ST )]) ( inf EQ t [e t [e Q∈P Q

(5.17)

Q∈P Q

However, this interval can be too large for all practical purposes and it is relevant more theoretically. Nevertheless, [Avellaneda et al. (1995)] were able to go a step further observing that each margin can be viewed as stochastic control problems with control variable σt . Hence, denot−rT H(ST )] and M − (St , t) = ing for simplicity M + (St , t) = supQ∈P Q EQ t [e Q −rT H(ST )] one can solve the dynamical programming partial inf Q∈P Q Et [e differential equations given by



2  ∂ M (S, t) 2 ∂M (S, t) ∂M (S, t) 1 ∂M (S, t) +r S − M (S, t) + σ 2 =0 S ∂t ∂S 2 ∂S 2 ∂S 2 (5.18) M (S, T ) = H(S)

(5.19)

The non-linear PDE (5.18) is the Black-Scholes-Barenblatt equation. The limiting cases M + and M − are obtained by specifying 2

2   (S,t) ∂ M (S, t) ≥ 0; σmax , if ∂ M ∂S 2 (5.20) = σ 2 ∂ M (S,t) ∂S 2 σmin , if < 0. 2 ∂S

7 Using

implied volatilities from other liquid derivatives can be self-contradictory when it has already been assumed that volatility can be more than a pointwise estimate.

page 79

April 28, 2015

12:28

80

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

for M + while by analogy for M −

2   ∂ M (S, t) σmax , if σ = 2 ∂S σmin , if

∂ 2 M (S,t) ∂S 2 ∂ 2 M (S,t) ∂S 2

≤ 0; > 0.

(5.21)

It is straightforward to see that the Avellaneda-Levy-Paras model falls back onto the Black-Scholes model when σmin ≡ σmax . If the future path of volatility is given by 

∂M + (St , t) (5.22) σt = σ ∂S 2 where σ[·] is given by (5.20) then the usual Black-Scholes argument of constructing a replication portfolio with Δt shares and Bt bonds, with ∂M + (St , t) (5.23) Δt = ∂S + Bt = M (St , t) − St Δt (5.24) is self-financing. [Avellaneda et al. (1995)] point out that for any other volatility process σt satisfying the initial model condition (5.15), the associated self-financing portfolio worth initially M + (St , t), and then constrained by (5.23) but not by the condition (5.24), will result in a nonnegative final value after paying the derivative, almost surely. The volatility given by (5.22) represents the worst-case volatility path under their model. The + (St ,t) strategy given by M + (St , t) is the minimal initial ask value and ∂M ∂S is the hedge ratio for managing a long position in the option H. 5.3

Option Pricing under Uncertainty in Complete Markets

From a mathematical perspective pricing a European option on a stochastic underlying S representing the price of a financial asset means calculating an expectation under a suitable pricing probability measure. Numerical algorithms that can calculate this expectation are presented in [Tunaru (2010)], [Tunaru (2011)] and [Fabozzi et al. (2012a)]. In a nutshell, the price at time t is given by u(St , t) = B(t, T )EQ t [ψ(ST )]

(5.25)

where B(t, T ) is a discount factor for the period [t, T ] with T the maturity of the option, Q is the risk-neutral pricing measure and ψ describes the payoff function. If the risk-neutral transition probability density function of S is known, p(ST , T |St , t) then the valuation formula can be refined as  ∞ ψ(ST )p(ST , T |St , t)dST (5.26) u(St , t) = B(t, T ) 0

page 80

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Carte˙main˙WS

81

This formula works well for complete markets. In incomplete markets, the mathematical framework must be refined further. One can look at option prices as statistics calculated under a given model. [Jacquier et al. (1994c)] is the seminal paper in finance literature that proposes Bayesian modelling as a solution to modelling under uncertainty. One of the earliest studies on option pricing under uncertainty is [Bunnin et al. (2002)]. They apply earlier methodologies related to model selection under uncertainty and the statistical model averaging discussed in [Draper (1995)] and [Hoetig et al. (1999)]. The [Bunnin et al. (2002)] results apply only to a complete market set-up and they differentiate between model risk and parameter estimation risk. In their view model risk arises in a situation when the investor considers two plausible models such as the geometric Brownian motion (GBM) model and the constant elasticity of variance (CEV) model. This example is not quite fortunate since the GBM model is nested into the CEV model and therefore one could test for the significance of the value of the elasticity parameter. A better example would be comparing the GBM process with a mean-reverting process driven by an Orstein-Uhlenbeck (OU) process. Nevertheless, one could still consider all three models and try to use the flow of data arrival to decide which model(s) are most likely to fit the data well.

5.3.1

Parameter uncertainty

For option pricing the most important quantity driving the prices is the volatility. Since the seminal paper of Black and Scholes in 1973 a myriad of papers have been published on models and methods put forward to price various options under various asset dynamics specifications. Restricting our discussion to the estimation of volatility, various classes of models emerged for the estimation of the same quantity, volatility σ. As in [Bunnin et al. (2002)] one can classify them into the following groups (1) Implied volatility. A pointwise estimate of σ is derived as an inverse problem; the option price is given and the Black-Scholes formula is used to retrieve the σ  that makes the formula match the market option price, see [Beckers (1981b)], [Corrado and Miller (1996)], and [Navas (2003)]. (2) Discrete time GARCH models. GARCH models were primarily developed for the evolution of variance but calculating the value of σ is straightforward. [Engle (2001)] provides an excellent review of GARCH literature.

page 81

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

82

Carte˙main˙WS

Model Risk in Financial Markets

(3) Frequentist econometric pointwise estimation of σ. One can use the historical series and some error specification obtained after discretizing a continuous-time model, and then use OLS, or MLE or GMM to estimate σ . (4) Bayesian pointwise estimation. One could use solely the pointwise posterior estimators such as posterior mean or posterior median, see [Lancaster (2004)] for some examples. (5) An important step in the evolution of modelling for financial markets was marked by the introduction of stochastic volatility models, in discrete time and continuous time, see [Heston (1993)], [Sch¨ obel and Zhu (1999)], and [Cox (1975)]. (6) Semi-parametric models. Not that many are available but they allow a very high degree of uncertainty since σ is constrained to a finite interval but no other specification of volatility is made. One important paper in this class is [Avellaneda et al. (1995)]. (7) Volatility surfaces. There is great research in this area recognizing that at one time the option maturity spectrum is defined by a term structure. Combine that with assets requiring modelling of a term structure of prices, such as bonds, and the cross combination leads to a volatility surface that needs to be estimated, see [Gatheral (2006)]. The pricing equation (5.26) tacitly assumes the full knowledge of the vector of parameters θ describing the model. More formally the pricing equation should be rewritten as  ∞

u(St , t) = B(t, T )

0

ψ(ST )p(ST , T |St , t, θ)dST

(5.27)

making explicit the conditioning on θ. The important observation made by [Bunnin et al. (2002)] is that the transition probability density for the case when parameters “are known” should be replaced with the predictive density  (5.28) p(ST , T |St , t) = p(ST , T |St , t, θ)p(θ|Yt )dθ where p(θ|Yt ) is the posterior density of θ given the occurrence of Yt , which is all observed data such as returns or changes or level prices8 of S, up to time t. Then (5.27) can be rewritten as

   ∞ ψ(ST ) p(ST , T |St , t, θ)p(θ|Yt )dθ dST (5.29) u(St , t) = B(t, T ) 0

8 For

Θ

example if the observed data refers to level prices then Yt = {St , St−1 , . . . , S0 }. If however, the data refers to logarithmic returns, then Yt = {log(St /St−1 ), . . . , log(S1 /S0 )}.

page 82

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Carte˙main˙WS

83

This is a big change in the way option prices are determined. Rather than assuming parameters are known and using pointwise estimates, the arrival of new data is recurrently used to update the distribution of the parameters under a given model. Then, the knowledge about the parameter space and the parameter distribution is averaged and the end result, the predictive distribution is readily available for other calculations, such as options pricing. In this manner, the tricky and troublesome problem of estimating correctly the parameters is elegantly circumvented. For some options payoffs it may be possible to derive closed-form solutions, for a given value of parameter θ. When this is the case, denoting by u∗ (St , t, θ) = B(t, T )EQ t [ψ(ST )|θ]) such a solution, with Fubini’s theorem formula (5.29) becomes    ∞ ψ(ST )p(ST , T |St, tθ)dST p(θ|Yt )dθ u(St , t) = B(t, T ) 0 Θ  = B(t, T )EQ t [ψ(ST )|θ]p(θ|Yt )dθ Θ  u∗ (St , t, θ)p(θ|Yt )dθ = Θ



M 1  ∗ u (St , t, θi ) M i=1

where the last approximation formula is calculated by drawing parameter values for their posterior distribution θi ∼ p(θ|Yt ).

(5.30)

Hence all that is needed is to generate samples of possible parameter values from the posterior distribution of the parameter θ given the observed data, p(θ|Yt ). The Bayesian approach described above is capable of updating calculations over longer periods of time. Suppose that an option was sold a month ago and another one may be issued today. Then, one should take into account the entire path of share prices in the last month in addition to previous data. For clarity, let’s assume that Yt = {St , St−1 , . . . , S0 }, ∀t ≥ 0 and denote by Y[s,t] = Yt \ Ys for any s < t. Many models in finance used for option pricing are Itˆo diffusions given by the general SDE, under the physical measure dSt = a(St , t)dt + b(St , t)dZt

(5.31)

where {Z}t≥0 is a Wiener process. Unfortunately, many practitioners decide on the values to employ for the parameter θ from processes applied under the risk-neutral pricing measure.

page 83

April 28, 2015

12:28

84

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

In order to avoid questions related to parameter estimation, the process is called “calibration”, with the clear intent to keep newly issued option prices in sync with other prices of other options on the same underlying that are off the run. So let us say the first option is issued for a volatility σ = 45% when the real parameter value is 30%. Then, for the second option on the same underlying, not necessarily even issued by the same trading desk, traders may try to calibrate their models to what is seen on the market as option 1, with volatility 45%. Even if the second issuer has the best pricing model from a financial engineering point of view, the model parameter error is building in. Any inference on the parameters governing the evolution of the underlying process should be done under the physical measure. Since Itˆo diffusions are Markov processes, applying Bayes’ formula gives p(θ|Yt ) = 

p(Y[s,t] |θ, Ss )p(θ|Ys ) p(Y[s,t] |θ, Ss )p(θ|Ys ) Θ

(5.32)

[Bunnin et al. (2002)] present two distinct algorithms for sampling from p(θ|Yt ), which is needed in order to compute the sample option prices. 5.3.1.1

Models with an exact SDE solution

Assume that the SDE (5.31) has a solution St = g(t, S0 , θ, Zt ) such that g has an inverse g −1 . Then the likelihood is derived from the closed form transition probability density function (

−1 ( 1 g (t, θ, St , S0 )2 (( ∂g −1 (( (5.33) exp − p(St , t|S0 , 0, θ) = √ ( ∂St ( 2t 2tπ 5.3.1.2

Models without an exact SDE solution

When the SDE does not have an exact solution, one can employ [Pedersen (1995)] approximation algorithm using the Euler-Maruyama discretization of the SDE, or the approach proposed in [Bunnin et al. (2002)] that is using the fact that the transition probability density is the solution of Kolmogorov’s forward equation that can be solved numerically. Thus, generically ' p(Sti+1 , ti+1 |Sti , ti , θ) (5.34) p(Yt |θ) = i

where each p(Sti+1 , ti+1 |Sti , ti , θ) is the solution to the PDE ∂ 2 (b2 p) ∂(ap) ∂p = − ∂t ∂S 2 ∂S

(5.35)

page 84

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

85

with the initial condition based on the Dirac function condition p(S, ti |Sti , ti ) = δ(S − Sti ). After solving the problem of how to simulate from p(Yt |θ), we need only to be able to draw samples from p(θ|Yt ) in order to calculate the predictive density. In order to avoid the calculation of the prior density, [Bunnin et al. (2002)] suggest applying for this important step the sampling importance resampling (SIR) algorithm that will go through the following procedure Step 1 Sample θi ∼ p(θ|Yt ), i = 1, . . . , n Step 2 Given new data Y[s,t] , calculate i p(Y[s,t] |θi , Ss ). p(Y

|θ ,S )

s [s,t] i Step 3 For each θi calculate the importance weight wi =  p(Y [s,t] |θi ,Ss ) i Step 4 Resample from the θi obtained, using the importance weighting given by wi . This will result in M < n samples from p(θ|Yt ). The resampling is done through the following subroutine:

(1) Split the interval (0, 1] into n subintervals (ai , bi ], where the end of j=i−1 i values are ai = j=1 wj and bi = j=1 wj . (2) Draw M i.i.d. U nif orm(0, 1) random numbers {Uk }k∈{1,...,M } (3) If Uk ∈ (ai , bi ] then θi becomes the k-th sample value. All this trouble is necessary in order to make sure we have a sample of values θi sampled correctly from their posterior distribution p(θ|Yt ). The last step in order to produce the option price that does take into account parameter uncertainty is sampling from the predictive density of o diffusion has a closed form ST . Once again, when the SDE of the Itˆ solution then we can proceed by drawing9 WT ∼ N (0, T ) and then calculate (i)

ST = g(WT , S0 , θi ) When the SDE does not have a closed form solution, the EulerMaruyama discretization seems to be the only feasible route. The procedure is the following. First discretize the SDE. Stj+1 − Stj = a(Stj , tj )Δt + b(Stj , tj )(Wtj+1 − Wtj ) where t1 = t, tm = T , so the simulation will be pathwise of length m between current valuation time t and maturity T . Then (1) Take a sample of size M with θi ∼ p(θ|Yt ), (2) Generate m standard Gaussian random draws that will help create the path to maturity, 9 Remark

the change in probability measure to the pricing measure.

page 85

April 28, 2015

12:28

86

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

(3) For each θi , generate an entire path of S values, leading to the final (i) one, ST which will be drawn from the predictive density p(ST , T |St , t). Then, the value of the option under parameter estimation uncertainty, is given by [ψ(ST )] u(St , t) = B(t, T )EQ t ∞ = B(t, T ) ψ(ST )p(ST , T |St , t)dST 0

≈ B(t, T )

M 1  (i) ψ(ST ) M i=1

where the last relationship reflects the Monte Carlo integration. 5.3.2

Model uncertainty

When more than one model is suitable for the problem in hand, Bayesian model averaging is a technique that accounts for lacking the precise knowledge of which model is best. Consider now that the market agent has a finite suite of models {Mi }i=1,...,k at her disposal10 . A priori the trader does not know which model will perform best so, ceteris paribus, the only reasonable thing she could do is to derive an option price that is averaging across the uncertainty regarding model selection. Hence, u(St , t|{Mi }i=1,...,k ) = B(t, T )EQ t [ψ(ST )|{Mi }i=1,...,k ]  ∞ k  = B(t, T ) ψ(ST ) p(ST , T |St , t, Mi )p(Mi |Yt )dST 0

= B(t, T )

k  ∞  i=1

= B(t, T )

k 

0

i=1

ψ(ST )p(ST , T |St , t, Mi )dST p(Mi |Dt )

EQ t [ψ(ST )|Mi ]p(Mi |Yt )

(5.36)

i=1

where p(Mi |Yt ) is the posterior probability associated with model Mi , so when this probability is high then the option price received from this model receives a larger weight in the final valuation. Bayes’ formula gives the recursive calculation of the model posterior probabilities in the light of new 10 This is a realistic scenario at large investment banks. Traders usually have several models built in and they may decide to use one or another depending on the task in hand.

page 86

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

87

data. p(Y[s,t] |Mi , Ss )p(Mi |Ys ) p(Mi |Yt ) = k i=1 p(Y[s,t] |Mi , Ss )p(Mi |Ys )

(5.37)

For practical purposes, each model is given some informative or noninformative prior probabilities and then the calculation of posterior model probabilities can be carried out as described above for the SIR algorithm 1 p(Y[s,t] |θi , Ss ) n i=1 k

θi ∼ p(θ|Ys ), 5.3.3

p(Y[s,t] |{Mi }i=1,...,k , Ss ) ≈

Numerical examples

Here we are reworking the example on option pricing described in [Bunnin et al. (2002)]. To this end we consider the two models discussed there, namely the Black-Scholes geometric Brownian motion (GBM) model and the constant elasticity of variance (CEV) model of [Cox and Ross (1976)]. The GBM model is described by the following equations in continuous-time dSt = μSt dt + σBS St dZt

(5.38)

= rSt dt + σBS St dWt

(5.39)

μ−r σBS

is the market price of risk, and r is the where Wt = Zt + λt, where λ = constant riskfree rate. Evidently W is the Wiener process associated with the risk-neutral pricing measure while Z is the Wiener process associated with the physical probability measure. The CEV model is given by dSt = μSt dt + σCEV Stγ dZt

(5.40)

σCEV Stγ dWt

(5.41)

= rSt dt +

and for this model one can prove that the elasticity of the instantaneous return variance with respect to price is equal to 2(γ − 1). In order to avoid technical problems related to arbitrage it is usually assumed that γ ∈ [0, 1). Bayesian analysis and Markov Chain Monte Carlo (MCMC) are going to be used, first to extract inference on the parameters of the two models and secondly to calculate no-arbitrage European call and put prices for options contingent on the FTSE100 index. Following [Bunnin et al. (2002)] we use historical data covering 50 weekly levels of the FTSE100 from 30 December 1997 to 9 December 1998. The dividend yield is initially ignored and the data for the options is as follows: the strike price is K = 5500, the initial index value is St0 = 5669.1, the risk-free rate is r = 0.075 and time to

page 87

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

88

Carte˙main˙WS

Model Risk in Financial Markets

Table 5.1: MCMC Analysis of Bayesian option pricing under the BlackScholes GBM model. Posterior inference statistics for mean, standard deviation, median and 2.5% and 95% quantiles from a sample of 50,000 values. Variable μ σBS call put λ

mean 0.1108 0.2023 774.9 208.4 0.1783

s.d. 0.2094 0.02119 39.63 39.63 1.029

MC error 6.648E-4 7.036E-5 0.1315 0.1315 0.003299

2.5% -0.3001 0.1659 708.5 142.0 -1.833

median 0.1107 0.2006 771.1 204.6 0.1788

97.5% 0.5237 0.249 863.6 297.1 2.196

maturity is T = 1 year. However, as opposed to [Bunnin et al. (2002)] who assumed μ = 0, I have allowed this parameter to be Gaussian distributed with a very large, 10,000, variance and zero mean. While the drift parameter μ should not have an impact on the option prices, which are priced under the riskless pricing measure, it does have an impact on measuring the market price of risk. Since the computational effort of allowing μ to be random and letting the data convey the information about its possible values, is minimal, I have decided to proceed in this way with the option pricing analysis. Another difference from the model choices in [Bunnin et al. (2002)] is related to the prior distribution for the volatility parameter, which is taken in that paper by the authors as σBS ∼ U nif orm(0.1, 0.3). I found this choice too restrictive, or to put it differently too informative, and hence I have used an inverse-gamma distribution that is very flat and covers a wide range. In other words, I will let the data decide on the most likely values for the volatility parameter. 5.3.4

Accounting for parameter estimation risk in the Black-Scholes model

For this analysis I am using WinBUGS 1.4, simulating two chains. The convergence is very rapid and after a burn-in period11 of 50,000 simulations I run another set of 50,000 simulations from which I extract the summary inferential results in Table 5.1. The posterior mean and median are about 0.20, confirming the analysis detailed in [Bunnin et al. (2002)]. The posterior mean and median for μ is 0.11 but the credibility interval constructed from the 2.5% and the 97.5% quantiles includes the zero value and there11 This

is like a training period that is discarded for the actual inference.

page 88

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

89

350

350

300

300

250

250 Volatility sigma

Drift parameter mu

fore, the idea of inferring therefore that this value could equal zero is not wrong.

200 150

200 150

100

100

50

50

0 −0.8 −0.6 −0.4 −0.2

0

0.2

values

0.4

0.6

0.8

0

1

0.16

0.18

0.2

(a) μ

350

350

300

300

250

250

200 150 100 50 0 650

0.22

0.24

values

0.26

0.28

0.3

0.32

(b) σ

European put prices

European call prices

April 28, 2015

200 150 100 50

700

750

800

850

values

(c) call

900

950

1000

0 100

150

200

250

300

values

350

400

450

(d) put

Fig. 5.1: Posterior densities of the Black-Scholes parameters and the European call and put option price for the FTSE100 index. The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year.

page 89

April 28, 2015

12:28

90

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Secondly the credibility interval for σBS is [0.1659, 0.249]. The uncertainty in the parameter estimation is captured and depicted beautifully by the posterior densities in the graphs in Fig. 5.1. Feeding these possible parameter values for volatility parameter σBS to the Black-Scholes option pricing operators gives automatically a posterior distribution of feasible values for the European call and put option prices. For example the posterior mean of the call is 774.9, the posterior median is 771.1, and the credibility interval for the European call price is [708.5, 863.6], which covers the value provided by [Bunnin et al. (2002)]. Likewise for the European put, the posterior mean is 208.4, the posterior median is 204.6 and the credibility interval is [142.0, 297.1]. Remark that since there is a slight difference between the posterior mean and the posterior median, we expect the posterior densities generated by the parameter estimation uncertainty to have a degree of skewness. This phenomenon is clearly illustrated in Figures 5.1 and 5.2 where the posterior densities of the European call and put options on the FTSE100 and of the market price of risk and delta for call and put, respectively, are illustrated. Last but not least, the market price of risk has the credibility interval [−1.833, 2.196] and therefore we cannot reject the hypothesis that this might be equal to zero. Looking at its posterior density one can observe that zero is also the most likely value. Consider now other quantities that are calculated under a model such as Black-Scholes model. All these important quantities are nothing but statistics depending on the parameters so by feeding the parameters into the formulae employed for calculating them it is easy to obtain a posterior sample of values reflecting the risk in the parameters estimation. In Fig. 5.2 . As in the posterior density of the market price of risk defined as λ = σμ−r BS [Bunnin et al. (2002)] we consider the dividend yield equal to zero and the risk-free rate r = 0.075. The possible values for μ and σBS will generate a sample of values for λ. Interestingly, the posterior distribution of the market price of risk for FTSE100 as inferred under the Black-Scholes model appears to be very close to a standard Gaussian distribution. The MCMC output can be utilised to calculate the posterior densities of the Greek parameters such as Delta. From Fig. 5.2 we can say that the most likely value for the Delta parameter is 0.735 but values such as 0.7 or 0.77 are also possible. Likewise, for the put the most likely Delta is -0.265 but values like -0.3 or -0.22 are also feasible, albeit less likely. Incidentally, calculating the Delta for European call and put with the Black-Scholes model assuming that we somehow have calculated the estimate of volatility as σBS = 0.20 we get that Δ(call) = 0.7345 and Δ(put) = −0.26552.

page 90

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

91

350

Market price of risk lambda

300 250 200 150 100 50 0 −5

−4

−3

−2

−1

0

1

values

2

3

4

(a) λ

300

250

Delta call

200

150

100

50

0 0.69

0.7

0.71

0.72

0.73

0.74 values

0.75

0.76

0.77

0.78

0.79

−0.25

−0.24

−0.23

−0.22

−0.21

(b) Δ(call)

300

250

200

Delta put

April 28, 2015

150

100

50

0 −0.31

−0.3

−0.29

−0.28

−0.27

−0.26

values

(c) Δ(put)

Fig. 5.2: The Posterior densities of the Black-Scholes market price of risk μ−r σ , and the Greek delta parameter for the European call and put option prices for the FTSE100.

page 91

April 28, 2015

12:28

92

5.3.5

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Accounting for parameter estimation risk in the CEV model

We now proceed with the parameter estimation for the CEV model by first discretizing (5.40) and then estimating the parameters. We consider here the case of zero dividend yield as in [Bunnin et al. (2002)] but also treat the continuously compounded annual dividend yield q as one extra parameter of the model. Although the CEV model has only one extra parameter compared to the Black-Scholes model, there are already complications when dealing with model parameter estimation. Under the Bayesian MCMC framework, choosing the prior distributions for the parameter gamma, the diffusion parameter12 σCEV or even of the dividend yield q, can have a strong impact on the speed of MCMC techniques and ultimately on the inference process. The prior distributions that I found to work well from all points of view were a very flat uniform distribution for σCEV , take (0,100) as an example; a uniform distribution covering (0, 0.30) for the dividend yield q and a beta distribution for the parameter γ that is constrained to be between 0 and 1 in order to avoid technical problems related to absorption at zero or explosion if other values were allowed. The same routine as described above for the Black-Scholes model is followed to obtain a stationary chain from the posterior distribution of all parameters. The autocorrelation plots, Gelman-Rubin statistics and trace dynamics all point to convergence before we select a sample from the last part of the chain that has become stationary. The summary statistics representing the posterior estimates in Table 5.2 were calculated from a sample of 20,000 values. One cannot reject the hypothesis that the drift parameter μ is zero. Interestingly the dividend yield is significant and its posterior mean and median are equal to 0.15 or 15%. This is the value that traditionally investors think is representative for the UK market. The diffusion parameter σCEV has a posterior mean of 2 and posterior median of 1.8. The CEV elasticity parameter γ is also significant and its posterior mean is 0.29 while its posterior median is 0.2573. The power of the MCMC approach is that we can visualise the entire posterior distribution for each parameter. The graphs in Fig. 5.3 show the posterior densities of all parameters driving the process of the FTSE index as modelled by the CEV model dynamics. 12 Please note that σ CEV is not the volatility of the CEV model. The volatility changes with the current state of the underlying process St .

page 92

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

93

Table 5.2: Posterior estimates of the parameters of the CEV model from the FTSE100 data. Inference is obtained with MCMC from a sample of 20,000 values. Variable γ μ q σCEV

mean 0.2904 0.2441 0.1503 2.0

s.d. 0.1801 0.2216 0.08642 1.241

MC error 0.002925 0.001651 6.688E-4 0.01737

250

2.5% 0.03987 -0.1903 0.007853 0.2403

median 0.2573 0.2431 0.1502 1.801

97.5% 0.7229 0.6824 0.2919 4.71

700 600

200 500 150

400 300

100

200 50 100 0 −0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 −0.8

0.35

(a) dividend yield q

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

(b) drift parameter μ

1.6 1.2 Posterior Density for Sigma.CEV

1.4 Posterior Density of Gama

April 28, 2015

1.2 1 0.8 0.6 0.4 0.2 0

0

0.1

0.2

0.3

0.4

0.5

Values

0.6

0.7

(c) elasticity γ

0.8

0.9

1

1 0.8 0.6 0.4 0.2 0 0

1

2

3

Values

4

5

6

(d) σCEV

Fig. 5.3: The posterior densities for the parameters of the CEV model calculated using data on FTSE100 index with MCMC from a sample of 20,000 values. The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year.

page 93

April 28, 2015

12:28

94

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Looking at the posterior densities is informative to see if the distribution has by any chance multiple modes and to observe any skewness. For example, from the graphs of the posterior densities for γ and σCEV it can be directly concluded that the most likely values would be roughly γ = 0.2 and σCEV = 0.25. Remark that this is far away from both posterior mean and posterior median. Other values are also possible and when considering model risk it would be wrong to simply ignore other values. Hence, when pricing the European put and call options as before, it is useful to look at option prices that result from all possible combinations of parameters. Nevertheless, parameters should not be simply considered on a grid of values but the likelihood of parameter values should be taken into consideration. In Fig. 5.4 and its counterpart Fig. 5.5 I show the surface of option prices, calls and puts, respectively, that are obtained by combining a sample of 500 values for γ and 500 values for σCEV from the stationary part of the MCMC distribution. The dividend yield is taken as zero, for comparison with the results in [Bunnin et al. (2002)]. For European calls, the maximum obtained value was 5583.7 and the smallest was 566.51, while for the puts the maximum obtained value was 5017.2 and the minimum was zero. It is clear from the figures that under the CEV model, most of the time the European options values form a flat plateau but at the same time very large value for options can be obtained when γ is about 0.5 and σCEV is small. Therefore, model risk posed by parameter estimation is prevalent in this situation and care must be taken that parameters are estimated periodically to capture any changes in parameter estimates.

page 94

April 28, 2015 12:28 BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty 95

Carte˙main˙WS

Fig. 5.4: Posterior surface for the European call price on the FTSE100 generated by the parameter uncertainty on γ and σCEV . The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year.

page 95

April 28, 2015 12:28

96

BC: 9524 - Model Risk in Financial Markets

Model Risk in Financial Markets

Carte˙main˙WS

Fig. 5.5: Posterior surface for the European put price on the FTSE100 generated by the parameter uncertainty on γ and σCEV . The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year.

page 96

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

97

Table 5.3: Posterior estimates of parameters of the CEV model from the FTSE100 data.The strike price is K = 5500, initial index value is St0 = 5669.1, risk-free rate is r = 0.075 and time to maturity is T = 1 year. Option price call put

mean 588.8461 22.3353

sd 195.7536 195.7536

2.5% 566.5108 0.0000

median 566.5108 0.0000

97.5% 716.4599 149.9491

In order to gauge the impact of parameter estimation risk it is very useful to calculate the summary statistics of the posterior densities for the European call and put options, calculated under the CEV model. We take here a dividend yield equal to the posterior mean as estimated above as 0.1503 or 15% per annum. For parameters γ and σCEV we allow them to vary when calculating the option prices. 5.4

A Simple Measure of Parameter Uncertainty Risk

Risk managers can greatly benefit from having the entire distribution of a price function such as the European call option price, generated by variation of parameters in a given parametric framework. In addition, our approach solves both problems under a Bayesian paradigm and benefits from the inferential power of Markov Chain Monte Carlo methods. The methods presented here are essential for risk management of assets exhibiting low frequency data. Our empirical analysis shows that the risk associated with parameter uncertainty can be (a) substantial even for vanilla products such as European call and put options, and (b) asymmetric for the buyer and the seller in the contract, even when the same parametric model class is used by both. Following the methodology presented above we can propose a new measure of model risk related to parameter uncertainty. We shall proceed by analogy with the way value-at-risk was introduced for quantifying market risk. Definition 5.1. Given a model defined unambiguously by a vector of parameters ϑ, for any contingent claim price function Π(H; ϑ) with payoff H, we define the parameter uncertainty model risk (PUMR) measure corresponding to Π(H; ϑ), at the 100(1 − α%) level of confidence, as the α% quantile in the direction of risk. To focus the discussion, consider the European call option with posterior

page 97

April 28, 2015

12:28

98

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

densities of the fair price as represented in Fig. 5.1. In addition, suppose that α% = 2.5%. For the seller of the derivative, the risk is represented by the right tail of the posterior distribution since if trading for a contract is done outside this area when the true price is actually in this area then the seller will incur a loss. In other words, the seller is exposed to feasible higher prices that he/she is not taking into consideration when choosing a point estimate of the fair price. Hence the PUMR for the seller is the 100(1 − α%) quantile, or the α% right quantile. Similarly, for the buyer of the derivative, the parameter uncertainty risk is represented by the left tail of the posterior distribution. If the real fair value of the contract is exactly equal to the α% quantile level, then any trading done at a value higher than this benchmark will result in a loss. For the numerical example discussed above the 97.5% quantile in Table 5.1 quantifies the PUMR for the seller while the 2.5% quantile measures the PUMR for the buyer of the derivative. Thus, under the Black-Scholes model, the PUMR for the call is 863.6 for the seller and 708.5 for the buyer, whereas the PUMR for the put is 297.1 for the seller and 142.0 for the buyer. Under the CEV model, from Table 5.3, the PUMR for the call is 716.45 for the seller and 566.51 for the buyer, whereas the PUMR for the put is 149.95 for the seller and 0.0000 for the buyer. One way to compare different models with respect to the parameter estimation risk embedded in derivatives pricing is to consider as a discrepancy measure the PUMR for the seller and the buyer. Models with a smaller PUMR should be preferred because that is equivalent with posterior distributions that are narrowly spread. I shall call this discrepancy measure the PUMR distance. For the Black-Scholes model this distance is equal to 155 roughly for both put and call, and for the CEV model this distance is equal to 149.95. It is remarkable that the two models come quite close in this particular case. Based on these results the CEV model shows a slight superiority despite having one extra parameter. The inference presented in Tables 5.1 and 5.3 reveals that model risk due to parameter uncertainty can be quite large. Furthermore, given the skewness of the posterior densities, the two parties in the financial contract do not have the same magnitude of exposure to model risk. For the European call option in discussion the seller takes on more model risk of the parameter estimation type. This is correct since call option contracts have no downside and variation comes from the upside and also because the process used for modelling the underlying index cannot become non-positive. This point is very important for investment banks and financial institutions

page 98

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

99

where both long and short positions may be simultaneously present on the balance sheet due to multiple counterparties. Model risk will not cancel out over the same contract for opposite positions. This answers a question posed by [Gibson et al. (1999)] as to whether model risk is symmetric. My answer is that it is not. 5.5

Bayesian Option Pricing

Bayesian updating methods have been used for pricing options in agricultural markets where news such as weather events may cause large swings in agricultural futures and consequently on options on these futures. One of the earliest application is described by [Foster and Whiteman (1999)] for options on soybean futures. In order to capture more accurately the evolution of the commodity markets one should consider time-series models incorporating seasonality and possibly other known factors. The model rapidly becomes quite complex and parameter estimation risk is a genuine problem. The solution to this computational statistical inferential problem is to employ Markov Chain Monte Carlo (MCMC) techniques. Gibbs sampling is arguably the most widely known method falling into the MCMC class. This can be described briefly as follows. Assuming that all parameters of interest, eventually including missing data if needed and latent variables, are represented by the vector θ = (θ1 , θ2 , . . . , θd ) (0)

(0)

(0)

the algorithm starts from some initial values θ(0) = (θ1 , θ2 , . . . , θd ) and then it goes through the following loop at step j (j+1)

.. .

(j)

(j)

from p(θ1 | θ2 , . . . , θd , Y ) • draw a value θ1 (j+1) (j+1) (j) (j) • θ2 from p(θ2 | θ1 , θ3 . . . , θd , Y ) • (j+1) (j+1) (j+1) from p(θd | θ1 , . . . , θd−1 , Y ) • θd

The vectors θ(0) , θ(1) , ..., θ(n) , .... form a Markov chain having the transition probability to move from θ∗ to θ given by Ker(θ∗ , θ) = p(θ1 |θ2∗ , ..., θd∗ , Y )p(θ2 |θ1 , θ3∗ ..., θd∗ , Y ) × p(θ3 |θ1 , θ2 , θ4∗ , ..., θd∗ , Y ) · · · p(θd |θ1 , ..., θd−1 , Y )

page 99

April 28, 2015

12:28

100

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Various conditions for convergence of the joint distribution of (i) (i) (i) (θ1 , θ2 , . . . , θd ) have been developed by [Chan (1993); Geman and Geman (1984); Liu et al. (1994, 1995); Tierney (1994); Roberts and Polson (1994)]. An excellent introduction to this subject can be found in [Lancaster (2004)] and in a financial context in [Johannes and Polson (2010)]. 5.5.1

Modelling the future asset value under physical measure

[Foster and Whiteman (1999)] were among the first to use Bayesian techniques to solve the parameter estimation problem and determine the predictive density of the derivative price, under the physical measure, and then they derive the risk-neutral density by minimising the Kullback-Leibler information discrepancy measure. In this section {St }t≥0 denotes the soybean price and {Ft (T )}0≤t≤T denotes the futures price at time t with maturity T . For simplicity we shall drop the dependence of T , thinking that T may denote one of the nearest maturities for which there is enough liquidity such that the maturity is longer than the options maturity. In other words, if the investor is considering pricing options on futures with one-month maturity then T would be the futures contract with maturity longer than one month, say two or three months, depending on the physical asset. [Foster and Whiteman (1999)] consider using the historical data on the vector Yt = (log(St /St−1 ), log(Ft /St )) and estimate a vector autoregressive model given by Yt = α + βt + a(L)Yt−1 + εt ,

iid

εt ∼ N (0, Σ)

where α, β are scalar vectors and a(L) is a vector of λ-degree polynomials in the lag operator. Hence the vector of parameters is θ = (α, β, a(L), Σ). Moreover, let us denote by Y the matrix obtained by stacking up all the rows Yt , and similarly by X the matrix obtained by stacking up the rows  ). (1, t, Yt−1 Considering that the historical data gives the first λ values for Yt , the conditional sampling density of Y is p(Y |θ) ∝

T '

 |Σ|−1/2 exp εt Σ−1 ε t

(5.42)

t=λ+1

Using the trace operator we can re-arrange the above expression as

 1 −(T −λ−1)/2  −1 p(Y |θ) ∝ |Σ| exp − tr(Y − XB) (Y − XB)Σ (5.43) 2

page 100

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

101

which shows that the model can be interpreted as a multivariate regression model Y XB + V

(5.44)

where the (2 + nλ) × n-dimensional matrix B encapsulates all VAR coefficients and all rows of V are i.i.d. N (0, Σ). Prior elicitation can be done based on expert knowledge or other considerations13 . In the absence of any other knowledge the standard Bayesian inferential routine is to use non-informative distributions. In this case p(B, Σ) ∝ |Σ|−(n+1)/2

(5.45)

Combining the prior with the likelihood gives the posterior distribution

 1 p(B, Σ|Y, X) ∝ |Σ|−(T −λ+n)/2 exp − tr(Y − XB) (Y − XB)Σ−1 2 (5.46) 5.5.2

Modelling the current asset value under a risk-neutral measure

Here we highlight, see [Foster and Whiteman (1999)], how to obtain the values of vanilla instruments such as futures and European call/put options. What we need are the values of the asset we are trying to price at some maturity T , under a risk-neutral pricing measure. This can be achieved in two steps. First a sample is generated from the predictive distribution at T , under the physical measure used to calibrate the parameters with MCMC techniques. Then a measure change is operated by calculating the adjusted probabilities such that the probabilistic discrepancy measure (KullbackLeibler) between the risk-neutral and physical probabilities is minimized. For the first part, the formula (5.46) can be used to draw a sample for Σ and then a sample for B. For each draw of (B, Σ) a predictive sample is generated by drawing m error values εT +1 , . . . , εT +m from the distribution N (0, Σ). Then the VAR model is employed to simulate new values based on the value of B and the new error terms. Using the current value of the futures Ft and a sample from the predictive distribution representing possible draws of future values at the desired maturity FTi it is possible to generate a sample of futures return distributions RT1 from FTi = Ft RTi 13 If

(5.47)

previous estimation results are available one can use empirical Bayes methods to construct prior distributions resembling the empirical output.

page 101

April 28, 2015

12:28

102

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

These values are equally probable, so if a predictive sample of M = 10000 1 is the physical probability of drawing values are predicted then pi = 10000 i the value RT . In order to get the risk-neutral probabilities, we are searching for a probability set given by {qi }i=1,...,M such that M 

qi RTi = 1

(5.48)

i=1

and that minimizes the Kullback-Leibler information divergence function

M  qi l(p, q) = qi ln (5.49) pi i=1 As pointed out in [Foster and Whiteman (1999)] the risk-neutral distribution obtained is Gibb’s canonical distribution exp(λRTi ) qi = M j j=1 exp(λRT )

(5.50)

where λ = argmin α

M 

exp [α(RTj − 1)]

(5.51)

j=1

Now we can price any other derivatives such as call options. If the option is directly written on the futures, as is the case in commodity markets, a European put option with strike price K has the price given by

CT = DFT

M 

qi max[K − Ft RTi , 0]

(5.52)

i=1

where DFT is simply the risk-free discount factor to maturity T . 5.6

Measuring Model Uncertainty

A theoretical framework for capturing model risk has been provided by [Cont (2006)]; expanding and deriving from this framework, [Kerkhof et al. (2010)] presented an excellent practical methodology that can be useful to adjust the calculations of capital requirements for trading activities in a market, depending on the degree of modelling reliability. The latter paper advocates for risk management purposes the consideration of a class of models rather than a single model, that will permit gauging model risk on top of

page 102

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Carte˙main˙WS

103

nominal market risk. Focusing on model risk associated with uncertainties in econometric modelling, the overall model risk is then disentangled into three components: estimation risk, misspecification risk, and identification risk. Another very important point is made by [Branger and Schlag (2004)] who distinguish between a risk measure subject to model risk, such as a VaR model under estimation uncertainty, and a risk measure for capturing model risk. The former is an illustration of model risk of another type of risk such as market risk. [Jorion (1988)] also contains a very interesting discussion in this respect. The measure I introduced in 5.1 is an example of the latter. Before we present a review of various approaches that can be used to measure model risk we need to fix the boundaries conceptually, otherwise the discussion will digress into philosophical territory, which is not the purpose of this book. Consider a contingent claim C with a payoff H by some maturity T . The modeler has a suite of models represented by risk-neutral measures Qi where i ∈ I with I representing an indexing set. In theory one would like I to cover all possible models. For all intents and purposes this is possible only theoretically and one may question even that. The second more pragmatic approach would define I such that its parameters belong to some convex but uncountable domain. For example a geometric Brownian motion where the volatility parameter σ ∈ (0, ∞) or σ ∈ (0, 3]. At the minimum there would be at least a finite suite of competing models and in this case I is a finite set such as {1, . . . , n}. Each of the above approaches is possible in a live trading environment. [Branger and Schlag (2004)] argues that for the last two approaches “we can never be sure that the true model is among the candidate models”. Actually when we consider only a finite set of models, if historical data is available one could verify whether there is any error between model values and market values. If there are errors then we are certain that the true model does not belong to the finite set. Moreover, since by de facto the modeler is building a model based on assumptions that may or may not be always true, the models encountered in financial markets are approximations at best of the true data generating process. 5.6.1

Worst case risk measure

If ViQ (C) is a valuation operator of contingent claim C under model Qi then, the worst case approach is calculating ViQ (C) for all i ∈ I and taking

page 103

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

104

Carte˙main˙WS

Model Risk in Financial Markets

the most conservative value. Hence if V is a pricing valuator then, under model risk measured by the worst case, the value of the contingent claim is V (C) = sup{ViQ (C)}

(5.53)

i∈I

If, however, V represents a market risk measure such as VaR then the worst case approach would imply that V (C) = inf {ViQ (C)} i∈I

(5.54)

The worst case approach has a lot of theoretical appeal. Interesting points for and against have been made by [Kirch (2002)] [F¨ollmer and Schied (2002)], [Talay and Zheng (2002)], and [Kerkhof et al. (2010)]. Nevertheless, from a practical standpoint the results can be significantly biased by the introduction into the set of models of a model that produces, even occasionally, wild results. For example, the Vasicek model for interest rates is known to produce negative interest rates, albeit with a small probability. The worst case approach over the set of Vasicek models, with I given by the compact domain of the parameter set, will take into consideration paths of interest rates drifting into negative territory. 5.7 5.7.1

Cont’s Framework for Model Uncertainty An axiomatic approach

Consider a market given by an asset S adapted to a stochastic basis (Ω, F) generating various market scenarios. From a probabilistic point of view the underlying asset {S}t∈[0,T ] is a measurable mapping floating into the space of right continuous functions with a finite left limit. On this underlying there could be many contingent claims with a payoff H. [Cont (2006)] defines a model as any arbitrage-free option pricing rule, represented by a risk-neutral probability measure Q on (Ω, F) such that14 {St }t∈[0,T ] is a martingale under Q. The value of such a contingent claim at time zero is subject to the chosen risk-neutral pricing measure C0Q = EQ [H]. Then Cont defines the model uncertainty of the contingent claim valuation C Q (.) as the uncertainty on the value of C Q resulting from the uncertainty in the selection of Q. Before proceeding to the axiomatic characterisation of model uncertainty we shall revise the “plain English” requirements advocated by [Cont (2006)] that a measure of model uncertainty should verify: 14 Discount

factors are ignored in [Cont (2006)] for simplicity.

page 104

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

105

1. For liquidly traded options, the price is determined by the market: there is no model uncertainty on the value of a liquid option. 2. Any measure of model uncertainty must take into account the possibility of setting up (total or partial) hedging strategies in a model-free way. If an instrument can be replicated in a model-free way, then its value involves no model uncertainty. If it can be partially hedged in a model-free way, this should also reduce the model uncertainty on its value. 3. When some options (typically, call or put options for a short maturities and strikes near the money) are available as liquid instruments on the market, they can be used as hedging instruments for more complex derivatives. A typical example of a model-free hedge using options is of course a static hedge using liquid options, a common approach for hedging exotic options. 4. If one intends to compare model uncertainty with other, more common, measures of (market) risk of a portfolio, the model uncertainty on the value of a portfolio should be expressed in monetary units and normalized to make it comparable to the market value of the portfolio. 5. As the set of liquid instruments becomes larger, the possibility of setting up static hedges increases which, in turn, should lead to a decrease in model uncertainty on the value of a typical portfolio.

As a starting point Cont assumes the existence of a finite set I of benchmark instruments with payoffs {Hi }i∈I and corresponding market prices, either pointwise {Ci∗ }i∈I , or a range [Cibid , Ciask ]. It is also assumed that a set Q of arbitrage-free pricing models exist, such that ∀i ∈ I either ∀Q ∈ Q,

EQ [|Hi |] < ∞, EQ [Hi ] = Ci∗

(5.55)

EQ [Hi ] ∈ [Cibid , Ciask ].

(5.56)

or ∀Q ∈ Q, ∀i ∈ I,

For simplicity we shall assume that the set of possible models is finite, that is |I| < ∞. The set of contingent claims is simply the set C of terminal payoffs which have a well-defined value under any of the alternative pricing models: C = ∩nk=1 L1 (Ω, FT , Qk ). For any contingent claim C ∈ C with payoff H, the model uncertainty risk measure is a mapping ψ : C → [0, ∞] that satisfies the technical conditions stated by [Cont (2006)]: (1) For liquid contracts, model uncertainty equals the uncertainty on market value ∀i ∈ I,

ψ(Hi ) ≤ |Ciask − Cibid |

(5.57)

page 105

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

106

Carte˙main˙WS

Model Risk in Financial Markets

(2) Let S be the set of admissible trading strategies such that the stochastic t integral Gt (φ) = 0 φdS is well-defined and is a Q-martingale bounded from below Q-a.s. for each Q ∈ Q. Then the effect of hedging the claim with the underlying positions has no impact on model risk    T φt dSt = ψ(C) (5.58) ∀φ ∈ S, ψ C + 0

If the contingent claim can be replicated in a model free way by trading only in the underlying then there is no model risk.    T φt dSt = 1, If ∃x0 ∈ R, ∃ψ ∈ S, ∀Q ∈ Q, Q C = x0 + 0

then

ψ(C) = 0.

(5.59)

(3) Model uncertainty decreases upon asset diversification ∀C1 , C2 ∈ C, ∀α ∈ [0, 1], ψ(αC1 + (1 − α)C2 ) ≤ αψ(C1 ) + (1 − α)ψ(C2 ) (5.60) (4) Impact of static hedging with traded options ∀X ∈ C, ∀u ∈ Rd , ψ(C +

d 

ui Hi ) ≤ ψ(C) +

i=1

d 

|ui (Ciask − Cibid )|

i=1

(5.61) This condition implies that if a contingent claim C is exactly statically replicated with traded liquid derivatives then the model uncertainty for C is upper bounded by the sum of the bid-ask spreads of the derivatives If ∃u ∈ Rd , C =

d 

ui Hi , then ψ(C) ≤

d 

i=1

5.7.2

|ui ||Ciask −Cibid | (5.62)

i=1

A coherent measure of model risk

Coherent risk measures are superior theoretically to other risk measures. By definition a risk measure ψ : C → R is called coherent if it satisfies the following properties: Monotonicity if C1 ≥ C2 then ψ(C1 ) ≤ ψ(C2 ) Cash is riskless ∀α ∈ R+ and ∀C ∈ C, ψ(C + α) = ψ(C) − α Subadditivity ∀C1 , C2 ∈ C ψ(C1 + C2 ) ≤ ψ(C1 ) + ψ(C2 ).

page 106

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

Positive homogeneity

Carte˙main˙WS

107

∀λ ∈ R+ ψ(λC) = λψ(C).

Importantly, [Artzner et al. (1999)] proved a representation result for coherent measures. They showed that that any coherent measure of risk can be represented as the highest expected payoff in a family P of models ψ(C) = sup EP [−C]

(5.63)

P∈P

Now we shall review how Cont derived a coherent model risk measure. The key assumption is the fact that any C ∈ C has a finite value under any of the models given by Qi , i ∈ {1, . . . , n}. Then, by analogy with calculations in incomplete markets, define the upper and lower price bounds: π(C) =

sup

EQi [C],

π=

i∈{1,...,n}

inf i∈{1,...,n}

EQi [C] = −π(−C)

(5.64)

Then the application ψ : C → R given by ψ(C) = π(−C) defines a coherent risk measure and for any model associated with a given Q ∈ Q it follows that the value V Q (C) ∈ [π(C), π(C)]. Evidently when there is no model risk π(C) = π(C). Cont defines a risk measure of the impact of model uncertainty on the value of the contingent claim C, relative to the family of models Q as ψQ (C) = π(C) − π(C).

(5.65)

Moreover, in [Cont (2006)] the following important result for model uncertainty measurement is proved. Proposition 5.1. Consider a contingent claim C with payoff H revealed at time T and a finite family of models Q given by the risk-neutral probability measures {Qi }i∈I . Assume that there are market15 bid-ask prices Cibid , Ciask . Then ∀i ∈ I,

Cibid ≤ π(Hi ) ≤ π(Hi ) ≤ Ciask

(5.66)

Moreover, the measure ψQ given by (5.65) satisfies all properties (5.57)– (5.62). Proof. Since we know that ∀Q ∈ Q, ∀i ∈ I,

EQ [Hi ] ∈ [Cibid , Ciask ].

then by taking sup and inf over all measures Q ∈ Q we get that Cibid ≤ π(Hi ) ≤ π(Hi ) ≤ Ciask . 15 This has to be understood as a competitive bidding system where several parties post simultaneously bid and ask prices for the same contingent claim C. Each investment bank works with a particular model and they publish prices resulting from that model.

page 107

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

108

Carte˙main˙WS

Model Risk in Financial Markets

Clearly then ψ(Hi ) ≤ |Ciask − Cibid |. For any φ ∈ S and any Q ∈ Q the associated gains process is a martingale and therefore  T Q φt dSt ] = EQ [C] E [C + 0

implying that 

T

π[C +

φt dSt ] = π[C]

0

and



T

π[C + 0

and combining the two leads to  ψ[C +

φt dSt ] = π[C]

T 0

φt dSt ] = ψQ [C]

Making the particular case C = x0 leads to the stated relationship. For the homogeneity and subadditivity properties, consider two claims C1 , C2 ∈ C, α ∈ [0, 1] and a model given by Q ∈ Q. Then it is true that α inf EQ [C1 ] + (1 − α) inf EQ [C2 ] ≤ EQ [αC1 + (1 − α)C2 ] Q∈Q

Q∈Q

≤ sup EQ [C1 ] + (1 − α) sup EQ [C2 ] Q∈Q

Q∈Q

Taking again sup and inf over the family of measures Q we obtain απ(C1 ) + (1 − α)π(C2 ) ≤ π(αC1 + (1 − α)C2 )) ≤ π(αC1 + (1 − α)C2 ) ≤ απ(C1 ) + (1 − α)π(C2 )) For the last two results, suppose that we have a long position in a contingent claim C, and positions αi , i = 1, . . . , d in d benchmark contingent claims Hi , i = i = 1, . . . , d. Without loss of generality the first k positions are long positions, the others being short positions. Based on the previous results EQ [C] +

k  i=1

αi Cibid +

d 

αi Ciask ≤ EQ [C +

d 

α i Hi ]

i=1

i=k+1

≤ EQ [C] +

d  i=1

αi Ciask +

d  i=k+1

αi Cibid

page 108

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

109

Taking sup and inf over Q leads to π[C] +

k 

αi Cibid

i=1

+

d 

αi Ciask

≤ π[C +

d 

α i Hi ]

i=1

i=k+1

π[C +

d 

α i Hi ] ≤

i=1

d 

αi Ciask

i=1

+

d 

αi Cibid

i=k+1

Combining the two gives π[C +

d 

αi Hi ] − π[C +

i=1

d 

αi Hi ] ≤ π[C] − π[C] +

i=1

d 

|αi (Ciask − Cibid )|

i=1

Last result is obtained by taking C = 0.

The clever thing about this approach is that since the value of the contingent claim, under all models Qi , is between π(C) and π(C), then one can look jointly at the market value and the model uncertainty interval and compare relative values of various contingent claims on the market. It is easy to see that two financial products with an identical market value may have very different model uncertainty measures. This should be of particular interest to auditors, exchanges and regulators. In contrast, it is also possible to have a situation where one claim X dominates another Y by market value, i.e. X(ω) ≥ Y (ω) almost surely but ψ(X) ≥ ψ(Y ). This is a more tricky situation that may become even more complex when time dynamics is taken into account16 . 5.7.3

A convex measure of model risk

Conventional risk measures are considered unreliable mainly if they fail to satisfy the subadditivity property. [Fabozzi and Tunaru (2006)] provided an example when estimation risk may change the subadditivity of a risk measure, highlighting essentially the hidden model risk associated with traditional risk management measures. By relaxing the positive homogeneity condition, [F¨ ollmer and Schied (2002)] merged this condition and the subadditivity condition into a convexity condition ∀α ∈ [0, 1],

ψ(αC1 + (1 − α)C2 ) ≤ ψ(αC1 ) + ψ((1 − α)C2 ).

(5.67)

16 See [Cont (2006)] for an interesting discussion of updating model uncertainty measures with the arrival of new information.

page 109

April 28, 2015

12:28

110

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Risk measures satisfying this condition in turn are called convex risk measures. [F¨ ollmer and Schied (2002)] determined, under an extra continuity condition, the formula for a convex risk measure. For a family of models P  (5.68) ψ(C) = sup EP [−C] − η(P) P∈P

where η(P) is a positive penalising function. Coherent risk measures are therefore a subclass of convex risk measures, for the particular case when η = 0 or η = ∞. [Cont (2006)] constructed also convex risk measures of model uncertainty that may prove to be very useful practically since they can relax the requirement that any model should calibrate the liquid benchmark market prices precisely. The idea is to allow other models that do not necessarily calibrate exactly these liquid contingent claims, but penalise the models in the model risk uncertainty measure by the discrepancy between the payoffs Hi and the observed market prices Ci∗ . As before, for simplicity of exposition let us consider that the family of models is finite, that is I = {1, . . . , n}, Q = {Qi }i∈I . The convex risk measure of model uncertainty is constructed in [Cont (2006)] by first calculating   n  Q ∗ Q ψ(C) = sup E [−C] − |Ci − E [Hi ]| (5.69) Q∈Q

i=1

Then determine the margins of model uncertainty   n  π ∗ (C) = ψ(−C) = sup EQ [C] − |Ci∗ − EQ [Hi ]| Q∈Q



π∗ (C) = −ψ(C) = inf

Q∈Q

Q

E [C] +

i=1 n 

(5.70)

 |Ci∗

Q

− E [Hi ]|

(5.71)

i=1

The convex measure of model uncertainty is defined by ∀C ∈ C,

ψ∗ (C) = π ∗ (C) − π∗ (C).

(5.72)

The following result, proved by [Cont (2006)], shows the main properties of this convex model uncertainty risk measure. Proposition 5.2. Consider the applications π ∗ , π∗ and ψ∗ as above. Then (1) π ∗ is upper bounded by the observed market price ∀i ∈ I,

π ∗ (Hi ) ≤ Ci∗

(5.73)

page 110

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Derivatives Pricing Under Uncertainty

111

(2) π∗ is lower bounded by the observed market price ∀i ∈ I,

π∗ (Hi ) ≥ Ci∗

(5.74)

(3) If Q contains at least one model that calibrates exactly the benchmark contingent claims, then (a) ∀C ∈ C, π ∗ (C) ≥ π∗ (C). (b) Moreover, ψ∗ will also satisfy in this case all ties (5.57), (5.58), (2) and (5.60). (c) Diversifying a position using long positions in benchmark tives reduces model uncertainty. For any system of positive d i=0 αi = 1   d  ψ∗ α 0 C + αi Hi ≤ ψ∗ (C).

(5.75) properderivaweights

(5.76)

i=1

(d) Finally, any position which can be replicated by a convex combination of available derivatives has no model uncertainty. This n n means that if there is j=1 βj = 1 such that C = j=1 βj Hj then ψ∗ (C) ≤ 0. Proof. The following inequalities are clearly true ∀i ∈ I, ∀Q ∈ Q  |Cj∗ − EQ [Hj ]| ≤ EQ [Hi ] − |Ci − EQ [Hi ]| EQ [Hi ] − j∈I

≤ EQ [Hi ] + Ci∗ − EQ [Hi ] ≤ Ci Taking the sup over Q gives π ∗ (Hi ) ≤ Ci∗ . Similarly  j ∈ I|Cj∗ − EQ [Hj ]| ≥ EQ [Hi ] + |Ci∗ − EQ [Hi ]| EQ [Hi ] + ≥ EQ [Hi ] + Ci∗ − EQ [Hi ] ≥ Ci∗ Taking now the inf over Q gives π∗ (Hi ) ≥ Ci∗ . Since ψ given above in (5.69) is a convex risk measure we can write 1 ψ(0) ≤ [ψ(C) + ψ(−C)] 2 By definition π ∗ (C) = ψ(−C) and π∗ (C) = −ψ(C) and therefore ∀C ∈ C, π ∗ (C) ≥ π∗ (C) + 2ψ(0).

page 111

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

112

Carte˙main˙WS

Model Risk in Financial Markets

But −ψ(0) = inf Q∈Q i∈I |Ci∗ − EQ [Hi ]| is the smallest17 calibration error for the given set of benchmark values {Ci∗ }i∈I . Hence, if Q contains at least one model Q that calibrates perfectly these benchmark prices, then clearly ψ(0) = 0. Thus ∀C ∈ C, π ∗ (C) ≥ π∗ (C). Clearly ∀i ∈ I, Ci∗ ≤ π∗ (Hi ) ≤ π ∗ (Hi ) ≤ Ci∗ implies that π∗ (Hi ) = π ∗ (Hi ) and therefore ψ∗ (Hi ) = 0, ∀i. As with the coherent measure proof ∀φ ∈ S the gains process is a Qmartingale, then  T φt dSt ] = x0 . EQ [C] = x0 + EQ [ 0

Hence, ψ(C) = −x0 and π ∗ (C) = π∗ (C) = 0, which implies that ψ∗ (C) = 0. Similarly      T  T  Q ∗ Q ψ C+ φt dSt = sup E [C + φt dSt ] + |Ci − E (Hi )| Q∈Q

0

= sup Q∈Q

0

 Q

E [C] +



i∈I

|Ci∗



Q

− E (Hi )|

i∈I

Now we can use the convexity property of ψ, so ∀C1 , C2 ∈ C and ∀α ∈ [0, 1] ψ∗ (αC1 + (1 − α)C2 ) = ψ(αC1 + (1 − α)C2 ) + ψ(−αC1 − (1 − α)C2 ) ≤ αψ(C1 ) + (1 − α)ψ(C2 ) + αψ(−C1 ) + (1 − α)ψ(−C2 ) = αψ∗ (C1 ) + (1 − α)ψ∗ (C2 ) so ψ∗ is a convex application. d Lastly, for any system of positive weights j=0 βj = 1, from the convexity of ψ∗ just ⎛ proved we get ⎞ ψ ∗ ⎝β 0 C +

d 

βj Hj ⎠ ≤ β0 ψ∗ (C) +

j=1

d  j=1

But ψ∗ (Hj ) = 0, ∀j ∈ I so ψ∗ (β0 C +

d 

βj Hj ≤ ψ∗ (C)

j=1

Taking C = 0 and using ψ∗ (0) = 0 gives d  β j Hj ) ≤ 0 ψ∗ ( j=1 17 Assuming

as before that I is finite.

βj ψ∗ (Hj )

page 112

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Derivatives Pricing Under Uncertainty

5.8

Carte˙main˙WS

113

Notes and Summary

Model risk has been associated usually with operational risk in investment banking. Although there were important financial events causing important losses blamed on model risk errors, the research on model risk in financial markets has not been given the importance it deserves. With the expansion of product innovation came the even greater expansion of model complexity. Model risk is too large now to be simply ignored. MCMC techniques have been applied before to Finance, although they did not get the recognition they deserve in financial markets analytics. Some important readings can be found in [Polson and Roberts (1994)], [Bunnin et al. (2002)], [Eraker (2001)], [Jacquier et al. (1994a)], [Jacquier et al. (2004)] and [Polson and Stroud (2003)]. An excellent reading where several models applied in finance are reviewed jointly with MCMC techniques that can be used for those models can be found in [Johannes and Polson (2010)]. The theoretical outline of MCMC framework as it can be applied to finance is described in the excellent work by [Johannes and Polson (2010)] that covers standard models such as Black-Scholes, Merton but also stochastic volatility models such as Heston. In addition, they also describe the Bayesian extensions of the term structure models such as Vasicek and Vasicek with jumps, CIR and regime switching models. [Barrieu and Scandolo (2013)] also consider a general framework for measuring model risk in relation to market risk calculus. They propose three measures of model risk for selecting a model from a given class: the absolute measure of model risk, the relative measure of model risk and the local measure of model risk. These measures are calculated where possible for VaR and ES. In a beautifully crafted paper [Detering and Packham (2013)] proposed value-at-risk and expected shortfall type risk measures for the potential losses arising from using mis-specified models when pricing and hedging contingent claims. Model uncertainty is expressed in their framework by a set of pricing models, relative to which potential losses are determined. For given market data a unified loss distribution is shown to be the best estimate of losses across models. On some examples it is illustrated that the model risk calculations are necessary and sufficient to isolate the claims from losses arising from model risk. Here are some lessons from this chapter:

page 113

April 28, 2015

12:28

114

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

(1) One major problem is parameter calibration and there is a growing need in the literature to address the uncertainty associated with the estimation process. (2) Market incompleteness should not be confused with model risk and uncertainty. (3) Bayesian option pricing is an efficient way to get an insight into parameter estimation risk. (4) It is difficult to have a one size fits all type of solution but problems can be grouped into classes depending on the main characteristics of the financial assets under investigation. (5) From a theoretical perspective Cont’s framework and measures of model risk seem to be the most advanced at this point in time. (6) MCMC techniques can be used to construct distributions of derivatives prices. The quantiles of these distributions define new measures of model risk.

page 114

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 6

Portfolio Selection under Uncertainty

6.1

Introduction to Model Risk for Portfolio Analysis

Estimation risk for portfolio analysis is almost a separately area. For portfolio management and optimisation everything is multivariate and problems such as parameter estimation risk are vital. One of the first references in this area is the wonderful book [Bawa et al. (1979)] and the survey in [Brown et al. (1979)]. Other early references include [Brown (1979)] who discusses estimation risk on the capital market equilibrium, [Brown and Chen (1983)] and [Brown and Klein (1984)] where portfolio selection under uncertainty is discussed. Model risk for portfolio analysis deserves a separate line of inquiry and sets of results. Due to space limitations I do not pursue this idea in this book. However, I would like to mention examples that are based on model averaging, arguably the best statistical technique to account for model risk for portfolio selection, and some of the results that are at the cutting edge between continuous-time finance and portfolio analysis. MCMC techniques have been applied to financial modelling problems in the areas of volatility estimation and portfolio selection. Some important ideas in this field have been put forward by [Young and Lenk (1998)], [Jacquier et al. (2004)], [Elerian et al. (2001a)], and [Kim et al. (1998)]. Portfolio analysis can be greatly influenced by model risk as demonstrated by [Barry et al. (1991)]. Another important reference in a portfolio context is [Kerkhof et al. (2010)]. The impact of uncertainty in probability default parameters on the VaR determined by an investor in a credit portfolio has been investigated recently by [Tarashev (2010)], who concluded that the impact can be significant for a wide range of portfolio characteristics and for a wide range of

115

page 115

April 28, 2015

12:28

116

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

dataset sizes. [Vrontos et al. (2008)] used model averaging for improving the analysis of model selection for hedge fund performance. They accounted for model uncertainty in hedge fund pricing regressions and thus produced superior inference, Model risk may impact portfolio analysis and ultimately investment decisions in different forms. Niederhoffer investment fund designed their strategy on the probability model assumption that the market would not fall by more than 5% in a single day. Hence, the fund managers were selling out-of-the-money puts on stock index futures. The strategy worked until the stock market fell by 7% on October 27, 1987 in the aftermath of the Black Monday market crash of 19 October when the Dow Jones Industrial Average (DJIA) dropped by 508 points to 1738.74 equating to a 22.61% loss, the largest one-day loss in history. This was mirrored by a crash in the UK and also a small crash in Japan, as can be seen from Fig. 6.1. Portfolio analysis and indeed derivatives pricing has never been the same after Black Monday, equity investors requiring a crash-premium that resulted in fatter tails for the probability distributions describing the movements of stock prices and a departure from the traditional Black-Scholes model that had seemed to work fine until that time. What caused this spectacular market crash has been extensively researched, see [Shiller (1989)] for an excellent discussion. What is less discussed is the preamble to the crash and the possible political causes. In August 1987 the Dow Jones index was at 2722 points, or 44% over the previous year’s closing of 1895 points. On October 14, the Dow Jones index dropped 95.46 points to 2412.70, and another 58 points the next day, down over 12% from the August 25 all-time high. At the same time the US was engaged in a war with Iran and on Thursday, October 15, 1987, Iran hit an American-owned supertanker with a missile. The next morning, Iran hit another American ship with another missile.

page 116

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Portfolio Selection under Uncertainty

117

29000 27000 25000 23000

21000 19000 17000 15000

(a) Nikkei225

2600 2400 2200 2000 1800 1600 1400

1200 1000 19/06/1987

19/07/1987

19/08/1987

19/09/1987

19/10/1987

19/11/1987

19/12/1987

19/01/1

(b) FTSE100 2900 2700 2500 2300 2100 1900 1700

1500 19/06/1987

19/07/1987

19/08/1987

19/09/1987

19/10/1987

19/11/1987

19/12/1987

(c) Dow Jones

Fig. 6.1: The stock market crash in October 1987.

19/01/1

page 117

April 28, 2015

12:28

118

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Moreover, on Friday 16 October, all the markets in London were closed due to the Great Storm of 1987. The Black-Monday crash began like a tsunami from the Far Eastern markets on the morning of October 19, reaching London and European countries first on its way to the US. At the same time, two U.S. missiles bombarded an Iranian oil platform in retaliation to Iran’s previous missile attacks. This is a perfect example where stock market risk (programme trading) was bundled with political risk (the US-Iran war) and catastrophic risk (insurance losses caused by the Great Storm were estimated at £2 billion). Identifying the correct portfolio is a model selection problem. Furthermore, computational problems may have a stronger impact due to the size of investments. In this chapter I highlight the benefit of employing Bayesian model averaging, an elegant technique of updating results in the light of the ebbs and flows of unfolding uncertainty. Moreover, some of the less known examples showing the difficulties posed by stochastic interest rates and modelling the market price of risk, in a portfolio context, are revisited here. 6.2

Bayesian Averaging for Portfolio Analysis

The subject of portfolio selection is directly related to model uncertainty. Many portfolio selection methodologies use regression type models for empirical and statistical analysis. Hence, it is not surprising that the entire technology developed under a Bayesian paradigm to deal with model selection or model uncertainty, can be applied to financial portfolio selection applications. Consider an investor looking at a suite of N portfolios such that rt is the N -dimensional vector of continuously compounded returns on these portfolios in excess of the risk-free rate. Suppose that the investor takes a general to specific approach in model selection and therefore, with Υ available explanatory variables that are economically significant but not necessarilystatistically significant, the investor will have to compare 2Υ linear regression models of the form rt = xj,t−1 Bj + εj,t

(6.1)

 ) encapsulates the covariate information for the where xj,t−1 = (1, zj,t−1 j-th model which has Bj as the matrix of regression coefficients, including an intercept. The error specification is the standard one in this context εj,t ∼ N (0, Σj ) where Σj is the variance-covariance matrix for model j.

page 118

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Portfolio Selection under Uncertainty

119

Clearly the number of models increases very fast with the number of potential explanatory variables. Bayesian model averaging is an elegant methodology that allows the investor to combine a prior view on the suite of models with evidence based on actual data, and determine the model that is most useful. If the j-th model is denoted by Mj and D denotes generically the data, the posterior probability of the j-th model is given by the formula Pr(D|Mj ) Pr(Mj ) Pr(Mj |D) = 2Υ k=1 Pr(D|Mk ) Pr(Mk )

(6.2)

These posterior probabilities can be used to identify one or a few models that outperform other candidate models. From the above formula it can be seen that in order to calculate the posterior probabilities one must determine the prior probability Pr(Mj ) and the marginal likelihood contribution of the jth model Pr(D|Mj ). The latter is given by Pr(D|Mj ) =

L(Σj , Bj , D, Mj ) Pr(Σj , Bj |Mj ) Pr(Σj , Bj |D, Mj )

(6.3)

where L(Σj , Bj , D, Mj ) represents the likelihood function implied by the j-th model specification. The other probabilities appearing in (6.3) are joint prior and posterior distributions respectively, of parameters describing model j. [Avramov (2002)] considered the above framework from an investment portfolio analysis point of view and applied the Bayesian averaging approach to solve the problem of model uncertainty. In the following we shall review the elicitation of prior distributions and the calculation of marginal likelihood. The next notations are standard in this context for a sample of size T . r=

T 1 rt , T t=1

zj =

T −1 1  zj,t T t=0

T 1 Vk = (rt − r)(rt − r) , T t=1

6.2.1

T −1 1  Vj,z = (zj,t − z j )(zj,t − z j ) T t=1

Empirical Bayes priors

[Avramov (2002)] followed [Kandel and Stambaugh (1996)] in deciding on the prior distribution of parameters. This approach is called the empirical Bayes approach in statistics and in a nutshell uses a hypothetical sample

page 119

April 28, 2015

12:28

120

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

matching the observed statistical characteristics of the current sample. If − → Bj is the vector formed from stacking the rows of matrix Bj and Bj,0 = [r, 0j ] then the prior distributions used in [Avramov (2002)] are      −1  −1 −−→ 1 − → 1 + z j Vj,z z j −z j Vj,z Bj |Σj ∼ N Bj,0 , Σj ⊗ (6.4) T0 V −1 −V −1 z j j,z

j,z

for the regression coefficients, and the inverted Wishart distribution Σj ∼ IW (T0 , Vk , T0 − mj − 1)

(6.5)

for the variance-covariance matrix. Note that T0 is the size of the hypothetical sample used for the prior distribution. 6.2.2

Marginal likelihood calculations

For clarity, all quantities associated with the prior hypothetical sample are marked with the subscript 0. Under the framework outlined in the previous section, [Avramov (2002)] showed that the log of the marginal likelihood can be calculated in closed form as Tj,0 − mj − 1 TN log(π) + ln(det(Tj,0 Vr )) ln[P r(D|Mj )] = − 2 2



 N  Tj∗ − mj − 1 Tj,0 − mj − i ln(det(Sj )) − ln Γ − 2 2 i=1

 N  Tj∗ − mj − i ln Γ + 2 i=1

∗ Tj N (mj + 1) ln (6.6) − 2 Tj,0 where T Sj = Tj∗ (Vr + rr ) − ∗ (Tj,0 [r, rz  j ] Tj   + R Xj )(Xj Xj )−1 (Tj , 0)[rz  j ] + Xj R)

Xj = [xj,0 , xj,1 , . . . , xj,T −1 ] R = [r1 , r2 , . . . , rT ] Tj∗ = T + Tj,0 Then replacing on the right hand side in (6.2) one would get the posterior weights associated with each model. The posterior probabilities for the model indicate by their size which models are good data generators and which are not.

page 120

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Portfolio Selection under Uncertainty

6.3

Carte˙main˙WS

121

Portfolio Optimization

One important problem in finance is that calculating the continuous-time portfolio problem requires the solution of maximizing expected utility from final wealth from an investing strategy spanned by different securities (e.g. stocks, bonds, and the money market account). While a lot of research has been done in this area, there are some surprising results contradicting conventional wisdom. For the following discussion in this section we follow closely [Korn and Kraft (2004)]. The investor can trade in some stock S, some zero coupon bond p with maturity T1 > 0, and the money market account M . The following generic dynamics are specified: ⎧ ⎨ dMt = Mt rt dt (6.7) dS = St [μSt St dt + σtS St dWtS ⎩ t B B dpt = pt [μB dt + σ dW ] t t t where W S , W B are correlated Brownian motions with correlation coefficient ρ and r = {rt }t≥0 is the risk-free rate process. Additionally, we define the processes λS = μS − r, λB = μB − r, λ = S (λ , λB ), σ = (σ S , σ B ) and W = (W S , W B ). We assume that the parameter processes μS , σ S , μB , σ B and the short rate process r are progressively measurable with respect to the Brownian filtration. Moreover, it will also be required that for some T > 0, with respect to time t ∈ [0, T ], the processes r, μS , μB are almost surely integrable and σ S , σ B are almost surely square-integrable. One classical problem in portfolio optimisation considers an investor who maximizes utility from final wealth X(T) with respect to a power utility function U (x) = γ1 xγ , x ≥ 0, γ ∈ (−∞, 0) ∪ (0, 1). If π = (π S , π B ) represents the proportions invested in the stock and bond, respectively, the process π is a portfolio process if it has real components and it is progressively measurable with respect to the Brownian filtration satisfying  T  T  |rt + πt λt |dt < ∞, |πt σt |2 dt < ∞ 0

0

These conditions ensure that the corresponding wealth process dXtπ = Xtπ [rt + πt λt dt + πt σt dWt ] π

(6.8)

with X (0) = x0 is well defined. Such a portfolio process is admissible if Xtπ ≥ 0 for all t ∈ [0, T ]. If A(0, x0 ) denotes the set of admissible portfolio processes, the subset of strategies with well-defined expected utility is A∗ (0, x0 ) = {π ∈ A(0, x0 ) : E[U (XT )] < ∞}

page 121

April 28, 2015

12:28

122

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

The problem that the investor has to solve is

γ 1 π X max E (6.9) γ T π∈A∗ (0,x0 ) The martingale method of portfolio optimization, see [Karatzas et al. (1987)] and [Cox and Huang (1989, 1991)], solves the dynamic problem (6.9) by first solving the static optimization problem maxC E(U (C)) = E[U (C ∗ )]

(6.10)

where the optimization is done over all contingent claims C with a given price of x. In the second stage it solves the corresponding representation ∗ problem of finding a portfolio process X π such that its associated wealth ∗ process XTπ = C ∗ . The main problem with the martingale method is that quite restrictive assumptions, such as a bounded market price of risk, are needed. One condition ensuring the applicability of the martingale approach even in some cases when the investor’s opportunity set is stochastic has been provided by [Dybvig et al. (1999)]. If ξ = (ξ S , ξ B ) = (λS /σ S , λB /σ B ), the condition requires that all moments of the state price density

 t   t 1 t 2 rs ds − ξs  ds − ξs dWs (6.11) φ(t) = exp − 2 0 0 0 and its inverse φ−1 are finite. This condition is called DRB henceforth. [Korn and Kraft (2004)] construct explicit counterexamples related to unstable portfolio optimisation problems, assuming a stochastic investors opportunity set, that is portfolio problems with stochastic interest rates, stochastic volatility, or a stochastic market price of risk. Definition 6.1. The portfolio optimisation problem (6.9) is M-stable if the utility associated with putting all the money into the money market account, equivalent to π ≡ 0, is finite for all parametrization of the parameter processes as well as all risk preferences γ. M-stability is important due to the following counterintuitive result. Proposition 6.1 (Korn-Kraft). If the portfolio optimisation problem is M-unstable, independently of other traded assets, it is optimal for the investor to put all her wealth into the money market account. Proof. This strategy is evidently admissible. Then a representative investor will obviously select this strategy. The problem is that this state cannot be an equilibrium state of an economy if stocks and bonds are also traded.

page 122

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Portfolio Selection under Uncertainty

Carte˙main˙WS

123

Definition 6.2. The portfolio optimisation problem is I-unstable if there exists a parametrization of the model and a convex set I with a nonempty interior of portfolio strategies π ∈ A∗ (0, x0 ) such that E[U (XTπ )] = +∞ for all π ∈ I. Remark that the Merton setting with constant coefficients is stable in both senses as it can be verified that the constant portfolio process πt∗ = 1 λ 1−γ σ 2 is always the unique optimal portfolio process and leads to a finite expected utility. 6.3.1

Portfolio optimisation with stochastic interest rates

Here we discuss the impact on portfolio optimisation of considering interest rates to be stochastic. 6.3.1.1

Dothan model

The Dothan model is a short rate model falling under the HJM framework1 . It is specified by

drt = rt [a(t)dt + b(t)dWt ]

(6.12)

where a and b are measurable deterministic functions of time such that a is integrable and b = 0 is square-integrable. The next example shows that for all risk preferences γ > 0 it is optimal for the investor to put all of her money into the money market account, irrespective of the specification of parameters a and b. Proposition 6.2 (Exploding expectations with Dothan short rates). When the interest rates are characterised by the Dothan model, with respect to the portfolio optimisation problem, for any γ > 0

 t rs ds = +∞ E exp γ 0

Thus, the portfolio optimisation problem with stochastic interest rates driven by the Dothan model is M-unstable. The proof of this result goes back to [Sandmann and Sondermann (1997)] and it contradicts [Lioui and 1A

discussion on Baxter’s condition ensuring this is provided in [Korn and Kraft (2004)].

page 123

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

124

Carte˙main˙WS

Model Risk in Financial Markets

Poncet (2001)]. Moreover, given a Dothan term structure, the DRB condition is violated if the market price of risk is assumed to be constant because all moments of the state price density are infinite. However, the Dothan model has not been extensively used in the industry. Can the same problem occur for interest models more widely used? 6.3.1.2

CIR model

Let us assume now that the interest rates follow the well-known CIR model, with short rate dynamics specified by the SDE √ (6.13) drt = k(θ − rt )dt + σ rt dWt where t ∈ [0, T ∗ ], k, θ, σ > 0, T ∗ > T with initial value r0 given. It is well known that by assuming the condition 2kθ ≥ σ 2 all rates rt are positive almost surely. The following result2 shows that there could be exploding expectations in the CIR model too. Proposition 6.3 (Exploding expectations with CIR short rates). When the interest rates are characterised by the CIR model, with respect to the portfolio optimisation problem, the following results hold: (1) For all γ, k, t ∈ R+ , there exists σ, θ ∈ R+ such that 2kθ ≥ σ 2 and

 t E exp γ rs ds = +∞ 0

More precisely, for γ, k, t ∈ R+ it is sufficient to choose σ according to σ2 ≥

2k 3 tekt γ[2ekt − (1 + kt)2 − 1]

and then to choose θ such that θ ≥ σ 2 /2k. (2) For all t, k, σ, θ ∈ R+ with 2kθ ≥ σ 2 there exists some K > 0 such that

 t E exp γ rs ds = +∞ 0

holds for any γ ≥ K. It is sufficient to choose K satisfying K≥ 2 For

2k 3 tekt σ 2 [2ekt − (1 + kt)2 − 1]

a technical proof see [Korn and Kraft (2004)].

page 124

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Portfolio Selection under Uncertainty

Carte˙main˙WS

125

One can remark that, in contrast to the lognormal short rate models, the expectations do not explode for all parametrizations of the model if γ > 0. Nevertheless, the above result indicates that for each γ > 0 there are parametrizations such that the expected utility of an investment in the money market account is infinite and therefore the portfolio problem is M-unstable. Moreover, once again, the DRB condition is violated when the market price of risk is assumed to be constant because the above result implies that there exists some positive integer n0 such that the moments of the state price density with an order n ≥ n0 are infinite. 6.3.2

Stochastic market price of risk

Reconsider now the portfolio optimisation problem where the parameters of the stock dynamics are assumed to equal μSt = rt + ξtS · σ S and σtS is constant for any t. Furthermore, let the market price of risk of the stock be √ given by ξtS = k yt where k > 0 is a scalar and ξ B is a positive constant. The process y defining the market price of risk has the dynamics √ (6.14) dyt = κ(θ − yt )dt + σy yt dWtS where κ, θ, σy are positive constants and 2κθ ≥ σy2 . The short rate is assumed here to follow the extended Vasicek model drt = (ν(t) − αrt )dt + bdWtB

(6.15)

for some continuous function ν and some constants α, b > 0. The two noise processes W B and W S are assumed to be independent for simplicity. √ Assume that πtS = lξtS = lk yt , with l > 0 and π B ≡ 0. The next result highlights a contradiction with [Lioui and Poncet (2001)]. Proposition 6.4. In the context outlined above and given that the investor can invest at least in the stock and in a money market account, the portfolio optimisation problem (6.9) is I-unstable. Proof.

 dXt = Xt [ rt + πtS ξtS σ S dt + πtS σ S dWtS ]  √ = Xt [ rt + lk 2 yt σ S dt + lk yt σ S dWtS ]

Denoting the function h(l, σ S ) = lσ S (1 − 0.5lσ S ) the solution of the above SDE is

 t    t lkσ S 2 S rs + k h(l, σ )ys ds + κ(θ − ys )ds yt − y 0 − Xt = x0 exp σy 0 0

page 125

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

126

Carte˙main˙WS

Model Risk in Financial Markets

Since y ≥ 0 it follows easily that 

E(XTγ )

  ≥ [· · · ]E exp γ



T 0

rs ds



 2



E exp γk h(l, σ )



T

S

0

ys ds

(6.16) where [· · · ] are deterministic factors that do not influence the logical sense of the results. [Korn and Kraft (2004)] selected γ ∈ (0, 1) and κ, θ, σy with 2κθ ≥ σy and σ S = 1 and remarked that h(l, σ S ) ≥ h(1.5, 1) for all l ∈ [1, 1.5]. One can now define the convex set I = [1, 1.5]. Using the second part of Proposition 6.3 there is some K > 0 such that the quantity in (6.16) is infinite for all k ≥ K and all l ∈ [1, 1.5].

6.3.3

Stochastic volatility

Similar problems regarding the instability of portfolio optimisation problems may appear when working with a stochastic volatility model for the underlying stock in the market. Consider the Heston model that specifies √ the volatility of the stock as σtS = yt with √ dyt = κ(θ − yt )dt + σy yt dWty

(6.17)

with κ, θ, σy > 0 and 2κθ ≥ σy2 . For simplicity we shall assume that W S and W y are independent, the short rate is constant rt ≡ r and that π B ≡ 0. Considering γ ∈ (0, 1) and selecting the functional form for the market price of risk ξtS = kσtS the SDE of the wealth is dXt = Xt [(r + πtS kσtS2 dt + πtS σtS dWtS ]

(6.18)

Proposition 6.5. Assume that the stock price dynamics are governed by the Heston model with uncorrelated Brownian motions for the stock and the volatility, the short rate is constant and the market price of risk is ξ S = kσ S with k > 0. When the investor can invest at least in the stock and in the money market account, the corresponding portfolio problem (6.9) is I-unstable. A technical proof is detailed in [Korn and Kraft (2004)] and it again contradicts the results in [Lioui and Poncet (2001)] and [Liu et al. (2003)].

page 126

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Portfolio Selection under Uncertainty

6.4

Carte˙main˙WS

127

Notes and Summary

Portfolio selection and optimisation is naturally subjected to model risk. Practitioners have seen over the years that some models work well in normal market conditions while other models work well during turbulent time. For portfolio analysis model risk manifesting itself in a dual manner, through parameter estimation risk, see [Vrontos et al. (2008)] and through model identification risk, see [Bunnin et al. (2002)]. Furthermore, [Alexander (2003)] pointed out that model risk in risk capital calculations requires appropriate methods of aggregation that combine market, credit risk and model risk calculations using Bayesian methods. This area clearly needs more research. Here are some useful tips following the material in this chapter. (1) In general it is difficult to identify one model that works well all the time. (2) It is very useful not to focus on one single model but to work with a framework that is able to look simultaneously at several models and adjust the weights given to different models with the ebbs and flows of financial information. (3) Portfolio optimisation can be greatly affected by stochastic interest rates and stochastic volatility assumptions. (4) Lognormal models of interest rates can have explosive expectations. (5) For the CIR model the expectations do not explode if γ > 0. Nevertheless, for each γ > 0 there are parametrizations such that the expected utility of an investment in the money market account is infinite and therefore the portfolio optimization problem is M-unstable. (6) Assuming the stochastic volatility model for the market index may also lead to instability of portfolio optimisation problems.

page 127

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 7

Probability Pitfalls of Financial Calculus

7.1

Introduction

The development of modern finance is intrinsically related to the application of a wide range of probability tools. This chapter is a short reminder of the pitfalls presented by some of the main probability concepts when they are applied to finance. At the same time the chapter makes the link between the two parts of the book, between financial mathematics and asset pricing and financial statistics and risk management. One of the most contentious concepts in the quantitative finance of financial markets is correlation . A lot of financial modelling criticism regarding the subprime-liquidity crisis that started in 2007 was attributed to the “wrong” way the correlation of defaults was captured. The Gaussian copula has been heavily criticised and many new solutions have been proposed afterwards. For an interesting discussion see [Morini (2011)]. The examples presented in the literature criticising how correlation was captured by models used in investment banks and rating agencies seem to have missed the big picture. When calculating estimates for the credit default family of tools (default probabilities, correlations, recoveries etc) rating agencies use historical data as the main pillar. For a long time they were the only ones having long dated series of historical data for a very large pool of companies. Their estimation, driven by historical data, was through the cycle. Thus, when they wanted to estimate say the probability of default for a company from a given category (e.g. A rated, Banking, Europe domicile) they used their large sample of data and calculated the probability of default over one or two business cycles. In contrast, when an investment bank was (still is?!) calculating the probability of default, they did it to the point, that is they were interested in the probability of default, for that company alone,

129

page 129

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

130

Carte˙main˙WS

Model Risk in Financial Markets

up to the required horizon, which is usually much shorter than the length of the business cycle. Naturally, the two estimates will differ because of the different horizons but they will also differ because they are calculated under different probability measures. Rating agencies worked under the physical measure whereas the investment bankers employed the forward looking market pricing measure. These differences will manifest themselves irrespective of the way correlation, or any other concept of dependency, is modelled. Furthermore, rating agencies use a large sample approach, so they try to control the effects of uncertainty by having large databases. The estimation is based on information on many obligors. On the other hand, traders try to control uncertainty in relative terms, vis-a-vis the market, so their estimation is based on information from many investors. The two will rarely be the same unless the price formation for the majority of investors is realised through models taking into account the historical databases. Next, I would like to point out some examples that will clarify to some extent a series of concepts from probability and statistics, frequently utilised in financial modelling. 7.2

Probability Distribution Functions and Density Functions

By definition a function f is a density function for a real random variable if it satisfies two conditions: (1) f ≥ 0 ∞ (2) −∞ f (u)du = 1 When the density function f associated with a random variable  x is given, the distribution function for that variable is defined by F (x) = −∞ f (u)du, or equivalently F (x) = f (x), for any x ∈ R. However, it is not true that each distribution function corresponds to a density function. The following counterexample is based on the Dirac probability measure. If a ∈ R then the probability measure defined by  1, if a ∈ A; P(A) = 0, otherwise. is called the Dirac probability measure, or a point mass measure, at point a.

page 130

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Probability Pitfalls of Financial Calculus

Carte˙main˙WS

131

Proposition 7.1. Any Dirac probability measure does not have a probability density. Proof. Let a ∈ R be any real number momentarily fixed. Then it is evident that the probability distribution function corresponding to the Dirac measure at point a is  0, if x < a; F (x) = 1, if x ≥ a. This shows that it is not possible to have a probability density function f corresponding to this F . 7.3

Gaussian Distribution

Here we discuss a few properties of the normal or Gaussian distribution, that plays a very important role in finance. The first result describes the situation of a random vector that fails to be joint-Gaussian although each of the components is marginally Gaussian1 . Proposition 7.2. Let X = (X1 , X2 ) be a bivariate random vector such that X1 and Z are independent and identically distributed N (0, 1) and also X2 = sgn(X1 )|Z|. Then X1 and X2 are Gaussian but X is not. Proof. It is clear that X1 and X2 are Gaussian. A random vector X is joint Gaussian if and only if every linear polynomial Y of X is also Gaussian. But the combination Y = X1 + X2 is not Gaussian distributed. Thus, a portfolio of assets may have a Gaussian distribution for each asset but the joint distribution of all assets in the portfolio may not be multivariate Gaussian. [Embrechts et al. (2002)] discuss several fallacies related to probability distributions that can have important impacts on in portfolio risk management for example. For instances, given that a random vector is multivariate Gaussian distributed means that the first two moments, i.e. the expectations and the covariance matrix uniquely determine the joint distribution. Nonetheless, given that only each component of the random vector is Gaussian there could be many multivariate distributions having the same two moments. 1 See

[Stoyanov (2014)] for more counterexamples related to joint-Gaussian distribution.

page 131

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

132

Carte˙main˙WS

Model Risk in Financial Markets

Proposition 7.3. Marginal distributions and correlation cannot always fully determine the joint distribution. Proof. Let X, Y ∼ N (0, 1) such that the correlation coefficient between X and Y is ρ. If (X, Y ) is jointly Gaussian then the distribution function is F (x, y) = Fρ (x, y) where

  Φ−1 (x)  Φ−1 (y) 1 (s2 − 2ρst + t2  exp − Fρ (x, y) = dsdt 2(1 − ρ2 ) 2π 1 − ρ2 −∞ −∞ However, for any other copula C = Fρ while the marginal distributions will be Gaussian, the bivariate distribution will exhibit the same correlation coefficient ρ but will not be jointly Gaussian. One such copula has been constructed in [Embrechts et al. (2002)] by taking 2a − 1 [1 − 1(a,1−a) (x)] 2a 2a − 1 [1 − 1(a,1−a) (x)] f2 (x) = −1(a,1−a) (x) − 2a and then the copula is

 x  y C(x, y) = xy + f1 (u)du f2 (u)du f1 (x) = 1(a,1−a) (x) +

0

(7.1) (7.2)

(7.3)

0

The next result concerns the mapping between the set of joint distribution functions with given marginal distributions and the possible values of the correlation coefficient . Proposition 7.4. Consider the random variables X and Y having support [0, ∞). Suppose that sup{x|F1 (x) < 1} = sup{F2 (y) < 1} = ∞. x

y

Then there is no joint distribution for (X, Y ) such that the correlation coefficient between X and Y is ρ = −1. Proof. ρ = −1 means that Y = α + βX with β < 0. Then, for all y < 0 y−α F2 (y) = P[Y ≤ y] = P[X ≥ ] (7.4) β y−α y−α ≥ P[X > ] = 1 − F1 ( )>0 (7.5) β β which contradicts the fact that the support of Y is the positive axis.

page 132

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Probability Pitfalls of Financial Calculus

7.4

Carte˙main˙WS

133

Moments

Empirical (sample) moments and theoretical moments are widely used in finance for estimation of parameters, risk measures calculation and optimisation, and for designing efficient computational algorithms for pricing. Due to the omnipresence of the martingale theory in continuous-time finance, the mean is the most utilised location measure in spite of being heavily influenced by outliers and in spite of not indicating the most likely value. 7.4.1

Mean-median-mode inequality

For a unimodal distribution we shall denote by M its mode, by m its median and by μ its mean. It is accepted in the literature that most of the time the mean, median and mode occur in either the alphabetical or reverse alphabetical order M ≤m≤μ

or M ≥ m ≥ μ

so when M = μ we automatically would get that M = m = μ, with the evident implication to the symmetry of the distribution. However, [Basu and DasGupta (1992)] provided the following result showing that it is possible to break away from the double inequality. Proposition 7.5 (Basu and Das Gupta). Consider three distributions with the cumulative distribution functions F1 , F2 and F3 of U nif orm[−3 + δ, 0], U nif orm[0, 1 − δ] and the degenerate distribution respectively. Consider then the distribution obtained by the mixture of the three distributions F (x) =

3−δ δ 1−δ F1 (x) + F2 (x) + F3 (x) 4 4 2

where 0 < δ < 1. Then for F , we have that M =μ 0. Other counterexamples for the same problem were offered by [Dharmadakiri and Joag-dev (1988)] and in a more convoluted way by [Abadir (2005)].

page 133

April 28, 2015

12:28

134

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

7.4.2

Distributions are not defined by moments

In quantitative finance quite often parameters are estimated by matching moments up to some order. It is wrong to think that all distributions applied in finance are uniquely determined by their moments. The widely used lognormal distribution may share the same moments with a different distribution. The following example is from [Heyde (1963)]. Proposition 7.6 (Heyde). Consider the lognormal distribution with the probability density function

(ln x)2 1 1 √ exp − f (x) = (7.6) 2 2π x for x ≥ 0 and zero otherwise. Then, for any real number a such that −1 ≤ a ≤ 1 the distribution with the probability density function ga (x) = f (x)[1 + a sin(2π ln x)]

(7.7)

has the same moments as f . Proof. It is sufficient to show that  ∞ xj f (x) sin[2π ln x]dx = 0

(7.8)

0

for any positive integer j. After the change of variable x = exp(s + j) the integral can be rewritten  ∞  1 r2 /2 ∞ − s2 1 js+j 2 −( s+j )2 2 √ e e sin[2π(s + j)]ds = √ e e 2 sin(2πs)ds 2π −∞ 2π −∞ (7.9) The last integral is obviously zero. If Z ∼ N (0, 1) and X = exp (Z) then the moments of X are given by E(X n ) = E(exp(nZ)) = . . . = e

n2 2

(7.10)

Based on this, another more elaborate example of a distribution with the same moments as a lognormal distribution is the one proposed by Durrett (1995). Proposition 7.7 (Durrett). Consider the parameters λ ∈ (0, 1), −1 ≤ a ≤ 1 and the notations

 ∞  λπ 1 = exp −|x|λ dx. , β = tan 2 cλ −∞ Then the distribution with the density     (7.11) fa,λ (x) = cλ 1 + a sin β|x|λ sgn(x) exp −|x|λ has the same moments as the lognormal distribution in (7.6).

page 134

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Probability Pitfalls of Financial Calculus

135

It may come as a surprise but there is also an entire family of discrete random variables, all having these same moments as the continuous lognormal variable. The next example is from [Leipnik (1981)]. Proposition 7.8 (Leipnik). Let a > 0. Then the discrete random variables Ya with the associated probabilities P(Ya = aek ) =

1 − k2 1 e 2 ak ca

(7.12)

for any integer k, and where ca is a normalising constant, has the same moments given in (7.10). Proof. e

2

− n2

E(Yan )

=e

2

− n2



k n −k

(ae ) a

k∈Z



k2 exp − 2



1 ca

=1

7.4.3

Conditional expectation

Given a probability space (Ω, F, P), an integrable random variable X defined on this space and G ⊂ F a sub-sigma algebra, the conditional expectation E(X|G) is a G-measurable variable satisfying the well-known identity   E(X|G)(ω)dP(ω) = X(ω)dP(ω) A

A

for any A ∈ G. The conditional expectation is a well-defined object since the Radon-Nikodym theorem implies the existence and uniqueness of the random variable E(X|G) almost everywhere. Notice that E(X|G) is a random variable defined on the induced probability space (Ω, G, P). [Platen and Heath (2006)] pointed out that the random variable X may not be a random variable on this latter probability space. Proposition 7.9. It is possible that E(E(X|G)) = E(X).

(7.13)

Proof. Consider the probability space (Ω, F, P) where Ω = [0, 1] and let X be a random variable defined on this space such that X(ω) = ω for ω ∈ [0, 1], with probability density f (x) = 2x for x ∈ [0, 1]. Let G be the sub-sigma algebra defined by the event A = {ω ∈ [0, 0.5]}.

page 135

April 28, 2015

12:28

136

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

It follows then that P(A) = 0.25 and E(X) = 23 . However,  E(X|A) = 13 , for ω ∈ [0, 0.5]; E(X|G)(ω) = E(X|Ac ) = 79 , otherwise. This means that E(E(X|G)) =

5 9

= 23 .

This result may have more profound implications in some new areas of finance where subfiltrations of financial information are used in the context of risk management. 7.5

Stochastic Processes

In finance it is less known that specifying the marginal distributions does not determine fully a stochastic process. Proposition 7.10. There exist stochastic processes that have N (0, t) as the marginal distribution, t ≥ 0, but they are not Brownian motions. √ Proof. If Z ∼ N (0, 1) then the process defined by Xt = tZ is continuous and marginally distributed N (0, t). However, since  Xt+s − Xs ∼ N (0, t − 2 s(t + s) + 2s) showing then the increments are dependent on Xs . This result shows that when doing estimation of parameters for stochastic processes used in finance the pathwise properties are very important. For a more detailed discussion of this point in relation to using bridge processes to enhance parameter estimation see chapter 12. 7.5.1

Infinite returns from finite variance processes

The next example shows that it is possible to have infinite returns for a price process described by a finite-variance steady-state process distribution. For example, a very common short rate process used in interest rate literature is the CIR process √ (7.14) drt = θ(μ − rt )dt + σ rt dz(t) The returns process is given by

   T 1 = E exp rs ds p(t, T ) t

(7.15)

page 136

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Probability Pitfalls of Financial Calculus

However, since  



T

E exp

0

rs ds

 =

Carte˙main˙WS

137

−2θμ/σ2 θ sin(aT ) + 2a cos(aT ) 2a 

2 θ μT 2 r × exp + σ2 2a cot(aT ) + θ

√ with a = 2σ 2 − θ2 , for 2σ 2 > θ2 , this expectation exists only when the following condition is satisfied T
1, it can be shown that E[Xt ] < x0 eμt Taking the particular case of μ = 0 leads to E[Xt ] < x0 , and hence the process X is not a martingale. 7.6

Spurious Testing

Quite often in finance the investigator may draw the wrong conclusion from a data analysis because of what is generally called spurious regression analysis. 7.6.1

Spurious mean reversion

Here I shall follow [Christoffersen (2012)] in emphasizing how one can draw the wrong conclusion on whether a time series is generated by a meanreverting process or not. This is very important in financial markets where traditionally there has been a lot of evidence on the mean-reversion of asset prices. To this end suppose that the time series of an asset price follows is thought to follow an AR(1) model St = φSt−1 + εt

page 138

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Probability Pitfalls of Financial Calculus

Carte˙main˙WS

139

This can be rearranged as St − St−1 = (φ − 1)St−1 + εt and when φ = 1 we have a unit root indicating that in fact the process is a random walk. Vasicek model when discretized takes this kind of form after a re-notation of parameters. The finite sample OLS estimator is biased downward, the so called Dickey-Fuller bias. This means that even if the estimate value of φ comes out as 0.9 it may be the case that the real value is 1. Empirical evidence suggests in general that there are many asset prices with φ close but not equal to 1. Notice that φ = 0.99 implies a degree of predictability for the asset prices while φ = 1 does not at all. Estimating φ by the standard OLS method and testing the null hypothesis that H0 : φ = 1 versus the alternative hypothesis that Ha : φ < 1 leadsd to the usual t-test with critical values calculated from the Gaussian distribution. Therefore, the analysts will reject the null hypothesis more frequently than she should, finding spurious evidence of predictability. This problems appears because the modelling is done in levels rather than in first differences or logarithmic returns.

7.6.2

Spurious regression

If Xt and Yt are two asset prices that are independent random walks then the time regression model Yt = α + βXt + ut where ut is the usual error term, should have β = 0. When using t-tests in practice based on the OLS estimation of β, may give β as significantly different from zero. This phenomenon is called spurious regression effect. One way to safeguard against that is to use the autocorrelation functions. When there is spurious regression the error terms ut will show a highly persistent ACF. Thus, by taking the first differences Yt − Yt−1 = β(Xt − Xt−1 ) + ut − ut−1 should correct the conclusion and show a non-significant β.

page 139

April 28, 2015

12:28

140

7.7 7.7.1

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Dependence Measures Problems with the Pearson linear correlation coefficient

One of the cornerstones of financial modelling is capturing the dependence between the dynamics of various assets. Important steps forward have been made with the introduction of copula functions but more needs to be done in this area. In particular the introduction of time consistent copulae is a challenge. At the same time traditional measures of association such as the Pearson linear correlation coefficient are still widely used in the finance industry. In this chapter we highlight several pitfalls and less known facts about the linear correlation coefficient and about copulae. Consider two real random variables X, Y , with finite variances. These could be asset prices but more likely they will be asset price changes or asset price returns, arithmetic or logarithmic. Then the linear correlation coefficient is by definition Cov[X, Y ] ρ(X, Y ) =  2 2 σX σY

(7.18)

This is a measure of linear dependence with connections to the simple linear regression model. It is well known that ρ(X, Y ) = 0 when X and Y are uncorrelated and ρ(X, Y ) = ±1 if and only if Y = αX + β almost surely. Otherwise, the well-known representation ρ(X, Y )2 =

2

σY2 − minα,β E[(Y − (αX + β)) ] σY2

(7.19)

shows that outside linear dependence, −1 < ρ(X, Y ) < 1. Some shortcomings of linear correlation as a measure of dependence are discussed next. Proposition 7.13. The covariance, and hence the linear correlation, of the components of a bivariate Student distribution tn is not defined for n ≤ 2. This simple example suggests that other models employing infinite variance cannot rely on the correlation coefficient. Another well-known fallacy is the inverse causal implication between independence and zero correlation. Proposition 7.14. Zero correlation does not imply independence.

page 140

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Probability Pitfalls of Financial Calculus

141

Proof. If X ∼ N (0, 1) and Y = X 2 , since the Gaussian distribution is symmetric, Cov(X, Y ) = E(X 3 ) − E(X 2 )E(X) = 0 but obviously the two variables are highly dependent. Ideally, a measure of dependence should be invariant to strictly increasing transformation, that is if ψ : R → R is a strictly increasing transformation then ρ(ψ(X), ψ(Y )) = ρ(X, Y ). Proposition 7.15. Let (X, Y ) be a bivariate Gaussian vector with correlation coefficient ρ. Then, for the transformation ψ(x) = Φ(x) we have %ρ& 6 (7.20) ρ(ψ(X), ψ(Y )) = arcsin π 2 A technical proof is given in [Joag-dev (1984)]. Therefore, even for the simple and most utilised Gaussian case the correlation coefficient is not invariant to increasing transformation. 7.7.2

Pitfalls in detecting breakdown of linear correlation

Risk managers are particularly concerned with potential breakdowns of correlation between assets or sectors. The majority of financial calculations are performed in good times under some numerical assumptions on the correlation spectrum across assets, sectors and business units. However, in turbulent times, the visual inspection of groups of assets seem to suggest a new reality regarding correlation numbers. Since changing the correlation parameter inputs may impact adversely on the book of a trader, the discussion in this section following [Boyer et al. (1977)] may alarm both sides, risk managers and traders. In a nutshell, correlation breakdown may be 2 an artefact of a sample selection conditioning argument as the following result proved by [Boyer et al. (1977)] shows. Proposition 7.16. Let (X, Y ) be a bivariate Gaussian vector with variσ ances σx , σy ) and the unconditional correlation coefficient ρ = σxxy σy . Let 3 A ∈ B(R) be such that 0 < Pr(A) < 1. The conditional correlation coefficient between X and Y , given the realisation of the event X(ω) ∈ A, is 2 The ideas presented here are not proof that the correlations never break down. It only shows that proper testing of a breakdown should be performed before conclusions are reached, since it may be possible that a particular correlation between changes or returns in two assets is only different from a historical benchmark because of the occurrence of a particular event. 3 B(R) is the sigma-algebra of all Borel subsets of the set of real numbers.

page 141

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

142

Carte˙main˙WS

Model Risk in Financial Markets

given by

σx2 ρA = ρ ρ + (1 − ρ ) var(X|X(ω) ∈ A) 2

− 12 (7.21)

Proof. The proof is based on a simple trick that is used by quants and risk managers to simulate correlated pairs of asset returns. Let U and V be two standard Gaussian independent random variables. If mx and my are the known means of (X, Y ) then one can write the following representation X = mx + σ x U

 Y = my + ρσy U + 1 − ρ2 σy V  σy = my + (ρ (X − mx ) + 1 − ρ2 σy V ) σx

(7.22) (7.23) (7.24)

Let A be an event such that 0 < Pr(A) < 1. The conditional correlation coefficient between X and Y , given the occurrence of X(ω) ∈ A is by definition Cov(X, Y |X ∈ A) ρA =  (7.25) var(X|X ∈ A)var(Y |X ∈ A) Applying the fact that X and V are independent and replacing Y as represented above leads to

 σy 2 Cov(X, Y |A) = Cov(X, my + ρ (X − mx ) + 1 − ρ σy V |X ∈ A) σx σy Cov(X, Y |X ∈ A) = ρ var(X|X ∈ A) σx Similarly  σy var(Y |X ∈ A) = var(ρ X + 1 − ρ2 σy V |X ∈ A) σx = (ρ2 σy2 /σx2 )var(X|X ∈ A) + (1 − ρ2 )σy2 Combining these two concludes the proof. There are several observations that are implied by the result above. First, the same proof can be adapted for any nontrivial measurable event A, not necessarily linked to X. Hence, the same proof works for

− 12 σx2 2 2 (7.26) ρA = ρ ρ + (1 − ρ ) var(X|A) Secondly the conditioned correlation and the initial correlation have the same sign, so the conditioning will not change the direction of the correlation between X and Y . The values that are interpretable for the Pearson

page 142

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Probability Pitfalls of Financial Calculus

143

correlation coefficient, that is 0, ±1, are preserved under conditioning. In other words, ρA ≡ ρ if ρ = 0 or ρ = ±1. Thirdly, one can remark that the formula for the conditional correlation coefficient does not depend on the variance of Y itself. Last but not least, it can be implied from the above theorem that |ρA | ≶ |ρ| if var(X|X ∈ A) ≶ var(X). The graph in Fig. 7.1 illustrates the relationship between the conditional correlation coefficient ρA on one side and the marginal correlation coefficient ρ and the ratio between the marginal variance var(X) and the conditional variance var(X|X ∈ A) on the other side. One can clearly see that the conditional correlation coefficient may take very different values from the marginal correlation coefficient and conditioning introduces a nonlinear linkage between the two types of correlation coefficient.

Conditional correlation coefficient

April 28, 2015

1

0.5

0

−0.5

−1 5 4

1 3

0.5 2

0 1

ratio of variances

−0.5 0

−1

rho

Fig. 7.1: Calculating the conditional correlation coefficient as a function of the marginal correlation coefficient and the ratio between the marginal variance and the conditional variance.

[Boyer et al. (1977)] showed that the result in Proposition 7.16 still holds for the multivariate i.i.d case. First, let us consider the multivariate non i.i.d. more general case. The setup is given by the random vectors

page 143

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

144

Carte˙main˙WS

Model Risk in Financial Markets

X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) satisfying the distributional assumptions (7.27) X ∼ N (mx , Σxx ) (7.28) Y ∼ N (my , Σyy )



Σxx Σxy X mx , (7.29) ∼N my Σxy Σyy Y where Σxy = Cov(X, Y ). By convention we also assume that X, Y are stationary vectors with constant means and regular matrices Σxx , Σyy , Σxy . Since we are in a multidimensional case it is not straightforward to define the correlation between X and Y . However, following [Boyer et al. (1977)], the correspondent to the Pearson linear correlation coefficient would be the average correlation operator between X and Y , defined as tr(Σxy ) ρ=  (7.30) tr(Σxx )tr(Σyy ) with tr(c) denoting the well-known matrix trace operator. Taking advantage of the Gaussian feature of X and Y , the above constructive proof for the unidimensional Gaussian pair of variables can be replicated. Hence, if U and V are independent standard Gaussian vectors then X = mx + Σ1/2 xx U

−1/2 1/2 U + (Σyy − Σxy Σ−1 V Y = my + Σxy Σxx xx Σxy ) For conditioning purposes we restrict the sample space to all events for which (X(ω) ∈ A, Y (ω) ∈ Rn ) where the measurable event A ∈ B(Rn ) is such that 0 < Pr(A) < 1. Under the restricted probability measure induced by conditioning with the event X ∈ A, the random vectors X and Y are not multivariate Gaussian. For clarity denote by mx|A , my|A , Σxx|A and Σyy|A the corresponding first and second moments. The conditional covariance matrix is defined using the conditional expectation operator   Cov(X, Y |A) = E (X − mx|A )(Y − mY |A ) |A = Σxy|A The conditional correlation coefficient is then tr(Σxy|A ) (7.31) ρA =  tr(Σxx|A )tr(Σyy|A ) Linear algebraic considerations help to advance further the calculations Σyy|A = var(Y |A) −1  −1  = Σxy Σ−1 xx Σxx|A Σxx Σxy + Σyy − Σxy Σxx Σxy

Σxy|A = Cov(X, Y |A) −1/2  1/2 U + (Σyy − Σxy Σ−1 V |A) = Cov(X, my + Σxy Σxx xx Σxy )

= Σxy Σ−1 xx Σxx|A

page 144

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Probability Pitfalls of Financial Calculus

145

Therefore, it does not seem possible to establish an algebraic relationship between ρA and ρ in the general case. However in the i.i.d. case, further simplifications occur. For example, Σxx = σx2 In , Σyy = σy2 In , Σxy = σxy In . For the i.i.d. case ρ = Σxx|A =

2 σx|A

σxy σx σy .

Moreover,

· In

Σyy|A = ρσx σy (σx2 )−1 Σxx|A (σx2 )−1 ρσx σy + σy2 In − = σy2 (ρ2 Σxy|A =

2 σx|A

σx2

ρ2 σx2 σy2 In σx2

+ 1 − ρ2 )In

ρσy 2 σ In σx x|A

Consequently ρσy 2 σx σx|A

,

ρA = σx|A σy = , ρ2 +

ρ2

2 σx|A 2 σx

+ 1 − ρ2

ρ 2 σx (1 2 σx|A

− ρ2

showing the same relationship between the conditional average correlation coefficient and the average correlation coefficient as in the unidimensional case. 7.7.3

Copulas

Consider a random vector X = (X1 , . . . , Xn ) with the multivariate distribution F such that the individual components X1 , . . . , Xn have the continuous marginal distribution F1 , . . . , Fn . A copula of the random vector X is usually defined as the joint distribution function C of the random vector (F1 (X1 ); . . . ; Fn (Xn )) . The functional relationship is F (x1 ; . . . ; xn ) = P[F1 (X1 ) ≤ F1 (x1 ); . . . ; Fn (Xn ) ≤ Fn (xn )] = C(F1 (x1 ); . . . ; Fn (xn ))

(7.32)

This means that a copula is the distribution function of a finite real random vector with uniform (0,1) marginals. [Embrechts et al. (2002)] pointed out to an equivalent definition. Definition 7.1. A copula is any function C : [0, 1]n → [0, 1] which has the following three properties:

page 145

April 30, 2015

14:13

BC: 9524 - Model Risk in Financial Markets

146

Carte˙main˙WS

Model Risk in Financial Markets

(1) C(x1 ; . . . ; xn ) is increasing in each component xi . (2) C(1; . . . ; 1; xi ; 1; . . . ; 1) = xi for all i ∈ {1, . . . , n}, xi ∈ [0, 1]. (3) For all (a1 , . . . , an ), (b1 , . . . , bn ) ∈ [0, 1]n with ai ≤ bi we have: 2  i1 =1

···

2 

(−1)i1 +...in C(x1i1 . . . , xnin ) ≥ 0

(7.33)

in =1

where xj1 = aj and xj2 = bj for all j ∈ {1, . . . , n}. With many copulae functions available it is good to have an idea about the boundaries of an unknown copula before going for estimation or even for other risk management applications such as stress testing. A multivariate cumulative distribution function F (x1 , . . . , xn ) with univariate marginal cumulative distribution function Fi (x), i = 1, . . . , n is called comonotonic if F (x1 , . . . , xn ) =

min i∈{1,...,n}

Fi (xi ),

(x1 , . . . , xn ) ∈ Rn

It can be shown that a random vector (X1 , . . . , Xn ) is comonotonic if and only if it agrees in distribution with a random vector where all components are non-decreasing functions (or all are non-increasing functions) of the same random variable. It is not difficult to see that for any random vector (X1 , . . . , Xn ) P(X1 ≤ x1 , . . . , Xn ≤ xn ) ≤

min i∈{1,...,n}

P(Xi ≤ xi ))

so the upper bound is given for the comonotonic case. The concept of comonotonicity has a direct application to the bivariate case. If (X, Y ) is a bivariate random vector such that the expected values of X, Y and XY exist, consider (X ∗ , Y ∗ ) be a comonotonic bivariate random vector that has the same one-dimensional marginal distributions as (X, Y ). Then for covariance and mean, the upper Frechet-Hoeffding bound is given by Cov(X, Y ) ≤ Cov(X ∗ , Y ∗ ) and E[XY ] ≤ E[X ∗ Y ∗ ]. The equality is achieved if and only if (X, Y ) is comonotonic. Consider now two assets X, Y each with a lognormal distribution, that is ln(X) ∼ N (μ1 , σ12 ) and ln(Y ) ∼ N (μ2 , σ22 ). Then it is known that 1

2

1

2

E(X) = eμ1 + 2 σ1 , E(Y ) = eμ2 + 2 σ2

page 146

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Probability Pitfalls of Financial Calculus 2

2

147 2

2

var(X) = e2μ1 +σ1 (eσ1 − 1), var(Y ) = e2μ2 +σ2 (eσ2 − 1). If the vector is also comonotonic then the correlation coefficient between the two is equal to  −1 eσ1 σ2 − 1 (7.34) r FX (U ), FY−1 (U ) =  2 2 (eσ1 − 1)(eσ2 − 1) To follow an example from [Embrechts et al. (2002)] we take now μ1 = 0, σ1 = 1 and μ2 = 0, σ2 = σ. Then the correlation coefficient between the two variables that are comonotonic is given by 2

 −1 eσ − 1 (U ), FY−1 (U ) =  r FX (eσ − 1)(e − 1)

(7.35)

1.2

1

0.8

0.6

0.4

0.2

0 0

0.5

1

1.5

2 sigma

2.5

3

3.5

4

Fig. 7.2: The correlation between comonotonic lognormal variables ln(X) ∼ N (0, 1) and ln(Y ) ∼ N (0, σ).

page 147

April 28, 2015

12:28

148

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Clearly the correlation coefficient goes to zero for very large values of σ. At the same time the comonotonicity relationship is the highest correlation possible for a given pair of univariate distributions. Moreover, the graph in Fig. 7.2 shows that a) the correlation coefficient could be the same for two different values of σ, posing questions on the estimation of the correlation vis-a-vis the values of σ, and b) the correlation is very high when σ is close to zero, hence less uncertainty may lead to higher correlation. The concept closest in meaning to independence is exchangeability. Definition 7.2. A finite set of n random variables is called exchangeable if the random variables are identically distributed, and every subset of order k < n has identical distribution for any k ∈ {2, . . . , n − 1}. Evidently a family of i.i.d. random variables is exchangeable whereas the converse is not true. For the next set of counterexamples it suffices to restrict our attention to the bidimensional case of continuous random variables as in [Nelsen (1995)]. In this case X, Y are exchangeable if and only if FX = FY and their copula function is symmetric, that is C(u, v) = C(v, u). Consider first the copula obtained from averaging the Fr´echet bounds 1 [max(x + y − 1, 0) + min(x, y)] (7.36) 2 and let U and V be random variables with the joint distribution function given by C1 . This construction allows easy proofs for the following counterexamples. C1 (x, y) =

Proposition 7.17. There exist exchangeable random variables that are not independent. Proof. The random variables U and V constructed above have the same marginal distribution and evidently C1 is symmetric. Moreover, the same example shows that U and V are uncorrelated because Cov(U, V ) = 1/4 − (1/2)2 = 0 but they are not independent. The next result is another way to prove that a random vector with Gaussian marginals may not be jointly Gaussian. Proposition 7.18. There exist bivariate distributions with normal marginals that are not bivariate normal.

page 148

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Probability Pitfalls of Financial Calculus

149

Proof. Consider X, Y to be standard normal random variables and Φ to be the standard cumulative distribution function. If N2 (x, y; ρ) denotes a standard bivariate normal distribution function with correlation coefficient ρ ∈ (−1, 1) then any copula different from C(x, y) = N2 (Φ−1 (x), Φ−1 (y); ρ) that is constructed from normal marginals will not necessarily have a joint bivariate normal distribution. It is widely known that zero correlation is not equivalent to independence. The next example offers a proof of that. Proposition 7.19. There exist uncorrelated normal random variables that are not independent. Proof. Again, consider X, Y to be standard normal random variables. In the bivariate set-up for continuous random variables the independence is equivalent to the copula function being C(x, y) = xy. However, the copula C1 given above is different even for the cases when E(XY ) = 0. [Nelsen (1995)] points out that the copula C1 concentrates the probability mass uniformly on the two diagonals of the square [0, 1] × [0, 1]. This implies that the bivariate distribution (and its associated copula) are singular. The following two counterexamples are very important to understand that marginal properties cannot be transferred directly to a multi-dimensional case. Proposition 7.20. There exist bivariate distributions without a probability density such that the marginal variables do have probability densities. Proof. It can be easily seen that

∂ 2 C1 (u,v) ∂u∂v

= 0 almost everywhere.

Now we revise a series of results that may change investors and risk managers perspectives on portfolio modelling. Proposition 7.21. There are Gaussian random variables with their sum that is not Gaussian. Proof. Consider X, Y two standard normal variables such that C1 is their joint copula function. Then it can be seen that P(X + Y = 0) = 0.5 so the sum X + Y cannot be Gaussian. This example indicates that it is necessary that X, Y are bivariate Gaussian to have the sum or any other linear combinations of the marginals also normal.

page 149

April 28, 2015

12:28

150

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

This result has an important application to finance. It says that even when all assets in a portfolio have Gaussian returns it is possible that the portfolio return may not be Gaussian. For the following examples, consider U and V be random variables uniformly distributed on [0, 1] such that V = |2U − 1|. These random variables are continuous and let C2 be their copula, that can be derived in closed-form as  max[u + 0.5(v − 1), 0], u ∈ [0, 1/2]; C2 (u, v) = min[u + 0.5(v − 1), v], u ∈ (1/2, 1]. Remark that C2 is again a singular copula. The next proposition is one of the most surprising results, particularly with finance in view, being really counterintuitive. Proposition 7.22. There exist uncorrelated random variables such that one can be derived perfectly from the other. Proof. For the two random variable U, V above one can calculate that Cov(U, V ) = 0 while we also know that V = |2U − 1|. Proposition 7.23. There exists random variables that are identically distributed and uncorrelated but not exchangeable. Proof. The random variables U and V are identically distributed and they are uncorrelated but their copula C2 is not symmetric so the two variables are not exchangeable. Proposition 7.24. There exists identically distributed random variables whose difference is not symmetric about 0. Proof. P(U − V > 0) = P(U > 1/3) = 2/3

One implication of this result is that if an investor looks at two assets with identical distributions then going one long and one short will not give an evenly balanced position. Proposition 7.25. There exist pairs of random variables each symmetric about 0 but with their sum not symmetric about 0.

page 150

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Probability Pitfalls of Financial Calculus

Carte˙main˙WS

151

Proof. Let X = 2U − 1 and Y = 2V − 1. Since U and V are uniformly distributed on [0, 1] it follows that X and Y are uniformly distributed on [−1, 1]. Moreover P(X + Y > 0) = P(U + V > 1) = P(U > 2/3) = 1/3.

A practical problem is to decide whether a multivariate copula is Gaussian or not. [Malevergne and Sornette (2003)] suggest that one should simply test for pairwise normality, that is if (X1 , X2 ), (X2 , X3 ) and (X3 , X1 ) have a Gaussian copula, then the entire triplet (X1 , X2 , X3 ) also has a Gaussian copula. However, [Loisel (2009)] found the following example indicating that pairwise testing is not sufficient to conclude that a copula is Gaussian. Proposition 7.26. There exists random vectors (Z1 , Z2 , Z3 ) such that (Z1 , Z2 ), (Z1 , Z3 ) and (Z2 , Z3 ) have a Gaussian copula but the 3dimensional copula of (Z1 , Z2 , Z3 ) is not Gaussian. Proof. Consider ρ ∈ (−1, 1) a correlation coefficient and let ε be such that 0 < ε < 1 − ρ. Let (X1 , X2 , X3 ) be a Gaussian vector with mean vector ⎛ ⎞ 1 ρ+ε ρ+ε (0, 0, 0) and covariance matrix ⎝ ρ + ε 1 ρ + ε ⎠ . Now take (Y1 , Y2 , Y3 ) ρ+ε ρ+ε 1 to be a random vector with univariate standard normal marginals, that has the 3-dimensional copula Cθ (u, v, w) = uvw[1 + θ(1 − u)(1 − v)(1 − w)] and that is independent of the vector (X1 , X2 , X3 ). Let us define the new random vector , , ρ ε (X1 , X2 , X3 ) + (Y1 , Y2 , Y3 ) (Z1 , Z2 , Z3 ) = ρ+ε ρ+ε with 0 < ε < 1 − ρ. It is clear that all three pairs (Z1 , Z2 ), (Z1 , Z3 ) and (Z2 , Z3 ) have a Gaussian copula with the correlation parameter ρ. Without loss of generality we can focus on (Z1 , Z2 ). The copula of (Y1 , Y2 ) is Cθ (u, v, 1) = uv, indicating that Y1 and Y2 are independent. Standard calculations

imply that (Z1 , Z2 ) 1ρ is a Gaussian vector with covariance matrix . The 3-dimensional ρ1 copula of (Z1 , Z2 , Z3 ) is not Gaussian because, if it were, then (Z1 , Z2 , Z3 )

page 151

April 28, 2015

12:28

152

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

would be a Gaussian vector, and using characteristic functions so would be (Y1 , Y2 , Y3 ), which has the Cθ copula that is different from any Gaussian copula for θ ∈ (−1, 1) − {0}. The copula Cθ used above is called the Eyraud-Farlie-GumbelMorgenstern copula by [Nelsen (2006)] and also the Eyraud-GumbelMorgenstern copula by [Cambanis (1977)]. 7.7.4

More general issues

Consider η(·, ·) a dependence measure mapping two real-valued random variables X and Y to a real number. The following example proposed by [Embrechts et al. (2002)] indicates that, while there may be many properties that are desirable for a dependence measure to satisfy, some of these properties may be contradictory. Proposition 7.27. There is no dependence measure η that satisfies simultaneously the conditions: (1) for any application U : R → R strictly monotonic and any real-valued random variables X, Y  η(X, Y ), U increasing; η(U (X), Y ) = (7.37) −η(X, Y ), U decreasing. (2) η(X, Y ) = 0 ⇐⇒ X, Y are independent Proof. Consider X, Y to be jointly uniformly distributed on the unit circle. Then the following representation is true (X, Y ) = (cos α, sin α),

α ∼ U (0, 2π)

It is then evident that (−X, Y ) has the same distribution as (X, Y ) and therefore η(−X, Y ) = η(X, Y ) = −η(X, Y ) and so η(X, Y ) = 0 in spite of them being evidently dependent. The same line of proof can be tracked for any spherical distribution in the real plane.

The following theorem is very important on its own and shows that even simple linear correlation coefficients cannot attain freely any value in the [−1, 1] range.

page 152

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Probability Pitfalls of Financial Calculus

153

Theorem 7.1 (H¨ offding-Fr´ echet). Consider (X, Y ) to be a random vector with marginal distribution functions FX and FY , respectively, and some unspecified dependence structure. Assuming that each marginal variance is positive and finite the following statements are true (1) The range of all possible correlations between X and Y is a closed interval [ρmin , ρmax ] and ρmin < 0 < ρmax (2) ρ = ρmin if and only if X and Y are countermonotonic; ρ = ρmax if and only if X and Y are comonotonic. (3) ρmin = −1 if and only if X and −Y are of the same type; ρmax = 1 if and only if X and Y are of the same type Proof. [H¨ offding (1940)] proved the identity  ∞ ∞ [F (x, y) − FX (x)FY (y)]dxdy Cov(X, Y ) = −∞

(7.38)

−∞

Applying the Fr´echet’s bounds (see[Fr´echet (1957)]) max(x1 + x2 − 1, 0) ≤ C(x1 , x2 ) ≤ min(x1 , x2 ) for x1 = FX (x), x2 = FY (y) we get max[FX (x) + FY (y) − 1, 0] ≤ F (x, y) ≤ min(FX (x), FY (y)) Obviously now the integrand in (7.38) is minimised pointwise if X and Y are countermonotonic and maximized if X and Y are comonotonic. Evidently ρmax ≥ 0. The equality cannot be true because it implies that min[FX (x), FY (y)] = FX (x)FY (y) for all x, y. However, that would mean at least one of the marginal distributions is degenerated so that either σ 2 (X) = 0 or σ 2 (Y ) = 0 which is a contradiction. Similarly for ρmin < 0. 7.7.5

Dependence and Levy processes

Levy processes have opened the door to increased sophistication in modelling financial markets. One of the earliest applications in finance has been the time change of Brownian motions with subordinated processes. In a nutshell the physical time t is replaced by a process {Zt }t≥0 representing the market trading clock that obviously is not equidistant. The following example points to an important observation that may prove useful for model validation and product control. Proposition 7.28. Independent processes cannot be modelled via independent Brownian motions time changes with the same subordinator.

page 153

April 28, 2015

12:28

154

7.8

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Notes and Summary

Model risk can appear not only in the form of wrong identification but also as the wrong application of concepts from probability and statistics to finance. The examples and discussion in this chapter highlight the need to be careful when using known mathematical results in an applied context. In addition, it is better to be aware of the dangers and limitations of some of these complex concepts and try to keep an open mind about the modelling shortcuts that may lead sooner or later to shortcomings. The probability theory behind stochastic processes is a lot deeper than that behind statistics and univariate random variables. Not everything that works for a random variable may work for a stochastic process and if the stochastic process is infinite then results become even more complex. An example discussed by [Wise and Hall (1993)] indicates that sampling a strictly stationary random process at a random time may not give a random variable. Moreover they provide further important results that I cite here because of their importance to model estimation risk. Proposition 7.29. A minimum variance unbiased estimator need not exist. Proposition 7.30. A unique minimum variance unbiased estimator need not be a reasonable estimator. Dependence is one of the most important concepts not only in financial modelling but also in statistical modelling. One step forward is associated with the introduction of copulae. However, we are still a long way from finding a suitable mechanism to capture dependence in finance. The most advanced way to deal with dependence seems to be based on copula functions. [Patton (2009)] provides a very good introduction to this topic including applications in finance. A very good survey of the literature involving copula models, including estimation and goodness-of-fit tests can be found in [Patton (2012)]. A great overview of the concept and applications of comonotonicity is presented in [Dhaene et al. (2002b)] and [Dhaene et al. (2002a)]. Contrary to conventional wisdom, [Durrleman et al. (2000)] proved that the upper Fr´echet bound does not always give the more risky dependence structure. In other words it is not always true that maximal risk corresponds to the case where the random variables are comonotonic. Quite surprisingly, there is very little research done on model risk for

page 154

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Probability Pitfalls of Financial Calculus

Carte˙main˙WS

155

copula functions. This looks like a very promising area of future research. Some important lessons for model validation learnt in this chapter are (1) A portfolio of assets may have a Gaussian distribution for each asset but the joint distribution of all assets in the portfolio may not be multivariate Gaussian. (2) Marginal distributions and correlation cannot always fully determine the joint distribution. (3) Given some marginal distributions for scalar random variables it is not always true that there is a joint distribution with a given correlation coefficient. (4) There could be many probability distributions having the same set of moments. (5) Estimators with nice theoretical properties may not be unique or may not even exist. (6) Even for the simple and most utilised Gaussian case the correlation coefficient is not invariant to increasing transformation. (7) Conditioning can change the correlation dramatically . (8) There exist uncorrelated Gaussian random variables that are not independent. (9) There are Gaussian random variables with their sum not Gaussian. (10) There exist uncorrelated random variables such that one can be derived perfectly from the other. (11) There exist random variables that are identically distributed and uncorrelated but not exchangeable. (12) There exist identically distributed random variables whose difference is not symmetric about 0. (13) There exist pairs of random variables each symmetric about 0 but with their sum not symmetric about 0. (14) OLS methods and t-tests can be deceiving when modelling is done in levels.

page 155

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 8

Model Risk in Risk Measures Calculations

8.1

Introduction

The literature on risk management is not clear and definitive on what is the best measure to use for risk calculations. There are many risk measures that have been proposed over the years, starting from simple measures such as the standard deviation or the quantile measure giving the value-at-risk (VaR) to more complex measures such as expectiles or expected shortfall. Risk management is slightly different when you compare the insurance industry and the investment banking industry. For the former, the central limit theorem is key and large homogeneous portfolios are needed in order to be able to perform risk management calculations. In this case historical performance information is essential. Moreover, the insurance risk for one unit of analysis cannot reappear in another portfolio. In other words you cannot insure twice the same risky entity. By contrast, in investment banking heterogeneity is prevalent, nonlinear positions appear often and therefore the central limit theorem cannot be applied at portfolio level. In this case, historical performance information is not essential for gauging risk for the future, and risk management calculation is performed on a position by position basis as well as in the aggregate. Furthermore, the same entity may be subject to the same risk in several portfolios at the same time. In this chapter I would like to highlight some of the pitfalls related to some of the most common risk measures and procedures encountered in the insurance and finance industries. Value at risk is a focal point naturally and my aim is to highlight the risk of measures of risk in general. Consequently, I also include some discussion on backtesting that I hope to be useful to risk management, model validation and product control departments. 157

page 157

April 28, 2015

12:28

158

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

In portfolio theory and risk management one ought to analyse sums of random variables with known marginal distributions but unknown dependency structure. For ([Kaas et al. (2002)]) that the riskiest choice is to consider that the random variables are as dependent as possible, that is the comonotonic case. Restricting our discussion without great loss of generality to the two-dimensional case, let X and Y be random variables with corresponding cdf’s FX and FY respectively, and define the new random −1 (U ) and Y c = FY−1 (U ), for some uniformly distributed variables X c = FX random variable U over the domain [0, 1]. Assume that the random variables X c and Y c have the same marginal distributions as X and Y but they are also comonotonic. Taking the real numbers (d1 , d2 ) in the support of (X c , Y c ) it follows easily that (X c − d1 )+ + (Y c − d2 )+ ≡ (X c + Y c − d1 − d2 )+ . Moreover, (X + Y − d1 − d2 )+ ≤ (X − d1 )+ + (Y − d2 )+ almost everywhere, so when the variables X and Y have finite mean it is true that E[(X + Y − d1 − d2 )+ ] ≤ E[(X c + Y c − d1 − d2 )+ ] for their stop-loss premiums. If  denotes the convex order given by lower stop-loss premiums and the same mean, it is clear that X + Y  X c + Y c . Convex smaller risks should be more attractive for risk-averse decision makers. When X and Y are i.i.d. X + Y  2X. 8.2 8.2.1

Controlling Risk in Insurance Diversification

One of the fundamental ideas in insurance is that the variation coefficient of the total portfolio risk decreases when more independent risks are added to the portfolio. This diversification principle may not always work. [Kaas et al. (2004)] provided a simple example based on the Cauchy distribution, with density f (x) =

1 , π(1 + x2 )

−∞ < x < ∞.

This distribution has the peculiar property that the sample mean has the same distribution as each element in the sample element, so the sample n has the same distribution1 as X1 . Translated into mean X n = X1 +...+X n financial modelling parlance, quite remarkably, n companies may take i.i.d. 1 For

a proof see [Feller (1971)], page 51.

page 158

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

159

risks, and overall, in the aggregate (with everyone carrying an equal share of the total risk), the risk stays at the same individual level without any benefit of diversification. 8.2.2

Variance

Here we consider the well-known single-period investment problem and take variance as the measure of risk. The following two examples identified in [Dempster et al. (2011)] concern the very important question whether the fixed-mix strategy will decrease the variance. Suppose that X and Y are two financial assets with equal logarithmic return variance over the period, that is var[ln(X)] = var[ln(Y )]. A fixedmix strategy is any convex combination αX + (1 − α)Y with α ∈ (0, 1). Proposition 8.1. There exist fixed-mix strategies that have greater logarithmic return variance than the corresponding variance for each of the two component assets. In other words, it is not true that var[ln(αX + (1 − α)Y )] ≤ var[ln(X)] for any α ∈ (0, 1). Moreover, mixing assets may even increase variance overall. Proposition 8.2. There exist fixed-mix strategies that have greater logarithmic return variance than the maximum of the corresponding variances of the two component assets. This is a stronger proposition result saying that it is not always true that var[ln(αX + (1 − α)Y )] ≤ max (var[ln(X)], var[ln(Y )]) for any α ∈ (0, 1). Proof. The proof here will cover both propositions. Let X and Y be two i.i.d variables taking two possible values 1 and K > 0, with equal probabilities. Define the auxiliary function h(α) = var[ln(αX + (1 − α)Y )] where h : [0, 1] → R. By studying the function h one can show that this function has α = 0.5 as its minimum when K ∈ [K1 , K2 ], where K1 ∈ (0, 1) and K2 ∈ (1, +∞). Moreover, when K ∈ [K1 , K2 ] the function h has a local

page 159

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

160

Carte˙main˙WS

Model Risk in Financial Markets

maximum at α = 0.5.  [Dempster et al. (2007)] showed that K1 and K2 are given by 2e4 − 1 ± (2e4 − 1)2 − 1. Then selecting α0 < 0.5 such that this number is larger than the smallest local minimum of h one can construct the random variables ψ = α0 X + (1 − α0 )Y and φ = α0 Y + (1 − α0 )X. It is evident now that var[ln(ψ)] = var[ln(φ)] while at the same time 



ψ+φ > var[ln(ψ)]. var ln 2

8.3

Coherent Distortion Risk Measures

[Wang (1996)] introduced distortion risk measures starting from the concept of a distortion function g, defined as any non-decreasing function g : [0, 1] → [0, 1] such that g(0) = 0 and g(1) = 1. Then for any nonnegative random variable X the distorted expectation operator is defined by  ∞  1 −1 g(1 − FX (u))du = FX (1 − u)dg(u) (8.1) Hg (X) = 0

0

where evidently FX is the cdf of X. It is well-known that the distortion risk measures Hg have useful properties in a finance context such as positive homogeneity, translation invariance, monotonicity and additivity for comonotonic risks. If g is a concave function then Hg is subadditive, implying that such a risk measure is coherent. From a portfolio point of view, without great loss of generality, we can restrict our discussion to portfolios of two assets only. Now, for two different portfolios that have identical marginal distributions but different dependency structure, any risk measure for the sum of the value of the two assets should take a lower value for the portfolio with lower covariance. Nevertheless, this a priori view is not always true for concave distortion risk measures. First we need to introduce the ordering induced by the dependency measures such as covariance and Pearson linear correlation. Definition 8.1. The random vectors (X1 , Y1 ) and (X2 , Y2 ) with identical marginal distributions (FX , FY ) are ordered in the covariance sense if any of the following equivalent conditions holds: (1) for any nondecreasing functions f, g Cov(f (X1 ), g(Y1 )) ≤ Cov(f (X2 ), g(Y2 )) whenever the covariance is well defined.

page 160

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

161

Table 8.1: Example of a bivariate random vector with correlation coefficient equal to 1/24. Y2 0 1

0 5/24 1/24

X2 1 0 6/24

2 7/24 5/24

(2) for any nonnegative real numbers x, y F(X1 ,Y1 ) (x, y) ≤ F(X2 ,Y2 ) (x, y) The order relation is denoted by (X1 , Y1 ) ≤Cov (X2 , Y2 ). [Wang and Dhaene (1998)] proved that for any concave distortion function g if it is true that (X1 , Y1 ) ≤Cov (X2 , Y2 ) then Hg (X1 + Y1 ) ≤ Hg (X2 + Y2 ). However, the order relationship induced by the covariance operator is only a partial order. A more insightful order relationship will take into consideration the Pearson linear correlation since it will involve the variances of the portfolios as well as the covariances. [Darkiewicz et al. (2003)] provided the following example of a concave distorted risk measure that does not preserve the order induced by Pearson correlation. Proposition 8.3. Let g(x) = min(4x, 1) be a concave distortion risk measure. Then there are random vectors (X1 , Y1 ) and (X2 , Y2 ) with identical marginal distributions (FX , FY ), such that Corr(X1 , Y1 ) ≤ Corr(X2 , Y2 ) but Hg (X1 + Y1 ) > Hg (X2 + Y2 ). Proof. Consider that (X2 , Y2 ) has the distribution in the Table 8.1. The random vector (X1 , Y1 ) is taken with independent components and the same marginal distribution as (X2 , Y2 ). It is straightforward to see that Cov(X2 , Y2 ) = 1/24 so we have the inequality 0 = Corr(X1 , Y1 ) < Corr(X2 , Y2 ) = 1/24 Moreover, Hg (X1 + Y1 ) = 3 and Hg (X2 + Y2 ) = 17/6 so Hg (X1 + Y1 ) > Hg (X2 + Y2 ).

page 161

April 28, 2015

12:28

162

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

[Darkiewicz et al. (2003)] proved the following more general result. Proposition 8.4. Let g be a distortion function that is piecewise continuously differentiable with g nonincreasing and such that g is different from the identical function. Then there exist univariate distributions FX g , FY g and random vectors (X1g , Y1g ) and (X2g , Y2g ) with those marginal distributions such that Corr(X1g , Y1g ) < Corr(X2g , Y2g )

(8.2)

Hg (X1g

(8.3)

+

Y1g )

>

Hg (X2g

+

Y2g )

A lengthy technical proof can be found in [Darkiewicz et al. (2003)]. This very general counterexample seems to have its source in the concavity of the distortion function. However, the same conclusion may be drawn even for non-concave distortion functions. Proposition 8.5. Let g be a non-concave distortion function that is piecewise continuously differentiable. Then there exist univariate distributions FX g , FY g and random vectors (X1g , Y1g ) and (X2g , Y2g ) with those marginal distributions such that Corr(X1g , Y1g ) < Corr(X2g , Y2g )

(8.4)

Hg (X1g

(8.5)

+

Y1g )

>

Hg (X2g

+

Y2g )

Proof. The following result based on the representation theorem proved by [Schmeidler (1986)] is essential for this proof. Theorem 8.1. If B denotes the set of bounded random variables and the operator H : B → [0, ∞) satisfies the conditions: (1) H is additive for comonotonic risks (2) H preserves first stochastic dominance (3) H(1) = 1 then there exists a distortion function h such that H(X) = Hh (X) for all X ∈ B. In addition, H(X + Y ) ≤ H(X) + H(Y ) for any X, Y ∈ B if and only if h is concave. This important result implies that for Hg there exists a bivariate random vector (X, Y ) with bounded marginals such that Hg (X + Y ) > Hg (X) + Hg (Y ). Consider now another random vector (X ∗ , Y ∗ ) such that it has the same marginals as (X, Y ) but it has a comonotonic dependency structure, that is Hg (X ∗ + Y ∗ ) = Hg (X ∗ ) + Hg (Y ∗ ) = Hg (X) + Hg (Y )

page 162

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

163

Hence Hg (X + Y ) > Hg (X ∗ + Y ∗ ) and since it is known2 that var(X + Y ) < var(X ∗ + Y ∗ ) it follows that Corr(X, Y ) < Corr(X ∗ , Y ∗ ). [Darkiewicz et al. (2004)] extend the discussion to Spearman’s rank correlation coefficient and Kendall’s tau rank correlation coefficient in place of the Pearson correlation coefficient. 8.4

Value-at-Risk

8.4.1

General observations

Almost a decade after the Black Monday crash of October 1987 the RiskMetrics Group of J.P. Morgan put forward in 1996 a new concept for measuring market risk, the so-called Value-at-Risk (VaR), which has become a standard market risk metric. In simple terms VaR refers to the possible maximum losses and profits generated by a portfolio relative to a time horizon h. Slightly more precisely, the V aRα,h of that portfolio is the threshold monetary value such that a loss larger than this threshold may occur only with a probability smaller than α. In other words, the risk manager is 100(1 − α)% confident that losses larger than V aRα,h will have a very small chance to surface over the horizon h. Before the subprime-liquidity crisis standard rules on calculating market risk capital requirements relied on accurate calculation of the estimated level of portfolio risk as well as passing a validation backtesting of the VaR model’s performance. The risk based capital requirement was set as the maximum between the bank’s current assessment of the 1% VaR over the horizon of 10 trading days and the multiple of the bank’s average reported 1% VaR over the previous 60 trading days, plus an additional amount accounting for the aggregated credit risk of the bank’s portfolio. Hence, the market risk capital was calculated using the formula   59 1  10 10 ), wt ) + T Ct V aRt−i (1%, T otalM Rt = max V aRt (1%, 250 60 i=0 250 (8.6) where T Ct is the credit risk charge and wt is the weight that is determined by the backtesting results by classifying the number N of VaR breaches 2 See

[Dhaene et al. (2002a,b)].

page 163

April 28, 2015

12:28

164

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

over the previous 250 trading days into three categories given by ⎧ if N ≤ 4 ⎨ 3.0, wt = 3 + 0.2(N − 4), if 5 ≤ N ≤ 9 ⎩ 4.0, if 10 < N With this approach if more than ten VaR violations occured over a period of one year the VaR model was practically invalidated and the risk management department needed to revisit the model. As an example, for a given portfolio and h = 1 year horizon suppose that the risk manager calculates that V aR5%,1yr is equal to £2.25 million. This means that, in his/her view or according to their model calculations, there is only a 5% chance of losing £2.25 million or more at the end of the one year period. It is important to realise that this is not a law of physics but an inferential statistical conclusion implied from a model or methodology of calculations, based on various assumptions. Please also note that there could be losses larger than £2.25 million, in a relatively large proportion, that may occur before the end of the period, say halfway through the period, but if these losses are followed by large gains in the subsequent half period then, overall, at the end of the period with horizon h, the losses may still be below the forecasted V aRα,h . Another observation is that VaR is a market risk measure defined under normal market conditions so it is not designed to measure losses triggered by external risk shocks such as fraud, employee incompetence, wars, political crises, mass revolt and unrest and so on. In addition, it is welldocumented and I will also exemplify below that VaR has some theoretical and practical deficiencies. One major criticism is that it is not informative about the magnitude of losses exceeding VaR. In essence the calculation of VaR is a forecasting exercise. An important question arising then is how do we know that our VaR measure has a good performance. [Brooks and Persand (2000)] pointed out early on the pitfalls of VaR methodologies in times of crisis and suggested ways to correct the implied biases. [Jorion (1996)] was the first to argue that since the model parameters ought to be estimated from financial data, estimation error is present with any VaR parametric calculation. This is a very important area. [Beder (1995)] compared eight VaR calculation methods used by a sample of commercial institutions and over three hypothetical portfolios. Quite worryingly, she found that parallel VaR estimates could differ by as much as a factor of 14. A similar conclusion in a repeat of this type of analysis was obtained by [Berkowitz and O‘Brien (2002)]. [Pritsker (1997)] compared

page 164

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

165

various VaR methodologies from the point of view of their accuracy and also computational time and showed that risk measure calculations can vary significantly across models. [Christoffersen and Goncalves (2005)] reviewed the estimation risk of financial risk measures. [Danielsson (2002)] reconsidered VaR from both internal and external (regulatory) points of view. He pointed out that statistical estimation performed using data from stable market periods will underestimate the risk that may surface in times of crisis. Based on an extensive survey across data classes and risk models, the VaR risk forecasting models are not robust and there is significant estimation risk. This manifestation of model estimation risk may lead to an increase both in idiosyncratic and in systemic risk. In practice, using the same model for VaR forecasting over a long series of successive periods of the same length h will generate a series of forecasting errors. These errors should be as close as possible to zero meaning that a VaR should not overshoot or undershoot its predictions. A too conservative VaR estimate will damage the investor through the back door because of the inflated capital reserve calculations. In general the proportion of breaches of VaR should be as close as possible to α. This is the main idea underpinning the backtesting theory and the application discussed later. Some authors such as [Dowd (2013)] view model risk essentially as the risk of error in pricing or risk-forecasting as is the case with VaR models. He also points out that risk measurement models are exposed to more potential sources of model risk than pricing models are. I believe that the truth is somewhere in the middle since more pricing models are developed in-house. A myriad of VaR models have been put forward, recent reviews are provided in [Angelidis and Degiannakis (2009)] and a vast amount of material about them can be found on the website gloria-mundi.com. For a more formal definition of VaR we denote by X the return of a portfolio with marked-to-market value Π, over the horizon h. If qα denotes the left-tail α quantile of the probability distribution of X, in other words P[X ≤ qα ] = α, then VaRα,h = −qα × Π

(8.7)

Quite often the portfolio value is normalised to one monetary value, that is Π = 1. The mathematical concept of VaR at a critical level α, e.g. α = 5% or α = 1%, is defined in conjunction with the cumulative distribution function (cdf) FX of portfolio return. Definition 8.2. For a risk represented by a random variable X the Value-

page 165

April 28, 2015

12:28

166

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

at-Risk (VaR) at a given critical level α can be defined as the upper αquantile VaRα (X) = −q α = − sup{x : P(X ≤ x) ≤ α}

(8.8)

but also as the lower α-quantile −1 (α) = inf{x : α ≤ FX (x)} VaRα (X) = qα = FX

(8.9)

The difference between the two definitions is mainly academic and it could become significant for discrete probability distributions. However, the majority of modelling in the industry is employing continuous probability distributions. In practice α = 1%, 5%, 10% are utilised but other values such as 2.5% and 0.01% have been used by auditors and regulators. For h, the most used values are one day, two weeks (10 trading days), one month (21 trading days), one quarter and one year (252 trading days). Under the assumption that the returns are i.i.d. and are logarithmic returns, the √ standard deviation of n-day returns is n× standard deviation of one-day returns. Hence, the time scale of the VaR calculations can be adjusted as √ V aRα,2−week = V aRα,1−day × 10 Selecting the appropriate horizon is more difficult than it appears at a first glance because results are not always time-scalable. The Turner review (see [FSA (2009)]) particularly highlighted the sensitivity of the horizon chosen for VaR calculations in relation to bank capital: “Measures of VaR were often estimated using relatively short periods of observation e.g.12 months. As a result they introduced significant procyclicality, with periods of low observed risk driving down measures of future prospective risk, and thus influencing capital commitment decisions which were for a time self-fulfilling. At very least much longer time periods of observations need to be used.” Lord Turner in The Turner review, a regulatory response to the global banking crisis, www.fsa.gov.uk/pubs/other.

An important remark is that methodologies relying on the central limit theorem and hence requiring large samples of data may not comply with Lord Turner’s request. There is a lot of debate about Solvency II, which is based on VaR at the 99.5% confidence level, that is α = 0.5%, over a one year horizon, versus the Swiss Solvency Test (SST) which is using instead as a benchmark calculation the expected shortfall3 ES at the 99% confidence level over the 3 This

concept is discussed in greater detail later in this Chapter.

page 166

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Risk Measures Calculations

Student(5,0,Sigma)

5

10

VaR ES

4 3 2 1 0 0

30 25

0.5

1

Sigma

1.5

15 10 5 0.5

1

Sigma

1.5

2

167

VaR ES

6 4 2

10

VaR ES

20

0 0

8

0 0

2

ES(99%)−VaR(99.5%)

Gaussian(0,Sigma)

6

Student(2,0,Sigma)

April 28, 2015

8

0.5

1

1.5

2

1

1.5

2

Sigma

Gaussian Student(5) Student(2)

6 4 2 0 0

0.5

Sigma

Fig. 8.1: Comparison of VaR and ES for Gaussian and Student’s distributions under Solvency II and the Swiss Solvency Test.

same horizon. Figure 8.1 describes the VaR at 99.5% and the ES at 99% for the Gaussian case and for the Student’s distributions with 5 and 2 degrees of freedom respectively. Several important conclusions can be drawn based on these graphs. While for small values of the standard deviation parameter σ there is no significant difference between the two measures of solvency for the Gaussian distribution, the difference increases with σ, for all three models. In addition, it is clear that different models will lead to different risk level calculations. Therefore, while at a superficial level it looks like there is not much discrepancy between Solvency II and the SST, this difference may actually be quite significant.

page 167

April 28, 2015

12:28

168

8.4.2

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Expected shortfall and expected tail loss

Depending on the chosen quantile, lower or upper, and assuming that returns X satisfy max(−X, 0) < ∞, the expected tail loss (ETL) at α critical level is defined by ET Lα (X) = −E[X|X ≤ q α ] for the lower quantile version, and ET Lα (X) = −E[X|X ≤ qα ] for the upper quantile version. A careful analysis reveals that ET Lα (X) ≥ ET Lα (X). One major problem though with this risk measure is that it is not always sub-additive. By definition the expected shortfall (ES) is the average of the worst losses greater than the corresponding VaR. Formally  1 α −1 F (u)du (8.10) ESα (X) = − α 0 X [Acerbi and Tasche (2002)] proved that the ES depends only on the distribution of X and the level α, but it is the same irrespective of which definition for the quantile is chosen. Moreover, it was also proved that this measure is coherent. For practical purposes, it is good to remember that if X is a continuous random variable then ET L = ES for any critical level α. 8.4.3

Violations ratio

If a VaR forecast at the α% confidence level is correct, then approximately α% of observations should have exceeded the VaR, so the VaR would have cover of α%. It is well known that it is optimal to have some exceedences of the VaR threshold; having zero exceedences is not a property of an optimal VaR forecast. Looking at this risk measure from a dynamic or multi-period point of view it is not difficult to prove that any VaR forecast that has correct conditional coverage also has correct unconditional coverage. However, the converse is not true. For each period at a fixed horizon we define a Bernoulli variable It as the indicator random variable taking the value 1 when the risk value, such as return or loss, over that period Xt is less or equal to −V aRt and zero otherwise. More formally  1, if Xt ≤ −V aRt ; (8.11) It = 0, if Xt > −V aRt .

page 168

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

169

Denoting by N1 the number of times It = 1 over a testing time with a total of T P periods (days, weeks) we can compare the observed number of violations N1 with the expected number of violations α × T P . Ideally, for a N1 method of calculating VaR we would like V R = α×T P to be equal to one. If it is greater than one then the risk calculation procedure underestimates risk whereas if V R is smaller than one the risk calculation method overestimates risk. VaR violations are meant to appear quite rarely. A 5% VaR should show only about 5 violations over a 100 days period, with VaR forecasted daily. In practice V R is rarely close to one. The rule of thumb in financial markets is that when V R ∈ [0.8, 1.2] the associated VaR methodology is considered fine. When V R < 0.5 or V R > 1.5, the VaR procedure is considered imprecise. Proposition 8.6. There are sequences of hit variables for an α% VaR forecast model that have the correct unconditional coverage but have the incorrect conditional coverage. Proof. For simplicity we consider α = 5%. One easy solution is the series of hits where the first 5% of observations equal one and the remaining 95% of observations equal zero. The series has the right proportion of hits (and so it will have the correct unconditional coverage) but the hits will be serially correlated and so they will not have the correct conditional coverage. Another easy example would be a series of hits which equals 1 every 20th observation, and is zero otherwise. Again, there will again be the right proportion of hits, but there will be serial correlation and one can predict perfectly when the next hit will occur in this series, so the conditional coverage will not be correct. Here we shall take an example. Consider the FTSE100 index between 4 January 2010 and 1 November 2013, with adjusted closed prices. We shall apply four models to calculate the one-day VaR forecast and then calculate the violations ratio and perform backtesting. The four models are the moving average (MA), the exponentially weighted moving average (EWMA), the historical simulation (HS) and the GARCH(1,1) model. The first 600 days are used to estimate the parameters for the models and forecast VaR at the 99% confidence level, or equivalently at the 1% critical level, for one day ahead. Subsequently, the estimation window is moved to the right and the exercise is repeated. This will give us a sample of 368 days on which to compare VaR forecasts with realized returns. We assume that the initial

page 169

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

170

Carte˙main˙WS

Model Risk in Financial Markets 0.04

FTSE returns EWMA MA HS GARCH

0.03 0.02 0.01 0 −0.01 −0.02 −0.03 −0.04 May2012

Aug2012

Oct2012

Dec2012

Mar2013

May2013

July2013

Oct2013

Dec20

Fig. 8.2: Daily FTSE100 returns and 1% Value-at-Risk calculations for the period 22 May 2012 to 1 November 2013 using the adjusted closed price series.

value of the portfolio is 1 million pounds. From Fig. 8.2 we can draw some conclusions immediately. The MA and HS methods perform very poorly, being too conservative. The GARCH(1,1) method seems to be more conservative than the EWMA. The violations ratio and the corresponding VaR volatility over the forecasting period are presented in Table 8.2. From this table it appears that there were no violations at all for the HS method. This is a clear example of a “good” method that is too conservative. On the other hand the EWMA has a VR ratio that is slightly high. When doing this analysis the data is selected by the analyst. I have selected the adjusted closed price series mainly due to convenience. How sensitive are these results to a different data selection for FTSE100 over the same period, say the open price series or low price series? For the open price series of the FTSE100 over the same period the results are almost identical for the VR ratio. However, for the low price series we get very

page 170

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

171

Table 8.2: Comparative Violations Ratio Calculations for the FTSE100 for the period 22 May 2012 to 1 November 2013 using adjusted closed price series. Method EWMA MA HS GARCH(1,1)

VR estimate 1.6304 0.2714 0 0.5435

VaR volatility 0.0042 0.0011 0.0004 0.0035

Table 8.3: Comparative Violations Ratio Calculations for the FTSE100 for the period 22 May 2012 to 1 November 2013 using low price series. Method EWMA MA HS GARCH(1,1)

VR estimate 2.1739 0.2717 0 1.3587

VaR volatility 0.0032 0.0011 0.0020 0.0031

different results. The VR ratio results for the four methods is reported now in Table 8.3. The evidence that EWMA is quite bad is given by a VR ratio of 2.1739, in other words more than twice as many violations as expected. In Fig. 8.3 is is clear that when the low price series is used the series of returns and VaRs generated is much closer than when the adjusted price series is used. This is expected somehow since the lowest prices in the day are used, although there is no theoretical reason that this should be the case since the VaR models are re-estimated from scratch when the new data is used. This observation raises a very important question for risk management related to model risk when measuring risk, and that is “Which data series should be used for risk measures calculations”? 8.4.4

Correct representation

Value-at-Risk is a risk measure that from a computational statistical perspective is equivalent to a quantile calculation. Unfortunately the literature is dichotomised when it comes to which tail of the statistical distribution to use for calculating VaR. Generally speaking in this book we have used the left tail of the distribution of profits and losses, with losses occurring on the left side of the distribution and profits on the right side. However, actuaries

page 171

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

172

Carte˙main˙WS

Model Risk in Financial Markets

0.04

FTSE returns EWMA MA HS GARCH

0.03

0.02

0.01

0

−0.01

−0.02

−0.03

−0.04 May2012

Aug2012

Oct2012

Dec2012

Mar2013

May2013

July2013

Oct2013

Dec2013

Fig. 8.3: Daily FTSE100 returns and 1% Value at Risk calculations for the period 22 May 2012 to 1 November 2013 using the low price series. are concerned with losses per se and therefore they focus on losses. Thus, a loss of 5 million appears as a positive number in the distribution of losses and then VaR is a risk measure associated with the quantile in the right tail. For the latter type of calculations, α usually represents the confidence level, such as 90%, 95%, 99% and not the critical level of 10%, 5% or 1%. Then formally, see [McNeil et al. (2005)], VaR for losses associated with a position X at a given horizon h is defined as VaRα = inf{x ∈ R : P(X ≥ x) ≤ 1 − α} = inf{x ∈ R : FX (x) ≥ α} (8.12) It is very easy to switch calculations from the left tail to the right tail for symmetric distributions. However, for skewed distributions it is very important to make sure that the formulae are applied in the right context. In order to differentiate between this VaR measure calculated in the right tail of losses taken as positive quantities and the VaR measure calculated in the left tail we are going to denote VaRr for the former. Another common fallacy is that risk measures are uniquely determined

page 172

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

173

when one knows the marginal distributions and the correlation of the constituent risks. This is not correct as the next example shows. Proposition 8.7. The Value-at-risk of linear portfolios is certainly not uniquely determined by the marginal distributions and the correlation of the risky components. Proof. Consider the bivariate Gaussian vector (X, Y ) with standard Gaussian marginals and correlation ρ. If Fρ denotes the underlying bivariate distribution function corresponding to the correlation coefficient ρ, then the convex combination F = aFρ1 + (1 − a)Fρ2 of any bivariate Gaussian distribution functions Fρ1 and Fρ2 , with a ∈ (0, 1), also has standard Gaussian marginals and the corresponding correlation coefficient equal to aρ1 + (1 − a)ρ2 . For given −1 < ρ < 1 and 0 < a < 1 and ρ1 < ρ < ρ2 such that ρ = aρ1 + (1 − a)ρ2 it follows that the variable X + Y has a longer tail under the mixed distribution F than under Fρ .



z z PF [X + Y > z] = aΦ − + (1 − a)Φ − (8.13) 2(1 + ρ1 ) 2(1 + ρ2 )

z (8.14) PFρ [X + Y > z] = Φ − 2(1 + ρ) Since



1 1 +O Φ(−x) = ϕ(x) x x2 it follows easily that PF [X + Y > z] =∞ lim z→∞ PFρ [X + Y > z] Using example 3.3.29 from [Embrechts et al. (1997)] one can make an even more precise calculation. When the confidence level α  100%,  (8.15) VaRrα,F (X + Y ) ∼ 2(1 + ρ2 ) −2 ln(1 − α)  r VaRα,Fρ (X + Y ) ∼ 2(1 + ρ) −2 ln(1 − α) (8.16) and therefore VaRrα,F (X + Y ) 1 + ρ2 = >1 lim r 1+ρ α 100% VaRα,Fρ (X + Y ) for any value of a. [Makarov (1981)] and [Frank et al. (1987)] proved a result that provides the best upper bound for linear portfolios.

page 173

April 28, 2015

12:28

174

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Theorem 8.2 (MFNS). Let X, Y be two random variables with distribution functions F1 and F2 , respectively. The following statements are true: (1) ψ(z) ≡ supx+y=z Cl (F1 (x), F2 (y)) ≤ P[X + Y ≤ z] (2) Defining ψ −1 (α) ≡ inf{z|ψ(z) ≥ α} with α ∈ (0, 1) as the generalized inverse of application ψ, we have that ψ −1 (α) =

inf

Cl (u,v)=α

{F1−1 (u) + F2−1 (v)}.

(3) The following upper bound is the best possible: r (X + Y ) ≤ ψ −1 (α). V aRα

The MFNS theorem is important because it shows that the ratio between the VaR of the portfolio and the sum of VaR measures for the components of the portfolio can be made arbitrarily large. Proposition 8.8. The ratio large.

VaRrα (X+Y ) VaRrα (X)+VaRrα (Y )

can be made arbitrarily

Proof. The proof follows [Embrechts et al. (2002)]. Consider again the Pareto marginals F1 (x) = F2 (x) = 1 − x−β , wherex > 1, and β > 0. We have to determine inf u+v−1=α {F1−1 (u) + F2−1 (v)}. Because F1 = F2 , the function g : (α; 1) → R+ , g(u) = F1−1 (u) + F2−1 (α + 1 − u) is symmetrical with respect to (α+1)/2. The Pareto density is decreasing so the function g is decreasing on (α; (α+1)/2] and increasing on the remaining part of the domain it follows that g((α + 1)/2) = 2F1−1 ((α + 1)/2) is the minimum of g. Moreover, ψ −1 (α) = 2F1−1 ((α + 1)/2). Then ψ −1 (α) VaRrα (X + Y ) ≤ r r r VaRα (X) + VaRα (Y ) VaRα (X) + VaRrα (Y ) =

F1−1 ((α + 1)/2) F1−1 (α)

= 21/β The upper bound 21/β is the same irrespective of the confidence level α, and can be reached.

page 174

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Risk Measures Calculations

8.4.5

175

VaR may not be subadditive

Here we highlight an example where VaRr is superadditive at any critical level α. Proposition 8.9 (VaR superadditive). Consider the risks of two assets X and Y that are independent and identically distributed with a Pareto(1,1) distribution. Then, for any α ∈ (0, 1) VaRrα (X) + VaRrα (Y ) < VaRrα (X + Y ) Proof. For the Pareto distribution FX (x) = 1 − 1 . One can easily show that VaRrα (X) = 1−α P(X + Y ≤ u) = 1 −

(8.17) 1 x

and therefore

log(u − 1) 2 −2 , u>2 u u2

Therefore, for any α P(X + Y ≤ 2VaRrα (X)) = α −

(1 − α)2 log 2



1+α 1−α

0, β > 0. Then, the independence tests are given by: H0 (ind) : β = 1,

Ha (ind) : β = 1

and under the null hypothesis, the likelihood ratio will be asymptotically χ2 (1). The conditional coverage tests are given by: H0 (CC) : a = α Ha (CC) : a = α

and or

β=1 β = 1

page 185

April 28, 2015

12:28

186

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

and under the null hypothesis the likelihood ratio is asymptotically χ2 (2) distributed. [Roynstrand et al. (2012)] pointed out that the null hypothesis associated with (8.33) is misspecified for discrete time durations and cannot be equal to the geometric distribution given by (8.31). The problem here is that when the durations are assumed to be i.i.d. with a geometric distribution, the likelihood ratio does not converge asymptotically to any distribution and therefore for testing the null hypothesis that durations follow a geometric distribution, the analyst needs to construct an empirical distribution accounting for the bias. The solution to the mismatch discussed above is to use a discrete variant of the Weibull distribution such as the one defined by [Nakagawa and Osaki (1975)] β

f (d; δ, β) = δ (d−1) − δ d

β

(8.35)

for all positive integers d and δ ∈ (0, 1), β > 0. For this distribution δ can be interpreted as the probability of at least one non-hit observation before a hit occurs, and β drives the memory of the process such that β = 1 leads to the geometric distribution. Then the independence tests are constructed as follows. For the independence tests H0 (ind) : β = 1,

Ha (ind) : β = 1

and under the null hypothesis, the likelihood ratio will be asymptotically χ2 (1). The conditional coverage tests are given by: H0 (CC) : δ = 1 − α Ha (CC) : δ = 1 − α

and or

β=1 β = 1

and under the null hypothesis the likelihood ratio is asymptotically χ2 (2) distributed. One great disadvantage of the Weibull, discrete and continuous, models for durations presented above is that once the model is specified the hazard rate is constant in time. An ingenious method developed by [Berkowitz et al. (2011)] allows for a time varying hazard rate by modelling directly the hazard rate for durations of VaR violations. More formally for any positive integer d λ(d; a, β) = adβ−1 , f (d; a, β) = λ(d; a, β)

(8.36) d−1 ' i=1

(1 − λ(j; a, β))

(8.37)

page 186

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

187

where a ∈ (0, 1) gives the probability of a hit and b ≤ 1 shows the departure from memoryless. It is reassuring now to see that under the null hypothesis the hazard function is constant and the probability density is exactly the geometric distribution. The independence and conditional coverage tests are constructed as with other duration models. For the independence tests: H0 (ind) : β = 1,

Ha (ind) : β = 1

and under the null hypothesis, the likelihood ratio is converging asymptotically to an equally weighted mixture of χ2 (0) and χ2 (1) distributions5 . The conditional coverage tests are given by: H0 (CC) : a = α and β = 1 Ha (CC) : a = α

or β = 1

and under the null hypothesis the likelihood ratio is asymptotically distributed as an equally weighted mixture of χ2 (1) and χ2 (2) distribution. 8.6

Asymptotic Risk of VaR

[Jorion (1996)] analysed the estimation error of VaR estimates obtained from a normal distribution with a zero mean parameter. Taking a step further [Chappell and Dowd (1999)] assumed that the mean is given and derived an exact confidence interval for the VaR estimate under a Gaussian model. 8.6.1

Normal VaR

The VaR under the Gaussian distribution assumption is easy to calculate. Suppose that the h-period returns X on a given portfolio follow a Gaussian distribution X ∼ N (μ, σ 2 ). Then V aRα,h = −qa σ − μ

(8.38)

where qa is the cutoff point6 at critical level α. Since the actual parameter values μ and σ for the distribution of returns are not known, the sample values m and s are used instead to give an estimated empirical value for VaR. 5 See

[Self and Liang (1987)] for a technical derivation. VaR is calculated at the 5% critical level then qa = −1.645 under the Gaussian distributional assumption. 6 If

page 187

April 28, 2015

12:28

188

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Table 8.5: Example of a confidence interval for a Gaussian VaR using the simulation of asymptotic distribution method from [Dowd (2000b)]. n 50 100 500 1000

lower end CI 0.765 0.829 0.917 0.943

upper end CI 1.329 1.218 1.090 1.061

V aRα,h = −qa × s − m

(8.39)

Therefore, if one could find the asymptotic distribution for the joint variables (m, s) then a confidence interval could be obtained for the VaR estimate. Dowd(2000) remarked that, for a sample of size n, we know that (n − 1)

s2 ∼ χ2n−1 σ2

which can be rewritten as σ2 ∼ and also

(n − 1)s2 , χ2n−1

μ∼N

m,

(n − 1)s2 nχ2n−1



The confidence interval for the VaR estimate can be calculated by simulation from the distribution of .

s2 (n − 1)s2 − N m, −qa (n − 1) 2 (8.40) χn−1 χ2n−1 [Dowd (2000b)] showed that when m = 0.2, s = 0.2 the confidence interval for the VaR at the 5% critical level and one-day holding period obviously improves with the sample size n, as illustrated in Table 8.5. [Moraux (2011)] was able to derive analytical results for the Gaussian VaR employing the “delta method” from mathematical statistics. The idea is to rewrite the formula for VaR under the Gaussian model as V aRα,h = −qa σ − μ = h(μ, σ)

(8.41)

This formula gives the population VaR if we know in full the parameters μ and σ. However, in practice only sample values m and s are available

page 188

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

189

for μ and σ respectively, from historical data. That means there will be a sampling error that needs to be taken into account when replacing the sample estimates V aRα,h = −qa × s − m = h(m, s)

(8.42)

For simplicity we shall drop the horizon h from the subscript notation of V aR in this section. Assuming that we have a sample of size n of i.i.d Gaussian returns then it is known from standard statistical theory that m ∼ N (μ,

σ2 ) n

(8.43)

s2 ∼ χ2n−1 (8.44) σ2 If μ were known exactly then these two results would be enough to calculate the confidence interval for V aR. When μ and σ are both unknown the calculation of the confidence interval for V aR needs to be done with more care, as discussed in [Moraux (2011)] whom we follow below. To start with, combining (8.42) with (8.43) gives: (n − 1)

E(V aRα ) = −E(m) − qα E(s)

(8.45)

= −μ − qα E(s)  var(V aRα ) = var(m) + qα2 var(s) σ2 + qα2 var(s) = n

(8.46) (8.47) (8.48)

and 2σ 2 n−1 2 The problem is that while s is an unbiased estimator of σ 2 , by the Jensen inequality √ √  E(s) = E( s2 ) ≤ E(s2 ) = σ 2 = σ E(s2 ) = σ 2 ,

var(s2 ) =

so s is downward biased from σ. Fortunately the bias can be calculated analytically following standard results in mathematical statistics. Thus √ 1 E( s2 ) = σ (8.49) ψ(n) √ (8.50) var( s2 ) = σ 2 (1 − ψ 2 (n))  n Γ( 2 ) 2 . The delta method will help us now calculate where ψ(n) = n−1 Γ( n−1 2 ) the standard errors of VaR estimates.

page 189

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

190

Model Risk in Financial Markets

Proposition 8.10. When the portfolio returns are i.i.d N (μ, σ 2 ) & √ % law aRα ) ∼ N (0, δ 2 ) lim n V aRα − V where δ 2 = σ 2 (1

Carte˙main˙WS

n→∞ + 12 qα2 )

(8.51)

Proof. It is well-known under Gaussian statistics theory that lim



n→∞

lim

n→∞



n(m − μ) ∼ N (0, σ 2 ) law

n(s2 − σ 2 ) ∼ N (0, 2σ 4 ) law

The delta method can be applied if we rethink formula (8.41) as √ V aRα = −μ − qa σ 2 = g(μ, σ 2 )

(8.52) 2  and therefore V aRα = g(m, s ), working on the vector of parameters θ = (μ, σ 2 ) rather than on (μ, σ) . The delta method gives directly the asymptotic distribution √ law aRα ) ∼ N (0, δ 2 ) lim n(V aRα − V n→∞ % & % & 1 2 2 2 Σ ∂g where δ 2 = ∂g ∂θ ∂θ . It is easy to calculate that δ = σ (1+ 2 qα ). The following results are implied directly from the above asymptotic derivation. Proposition 8.11. When the portfolio returns are i.i.d N (μ, σ 2 ) with μ and σ known, the standard error for the VaR estimate based on a sample of size n of returns is given by , σ 1 1 + qα2 (8.53) S.E.(V aRα ) = √ 2 n and a confidence interval for the VaR estimate at the critical level β is given by aRα )q1−β/2 < V aRα < V aRα − S.E.(V aRα )qβ/2 (8.54) V aRα − S.E.(V Proposition 8.12. When the portfolio returns are i.i.d N (μ, σ 2 ) with μ and σ unknown, the standard error for the VaR estimate based on a sample of size n of returns is given by √ 2 + q2 (8.55) S.E.(V aRα ) ≈ s2 √ α 2n and a confidence interval for the VaR estimate at the critical level β is given by V aRα − S.E.(V aRα )q1−β/2 < V aRα < V aRα − S.E.(V aRα )qβ/2 (8.56)

page 190

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Risk Measures Calculations

8.6.2

191

More general asymptotic standard errors for VaR

The definition of VaR is directly related to the definition of the quantile of the distribution of returns. Since the current practice is to report VaR as a positive number, by definition, VaR is the negative of the quantile. Thus, if all returns are non-negative (for example a long position in a call option), VaR would be negative, reflecting the fact that the holder of a call option has no downside risk. The second point is that VaR as defined above is a concept defined at the “population” level, to borrow terminology from statistics. Moreover, for computational purposes based on parametric models, VaR calculations depend on F . Nevertheless in practice, we do not know the population distribution F and we have to work with an estimate F calibrated from a sample of data X1 , . . . , Xn . The remaining part of this chapter is based entirely on [Stanescu and Tunaru (2012)] and [Tunaru (2013a)]. Stuart and Ord (1987) provide a formula for the asymptotic variance of a quantile. Adapting expression (10.29) from Stuart and Ord (1987) to our notation, we obtain an expression for the asymptotic variance of a VaR estimator: α(1 − α) α(1 − α) = (8.57) var(V aRα (F )) = nf (−V aRα (F ))2 nf (F −1 (α)2 ) where n is the sample size and f is the density function of the P&L (or returns) associated with F . Therefore, asymptotically (when the sample size n is large enough), V aRF,α follows a Gaussian distribution with mean equal to −F −1 (α), where F −1 is the generalized inverse of the cumulative distribution function F , and variance is as given above in (8.57). Hence:   α(1 − α) −1 qα ∼a N F (α), (8.58) nf (F −1 (α))2   α(1 − α) −1  V aRα (F ) ∼a N −F (α), (8.59) nf (F −1 (α))2 from which asymptotic confidence intervals for VaR can be easily derived. If c is the chosen confidence level of the asymptotic confidence interval for the selected risk measure,  Φ is the standard Gaussian cumulative distribution function, and Ψ = nf (α(1−α) −1 (α))2 it follows that: F Φ−1



1−c 2



Ψ + F −1 (α) < qα < Φ−1



1+c 2



Ψ + F −1 (α)

page 191

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

192

Carte˙main˙WS

Model Risk in Financial Markets

Φ−1



1−c 2



Ψ − F −1 (α) < V aRα (F ) < Φ−1



1+c 2



Ψ − F −1 (α)

Note that this asymptotic result can be applied also to discrete distribution functions for which f is the mass density function. 8.6.3

Exact confidence intervals for VaR

Although the asymptotic approach highlighted above is a quick starting point for producing confidence intervals for VaR estimates in closed form, it is only validly applied when the sample sizes are relatively large. Furthermore, the confidence intervals it produces may be large. An alternative measure of accuracy for VaR can be obtained using the theory of order statistics, see [David (1981)] and [Stuart and Ord (1987)]. Given an i.i.d. sample7 X1 , X2 , ..., Xn , we can order the observations from smallest to largest to obtain the ordered sample: X(1) ≤ X(2) ... ≤ X(n) X(r) is called the r-th order statistic, with r = 1, 2, . . . , n. Thus X(1) is the sample minimum and X(n) is the sample maximum. If we set r = [αn], where [a] is the largest integer smaller or equal to a, then V aRF,α can be interpreted as the negative of the r-th order statistic, with the underlying assumption that the observed sample is randomly drawn from a distribution with the cdf F . We can therefore employ a well known result from the order statistics literature, namely the distribution function of an order statistic, to derive confidence intervals for V aR. As [David and Mishriky (1968)] pointed out, the following results hold regardless of whether F is continuous or discrete:

n j n−j P (j out of the n observations do not exceed x) = [F (x)] [1 − F (x)] j (8.60) and  P (at least r out of the n observations do not exceed x) = P X(r) ≤ x n  n j n−j = FX(r) (x) = j=r j [F (x)] [1 − F (x)] (8.61) 7 The notation is slightly changed here because X is usually the profit/loss variable and not exactly the returns.

page 192

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

193

 n! where nj = n(n−1)...(n−j+1) = j!(n−j!) is the binomial coefficient. The j(j−1)...1 expression in (8.61) is the cumulative distribution function of X(r) . Confidence intervals can no longer be obtained in closed form (as in the asymptotic case above), but one can go through the following computational steps: Step 1: Decide what level of confidence and what VaR estimate are considered. Suppose we would like to obtain a 90% confidence interval for V aR1% , based on an empirical sample of 1000 observations. We therefore have: n = 1000, α = 1%, which means that r = [nα] = 10; hence we are interested in the distribution of the 10-th order statistic, and more specifically in finding the pair of its 5-th and 95-th percentiles, which will then be the lower and upper bounds of the 90% confidence interval sought after Step 2: Get the percentile points p1 and p2 , such that: 1000  1000 j 1000−j FX(10) (x1 ) = = 0.05 [p1 ] [1 − p1 ] j j=10 and FX(10) (x2 ) =

1000  j=10



1000 j 1000−j = 0.95 [p2 ] [1 − p2 ] j

Note that p1 and p2 must be determined via a numerical algorithm. For example, using the bisection algorithm suggested by [Dowd (2010)], we get that p1 = 0.0054 and p2 = 0.0157. Step 3: The 90% confidence interval for VaR is given by: [F −1 (p1 ), F −1 (p2 )]. It is important to note that although the confidence interval for VaR depends on the choice of F , p1 and p2 do not depend on F ; given a numerical algorithm for finding these probabilities, they depend solely on n and α. 8.6.4

Examples

For the results8 in Tables 8.6 and 8.7, we consider a sample of FTSE 100 daily returns (from 2000 to 2011) for which we compute the 1% and 5% VaR using the Gaussian, Student’s t and Johnson SU distributions9 . We also report, for each estimate, the respective upper and lower bounds of 90% confidence intervals computed using both the asymptotic and order 8 The results presented in this section follow [Stanescu and Tunaru (2012)]. I am grateful to Dr. Silvia Stanescu for giving permission to use the results here. 9 For technical details on this distribution and how it is applied to VaR calculations please see [Stanescu and Tunaru (2012)].

page 193

April 28, 2015

12:28

194

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

statistics approaches. To capture any changes within the 11 year sample period, we repeat the analysis for two subperiods, namely 2004-2008, a low volatility subperiod, and 2008-2010, a high volatility subperiod.

Table 8.6: FTSE 100 Daily Returns: Summary statistics for the period 2000 to 2010, and for the subsamples for the periods 2004-2006 and 2008 to 2010. Sample period Mean Standard deviation Volatility Skewness Excess kurtosis Sample size (n)

2000-2011 -5.9E-05 0.0132 0.2091 -0.1127 5.5960 3029

2004-2006 0.0004 0.0067 0.1061 -0.3798 1.6486 758

2008-2010 -0.0001 0.0173 0.2735 -0.0275 5.4092 759

Table 8.6 summarizes the characteristics of the data. As expected, the mean return is very close to zero for the entire sample period considered (2000-2011), as well as for the two subperiods, 2004-2006 and 2008-2010, respectively. The overall volatility10 is approximately 21%. The 2004-2006 subperiod appears remarkably calm (volatility less than 11%), while 20082010 is, as expected, more turbulent, with a volatility higher than 27%. All three time-periods exhibit non-Gaussian features (i.e. non-zero skewness and excess kurtosis); as expected for stock index returns, the skewness coefficient takes negative values, while the excess kurtosis is positive, for all three time periods considered. Table 8.7 presents the VaR estimation results. For the entire sample period and the two subperiods mentioned above, VaR is estimated using three alternative distributional assumptions – namely, F is, in turn, the Gaussian, Student’s t, or Johnson SU cdf – and two critical significance levels: 1% and 5%. The degrees of freedom parameters of the Student t distribution were determined using maximum likelihood for each of the samples, while the four parameters of the Johnson SU distribution were fitted using Tuenter’s (2001) algorithm. Furthermore, 90% confidence intervals were also computed for each of these VaR estimates; the lower and upper bounds of the confidence intervals are given in Table 8.7, for both the asymptotic approach and the order statistics approach. While for the Normal and Johnson SU VaRs, the asymptotic and order statistics approaches produce 10 We

used 252 risk days per year to annualize the standard deviation into volatility.

page 194

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Risk Measures Calculations

195

Table 8.7: VaR estimates and 90% confidence interval bounds for FTSE 100 daily returns. Results are presented for the entire sample 2000-2011 and also for the subperiods 2004-2006 and 2008-2010. Sample period VaR alpha VaR estimate asy lower bound asy upper bound lower bound OS upper bound OS VaR estimate asy lower bound asy upper bound lower bound OS upper bound OS VaR estimate asy lower bound asy upper bound lower bound OS upper bound OS

2000-2011 2004-2006 1% 5% 1% 5% Normal 0.0308 0.0218 0.0152 0.0106 0.0293 0.0210 0.0137 0.0098 0.0323 0.0226 0.0167 0.0115 0.0295 0.0210 0.0140 0.0099 0.0325 0.0227 0.0172 0.0116 Student’s t 0.0396 0.0206 0.0170 0.0103 0.0342 0.0193 0.0143 0.0093 0.0451 0.0219 0.0197 0.0114 0.0322 0.0183 0.0149 0.0094 0.0393 0.0206 0.0204 0.0115 Johnson SU 0.0364 0.0207 0.0180 0.0109 0.0332 0.0196 0.0153 0.0098 0.0396 0.0219 0.0206 0.0120 0.0336 0.0197 0.0160 0.0099 0.0401 0.0220 0.0217 0.0123

2008-2010 1% 5% 0.0404 0.0365 0.0442 0.0374 0.0455

0.0286 0.0264 0.0308 0.0267 0.0311

0.0499 0.0372 0.0626 0.0394 0.0608

0.0259 0.0227 0.0291 0.0225 0.0284

0.0471 0.0389 0.0552 0.0412 0.0591

0.0270 0.0241 0.0300 0.0246 0.0306

confidence intervals of comparable width, the order statistics approach appears more accurate when used together with the Student t distribution for VaR estimation. 8.6.5

VaR at different significance levels

Financial regulators usually require that banks measure their VaR for specific levels of significance: for example, Basel II regulations stipulate using 1% as the level of significance for VaR calculations that subsequently form the basis for market risk capital requirements. However, for internal risk management purposes, banks may need to compute VaR for levels of confidence which are different from those imposed by the regulators. Hence, they may be interested in evaluating the precision of VaR measures at different levels of significance, simultaneously. Established backtesting methodologies based on coverage tests only consider the VaR estimates for one confidence level at a time. Thus, these backtesting approaches can deem one VaR method appropriate for one confidence level, but inappropriate for another confidence level. This section describes ways of assessing

page 195

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

196

Carte˙main˙WS

Model Risk in Financial Markets

VaR estimates at different confidence levels jointly, via confidence intervals constructed based on either asymptotic or exact results for the distribution of quantiles. Asymptotic results are also available for the joint distribution of two or more quantiles; consequently, based on these multivariate distributions, we show below how multi-dimensional confidence domains can be constructed for VaR measures computed for different confidence levels. Following [Stuart and Ord (1987)] the asymptotic joint distribution for V aRα1 ,F and V aRα2 ,F , with α1 < α2 , is given by the bivariate Gaussian distribution N2 (μ, Σ), where:   −F −1 (α1 ) μ= (8.62) −F −1 (α2 ) ⎞ ⎛ Σ=⎝

α1 (1−α1 ) α1 (1−α2 ) −1 (α1 ))2 −1 (α1 ))f (F −1 (α2 )) nf (F nf (F α2 (1−α2 ) α1 (1−α2 ) −1 (α1 ))f (F −1 (α2 )) −1 (α2 ))2 nf (F nf (F



(8.63)

We note that the covariance between the two VaR measures is non-zero; thus, even if the assumed sample is i.i.d., the resulting quantiles are not i.i.d., which further motivates the need for analysing the precision of quantiles jointly rather than separately. 8.6.6

Exact confidence intervals

Established results from the order statistics literature were used above to compute confidence intervals for VaR measures, in a univariate framework. Similar results can also be derived for the joint distribution of two or more VaR measures. Starting from a random sample, X1 , X2 , ..., Xn , and for r < s and x < y, following [David and Mishriky (1968)] we can write that: FX(r) ,X(s) (x, y) = P (X(r) ≤ x, X(s) ≤ y)

at least r out of n observations are not greater than x =P and at least s out of n observations are not greater than y ⎛

⎞ i out of n observations are not greater than x P ⎝ and j out of n observations are not greater than y, ⎠ = i=r j=max(0,s−i) but greater than x n 

=

n 

n−i 

n−i 

i=r j=max(0,s−i)

n! F (x)i [F (y) − F (x)]j [1 − F (y)]n−i−j i!j!(n − i − j)! (8.64)

page 196

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Risk Measures Calculations

8.6.6.1

197

Empirical implementation

We are interested in constructing bi-dimensional confidence regions for VaR computed for two significance levels α1 and α2 . For example, in order to construct a 90% confidence region for the 1% and 5% VaRs, the 5-th and 95-th quantiles of the joint distribution of 1% and 5% VaRs are of interest: • in the asymptotic case, this joint distribution is the bivariate Gaussian, with mean vector and covariance matrix as given in (8.62) and (8.63); • for the exact case, the joint cumulative distribution function of the two quantiles is as given in formula (8.64). In either case, arriving at the confidence region is not a trivial exercise. In the asymptotic case we are interested in the pairs (xL , yL ) and (xU , yU ) such that: N2 (xL , yL ) = 0.05,

and

N2 (xU , yU ) = 0.95

where N2 now stands for the bivariate Gaussian distribution with mean μ and variance-covariance matrix Σ. This step would imply inverting a bivariate Gaussian distribution, which is computationally cumbersome11 and hence not developed further here. In the exact case, taking r = [nα1 ], s = [nα2 ], where α1 = 1% and α2 = 5% in our example, we would first need to find the pairs (pxL , pyL ) and (pxU , pyU ) such that: FX(r) ,X(s) (xL , yL ) = 0.05

and

FX(r) ,X(s) (xU , yU ) = 0.95

where xL = F −1 (pxL ); yL = F −1 (pyL ); xU = F −1 (pxU ); yU = F −1 (pyU ). Again, the step of finding the pairs, (pxL pyL ) and (pxU pyU ), is non-trivial. Like in the univariate case, these probabilities can be recovered via a numerical technique, but this time we would operate in a bivariate rather than univariate framework and the solutions may not be unique. Assuming that we can solve this numerical step and find the two pairs of probabilities, (pxL , pyL ) and (pxU , pyU ), the limits of the confidence region, will be given by the pairs (xL , yL ) and (xU , yU ). 8.6.7

Extreme losses estimation and uncertainty

One major criticism of VaR is that it does not reflect the size of extreme losses that a portfolio may experience. The expected tail loss and the expected shortfall are risk measures that are used in practice precisely to 11 For

example, the (x, y) pairs are not necessarily unique.

page 197

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

198

Carte˙main˙WS

Model Risk in Financial Markets

circumvent this drawback. While there are differences between the two from a theoretical point of view, the two measures coincide when the cdf F is a continuous function. Hence, when parametric models are used and X is a continuous random variable with cdf F then V aRF,α = −q α = −qα

(8.65)

and ET Lα (X) = ET Lα (X) = ESα (X) = E[X|X ≤ −V aRα (X)]

(8.66)

One source of uncertainty in estimating the risk measures of extreme losses is due to parameter estimation for F . Thus, if ϑ is the vector of parameters describing F , that is F ≡ F (·; ϑ) then one has to work with  Clearly, changing the method of parameter estimation may F = F (·; ϑ). result in different values for ϑ which will lead to different values for V aRF,α and this in turn may give different estimates for the ES. Despite its desirable theoretical properties, the ES as a measure of risk has been less covered in the literature. Regarding its estimation, ordered statistics can be used to construct a point estimator of the ES from a historical data sample. [Acerbi and Tasche (2002)] proved that: [nα]

ESα = lim − n→∞

1  X(i) . [nα] i=1

(8.67)

[nα] 1 Therefore, a direct estimator of the ES is given by − [nα] i=1 X(i) . Notice that this estimator is model free, which is a great advantage. Furthermore, [Inui and Kijima (2005)] showed that, if k = [nα] and m = nα − k, then:  −X (k) , if nα is a positive integer; / (8.68) ES α = −(1 − m)X (k) − mX (k+1) , otherwise. X

+...X

where X (k) = (1) k (k) . [Dowd (2010)] presents an easily implementable approach for quantifying the uncertainty in ES estimators. He suggests a modification of the univariate order statistics approach (detailed in Section 5) for the case of the expected shortfall risk measure, and obtains the upper and lower bounds of confidence intervals for ESα by simply averaging the respective upper and lower bounds of the corresponding V aRa estimates, where 0 < a ≤ α. Nevertheless, it is difficult to derive the exact distribution of the ES suggested in (8.67) and in (8.68) and subsequently a confidence interval. Given that these two estimators are linear combinations of ordered statistics, and

page 198

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

199

that the joint distribution of these ordered statistics is known, one may envisage a way of calculating a confidence internal with simulation methods. However, due to the large values of the multinomial combinatorial coefficient, this is known to be a difficult problem computationally, but we hope to report on some results in this area in the near future. 8.6.8

Backtesting expected shortfall

Backtesting the expected shortfall is not straightforward. One intuitive way is to work only with the subsample consisting of those periods when VaR is breached and calculate the ratio ESRt =

Xt ESα (t)

Then, since we know that E[Xt |Xt < −V aRα (t)] =1 ESα (t) we can say that the average ESR will be equal to one when the model forecasts the ES exactly as observed ex post on the financial market. As an example we shall calculate here the ES for EWMA and HS methods and perform empirical backtesting12 . To this end we shall continue with the numerical example described in Sec. 8.4.3 referring to VaR calculations for FTSE100 time series. In Fig. 8.4 I illustrate the ES return threshold calculation using the EWMA and the Historical Simulation approaches. Clearly the latter is too conservative and moreover, since there is no observed return lower than the return threshold for this method, the measure ESR cannot be calculated. For the EWMA method, in this particular case ESR = 1.3372, which is slightly higher than 1. Ideally, in an out of sample of 368 values, at 1%. 8.7

Notes and Summary

The selection of a particular risk measure for day to day risk management can have profound implications on the profit and loss profile of a bank. An excellent review of the methods used generally for calculating VaR, including those recommended by the Basel Committee, is presented 12 Please note that as in the case for VaR, simply comparing ESR with one is not based on a statistical test.

page 199

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

200

Carte˙main˙WS

Model Risk in Financial Markets

0.04

FTSE returns EWMA HS

0.03

0.02

0.01

0

−0.01

−0.02

−0.03

−0.04 0

50

100

150

200

250

300

350

400

Fig. 8.4: Daily FTSE100 returns and 1% Expected Shortfall calculations for the period October 2012 to November 2013 using the adjusted closed price series.

in [Gourieroux and Jasiak (2010)]. The introduction of Value-at-Risk is associated with the nascency of the quantitative risk management. The criticism of this measure, mainly on theoretical grounds, has generated a lot of research in this area and other more sophisticated measures of risk such as the expected shortfall have been introduced. The literature and empirical evidence is divided between the two measures of risk, VaR and ES. The former is much easier to define, understand and apply, while the latter has clear theoretical superiority. I am not going to go against either of these two measures. Instead, I am going to advocate using both, and working with them as a pair. There is model risk inherently built into the calculation of any risk measure. This has sometimes been called the Risk-squared or the risk of risk. Models used for risk management purposes should also be validated and periodically backtested and stress-tested. [Embrechts et al. (2013)] derived some analytical bounds that can be used for VaR calculations of portfo-

page 200

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Risk Measures Calculations

201

lios as a function of different dependence scenarios on the factors of the portfolio. [Embrechts et al. (2015)] propose a new notion of qualitative robustness for risk measures, from the point of view of the sensitivity of a risk measure to the uncertainty of dependence in risk aggregation. They show that ES is more robust than VaR based on the new concept of robustness. Furthermore they found that, for a portfolio of a large number of risks, VaR is likely to carry a larger uncertainty spread than ES, motivating the use of ES in risk aggregation. [Embrechts et al. (2014)] provide another comparative discussion about ES versus VaR, in the light of the new Basel accord to be implemented in the banking system, and they conclude that for the finite mean case ES is a better risk measure. From the same paper I like very much the following excerpt regarding ES and VaR measures: Both however remain statistical quantities, the estimation of which is marred by model risk and data scarcity. The frequency rather than severity thinking is also very much embedded in other fields of finance; think for instance of the calibrations used by rating agencies for securitization products (recall the (in)famous CDO senior equity tranches) or companies (transition and default probabilities). Backtesting models to data remains a crucial aspect throughout finance; elicitability and prequentist forecasting add new aspects to this discussion.

While there is empirical evidence that VaR can be biased there is not much work on calculating the actual bias. [Bao and Ullah (2004)] calculate the analytical second-order bias of a VaR estimator when the conditional volatility is modelled with an ARCH(1) model and the parameters are estimated by quasi-maximum likelihood estimation. The issue of bias is very interesting because it brings into focus another important facet of model risk related to market risk. [Fabozzi and Tunaru (2006)] pointed out that even if a risk measure satisfies all properties defining coherence, it is possible that the estimates of the risk measure fail some of these conditions. For example, even if VaR calculated under some model is subadditive13 it is still possible that the estimated VaR values invalidate the subadditivity inequality condition. In contrast, it is easy to see that even if the risk measure is violating the subadditivity condition theoretically, it is still possible that, due to the in sample estimation error, the estimated risk values do actually obey the subadditivity condition. [Campbell (2007)] proposed backtests based on pre-specified loss functions. In addition, he suggested also measuring several quantiles of the 13 Recall that VaR is a subadditive measure for elliptically distributed portfolios, see [McNeil et al. (2005)].

page 201

April 28, 2015

12:28

202

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

P&L distribution and he showed that tests that examine several quantiles are very useful in identifying inaccurate VaR models in the presence of systematic under reporting of risk. The precision of risk measures is influenced by their distribution and it is known from mathematical statistics that for large samples this distribution is Gaussian. However, [Dowd and Cotter (2007)] presented results revealing that when the horizon is 1-day practitioners will often not have samples long enough to apply asymptotic normality. They also showed that the loss distribution can have an impact on the reliability of risk measures and that excess kurtosis rather than skewness may reduce the precision of VaR, the ES and spectral risk measures. [Escanciano and Olmo (2010, 2011)] pointed out that the standard unconditional and independence backtesting tests for parametric VaR may give misleading conclusions because the cut-off point determining the validity of the risk management model is wrong. They showed how to identify the appropriate cut-off point by correcting the variance in the relevant test statistics. Furthermore, via simulation they provided some examples where there are significant distortions for the Kupiec uncorrected test. [Boucher et al. (2014)] analysed the risk of risk estimation using a controlled experiment whereby the data generating process was known. They discovered that the model bias can vary significantly across models and it can have the same order of magnitude as the estimated risk measure (VaR). This important finding invalidates a commonly used approach that estimates a model with a selected probability in order to improve the VaR estimate at a less extreme critical point. Furthermore, they provide evidence that the regulatory bodies’ call for using more extreme risk measures such as VaR at the 0.5% or 0.1% critical levels, may be incorrect. Model risk seems to appear more prevalent at the worst times, during crises. This point has been made by [Brooks and Persand (2002)] before the subprime crisis and more recently by [Danielsson et al. (2014)], after the subprime crisis. Their conclusions point out an interesting question. Is there any VaR model that performs well during normal times and also during market crashes? Or is it the case that risk management requires a tool for normal times and a tool to detect potential crashes? Backtesting is not only useful, it is absolutely necessary for risk management models. [Kerkhof and Melenberg (2004)] proposed a framework for backtesting VaR and ES models using the functional delta method. Although their general framework is based on asymptotic results, they prove that it works also for realistic finite sample sizes. They argue that ES

page 202

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Model Risk in Risk Measures Calculations

Carte˙main˙WS

203

may not be harder to backtest than VaR, the power of the test for ES being significantly higher than the corresponding one for VaR. [Angelidis and Degiannakis (2007)] proposed a two-stage procedure to backtest VaR models forecast and also losses beyond VaR. Their methodology aims to empower a risk manager with a tool that will select from a myriad of competing models models that are precise for both VaR and Expected Shortfall calculations. The Basel Committee, see [Basel (2013)], aims to replace VaR with ES as a risk measure for regulatory capital adequacy calculations in the trading book. Backtesting expected shortfall suffered a set-back when [Gneiting (2011)] showed that ES is not an elicitable measure. This has led Paul Embrechts to withdraw his support for ES on the basis that it cannot be backtested. However, [Acerbi and Szekely (2014)] found a methodology to allow for ES backtesting. This shows that ES is still a very good measure to use in risk management and the fact that it is not elicitable simply means that a backtesting method for ES cannot be found using a scoring function, not that it is never possible to backtest ES. Furthermore, [Emmer et al. (2014)] proved that ES is conditionally elicitable for models based on continuous distributions. In addition, they provide an approximative formula for ES formed of quantiles at various confidence levels that can be used for backtesting models employed for ES calculations. Here are some useful results vis-a-vis the risk of risk. (1) VaR may not always be subadditive. (2) The expected shortfall is always subadditive. (3) More data is needed to correctly estimate the ES than you need for VaR. (4) When the asset returns are not i.i.d. it is incorrect to determine VaR at longer horizons by scaling the 1-day forecast volatilities by the square root multiplication rule. (5) For a GARCH(1,1) model temporal aggregation leads to a gradual phasing down of the volatility fluctuations, whereas square root scaling increases volatility fluctuations. (6) There are several backtesting tests, unconditional, conditional, independence, duration and so on. All these tests are useful in practice to validate models for risk management. (7) The uncertainty associated with the estimation of risk measures can be gauged by calculating confidence intervals for these measures. (8) Order statistics can provide a relevant framework to understand model risk associated with risk measure estimates.

page 203

April 28, 2015

12:28

204

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

(9) It is possible to backtest ES but more research is needed to identify a general framework.

page 204

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 9

Parameter Estimation Risk

9.1

Introduction

Estimating parameters is fundamental to modelling in financial markets. This activity is performed daily for pricing models, risk management models, stress-testing models and so on. Parameter estimation risk alone could be the subject of a book itself. The purpose of this chapter is to raise awareness first and foremost. It is not to describe all possible situations that one may encounter in practice. By far the largest class of models used in financial markets is represented by diffusion processes and here I also include jump-diffusions. Hence, a significant part of this chapter is dedicated to problems arising from models in this class. However, I would also like to point out some more generic problems related to the principle of maximum likelihood estimation and to the bootstrapping technique as a general panacea for extracting inference. Overall this chapter is a plea against maximum likelihood. While this technique is relatively easy to implement, it carries many hidden dangers. In my opinion, maximum likelihood is representative for the 1970s and 1980s, before the great advancement of computer science. Since MCMC techniques encapsulate maximum likelihood but also offer so many other advantages, I believe that the way forward is using MCMC techniques more widely for analytics in the financial markets as these are clearly the most advanced inferential mechanisms currently available.

205

page 205

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

206

9.2 9.2.1

Carte˙main˙WS

Model Risk in Financial Markets

Problems with Estimating Diffusions A brief review

Estimating parameters of diffusion models, and other models in discrete or continuous time, based on discretely observed financial market time series data, is not a straightforward task. In Sec. 5.1.1 we have described briefly parameter estimation risk, taking estimation of diffusion parameters as the main example. There is a large body of research on general inference for diffusion processes going back to [Kutoyants (1984)] and the references therein. The majority of estimation methods proposed initially were coagulated around the maximum-likelihood principle but other improved techniques were published in the last three decades. Notable contributions in this direction include [Dacunha-Castelle and Florens-Zmirou (1986); Dohnal (1987); Florens-Zmirou (1993); Lo (1988); Ait-Sahalia (2002)]. [Pedersen (1995)] developed an approximate maximum likelihood method. [Bibby and Sorensen (1995)] applied estimation functions and developed martingale estimators applied to diffusions. Bayesian analysis was described early on by [Elerian et al. (2001b)] and [Roberts and Stramer (2001)]. Parametric methods for fixed and not necessarily small Δ have been developed by [Hansen et al. (1998)] who characterised scalar diffusions via their spectral properties. Another important seminal paper looking at the efficiency of spectral methods in the same context is [Kessler and Sorensen (1999)]. Nonparametric methods have also been proposed for estimating diffusions. One estimator based on discrete sampling observations is described in [Jiang and Knight (1997)] for general Itˆ o diffusion processes. They showed that under certain regularity conditions this estimator is pointwise consistent and follows asymptotically a Gaussian mixture distribution. [Gobet et al. (2004)] investigated the nonparametric estimation of diffusions based on discrete data when the sampling tenor is fixed and cannot be made arbitrarily smaller. They discovered that the problem of estimating both the diffusion coefficient (the volatility) and the drift in a nonparametric setting is ill-posed. It is not feasible to review here all the contributions to estimation of diffusions. Some excellent reviews are provided by [Bibby and Sørensen (1996)] and [Sørensen (2002)]. Thus, in this chapter we shall consider only parametric models and discuss some of the pitfalls in estimating the parameters underlying the models.

page 206

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Parameter Estimation Risk

Carte˙main˙WS

207

Adding jumps to a diffusion helps in producing more plausible marginal distributions that can be tuned to the empirical characteristics of financial data. However, as there is no free lunch in finance, additional problems are hiding behind jump-diffusion modelling, as will be shown in Sec. 9.3. The majority of the various methodologies gravitate around the maximum likelihood principle. Because of this predilection towards using variations or approximations of this principle, in Sec. 9.4 we highlight how this widely used principle can lead us on the wrong path of inference. When maximum likelihood estimation does not work because of theoretical implications, bootstrapping is another method of choice, more computational in nature. Perhaps the popularity of the latter stems out of its black-box character, little being known about theoretical properties of this estimator under various set-ups usually encountered in quantitative finance. In Sec. 9.5 we highlight that there could be problems even with bootstrapping. For the sake of clarity we focus here on one-dimensional diffusion processes, but the interested reader can find similar results for multidimensional diffusions which are left out here merely due to notational burden. A very general family of diffusion models is represented by the stochastic differential equation dXt = κ(θ − Xt )dt + σ(Xt , ψ)dWt

(9.1)

where the vector of parameters ϑ = (κ, θ, ψ) ∈ R×R×R. Many well known models can be recovered as particular cases of this general model, adjusting also the domain of the parameters, which is quite important. For example the geometric Brownian motion model corresponds to θ = 0 and σ(Xt , ψ) = ψXt . The mean-reverting Vasicek model can be obtained from σ(Xt , ψ) = ψ given that κ > 0. The CIR model is an extension of the √ Vasicek model on the diffusion part σ(Xt , ψ) = ψ Xt . Over the years it has been discovered empirically that the estimators of the drift parameters, κ– the speed reversion, and θ– the long run mean, may be biased and they may change behavior close to the boundaries of the parameter domain. Such evidence has been presented for example in [Ball and Torous (1996)] and [Yu and Phillips (2001)]. Even for a simple model such as the Black-Scholes model1 , when considering the estimation of the drift and diffusion functions as constant functions, [Merton (1980)] found out that it is difficult to precisely estimate the drift parameter. The bias of the estimators of the drift parameters is not only of academic interest. [Phillips and Yu (2005)] and [Tang and Chen (2009)] found that the 1 Notice

that θ = 0 in this case by model specification.

page 207

April 28, 2015

12:28

208

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

maximum likelihood estimator for κ can exhibit relative bias of more than 200%, despite the processes being observed monthly for more than 10 years. Therefore, the mis-estimation of this parameter can lead to severely biased option prices. [Wang et al. (2011)] confirm the existence of these problems for one-dimensional diffusion models but also for multivariate diffusions. 9.2.2

Parameter estimation for the Vasicek model

The so called Vasicek process proposed in [Vasicek (1977)] for interest rate modelling is described by the following SDE: dXt = κ(θ − Xt )dt + σdWt .

(9.2)

This is the most known example of a mean-reverting continuous-time specified diffusion model. This model has been intensively researched for pricing contingent claims in interest rate markets and commodity markets and it has been for many years one of the main competing models against the geometric Brownian motion model. The debate regarding mean-reversion models versus random walk models has a long history. The mean-reversion models are generalizations of the Ornstein-Uhlenbeck model proposed first by [Uhlenbeck and Ornstein (1930)]. The OU process was the next step forward away from Brownian motion. [Stroock (1993)] remarked that the authors “introduced this process in an attempt to reconcile some of the more disturbing properties of Wiener paths with physical reality”. For s < t, under the Vasicek model, it is easy to derive the conditional distribution of Xt |Xs

 2 −κ(t−s) −κ(t−s) σ −2κ(t−s) Xt |Xs ∼ N Xs e + θ(1 − e ), (1 − e ) (9.3) 2κ 2

The process has a stationary asymptotic distribution given by N (θ, σ2κ ). From the above conditional distribution one can calculate the conditional mean and conditional variance (9.4) E(Xt |Xs ) = Xs e−κ(t−s) + θ(1 − e−κ(t−s) ) 2 σ (1 − e−2κ(t−s) ) var(Xt |Xs ) = (9.5) 2κ Suppose that a sample of data X0 , X1Δ , . . . , XnΔ is available at equal frequency of tenor-Δ, splitting the time period [0, T ] into equal partitions such that T = nΔ. Since the Vasicek process is a diffusion process it is Markovian. Hence, it seems natural that maximum likelihood estimation is the method of choice

page 208

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Parameter Estimation Risk

209

for parameter estimation since its transitional density is known. Since ϑ = (κ, θ, σ) is the vector of parameters of interest, the likelihood function under the Vasicek process is √

2κ (X0 − θ) σ

L(ϑ) = ϕ

 ×

n ' i=1

 ϕ

[XiΔ − E(XiΔ |X(i−1)Δ )]  var(XiΔ |X(i−1)Δ )

 (9.6)

where ϕ(·) is the probability density function of the standard Gaussian distribution. Under this set-up [Tang and Chen (2009)] derived closed form solutions to the maximum likelihood estimators (MLE). These are  = β2 , σ 2 = κ  = −Δ log(β1 ), α

2 κβ3 (1 − β2 )

(9.7)

1

where

n β1 = β2 =

n

n n XiΔ X(i−1)Δ − i=1 XiΔ i=1 X(i−1)Δ 2  n n 2 n i=1 X(i−1)Δ − i=1 X(i−1)Δ

i=1

n

− β1 X(i−1)Δ ) n(1 − β1 )

i=1 (XiΔ

2 1  β3 = XiΔ − β1 X(i−1)Δ − β2 (1 − β1 ) n i=1

(9.8)

(9.9)

n

(9.10)

Taking advantage of knowing the transitional densities and also the moments of the Vasicek process, [Tang and Chen (2009)] showed that E(β1 ) = β1 −

3κΔ 7 1 4 Δ + + 2 + o( + ) n n n κΔ Δn2 n

(9.11)

Therefore, the bias of the MLE of β1 seems to be driven by two components, the size of Δ and the sample size n. One should note that in order to apply time-series results for the AR(1) model, which is commonly used for parameter estimation of the Vasicek model, Δ should not be neglectable. There is a subtle and important difference between the cases when Δ is small but fixed and when Δ → 0. The following results reported in [Tang and Chen (2009)] highlight the differences in interpretation.

page 209

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

210

9.2.2.1

Carte˙main˙WS

Model Risk in Financial Markets

Properties of MLE when Δ is fixed

We shall use the same notation as in [Tang and Chen (2009)] and take 5 + 2eκΔ + e2κΔ 2 

σ 2 −1  e−2κΔ 2κΔ −κΔ 2 − κΔ − 0.5e (1 − e ) − 4Δ B2 (ϑ, Δ) = − κ Δ 1 − e−2κΔ B1 (ϑ, Δ) =

e2κΔ − 1 Δ σ 2 Δ eκΔ + 1 V2 (ϑ, Δ) = 2κ eκΔ − 1

 σ4 2κΔe−2κΔ 2 κΔ −κΔ + (e − e ) 1 − V3 (ϑ, Δ) = 2(κΔ) (κΔ)2 1 − e−2κΔ

V1 (ϑ, Δ) =

The following result proved in [Tang and Chen (2009)] allows the investigation of bias and consistency. Theorem 9.1. For the Vasicek model B1 (ϑ, Δ) + O(1/n2 ) nΔ V1 (ϑ, Δ) + O(1/n2 ) nΔ  = V2 (ϑ, Δ) + O(1/n2 ) θ + O(1/n2 ), var(θ) nΔ B2 (ϑ, Δ) 2 2 + O(1/n ) σ + n V3 (ϑ, Δ) + O(1/n2 ) n

E( κ) = κ + var( κ) =  = E(θ) 02 ) = E(σ var( σ2 ) =

(9.12) (9.13) (9.14) (9.15) (9.16)

The above theorem implies that the MLE estimators for the parameters of the Vasicek model have both their biases and variances at the order of 1/n. Nevertheless, T = nΔ implies that the bias and variance of κ  and the variance of θ are effectively at the order of 1/T . This is a very important conclusion implying that estimating by maximum likelihood the long-run mean and the speed reversion, it is the length of the observation period that matters and not the frequency! Moreover, it can be also remarked that the bias and variance of σ 2 is O(1/n). The bias of θ converges to zero much faster than the bias of κ  when n → ∞. Another important observation is that, since V1 and V2 are monotonically decreasing in Δ, the variance of the MLE of Vasicek drift parameters increases as Δ gets smaller!

page 210

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Parameter Estimation Risk

9.2.2.2

Carte˙main˙WS

211

Properties of MLE when Δ is neglectable

We assume now that n → ∞ and Δ → 0 in such a way that T = nΔ → ∞ and ∃n0 > 2, T Δ1/n0 → ∞

(9.17)

This assumption is more realistic in financial markets with high frequency data. [Tang and Chen (2009)] proved the following important theorems. Theorem 9.2. For the Vasicek model: 4κ 7 1 1 4 − ] + o( + 2 ) E( κ) = κ + − [ 2 T n κT n T 1 2κ + o( ) var( κ) = T T 2 1  = θ + o( ), var(θ)  = σ + o(1/T ) E(θ) 2 2 T κ T 2σ 4 E( σ 2 ) = σ 2 + O(1/n), var( + o(1/n) σ2 ) = n

(9.18) (9.19) (9.20) (9.21)

This theorem shows that the bias of κ  is mainly given by 4/T . Moreover, 4 the leading order relative bias is κT and this quantity increases when κ decreases. This implies that when the Vasicek process is weakly meanreverting the bias of the MLE estimator increases. In addition, the main order variances of κ  and θ are in both cases of order 1/T , which dominate the order of var( σ 2 ). This technical observation has a very important meaning. The ML estimators for the drift parameters are much more variable than the ML estimator of σ 2 . Furthermore, θ is more variable but at the same time it is almost unbiased. Last but not least, σ 2 has less bias and less variability than estimators of drift parameters, a fact which has been confirmed by empirical research. σ Theorem 9.3. If ϑ is the vector ( κ, θ, 2 ) for the Vasicek model then, as n → ∞, √ √ √ law (ϑ − ϑ)diag( T , T , n) −→ N (0, Ω) 2

where Ω = diag(2κ, σκ2 , 2σ 4 ). This result shows that the MLEs of drift parameters converge asymptotically with a rate of √1T whereas the MLEs of σ 2 converge asymptotically with a rate of √1n . Recall that T = nΔ. Thus, increasing the sample size by observing data more frequently helps only with the estimation of the volatility. So when Δ is neglectable, for the drift parameters κ and θ of the Vasicek process, once again it is the length T of the observation period that matters.

page 211

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

212

9.2.3

Carte˙main˙WS

Model Risk in Financial Markets

Parameter estimation for the CIR model

The CIR process has been proposed by [Cox et al. (1985)] and it is associated with the following SDE:  (9.22) dXt = κ(θ − Xt )dt + σ Xt dWt This CIR process is defined under the constraint that 2κθ > σ 2 . The conditional distribution of XiΔ |X(i−1)Δ is slightly more complicated than 4κ in the Vasicek case. Denoting by c = σ2 (1−e −kΔ ) it is known that the density

of cXiΔ |X(i−1)Δ is non-central χ2m (λ) with degrees of freedom m = 4κθ σ 2 and −κΔ . with non-centrality given by λ = cX(i−1)Δ e Interestingly, the conditional mean for the CIR process is the same as the conditional mean of the Vasicek process in (9.4). At the same time the conditional variance for the CIR process is expected to be different from the conditional variance of the Vasicek process, due to the heteroscedasticity of the diffusion part. Fortunately this can also be calculated in closed form

e−κΔ − e−2κΔ θσ 2 (1 − e−κΔ )2 + X(i−1)Δ σ 2 (9.23) 2κ κ Although the conditional transition density is known, this is given by the non-central χ2 density function which is an infinite series involving central χ2 densities. Therefore, it is not possible to determine an exact formula for the MLE of ϑ = (κ, θ, σ 2 ) under the CIR model. While maximum-likelihood estimation is not feasible in closed-form for the CIR process, it is helpful to work with the pseudo-likelihood estimators suggested by [Nowman (1997)]. The idea is to use an approximation of the continuous time model (9.22) as proposed by [Bergstrom (1984)] var(XiΔ |X(i−1)Δ ) =

dXt = κ(θ − Xt )dt + σ

 XiΔ dWt

(9.24)

for all t ∈ [iΔ, (i + 1)Δ). This approximation process is then discretized as follows: XiΔ = e−κΔ X(i−1)Δ + θ(1 − e−κΔ ) + ui 2

(9.25) −2κΔ

) where E(ui ) = 0, E(ui uj ) = 0, i = j and E(u2i ) = σ (1−e X(i−1)Δ . 2κ In essence, Nowman’s approximation means that the process becomes now piecewise Vasicek, that is over each period between observations, the diffusion coefficient becomes constant. The pseudo-likelihood method assumes that the error ut is Gaussian distributed. Hence the pseudo log-likelihood relative to a sample

page 212

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Parameter Estimation Risk

213

of data X0 , X1Δ , . . . , XnΔ can be calculated in closed form. Letting νi = var(XiΔ |X(i−1)Δ ) direct calculations give  n  1 1 −κΔ −κΔ 2 log(νi ) + log L(ϑ) = − [XiΔ − e X(i−1)Δ − θ(1 − e )] . 2 2νi i=1 (9.26) The pseudo-MLEs for the CIR process have been derived in [Tang and Chen (2009)] as: 2 κβ3 log(β1 )   , θ + β2 , σ 2 = (9.27) κ =− Δ 1 − β2 1

where

n β1 =

i=1

n β2 = β3 = 9.2.3.1

n XiΔ 1 i=1 X(i−1)Δ − n i=1 X(i−1)Δ n 1 2 i=1 XiΔ i=1 X(i−1)Δ − n

XiΔ n

n

XiΔ  i=1 X(i−1)Δ − nβ1 n 1 − β1 ) i=1 X(i−1)Δ

(1 n

i=1 [XiΔ

− X(i−1)Δ β1 − β2 (1 − β1 )]2 nX(i−1)Δ

(9.28)

(9.29)

(9.30)

Properties of MLE when Δ is fixed

The analysis for the CIR process follows a similar outline as for the Vasicek process. However, unfortunately we need a long cumbersome notational list in order to be able to summarize the results in a similar format to the Vasicek case. The notations are provided in full in Appendix 15. The main result here is the following theorem2 . Theorem 9.4. For the CIR model with ϑθ ≥ 2, when n → ∞ B3 (ϑ, Δ) + O(1/n2 ) E( κ) = κ + nΔ V4 (ϑ, Δ) + O(1/n2 ) var( κ) = nΔ  = θ + O(1/n2 ) E(θ)  = V5 (ϑ, Δ) + O(1/n2 ) var(θ) n 2 0 2 E(σ ) = σ + B4 (ϑ, Δ) + O(1/n2 ) 02 ) = V6 (ϑ, Δ) + O(1/n2 ) var(σ n 2 See

[Tang and Chen (2009)] for a proof.

(9.31) (9.32) (9.33) (9.34) (9.35) (9.36)

page 213

April 28, 2015

12:28

214

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

In general the pseudo-ML estimators for ϑ in the case of the CIR process have similar properties to the ML estimators for the Vasicek process. One important difference is that the bias of σ 2 does not converge to zero and 2 hence, the pseudo-ML for σ is not a consistent estimator of σ 2 . The pseudo-ML estimators for the drift parameters are asymptotically unbiased and consistent. [Lo (1988)] argued that any estimation using discretization with fixed and non-neglectable Δ will lead to the type of systemic bias highlighted above and suggested making Δ → 0, i.e. neglectable, in order to achieve consistency for the estimator. 9.2.3.2

Properties of MLE when Δ is neglectable

The asymptotic properties of the pseudo-ML estimators are different under the assumption that Δ → 0. As with the Vasicek process it is assumed here that n → ∞ and Δ → 0 in such a way that T = nΔ → ∞ and ∃n0 > 2, T Δ1/n0 → ∞

(9.37)

The following two theorems are proved in [Tang and Chen (2009)]. Theorem 9.5. For a stationary CIR process such that

κθ σ2

≥1

4 2κ + o(1/T ), var( κ) = + o(1/T ) T T 2  = θ + o(1/T ), varθ = θσ + o(1/T ) E(θ) κ2 T 2 σ κΔ E( σ 2 ) = σ 2 − 2κθ + O(1/n) 2( σ2 − 1)

σ4 1 2 var( σ )= 2− + o(1/n) n ϑθ − 1 E( κ) = κ +

(9.38) (9.39) (9.40) (9.41)

The major order bias of κ  is 1/T as in the Vasicek case. The pseudo-ML estimation of the drift parameters κ and θ exhibits a larger order vari2 becomes ance than the estimation of σ 2 . As expected, the pseudo-ML σ consistent. σ Theorem 9.6. Denoting by ϑ the vector ( κ, θ, 2 ) then as n → ∞ √ √ √ (ϑ − ϑ)diag( T , T , n) → N (0, Ω) ⎞ ⎛ 2κ 2 0 σ2 θ ⎟ ⎜ where Ω = ⎝ 2 κ2 % 0 &⎠. 1 4 0 0 σ 2 − ϑα −1

page 214

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Parameter Estimation Risk

215

As opposed to the Vasicek case the pseudo-ML estimators of the drift parameters are not asymptotically uncorrelated. Furthermore, the asymptotic correlation for κ  and θ seems to depend on σ 2 , which is a surprising finding. 9.3

Problems with Estimation of Jump-Diffusion Models

[Honor´e (1998)] pointed out that it is invalid to employ maximum likelihood estimation for jump-diffusion models. His counterexamples stem from the fact that for standard Gaussian-Poisson jump-diffusion models the probability distribution of log-returns is equivalent to an infinite discrete mixture of Gaussian distributed variables. It is well-known that the mixture of distributions may imply an unbounded likelihood function making it impossible to find an MLE. 9.3.1

The Gaussian-Poisson jump-diffusion model

A general jump-diffusion model is described by the following SDE dSt = αdt + σdWt + dIt St −

(9.42)

where α is the drift parameter, σ is the volatility parameter of the diffusion part, {Wt }t≥0 is a Wiener process and {It }t≥0 is the jump process, with t− the nearest point before t. For reasons that have to do more with convenience rather than anything else, the jump process of choice in academia and the quantitative finance industry has been the Poisson process. Hence dIt = Jt dNt

(9.43)

where {Nt }t≥0 is a Poisson process with a constant arrival rate λ, and {Jt }t≥0 corresponds to the jump amplitude process, where Jt > −1 for any t ≥ 0 in order to ensure non-negative stock prices. The Poisson process is particularly convenient since it is known that P(dNt > 1) = O((dt)2 ). Assuming that all three processes involved– S, N and J – are mutually independent and applying Itˆo formula for semimartingales, the solution of the SDE in (9.42) is given by ⎧ ⎨

⎫ Nt ⎬  1 2 log(1 + Jtj ) St = S0 exp (α − σ )t + σWt + ⎩ ⎭ 2 j=1

(9.44)

page 215

April 28, 2015

12:28

216

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

where tj is the time of the j-th jump. If Nt ≡ 0 then, by convention, the sum for the jump part would be equal to zero. In general, the attention in financial modelling using jump-diffusion processes is on the fact that such models lead to an incomplete market setup and therefore some external mechanism or principle must be invoked in order to select the pricing measure. This problem is created mainly by the jump amplitude process. The other less known problem is the estimation of the jump-diffusion model. The “common” route would be to impose some parametric distributional assumption on the jump amplitude process Jt and then proceed with maximum likelihood estimation for the entire jump-diffusion process. 9.3.2

ML Estimation under the Merton Model

Under Merton’s jump-diffusion model the jump amplitude is lognormal distributed such that log(1 + Jt ) ∼ N (μ, σJ2 ). Consequently the solution in (9.44) can be written in additive format as follows

Nt  1 log(1 + Jtj ) log(St ) = log(S0 ) + α − σ 2 t + σWt + 2 j=1

(9.45)

From an econometric point of view the vector of parameters that need to be estimated is ϑ = (α, σ, λ, μ, σY ). For this purpose we consider that a historical series of data points for our stock price process S is readily observable at discrete time points ti = iΔ, where i = 0, 1, . . . , n, with Δ the sampling frequency and where t ≡ t0 and tn ≡ T . For simplicity, % S let us & t denote the log-return over the discrete time periods by Yi+1 = log Si+1 . ti The probability density function of one such log-return over a Δ period is given by the formula

∞  (λΔ)j e−λΔ 1 2 2 2 (9.46) ϕ Y ; (α − σ )Δ + jμ, σ Δ + jσJ p(Y ; ϑ) = 2 j! j=0 which is the result of a standard argument conditional on the number of jumps occurring. Thus, the density of the log-return under Merton’s model is given by an infinite mixture of Gaussian distributions in (9.46). As pointed out by [Beckers (1981a)] and [Honor´e (1998)] this infinite mixture representation makes estimation by maximum likelihood unfeasible. An alternative approach enabling estimation by maximum likelihood has been developed in finance by [Ball and Torous (1983, 1985)]. Essentially

page 216

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Parameter Estimation Risk

equation (9.45) is discretized into

1 log(Sti ) = log(Sti−1 ) + α − σ 2 Δ + σi + log(1 + Ji bi ) 2

Carte˙main˙WS

217

(9.47)

where i ∼ N (0, Δ), Ji ≡ Jti , bi ∼ Bernoulli(1, λΔ) with λΔ < 1. One should also note that P(N(i+1)Δ − NiΔ > 1) ∼ = 0 is equivalent to 1 + λΔ ∼ = 0. eλΔ which can be true only when λΔ ∼ = The approximation of the Poisson process increment with a Bernoulli variable makes sense in this case and it simplifies the framework. For any period of length Δ the probability density function of log-returns Ri is approximated by   (9.48) p(Y ; ϑ) = λϕ Y ; ξ + μ, σ 2 Δ + σJ2 + (1 − λΔ)ϕ Y ; ξ, σ 2 Δ wher ξ = α − 12 σ 2 Δ. Then the approximated log-likelihood function can be be written-up in the usual way log L(ϑ|Y1 , . . . , Yn ) =

n 

log p(Yi ; ϑ).

(9.49)

i=1 1 The domain of ϑ = (α, σ 2 , λ, μ, σJ2 ) is R × R+ × (0, Δ ) × R × R+ . The MLE of ϑ should be obtained by maximizing (9.49). However, [Kiefer (1978)] proves that a unique optimum solution θM LE for this problem does not exist. This estimation problem is related to the fact that the distribution of a Δ period log-return is a mixture of two Gaussian distributions with unequal and unknown means and variances, and also an unknown mixture parameter. Hence, it is impossible to precisely say to which of the two Gaussian distributions each observation belongs3 . This important model estimation problem has been overlooked in the empirical finance literature on jump-diffusion modelling. Some authors, see [Ball and Torous (1983, 1985)], [Beckers (1981a)] or [Jorion (1988)], obtained negative variance estimates or other estimates situated outside the feasible parameter region. A possible solution to this problem is to restrict the variance parameters σ and σY to belong to a compact interval [s∗ , s∗ ]. Then, the restricted MLE are consistent and asymptotically Gaussian distributed, as proved by [Kiefer (1978)]. Needless to say, one should also explain how to select s∗ and s∗ . 3 That would be possible only when the variances of the two Gaussian distributions are either known or equal.

page 217

April 28, 2015

12:28

218

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

9.3.3

Inexistence of an unbiased estimator

In general, practitioners are happy to use an estimator based on the understanding that it is unbiased. However, such an estimator does not always exist! The next example is related to the Poisson distribution and to the question how do we estimate the parameter of the Poisson distribution. The example is presented in [Wise and Hall (1993)]. Proposition 9.1. An unbiased estimator may not exist. Proof. Consider a random variable X that has a Poisson distribution P ois(λ). This can be interpreted as the number of market crashes or the jumps in an asset price. Let φ(X) be a statistic that is an unbiased estimator for the value 1/λ. Hence ∞  1 λj e−λ = φ(j) E(φ(X)) = j! λ j=0 which can be rewritten as ∞ 

φ(j)

j=0

eλ λj = j! λ

x

However, the function f (x) = ex is not expandable in a Maclaurin series. Therefore, the last equality is impossible. 9.4

A Critique of Maximum Likelihood Estimation

The inferential framework used to extract results from data is often overlooked. We all know the quote from the great statistician George E. Box “All models are wrong, but some are useful.”

He went on clarifying what he meant, “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful?”

Unfortunately in social sciences in general and also in finance in particular, even today the focus is on hypothesis testing as a device to extract valid statements from observed data. Parameter estimation is carried out before hypothesis testing. Maximum likelihood estimation has been established as one of the most “rigorous” estimation methods and in finance this method is widely used.

page 218

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Parameter Estimation Risk

219

So in a nutshell, data is collected and/or observed, a model is formulated, most of the time a parametric model, and then the two are put together and parameters are estimated. Hypothesis testing can be carried out only after parameter estimation and its conclusions depend intrinsically on the reliability of the parameter estimation procedure. People involved with analytics in financial markets take comfort from using the maximum likelihood estimation (MLE) principle. The idea is that the parameters are selected from a feasible domain such that the probability of observing the sample data occurring under the data generating process is maximised. However, the following result presented in [Wise and Hall (1993)] points out that MLE may not be that problem free. Proposition 9.2. A maximum likelihood estimator may not exist. Proof. Let {X1 , . . . , Xn } be a sample of mutually independent and identically distributed random variables with the probability density

x−μ 1−ε φ + εφ(x − μ) fθ (x) = σ σ where φ is the probability density of an N (0, 1) Gaussian random variable and ε ∈ (0, 1) is a fixed real number. The parameters are ϑ = (μ, σ 2 ) ∈ R × (0, ∞) = Θ. Then, the likelihood function is

 n ' xi − μ 1−ε φ L(ϑ; x1 , . . . , xn ) = + εφ(xi − μ) σ σ i=1 One can prove the following inequality

n x1 − μ ' 1−ε φ L(ϑ; x1 , . . . , xn ) > [εφ(xi − μ)] σ σ i=2 Taking σ → 0 and fixing μ = x1 , the limit of the lower bound is equal to infinity. Thus, the MLE of ϑ does not exist because the supremum of the likelihood function over all ϑ ∈ Θ is not finite. The following results related to maximum likelihood estimation are discussed in [Wise and Hall (1993)] and are reproduced here for convenience. Proposition 9.3. A maximum likelihood estimator is not always unique and there could be an infinity of values maximizing the likelihood function. Proposition 9.4. A maximum likelihood estimator may not be consistent.

page 219

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

220

Carte˙main˙WS

Model Risk in Financial Markets

Proposition 9.5. A maximum likelihood estimator may not be unbiased or even admissible. Proposition 9.6. A maximum likelihood estimator of a real valued parameter may be with probability one strictly less than the actual value of the parameter. The following example shows that maximum likelihood is not always a reliable estimation method and that the final hypothesis testing conclusions may change dramatically depending on the model assumed. The example is based on the experiment suggested by [Lindley and Philips (1976)] regarding 12 independent tosses of a coin that produce 9 heads and 3 tails. One can think of the experiment as being equivalent to taking a decision on a loan application, with a head being acceptance and a tail being rejection. Suppose that the true probability of getting a head in one toss of the coin is θ, and that the null hypothesis of interest here is H0 : θ = 0.5 versus the alternative Ha : θ > 0.5. Let us consider the inferential mechanism under two different models. MLE under the binomial model If X is the random variable denoting the number of heads in n tosses, then in this case n = 12 is given. So X ∼ Binomial(n = 12, θ). The likelihood function under the binomial model is



n x 12 9 θ (1 − θ)n−x = θ (1 − θ)3 (9.50) L1 (θ) = x 9 MLE under the negative binomial model Now we consider that the experiment was carried out tossing the coin until the third tail toss occurred. In this case, X is the number of heads required to complete the experiment. In this example X = 9 for a target of r = 3 tails. The distribution that is associated with X is the negative binomial distribution X ∼ N egBinomial(r = 3; θ) which gives the likelihood function



r+x−1 x 11 9 θ (1 − θ)r = θ (1 − θ)3 (9.51) L2 (θ) = x 9 One way we can compare the two models is to calculate the p-values corresponding to the rejection region of the null hypothesis.

page 220

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Parameter Estimation Risk

221

Under the binomial likelihood one obtains 12  12 j p1 = Pr(X ≥ 9; θ = 0.5) = θ (1 − θ)12−j = 0.075. j j=9 Under the negative binomial likelihood, however, ∞

 2+j j θ (1 − θ)j = 0.0325 p2 = Pr(X ≥ 9; θ = 0.5) = j j=9 Hence, at the 95% level of confidence, that is at the 5% critical level, under the first model we fail to reject the null hypothesis while under the second model we reject the null hypothesis. This example also epitomises model identification risk. Furthermore, the example is even more surprising since both models have exactly the same likelihood kernels. The likelihood for the binomial and negative binomial models differ only by a constant that does not depend on parameter θ. As emphasized by [Carlin and Louis (1996)], this example is in stark contradiction with the Likelihood Principle which requires that upon observation of the value x of random variable X, the likelihood function L(θ|x) encapsulates all relevant data information about the parameter θ. In the example above L1 (θ) and L2 (θ) differ only by a proportionality constant so they will give the same MLE solution θM LE . However, the null hypothesis is rejected under one model and not the other. A solution to the above problem, also favored by the author of this book, is applying a Bayesian inferential point of view. 9.5

Bootstrapping Can Be Unreliable Too

The bootstrapping technique4 has been advocated as useful in a wide range of situations where it is difficult to construct reliable estimators. Nevertheless, there are many examples in the literature showing that the bootstrap estimator does not consistently estimate the true distribution of a statistic correctly to the first order. [Bickel and Freedman (1981)] provided, as far as I know, the first counterexample to the bootstrapping. They took the U -statistic of degree two in which the kernel κ(x, x) does not satisfy the condition  κ2 (x, x)dF (x) < ∞ 4 See

[Efron (1979)] and [Efron and Tibshirani (1994)] for an introduction to this subject.

page 221

April 28, 2015

12:28

222

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

where F is the true, or population, cumulative distribution function of the data. Another example constructed by the same authors is the largest order statistic from an i.i.d. sample of uniform (0, θ) random variables. Since then, many other counterexamples to bootstrapping have been suggested: degenerate U and V statistics described in [Bretagnolle (1983)], extrema for unbounded distributions (see [Athreya and Fukuchi (1994)] and [Deheuvels et al. (1993)]), and the sample mean for infinite variance random variables as discussed by [Babu (1984)] and [Athreya (1987)]. See also the examples related to the non-differentiable functions of the empirical distribution function ([Beran and Srivastava (1985); D¨ umbgen (1993)]) and the nonparametric kernel estimator of the mode of a smooth unimodal density, when the smoothing parameter for the estimator and the bootstrap is chosen to be optimal for the estimation problem, as described in [Romano (1988)]. The majority of these counterexamples are important in their message but their construction is often cumbersome and somehow unrealistic for quantitative finance. Nevertheless, in the i.i.d. framework, there are three conditions for the bootstrapped distribution of a statistic to be consistent (see [Bickel and Freedman (1981)] for details): (1) weak convergence of the statistic when Xi ∼ G for all distributions G in a neighbourhood of the true distribution F , (2) uniform weak convergence over distributions G in a neighbourhood of the true distribution F , (3) continuity of the mapping from the underlying distribution G to the asymptotic distribution of the statistic. [Andrews (2000)] proposed a counterexample that violates the third condition and that is relevant to applications. The idea is to have a parameter on the boundary of the parameter space. He manages to build an example where the bootstrap of the maximum likelihood estimator of the mean of a sample of i.i.d. Gaussian random variables with mean μ and variance 1 is not asymptotically correct to the first order when the mean is restricted to be nonnegative. Regarding the MLE estimator, for the example set-up just described, this is equal to the maximum of the sample mean and zero. This implies that when the true mean is zero the bootstrap is not asymptotically correct to the first order. More formally, [Andrews (2000)] proves the following interesting example. Proposition 9.7. Let {Xi : i ≥ 1} be a sequence of independent identi-

page 222

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Parameter Estimation Risk

Carte˙main˙WS

223

cally distributed N (μ, 1) random variables, where μ ∈ R+ . The maximum likelihood estimator of μ in this case is μ n = max[X n , 0] where as usual we n let X n = n1 i=1 Xi . Then, if Z ∼ N (0, 1), the following convergence in law is true when n → ∞  √ Z, if μ > 0 law n ( μn − μ) −→ (9.52) max(Z, 0), if μ = 0 Proof. Consider {Xi∗ : i ≤ n} to be i.i.d. with Xi∗ ∼ Fn where n 1  Fn (x) = 1{Xi ≤x} . n i=1 The bootstrap maximum likelihood estimator μ ∗n is defined by ∗

μ ∗n = max(X n , 0). √ Suppose that μ = 0. Let Ac = {lim inf n→∞ n X n < −c} for 0 < c < ∞. Apply the law of the iterated logarithm P(Ac ) = 1. For any ω ∈ Ac consider a subsequence {nk }k≥1 of positive integer numbers such that √ nk X nk (ω) ≤ −c for all k. A triangular array central limit theorem gives √  √ ∗ nk [ μ∗nk − μ nk (ω)] ≤ max nk (X nk − X nk (ω)) − c, 0 → max(Z − c, 0) as k → ∞ conditional on {Fn }n≥1 ≤ max(Z, 0) where the last inequality has a strict with positive probability. This law √  ∗ means that for the subsequence {nk }k≥1 we have nk μ nk − μ nk (ω) −→ max(Z, 0) as k → ∞, conditional on {Fnk }k≥1 . Therefore √ law n ( μ∗n − μ n (ω)) −→ max(Z, 0) as n → ∞, conditional on {Fn }n≥1 and this is true for all ω ∈ Ac . The bootstrapping estimator is not consistent when μ = 0 for sample √ paths ω ∈ Bc = {lim supn→∞ nX n > c} for any 0 < c < ∞ and sample √ sizes {nm }m≥1 for which nm X nm (ω) ≥ c for all m. Similarly to the point above √   ∗ √ ∗ nm − μ nm μ nm (ω) ≤ max nm (X nm − X nm (ω)), −c → max(Z − c, 0) as m → ∞ conditional on {Fn }n≥1 ≤ max(Z, 0) where again the last inequality is with positive probability. Since P(Bc ) = 1 √ for all 0 < c < ∞ the bootstrap is incorrect both when nX n (ω) is negative and when it is positive, for large n.

page 223

April 28, 2015

12:28

224

9.6

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Notes and Summary

Estimating parameters for continuous-time diffusion models proposed in financial markets is far from straightforward. Traditional methods such as maximum likelihood, while still fashionable and widely implemented in many specialized computer packages, come with hidden problems. [Jacod (2010)] gives an excellent review of some of the methods employed in estimating unknown parameters of a diffusion process under various data observation schemes. A recent review of estimating functions methods applied to discretely sampled diffusions is presented in [Bibby et al. (2010)]. Interesting discussions on the subject of inference based on likelihood can be found in [Birnbaum (1962)] and [Berger and Wolpert (1984)]. The p-values should not be used as hard evidence for inference. [Berger and Sellke (1987)] showed that the real evidence against a null hypothesis can differ by an order of magnitude from the calculated p-value, so data that may give a p-value of 0.05, when testing a Gaussian mean, will result in a posterior probability of the null hypotheses of at least 0.30 for a wide range of realistic prior distributions. Hence, p-values can be highly misleading measures of the evidence provided by the data against the null hypothesis. [Basawa et al. (1991)] proved that for a first-order autoregressive process Xt = βXt−1 + εt with i.i.d. errors with mean zero and variance 1, when β = 1 the least square estimator derived by bootstrapping is asymptotically invalid and this is true also when errors are Gaussian. Moreover, they also proved that the conditional limit distribution for the estimator calculated by bootstrapping when β = 1 converges to a random distribution. [Wise and Hall (1993)] is a source of some amazing results in probability that are very much counterintuitive. Here I reiterate some of them that have the most important impact on financial engineering and calibration. Proposition 9.8. There exists a random process {St }t∈R that is nonstationary such that for any Δ ∈ R, Δ > 0 the process {SnΔ }n∈Z is strictly stationary. This result suggests that it is impossible to test whether a continuous-time process is stationary based only on sample paths. This is somehow also echoed by the next result. Proposition 9.9. Estimating an autocorrelation function via a single sample path can be futile.

page 224

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Parameter Estimation Risk

Carte˙main˙WS

225

In this chapter we have highlighted some of these problems but this seems to be only the tip of the iceberg. Here are some useful points. (1) For the Vasicek model the MLE estimators for the drift and diffusion parameters are biased. Asymptotically, when the size of the discretization interval Δ is not neglectable, the bias and variance of the MLE estimators are effectively at the order of O(1/T ) where T is the length of the observation period. Hence, the sample size of data n does not matter in this instance. The variance of the drift parameter MLEs increase as Δ gets smaller, decreasing the accuracy of the estimation, rather than increasing it. (2) For the case when Δ is neglectable, and under a technical condition stated above such that T → ∞ dominates Δ → 0, when the Vasicek process is weakly mean-reverting the bias of the MLE estimator increases. In addition, maximum likelihood estimators of the drift parameters exhibit more variability than the maximum likelihood estimator of the diffusion parameter σ 2 . Furthermore, the latter also has less bias than maximum likelihood estimators of the drift parameters. Hence, it seems much easier to estimate the volatility of the Vasicek process rather than its drift parameters. (3) The CIR process is more difficult to estimate due to the heteroscedasticity of the diffusion part. Thus, standard maximum-likelihood estimation is difficult to conduct and a pseudo-ML method is usually implemented. In the case when Δ is small but not neglectable, one crucial difference between the Vasicek and CIR processes is that, for the latter, the pseudo-ML for σ 2 is not a consistent estimator. This problem disappears when Δ is neglectable. (4) The maximum-likelihood estimator is also invalid to utilise for jumpdiffusion models. The problem is caused by the infinite mixture of Gaussian densities and some identification problems. One practical way to circumvent these problems is to to restrict the variance parameters for the diffusion σ and for the jumps σJ to be in a compact interval [s∗ ; s∗ ]. (5) The maximum likelihood principle can be put under serious doubt as demonstrated in this chapter. The example provided uses two different models that have the same likelihood but lead to opposite inferential conclusions! (6) There are many alternatives to maximum-likelihood estimation as a technique. Bootstrapping has been widely applied in financial econo-

page 225

April 28, 2015

12:28

226

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

metrics. While this procedure can be very useful it is important to know that the bootstrap estimator may not consistently estimate the true distribution of a target statistic.

page 226

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 10

Computational Problems

10.1

Introduction

After deciding which model to use for a problem in finance, the actual calculations may not be straightforward. Borrowing techniques from applied mathematics, the calculations can be performed using various shortcuts such as approximations of probability densities, or transformations such as Edgeworth or Johnson, or Fourier or Laplace. In this chapter I highlight the danger that these “helping” methods can pose to the final results that will be used in the decision making process. This chapter is by no means exhaustive, no chapter in this book is, and there are probably a lot more problems hidden in the computational part of quantitative finance. Since the main scope of this chapter is to draw attention to what may go wrong, I focus on some of the most used techniques or problems in quantitative finance: Monte Carlo simulation, calculation of greeks and calculation of implied volatility. The range of problems that can be classified as computational is very likely to be much wider and not restricted to option pricing. Credit risk for example offers another area where computational problems appear frequently. For example, using on a monthly basis a rating transition matrix that is reflecting annual default and rating transition probabilities requires constructing matrices that are the equivalent to the 12th root for scalar numbers. Computationally it is known that there are many pitfalls that may appear. Portfolio optimisation as well is also an area that is highly susceptible to risk stemming out of the computational algorithms that are employed.

227

page 227

April 28, 2015

12:28

228

10.2

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Problems with Monte Carlo Variance Reduction Techniques

Monte Carlo (MC) simulation methods are either the first or the last resort solution for computational problems in finance. They are the first when a quick numerical solution should be produced. They are the last when everything else, from an analytical perspective, fails. The main reason for these features is that Monte Carlo techniques are relatively easy to understand and implement in practice. The downside is that the models underpinning Monte Carlo simulation become opaque, being called black boxes. The cost to pay for this facile applicability is the accuracy of results. Simulate another batch of scenarios or trials and the option price or delta parameter may change quite a bit. Hence, researchers have designed ways to improve the accuracy, and also the computational speed1 . In essence Monte Carlo methods help calculating integrals of the type  b ψ(u)du a

where a, b can be also −∞ or ∞. This calculation can be recast as the estimation of the mean of a random variable under a given probability b density. Then, if E(X) = a ψ(u)du any random sample X1 , . . . , XN from the probability distribution of X allows approximating the target integral with the estimator N 1  (N ) X = Xi . N i=1 2

If var(Xi ) = σ 2 , ∀i the variance of this estimator is equal to σN . Thus, accuracy can be increased simply by increasing the sample size of MC simulations. Another way that is used by professionals to improve accuracy is antithetic sampling. Consider two samples of the same size X1 , . . . , XN and Y1 , . . . , YN from the same probability distribution such that Xi and Yi may be dependent so that var(Xi ) = var(Yi ) = σ 2 ,

Cov(Xi , Yi ) = ρσ 2 , ∀i.

Then, by pooling the two samples of size N into a sample of size 2N by i , we get that the variance of the Z-sample estimator is taking Zi = Xi +Y 2 1 With the advance of parallel computing and ever increasing computer memory and processor speed the computational speed is slowly becoming a non-issue.

page 228

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Computational Problems

Carte˙main˙WS

229

given by the formula σ2 (1 + ρ). (10.1) 2N Hence, if the correlation between the two samples is negative then the variance of the pooled sample estimator is reduced and we can even calculate by how much. This is achieved by pairing each simulated random number ε (uniformly drawn) with its antithetic pair 1 − ε. Consider a symmetric butterfly option which is defined as a portfolio of a long European call option with strike K1 , another long one with strike 3 K3 and short two short European call options with strike K2 = K1 +K . If 2 ST is the value of the underlying index at maturity then the payoff of the option is var(Z

(N )

)=

ψ(ST ) = [ST − K1 ]+ − 2[ST − K2 ]+ + [ST − K2 ]+ which in more detailed form looks like this ⎧ if ST ≤ K1 or ST ≥ K3 ; ⎨ 0, ψ(ST ) = ST − K1 , if K1 ≤ ST ≤ K2 ; ⎩ K3 − ST , if K2 ≤ ST ≤ K3 .

(10.2)

This payoff is shown in Fig. 10.1. Suppose now that a trader would like to calculate the value of the butterfly spread but she does not have any information about the distribution of the underlying variable. Therefore, she assumes that the riskneutral probability distribution under an appropriate risk-neutral measure Q is the uniform distribution over the interval [K1 , K3 ], represented as ST ∼ U nif orm[K1 , K3 ]. Thus the trader needs to calculate  K3 1 ψ(ST ) dST (10.3) EQ [ψ(ST )] = K3 − K1 K1 Ignoring the possibility of direct closed-formula calculation, the trader turns to the MC application. If two samples from the target uniform distribution are available X1 , . . . , XN and Y1 ,. . . , YN then the contribution of each pair K (Xi , Yi ) to the MC estimator of K13 ψ(ST ) is 1 [ψ(Xi ) + ψ(Yi )]. 2 However, if antithetic MC simulation is used, by taking Xi ∼ U nif orm[K1 , K3 ]

(10.4)

Yi = 2K2 − Xi ∼ U nif orm[K1 , K3 ]

(10.5)

page 229

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

230

Carte˙main˙WS

Model Risk in Financial Markets

K 3  K1 2

O

K1

K2

K3

Fig. 10.1: A symmetric butterfly payoff constructed from trading two long European call options with exercise prices K1 and K3 and short two Euro3 pean call options with strike price K2 = K1 +K . 2

then the contribution of the pair (Xi , Yi ) to the MC estimator is 1 [ψ(Xi ) + ψ(2K2 − Yi )]. 2 The difference between the variance of the i-th antithetic MC component and the i-th standard MC component is equal to 1 [var(ψ(Xi )) + var(ψ(2K2 − Xi ) + 2Cov(ψ(Xi ), ψ(2K2 − Xi ))] 4 1 − [var(ψ(Xi )) − var(ψ(Yi ))] 4 Since var(ψ(Xi )) = var(ψ(2K2 − Xi )) = var(ψ(Yi ))

(10.6)

page 230

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Computational Problems

231

the difference is then equal to the following 1 diff = Cov[ψ(Xi ), ψ(2K2 − Xi )] 2  1 Q E (ψ(Xi )ψ(2K2 − Xi )) − EQ (ψ(Xi ))EQ (ψ(2K2 − Xi )) = 2 The total difference is calculated simply by summation, given the independence of the sample components. Hence, replacing each Xi with the generic U uniform variate for simplicity, the difference in the two MC variances is given by  1 Q E (ψ(U )ψ(2K2 − U )) − EQ (ψ(U ))EQ (ψ(2K2 − U )) (10.7) diff = 2 If the antithetic MC simulation is going to be an improvement in terms of accuracy then this difference should be negative. However, standard calculations show that Q



K3

1 ψ(U ) dU K − K1 3 K1    K3 K2 1 = (U − K1 )dU + (K3 − U )dU K3 − K1 K 1 K2

E (ψ(U )) =

K3 − K 1 . 4 Moreover, notice that ψ(2K2 − U ) = ψ(U ) so EQ (ψ(2K2 − U )) = Furthermore, =

K3 −K1 . 4

EQ (ψ(U )ψ(2K2 − U )) = EQ (ψ 2 (U ))    K3 K2 1 2 2 (U − K1 ) dU + (K3 − U ) dU = K3 − K1 K 1 K2 =

(K3 − K1 )2 12

Thus (K3 − K1 )2 > 0. (10.8) 48 To recap, if U ∼ U nif orm[K1 , K3 ] then 2K2 − U = K1 + K3 − U is distributed U nif orm[K1 , K3 ]. Because diff =

Cov(U, 2K2 − U ) = −var(U ) < 0 it follows that the two variables are inversely correlated. However, as we demonstrated above, the antithetic MC simulation based on 2K2 − U has a

page 231

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

232

Carte˙main˙WS

Model Risk in Financial Markets

larger variance than the plain MC, so using this antithetic MC simulation increases rather than decreases the accuracy of the estimator. Now remark that U and 2K2 − U will be inversely correlated irrespective of the probability distribution of U . What happens if the risk-neutral distribution is a different one, most commonly used in the financial markets such as the Gaussian or lognormal? Here I am going to show that the conclusion reached above for the Uniform distribution is still valid irrespective of the probability density. Consider that U ∼ ρ(U ), where  is the risk neutral probability density function. Clearly, no matter what  is, it is true that ψ(U ) = ψ(2K2 − U ). Then it follows that EQ (ψ(U ))EQ (ψ(2K2 − U )) = [EQ (ψ(U ))]2 and also EQ [ψ(U )ψ(2K2 − U )] = EQ (ψ 2 (U )). Since ψ is a continuous function, ψ(U ) is also a random variable. As EQ (ψ 2 (U )) − [EQ (ψ(U ))]2 = var(ψ(U )) so 1 [var(ψ(U ))] ≥ 0. 2 This shows that no matter what the probability distribution  is, taking 2K2 − U as the antithetic Monte Carlo pair of U ∼ , will increase rather than decrease variance. diff =

10.3

Pitfalls in Estimating Greeks with Pathwise Monte Carlo Simulation

When using models to calculate asset prices implicitly we define a mapping between the domain of the parameters and the range of values for the asset price that the model implies. Suppose that the payoff Π of an asset or portfolio of assets depends on a single parameter θ, so that Π = Π(θ). This parameter can be the current value of stock S0 , or the volatility σ or any other parameter specified by the model. Greek parameters are sensitivities of values of payoffs calculated with respect to model parameters. This section is inspired by [Glasserman (2004)], section 7.2. Before proceeding, we shall warn the reader that the meaning of the word “derivative” here is the mathematical one and not the financial markets one.

page 232

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Computational Problems

233

Suppose that there is a valuation operator given by ψ(θ) = EQ (Π(θ))

(10.9)

where the expectation is taken under a risk-neutral measure and Π(θ) is the payoff in the financial contract that can be calculated directly based on the set of parameter values θ. The Greek parameter for θ is ψ (θ) = dψ(θ) dθ . One way to estimate ψ (θ) is to use the derivative of the payoff Π(θ + h) − Π(θ) (10.10) h Remark that this derivative is defined pathwise, that is Π (θ) = Π (θ, ω) where the path ω is fixed. This derivative should exist with probability 1 and it is called the pathwise derivative of Π at θ. It is important to realize that it may be possible for Π (θ) to exist with probability 1 at each θ ∈ Θ but this is not equivalent to saying that the application θ → Π(θ) is differentiable with probability one on the domain Θ. The explanation for this resides in the fact that the infinite union of the zero-probability sets associated with each θ ∈ Θ may have positive probability. The estimator given by (10.10) would be unbiased if and only if

 d  Q Q dΠ(θ) E [Π(θ)] (10.11) = E dθ dθ that is the expectation and differentiation operators are interchangeable. In order to understand the problems caused by estimating the Greeks pathwise we shall consider in this section the delta, gamma and vega Greek sensitivity parameters for a European call, a digital call and an Asian option, under the well-known Black-Scholes dynamics. The discounted payoff of the call is given by Π (θ) = lim

h→0

Π = e−rT [ST − K]+ with

√ 1 ST = S0 exp (r − σ 2 )T + σ T Z 2

(10.12)  (10.13)

where Z ∼ N (0, 1). The parameter θ can be in this case any of S0 , r, σ, T, K, but the most commonly calculated ones are delta, corresponding to θ = S0 and vega, corresponding to θ = σ. The gamma Greek parameter is the second derivative with respect to θ = S0 . The key relationship for pathwise Greek parameter estimation is the following: dΠ dST dΠ = (10.14) dθ dST dθ

page 233

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

234

Carte˙main˙WS

Model Risk in Financial Markets

The first factor on the right side gives the calculation of the payoff to the path while the second factor gives the calculation of the path to the parameter. For delta, this is evidently rewritten as dΠ dST dΠ = dS0 dST dS0 Clearly d (x − K)+ = dx



(10.15)

1, x < K 0, x > K

(10.16)

and therefore the derivative is not defined for x = K. However, since ST is by assumption a continuous random variable, the event {ST = K} has probability zero2 . Hence, we can say that Π is almost surely differentiable with respect to ST and it has the derivative dΠ = e−rT 1{ST >K} dST

(10.17)

Using the closed-form solution of the GBM for ST it is evident that ST dST = dS0 S0

(10.18)

Combining the last two identities allows us to determine the pathwise estimator of the delta parameter

 ST Q dΠ 1{ST >K} (10.19) E = e−rT dS0 S0 Then EQ



dΠ dS0



 1 = e−rT EQ ST 1{ST >K} S0 

ln(K/S0 ) − (r − √ |Z > σ T

σ2 2 )T

ln(K/S0 ) − (r − )T Q √ = S0 e E eσ T Z |Z > σ T    2 ln(K/S0 ) − (r + σ2 )T √ = 1−Φ σ T   2 ln(S0 /K) + (r + σ2 )T √ =Φ σ T

σ2 2 )T

=E

Q

S0 e

√ 2 (r− σ2 )T +σ T Z



2 (r− σ2

2 This



is not true if ST is assumed to have a discrete probability distribution!

 

page 234

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Computational Problems

Therefore EQ

235

  ln(S0 /K) + (r + dΠ √ =Φ dS0 σ T

σ2 2 )T

 (10.20)

Since the term on the right side is easily recognisable as the Delta under the Black-Scholes model, the pathwise estimator of delta is unbiased in this case. [Glasserman (2004)] makes another very important observation here. For any given S0 , the payoff Π is differentiable at the point S0 with probability one because the event {ST = K} has probability one. Having said that, Π is almost surely non-differentiable on (0, ∞) because for any value taken by Z ∼ N (0, 1) there is exactly one S0 such that ST = K. This solution is √ 1

S0 = Ke−(r− 2 σ

2

)T −σ T Z

.

The above methodology for estimating the Greek parameters of various contingent claims using pathwise estimation may prove useful for contracts and models that do now allow closed form calculations. After all, we knew exactly the delta for a European call option under the Black-Scholes model. Let us consider now a digital option with the following discounted payoff Π = e−rT 1{ST >K} . Since this payoff is piecewise constant in ST it follows that dΠ =0 dST and consequently dΠ = 0. dS0 However, in this case Π is differentiable except at the point ST = K, so Π is almost surely differentiable. In conclusion, for a digital option under the Black-Scholes model

 dΠ d Q E (Π) = 0 = EQ dS0 dS0 To see that this is true we calculate EQ (Π). EQ (Π) = EQ (e−rT 1{ST >K} ) = e−rT Q(ST > K)   2 ln(K/S0 ) − (r − σ2 )T −rT √ Q Z> =e σ T   2 ln(S0 /K) + (r − σ2 )T −rT √ Φ =e σ T

page 235

April 28, 2015

12:28

236

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

This example has profound implications. The pathwise derivative exists with probability one but it is useless in estimating the Greek parameter, delta in this case. The impact on EQ (Π) of a small change in S0 is directly driven by the possibility that this small change in S0 will make the value at the end of the path ST to be above the strike K. However, this possibility at the end of the path is never materialised by the pathwise derivative because the derivative calculated pathwise takes into account only the local sensitivity of Π to S0 . [Glasserman (2004)] also pointed out that a similar reasoning applies to barrier options. The pathwise method of calculating a derivative does not work for barrier options because on any given path a change in the underlying value of the asset that is small enough will not touch the barrier so the barrier condition is totally missed by the pathwise derivative. Secondly, since many financial derivatives have payoffs that are 2 piecewise linear, the gamma parameter, which is ddθΠ2 , cannot be estimated pathwise since the first derivative looks like a digital option. For example, the delta of a European call option (and the put option as well) will have a discontinuity at ST = K. Therefore, it would be totally wrong to estimate the gamma parameter pathwise for these options. The big question here is when can we safely apply pathwise estimation? A necessary and sufficient condition for the expectation operator to commute with the differential operator, that is

  Π(θ + h) − Π(θ) Q Q Π(θ + h) − Π(θ) = lim E E lim h→0 h→0 h h }{h∈[−1,1]} is uniformly is that the family of random variables { Π(θ+h)−Π(θ) h integrable. However, this is a theoretical condition that is difficult to verify in practice from payoff to payoff. A more useable, practically speaking, set of sufficient conditions is described in [Glasserman (2004)], and restated here for convenience. If Π(θ) is the pricing value of the payoff at maturity given by X(θ) = (X1 (θ), . . . , Xm (θ)) we can say that Π(θ) = f (X(θ)) with f : Rm → R. This layout will cover multi-assets as well as path dependent models. The sufficient conditions are the following: (1) ∀θ ∈ Θ, Xi (θ) exists with probability one, ∀i ∈ {1, . . . , m} (2) Q(X(θ) ∈ Df ) = 1, ∀θ ∈ Θ, where Df ⊆ Rm is the set of points at which f is differentiable.

page 236

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Computational Problems

237

(3) ∃ constant kf such that ∀x, y ∈ Rm |f (x) − f (y)| ≤ kf x − y which is the well-known Lipschitz condition. (4) ∃Ki , i ∈ {1, . . . , m} random variables such that ∀θ1 , θ2 ∈ Θ |Xi (θ2 ) − Xi (θ1 )| ≤ Ki |θ2 − θ1 | and EQ (Ki ) < ∞, i ∈ {1, . . . , m}. Notice that the digital option does not satisfy the third condition because of the break in continuity. To see these conditions in action it is instructive to calculate now the delta and vega for an Asian option, under the Black-Scholes model. The price of the payoff is 1  [S − K], S = St m i=1 i m

Π=e

−rT

for fixed 0 < t1 < . . . < tm ≤ T . The pathwise calculation starts with the same derivation by chain rule dΠ dΠ dS = dS0 dS dS0 = e−rT 1{S>K}

dS dS0

dS S 1  dSti 1  St i = = = dS0 m i=1 dS0 m i=1 S0 S0 m

m

The pathwise estimator of the Asian call delta is given by dΠ S = e−rT 1{S>K} dS0 S0

(10.21)

Let us verify now that the conditions (1-4) stated above are true. The underlying asset S follows a GBM 1

Sti = S0 e(r− 2 σ

2

√ )ti +σ ti Zi dS

So for the first condition, similar to previous calculations dSt0i = with probability one. The second condition is true because 1  St m i=1 i m

f (St1 , . . . , Stm ) =

Sti S0

exists

page 237

April 28, 2015

12:28

238

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

is differentiable in all m arguments. To prove the third condition consider x, y, z ∈ Rm . Then 1  (xi − yi ) m i=1 1/2 √  m m 2 (xi − yi ) ≤ m i=1 m

|f (x) − f (y)| =

1 = √ x − y m where the majorization is due to the Cauchy-Bunyakovsky-Schwarz inequality. For the last condition, consider S01 and S02 two possible initial stock values in (0, ∞), and a future time ti . Then 1

Sti (S0j ) = S0j e(r− 2 σ

2

√ )ti +σ ti Zj

with Zj ∼ N (0, 1) and j = 1, 2. Taking Z = max{Z1 , Z2 } we can write 1

|Sti (S01 ) − Sti (S02 )| ≤ |S01 − S02 |e(r− 2 σ

2

√ )ti +σ ti Z

. 1

2



Recognizing that we can take the random variables Ki = e(r− 2 σ )ti +σ ti Z and that each Ki is a lognormal variable, it implies a finite mean, and therefore all four conditions are satisfied in this case. Consequently, the delta for the Asian option can be estimated with the estimator given in (10.21), which is unbiased. Since the Asian option is path-dependent, it would be difficult otherwise to calculate the delta with a closed-form solution. As a bonus, one can also calculate the vega for the Asian option, under the same Black-Scholes model. Here a shortcut can be used as follows. √ (r− 12 σ 2 )(ti −ti−1 )+σ ti −ti−1 Zi then it is easy to derive Since Sti = Sti−1 e    dSti−1 Sti dSti = + Sti −σ(ti − ti−1 ) + ti − ti−1 Zi dσ dσ Sti−1 and using the initial condition

dS0 dσ

= 0 we obtain

i   dSti = Sti [−σti + tj − tj−1 Zj ] dσ j=1

which can be rewritten as

 1 2 dSti = Sti ln(Sti /S0 ) − (r + σ )ti dσ 2

(10.22)

page 238

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Computational Problems

Carte˙main˙WS

239

This formula can be implemented directly in the pathwise estimator of the vega of the Asian option, which is the same as the delta, where S0 is replaced by σ 1  dSti m i=1 dσ m

vega = e−rT 1{S>K}

(10.23)

or 1  1 St [ln(Sti /S0 ) − (r + σ 2 )ti m i=1 i 2 m

vega = e−rT 1{S>K}

10.4

(10.24)

Pitfall in Options Portfolio Calculation by Approximation Methods

The delta approximation method and the delta-gamma method are approximation techniques that allow the trader easy calculations of future values of a portfolio of options. Following the example described in [Britten-Jones and Schaefer (1999)] and re-discussed in [Christoffersen (2012)], consider a portfolio of three options all contingent on the same underlying stock with current value St = 100, all with time to maturity T equal to 28 calendar days, when risk-free rate is 1% per annum. The volatility of the underlying stock is given as 1.60% per calendar day. The three options are one European call and one European put with strike price 95 and one European call with strike price 105. The Black-Scholes formula will allow a rapid calculation of option prices but also of the Greek parameters delta (δ) and gamma (γ). Denoting by V Ot the value of the portfolio of options at time t the trader can approximate the future value of the portfolio of options at some future time. For example, she may like to know what would be the value of the options portfolio at time t + 5. Consider first a portfolio with two short calls with strike 95, six long calls with strike 105 and four short puts with strike 95. The delta method uses the following approximation V Ot+5 ≈ V Ot + δ(St+5 − St )

(10.25)

where δ is the delta of the portfolio, which is easy to calculate by linearity from the deltas of all options involved. In essence this is a first order approximation approach. It is well-known that this method works well only for small value variations in the underlying stock asset.

page 239

12:28

BC: 9524 - Model Risk in Financial Markets

240

Carte˙main˙WS

Model Risk in Financial Markets

The delta-gamma method adds a second order term of a quadratic nature. The formula is 1 (10.26) V Ot+5 ≈ V Ot + δ(St+5 − St ) + γ(St+5 − St )2 2 where γ is the Greek parameter for the portfolio of obtains, which again is easily computable. This method is supposed to improve on the delta method and work well also for larger variations in the value of the underlying stock.

30 Full Valuation

Delta-based

Gamma-based

20 Portfolio Value.

April 28, 2015

10 0 -10 -20 -30 -40 -50

85

90

95

100

105

110

115

Underlying Asset Price

Fig. 10.2: A comparison of the options portfolio valuation using the delta and the delta-gamma approximation methods. The portfolio of options has six long European call options with exercise prices 105, two short European call options and four short European put options with exercise price 95. The graph in Fig. 10.2 shows the comparison of the options portfolio valuation using the delta and the delta-gamma approximation methods, when the portfolio of options has six long European call options with exercise prices 105, two short European call options and four short European put options with exercise price 95. Clearly the approximation methods work well for a range of underlying values around 100.

page 240

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Computational Problems

241

10 Full Valuation

Delta-based

Gamma-based

0 Portfolio Value.

April 28, 2015

-10

-20

-30

-40

-50

-60

85

90

95

100

105

110

115

Underlying Asset Price

Fig. 10.3: A comparison of the options portfolio valuation using the delta and the delta-gamma approximation methods. The portfolio of options has six long European call options with exercise prices 105, five short European call options and one short European put options with exercise price 95.

It is very much a similar story for a portfolio options with six long European call options with exercise prices 105, five short European call options and one short European put options with exercise price 95. This is illustrated in Fig. 10.3. However, if only a slight change is made to the portfolio such that there are now six long European call options with exercise prices 105, three short European call options and three short European put options with exercise price 95, one will get the situation described in Fig. 10.4. Both approximation methods are now quite poor compared with the “proper” analytical valuation offered by Black-Scholes, that looks more like a third-order polynomial. Approximation methods can be very deceptive and dangerous because of the way they circumvent some problems but they can introduce some hidden ones as well. While they may work fine in low volatility time periods

page 241

12:28

BC: 9524 - Model Risk in Financial Markets

242

Carte˙main˙WS

Model Risk in Financial Markets

5 Full Valuation

Delta-based

Gamma-based

0 Portfolio Value.

April 28, 2015

-5 -10 -15 -20 -25 -30 -35

85

90

95

100

105

110

115

Underlying Asset Price

Fig. 10.4: A comparison of the options portfolio valuation using the delta and the delta-gamma approximation methods. The portfolio of options has six long European call options with exercise prices 105, two short European call options and two short European put options with exercise price 95.

they may cause miscalculations exactly in the high volatility time periods when calculations need to be more precise. 10.5 10.5.1

Transformations and Expansions Edgeworth expansion

The Gram-Charlier expansion has been used for computational solutions in quantitative finance. The idea of this expansion is to develop the characteristic function of the target distribution whose probability density function is f as a series depending on a characteristic function of a known, more friendly, distribution, and to recover f through the inverse Fourier transform. Most of the time the known distribution selected is the Gaussian distribution. For computational gain only the first few terms of the expan-

page 242

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Computational Problems

Carte˙main˙WS

243

sion are retained. However, the approximated f obtained in this manner is not guaranteed to be positive, and therefore it cannot be a valid probability distribution. The Gram-Charlier series diverges in many cases of interest. [Cramer (1957)] proved that the Gram-Charlier % 2 & series converges only if the target density f decays faster than exp − x4 at infinity. When the series diverges it is also not a true asymptotic expansion, because it is not possible to estimate the error of the expansion. The Edgeworth series is generally preferred over the Gram-Charlier series and it has been extensively used in finance. The main advantage of the Edgeworth series is the error control implying that this is a true asymptotic expansion. However, the next example from Ju (2002) shows that the Edgeworth expansion should not be employed when it is used to approximate the density of the arithmetic average of a lognormal process. This has an implication for pricing Asian options through this route. [Jarrow and Rudd (1982)] cast the Edgeworth expansion technique as an approximation of the ratio of the characteristic function of the target random variable X to that of the approximating one Z ∞ E[eitX ]  (it)j φX (t) = = (10.27) wj φZ (t) E[eitZ ] j! j=0 [Jarrow and Rudd (1982)] give the first few coefficients wj in terms of the cumulants which in turn can be calculated in terms of moments as shown in [Kendall and Stuart (1977)]. Proposition 10.1. The Edgeworth expansion diverges when the approximating random variable is lognormal. Proof. If Z is a lognormal random variable with mean m and variance s2 , the series ∞  (it)j (10.28) E(Z j ) j! j=0 diverges. It is evident to see from the properties of the lognormal variable j 2 s2

that E(Z j ) = ejm+ 2 . Applying the ratio test for the series in (10.28) % & (it)j+1 (j+1)2 s2 exp (j + 1)m +  (j+1)! 2 t % & exp m + (j + 0.5)s2 = (it)j j 2 s2 j + 1 (j)! exp jm + 2 Since for any s2 > 0, t = 0 it follows that  t lim exp m + (j + 0.5)s2 = ∞ j→∞ j + 1

page 243

April 30, 2015

14:13

244

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

we conclude that E[eitZ ] cannot be approximated by its corresponding Taylor expansion. 10.5.2

Computational issues for MLE

The maximum likelihood estimation procedure may cause problems not only from a theoretical perspective but also from a computational perspective. The following two results selected from [Wise and Hall (1993)] indicate some surprising aspects when calculating MLE. Proposition 10.2. The computational complexity of a maximum likelihood estimator may increase with the number of observations. Proof. Assume that {X1 , . . . , Xn } is a sample of mutually independent and identically distributed random variables with the Cauchy density function 1 fθ (x) = π[1 + (x − θ)2 ] where θ ∈ R. The likelihood function is ln L(θ; x1 , . . . , xn ) = −n ln(π) −

n 

ln[1 + (xi − θ)2 ]

i=1

so the MLE is determined by calculating the roots of the polynomial   n  ∂ 2(xi − θ) ln L(θ; x1 , . . . , xn ) = ∂θ 1 + (xi − θ)2 ) i=1 Because the expression on the right side can be expanded as a polynomial in θ of degree 2n − 1, as the sampling size n increases, so does the complexity of the problem to find out the roots of the polynomial. Furthermore, there are 2n − 1 roots considered in general and they all must be checked to see whether they reach a global or only a local optimum point. The second example is simply fascinating. I do not think that many practitioners working in model validating teams know it. Proposition 10.3. A maximum likelihood estimator may be unique when the sample size is an odd number but it may not be unique if the sample size is an even number. Proof. Consider the probability density function given by

|x − θ| fθ (x) = exp − 2

page 244

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Computational Problems

245

where θ ∈ R and the range for the random variable is again R. The log-likelihood equation relative to an observed sample of data {X1 , . . . , Xn } is   n  |xi − θ| ln L(θ; x1 , . . . , xn ) = −n ln 2 − i=1

Clearly the MLE for the parameter θ is the median of the sample {X1 , . . . , Xn }. If we assume that the variables in the sample are almost surely distinct then when n is odd there is a unique median which is the MLE estimator and when n is even there are almost surely an uncountable infinity of values that are the MLE of θ. 10.6

Calculating the Implied Volatility

The concept of implied volatility has become over the last few decades a very important one in quantitative finance. It is extremely important for options pricing but it is also used for risk management purposes and financial stability analysis. Volatility is traded nowadays as an individual asset class. 10.6.1

Existence and uniqueness of implied volatility under Black-Scholes

Under the Black-Scholes option pricing model all model parameters are directly observable from market data, except volatility3 . If volatility is the only unknown (and constant) then its value could be reverse-engineered from European options data. This idea was first pointed out by [Latane and Rendleman (1976)] but nowadays implied volatilities are widely used in financial markets. In plain language the implied volatility is the value of the volatility used in the Black-Scholes formula that will match the model price to the market price of the European option. More formalized, if C(S0 , r, T, X, σ) is the European call price on un underlying stock S at time zero and C mkt is the market price of the same option then the implied volatility σ  is determined implicitly by matching C(S0 , r, T, X, σ) ≡ C(σ) = C mkt and solving as an equation in σ in the domain (0, ∞). The case when σ = 0 should be discarded on modelling grounds and there should be preliminary explorations of data. 3 Strictly speaking this is not entirely correct. Dividends are also unknown ex ante and they are stochastic in nature.

page 245

April 28, 2015

12:28

246

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Before discussing numerical issues related to the calculation of implied volatility it is important to make sure that there is a unique solution to this equation. A priori there is no guarantee that there is a solution or that, in case there is a solution, this is unique. Proposition 10.4. The Black-Scholes formulae for pricing a European call or put option have a unique solution for the implied volatility. Proof. We shall prove the proposition for European calls, the demonstration for puts being similar. The Black-Scholes formula is given by C(σ) = S0 Φ (d1 ) − Ke−rT Φ(d2 ) % & 2 ln(S0 /K) + r + σ2 T √ d1 = , σ T

(10.29) √ d2 = d 1 − σ T

(10.30)

From the properties of Φ(·) it follows directly that lim C(σ) = S0 +  lim C(σ) = S0 − Ke−rT +

σ→∞

(10.31) (10.32)

σ→0

√ ∂C(σ) = S0 T φ(d1 ) ∂σ so clearly ∂C(σ) ∂σ > 0 for any S0 , r, T, K. Therefore, C is a continuous function over the domain (0, ∞) and it is also increasing. Clearly then the implied volatility equation C(σ) = C mkt has a solution if and only if the following double inequality condition is satisfied +  ≤ C mkt ≤ S0 (10.33) S0 − Xe−rT The same conditions ensure uniqueness because C is increasing. C mkt ≤ S0 otherwise there will be a straightforward arbitrage by shortselling stock +  then this and buying the call option4 Now, if C mkt < S0 − Ke−rT would imply that S0 > Ke−rT and again an arbitrage can be organised by going long on the option, short on the stock and depositing Ke−rT of the proceedings in cash. At maturity, either ST > K and then the option will 4 What if shortselling is not allowed? Is it still true that there is a unique solution for the implied volatility under Black-Scholes? Recall that the Black-Scholes model does require shortselling to be allowed.

page 246

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Computational Problems

Carte˙main˙WS

247

be exercised and the short stock position cancelled, or ST ≤ K and then the option will not be exercised but the cash available K will be enough to cancel the short stock position. From a numerical perspective, in order to calculate the implied volatility it is useful also to understand the convexity of the pricing function. Since 1√ 1 ln(S0 /K) + rT ∂d1 (σ) √ = T− 2 ∂σ 2 σ T it follows that √ ∂φ(d1 (σ)) ∂ 2 C(σ) = S0 T 2 ∂σ ∂σ  √  √ σ T 1 ln(S0 /K) + rT √ − = S0 T d1 (σ)φ(σ) σ 2 σ T √ d1 d2 √ = S0 T S0 T φ(d1 ) σ There is an inflection point σ  that can be determined from either d1 (σ) = 0 or d2 (σ) = 0. Thus, when K > S0 erT the inflection point is .

K 2 ln σ = − 2r (10.34) T S0 and when K < S0 erT the inflection point is .

S0 2 ln σ = + 2r T K

(10.35)

 = 0, which Note that when K = S0 erT the inflection point would be at σ is not allowed. It is easy to show that d1 (σ)d2 (σ) =

T ( σ4 − σ4 ) 4σ 2

and this shows that for σ < σ  the call pricing function as a function of volatility is convex and for σ > σ  the call pricing function as a function of volatility is concave. Hence the only inflection point is given by . ( ( ( ln(S0 /K) + rT ( ( (10.36) σ  = 2(( ( T

page 247

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

248

Carte˙main˙WS

Model Risk in Financial Markets

While we know that there is a unique implied volatility value, determining the exact value for implied volatility is facile but not without surprises. There is no analytical formula, although there are some analytical approximation formulae reviewed below, so the usual solution is to use a root-finding algorithm to calculate the implied volatility. The most known method for finding a solution for the equation H(σ) ≡ C(σ) − C mkt = 0 is the Newton-Raphson algorithm, see [Manaster and Koehler (1982)], that calculates iteratively sn+1 = sn −

H(sn ) H (sn )

and then take the solution σ  = limn→∞ sn . It is obvious that the algorithm requires that the derivative of the difference H(σ) should not be zero. Standard considerations related to the Newton-Raphson algorithm lead to the conclusion that 0
0. The three processes {Bt }t∈[0,T ] , {Nt }t∈[0,T ] and {Yt }t∈[0,T ] are mutually independent. 1 2 Letting k = E(Yt − 1) = eμ+ 2 δ it follows that E[(Yt − 1)dNt ] = kλdt. Therefore in order to eliminate the predictability of asset price movements only coming from jumps in the SDE (10.41) the term kλdt is taken out in the drift. Applying Itˆ o’s lemma we get that d ln St = (α − λk −

σ2 )dt + σdBt + ln Yt dNt 2

6 http://www.financialwisdomforum.org/gummy-stuff/implied-volatility.htm

page 251

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

252

Carte˙main˙WS

Model Risk in Financial Markets

and therefore

ln St − ln S0 = (α − λk −

Nt  σ2 )(t − 0) + σ(Wt − W0 ) + ln Yj 2 j=1



' Nt σ2 )t + σWt St = S0 exp (α − λk − Yj 2 j=1 ⎧ ⎫ Nt ⎨ ⎬ 2  σ )t + σWt + St = S0 exp (α − λk − ln(Yj ) (10.42) ⎩ ⎭ 2 j=1 % & 6N t Yj = 1 if Nt = 0. If Rt = ln SS0t then its It is tacitly assumed that j=1 probability density is given by

P (Rt ∈ A) =

∞ 

P (Nt = j)P (Rt ∈ A|Nt = j)

j=0

=

∞  e−λt (λt)j

j!

j=0



σ2 ϕ Rt ; (α − λk − )t + jμ, σ 2 t + jδ 2 2



Here ϕ(x; a, b) denotes the probability density of a Gaussian distribution with mean a and variance b as a function of x. It is not difficult to calculate 2 +μ2 )) . Therefore sign(μ) the skewness of Rt and this is equal to (σ2λμ(3δ +λσ 2 +λμ2 )3/2 determines sign(skewness(Rt )). Following a similar route the probability density function of ln(St ) can be obtained as

P (ln St ∈ A) =

∞ 

P (Nt = j)P (ln St ∈ A|Nt = j)

j=0

=

∞  e−λt (λt)j j=0

2

j!

 ϕ ln St ; ln S0 + ξt + jμ, σ 2 t + jδ 2

where ξ = (α − λk − σ2 ). In order to calculate and compare the volatility under the Black-Scholes model and Merton’s lognormal jump diffusion

page 252

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Computational Problems

253

model we need to calculate the variance of log-returns over a given t-period. var(Rt ) = var(σBt +

Nt 

ln Yj )

j=1

= σ 2 t + var( ⎡

Nt 

ln Yj )

j=1



= σ 2 t + E ⎣var ⎝

Nt 

⎡ ⎛ ⎞⎤ Nt  ln Yj |Nt ⎠⎦ + var ⎣E ⎝ ln Yj |Nt ⎠⎦

j=1 2

⎞⎤

j=1

2

= σ t + E(Nt δ ) + var(μNt ) = σ 2 t + λtδ 2 + λtμ2 The Black-Scholes model is based on a diffusion model like the one in (10.41) where the jump term is removed. It is known that for Black-Scholes √ StdBS (Rt ) = σBS t. Furthermore, under Merton’s lognormal jump diffusion model  2 2 2 StdM erton (Rt ) = (σM JD + λδ + λμ )t.

(10.43)

This formula has also been obtained earlier by [Press (1967)] and more recently by [Navas (2003)]. However, [Navas (2003)] pointed out that [Merton (1976a)], [Merton (1976b)], [Ball and Torous (1985)], [Jorion (1988)] and 2 2 [Amin (1993)] use only (σM JD + λδ )t which is incorrect. It is easy to see that the reason they have got the wrong formula is that the marginal variance was not calculated as the mean of the conditional variance plus the variance of the conditional mean, where conditioning is done with respect to Nt . By accident the two formulae will be the same only when μ = 0 2 but they assumed that E(Y − 1) = 0, which means μ = − δ2 . Clearly if σBS = σM JD then Merton’s prices will always be higher than the BlackScholes prices because of extra volatility. What happens if we require that the volatilities under both models are the same?  √ 2 2 2 σBS t = (σM JD + λδ + λμ )t 10.8

Notes and Summary

Computational finance may carry model risk in the form of using the wrong formula or applying a technique without verifying that some assumptions

page 253

April 28, 2015

12:28

254

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

are true. There are probably more examples along this line in the financial modelling world but here are some hints from this chapter. It is not true that methods designed to improve computational precision will always work. [Wise and Hall (1993)] show that there are cases of probability density functions for which the standard importance sampling is not a useful variance reduction technique. In general, the computational side of financial modelling differs from bank to bank and from team to team. A great book on financial modelling where a lot of attention is being paid to model risk is [Kienitz and Wetterau (2012)]. [Doran and Ronn (2005)] advocates that Black-Scholes Implied Volatility or Black Implied Volatility is an efficient but biased predictor of future realized volatility and that the bias is a result of a negative market price of volatility risk. Moreover, they argue that this conclusion is true for both equity markets, where there is a well documented negative correlation between equity returns and realized and implied volatilities, and also for energy markets, where there is a positive correlation between price-returns and volatility. Here are some useful points from this chapter. (1) Antithetic sampling does not always lead to an improvement in accuracy. (2) The pathwise derivative is not always useful in estimating the Greek parameters. The pathwise method of calculating a derivative does not work for barrier options because on any given path a change in the underlying value of the asset that is small enough will not touch the barrier so the barrier condition is totally missed by the pathwise derivative. (3) Since many financial derivatives have payoffs that are piecewise linear, 2 the gamma parameter, which is ddθΠ2 , cannot be estimated pathwise since the first derivative looks like a digital option. For example, the delta of a European call option (and of the put as well) will have a discontinuity at ST = K. Therefore, it would be totally wrong to estimate the gamma parameter pathwise. (4) The Newton-Raphson algorithm of calculating the implied volatility under the Black-Scholes model is sensitive to the starting point, and it may not give a solution in some cases. (5) There exist some analytical approximation formulae for implied volatility.

page 254

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Computational Problems

Carte˙main˙WS

255

(6) When calculating the implied volatility in a jump-diffusion model the marginal variance must be calculated as the mean of the conditional variance plus the variance of the conditional mean, where conditioning is done with respect to the jump process Nt .

page 255

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 11

Portfolio Selection Using the Sharpe Ratio

In this chapter real data is used for a portfolio analysis problem with the declared scope of highlighting the pitfalls that may appear when using the Sharpe ratio. There is a growing literature on improving or adjusting the classical Sharpe ratio for investments analysis. Important and useful reading can be found in [Sharpe (1994)], [Dowd (2000a)], [Lo (2002)], [Goetzmann et al. (2004)] and [Cerny (2009)]. [Stanescu and Tunaru (2014)] investigated the portfolio selection problem for an equity investor, for the US using daily data on VIX futures, the S&P500 index and the Barclays US Aggregated total return bond index between March 2004 and February 2012, and for the EU using daily data on VSTOXX futures, the STOXX50 index and the Barclays EUROPE Aggregated total return bond index between May 2009 and February 2012. As a proxy for the risk-free rate the 3-month Treasury Bill rates (secondary markets) are used for the US, and, for Europe, the 3-month EURO LIBOR. Following Szado (2009), the portfolio weights for the volatility futures are pre-set to 2.5% and then 10%. Tables 11.1–11.4 summarize the performance of the volatility-diversified portfolios. We assume that the portfolios are rebalanced weekly. The Sharpe ratios are commonly used in portfolio analysis to differentiate between competing portfolios. Since this performance measure is not applicable for negative excess returns, the adjusted Sharpe ratios are also reported. The latter are calculated using excess returns over the mean return obtained for the plain equity index portfolio as a benchmark, for the turbulent periods. In addition, in order to gauge the protective cover obtained from adding volatility futures, the historical value-at-risk measures at the 95% and 99% confidence levels are also calculated for each portfolio. The results in Tables 11.1–11.2 demonstrate that adding VIX futures

257

page 257

April 28, 2015

12:28

258

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Table 11.1: Performance of volatility-diversified US portfolios The performance statistics are of the daily relative returns on the different portfolios based on equity and VIX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures). SPX

97.5% SPX 2.5% VIX Futures

90% SPX 10% VIX Futures

Period (2004- 2012) Mean return Volatility Sharpe ratio Adj Sharpe ratio VaR 1% VaR 5%

5.11% 22.25% 17.05% NA 4.43% 2.13%

5.50% 20.35% 20.53% 1.92% 4.04% 1.94%

6.76% 15.80% % 34.44% 10.44% 2.91% 1.37%

subperiod 2004 - 2007 Mean return Volatility Sharpe ratio VaR 1% VaR 5%

8.30% 12.09% 48.37% 2.22% 1.27%

8.70% 10.84% 57.59% 2.00% 1.10%

9.91% 8.94% 83.47% 1.37% 0.78%

subperiod 2008-2012 Mean return Volatility Sharpe ratio Adj Sharpe ratio VaR 1% VaR 5%

2.21% 28.50% 6.75% NA 5.24% 2.90%

2.59% 26.15% 8.81% 1.45% 4.79% 2.62%

3.90% 20.10% 17.95% 8.41% 3.60% 1.88%

-42.23% 41.37% -104.37% NA 8.24% 4.52%

-39.74% 38.43% -105.89% 6.48% 7.66% 4.11%

-31.95% 30.36% -108.39% 33.86% 6.22% 2.90%

crisis year 2008 Mean return Volatility Sharpe ratio Adj Sharpe ratio VaR 1% VaR 5%

has a beneficial effect on portfolio performance, improving the mean return but most importantly reducing the volatility. Comparing the performance of the six portfolios under investigation it is also clear that, in normal times such as the period 2004-2007, adding a VIX futures contract improves the mean return and produces excellent Sharpe ratios and of course improves

page 258

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Portfolio Selection Using the Sharpe Ratio

259

Table 11.2: Performance of volatility-diversified US portfolios The performance statistics are of the daily relative returns on the different portfolios based on equity, bonds and VIX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures). 60% SPX 40% Bonds

58.5% SPX 39% Bonds 2.5% Futures

54 % SPX 36% Bonds 10% Futures

Period (2004- 2012) Mean return Volatility Sharpe ratio Adj Sharpe ratio VaR 1% VaR 5%

4.91% 12.92% 27.77% -1.55% 2.50% 1.22%

5.36% 11.34% 35.58% 2.20% 2.19% 1.03%

6.78% 8.90% 61.32% 18.76% 1.55% 0.64%

subperiod 2004 - 2007 Mean return Volatility Sharpe ratio VaR 1% VaR 5%

6.57% 7.22% 57.04% 1.22% 0.76%

7.02% 6.22% 73.40% 1.00% 0.65%

8.38% 6.52% 90.84% 0.77% 0.49%

subperiod 2008-2012 Mean return Volatility Sharpe ratio Adj Sharpe ratio VaR 1% VaR 5%

3.39% 16.47% 18.85% 7.16% 2.98% 1.61%

3.85% 14.51% 24.52% 11.30% 2.68% 1.35%

5.32% 10.61% 47.45% 29.31% 1.77% 0.94%

-24.35% 24.03% -105.29% 74.41% 4.97% 2.59%

-22.07% 21.61% -106.49% 93.29% 4.47% 2.23%

-14.99% 15.75% -101.22% 172.95% 3.20% 1.45%

crisis year 2008 Mean return Volatility Sharpe ratio Adj Sharpe ratio VaR 1% VaR 5%

VaR risk measures. Moreover, during turbulent times such as 2008-2012, there is a great benefit in having VIX futures in the investment portfolio, the mean return staying positive and the Sharpe ratios being the best for the portfolios containing VIX futures positions. Looking at the event risk of 2008 it can also be remarked that extreme

page 259

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

260

Carte˙main˙WS

Model Risk in Financial Markets

losses can be avoided if VIX futures positions are added. The comparison of portfolio performance for the eventful year of 2008 may lead to wrong conclusions if the standard Sharpe ratio is used as a performance yardstick. In Tables 11.1–11.2, when comparing the portfolio comprising 60% equity and 40% bonds with the portfolio comprising 58.5% equity, 39% bonds and 2.5% VIX futures, for the year 2008 only, the Sharpe ratio is better for the former portfolio. However, the latter portfolio has relatively better mean return and less volatility. Thus, something is not quite right. The adjusted Sharpe ratio corrects for this type of anomaly. Table 11.3: This table summarizes the performance of volatility-diversified European portfolios. The performance statistics are of the daily relative returns on the different portfolios based on equity and VSTOXX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures). STOXX

Mean return Volatility Sharpe ratio Adj. Sharpe ratio VaR 1% VaR 5%

5.42% 25.51% 10.66% NA 4.28% 2.55%

97.5% STOXX 2.5% VSTOXX Futures 5.97% 23.42% 13.98% 2.35% 3.83% 2.32%

90% STOXX 10% VSTOXX Futures 7.68% 17.92% 27.82% 12.61% 2.69% 1.78%

A similar story follows from the results of Tables 11.3–11.4, although this analysis covers only the most recent period due to the availability of VSTOXX futures contracts introduced by EUREX. For the European case, the results show that, for the period under analysis (i.e. May 2009 to February 2012), adding volatility exposure to an equity portfolio that tracks the EURO STOXX 50 index indeed provides risk diversification benefits: the volatility decreases from over 25% to under 18% (i.e. a reduction of around 30%) for a 10% exposure to VSTOXX futures (nearest maturity).

page 260

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Portfolio Selection Using the Sharpe Ratio

261

Table 11.4: This table summarizes the performance of volatility-diversified European portfolios. The performance statistics are of the daily relative returns on the different portfolios based on equity, bonds and VSTOXX futures positions. The portfolios are weekly rebalanced, and the notional of the futures contracts is assumed to be held in cash (no collateralization of the futures). 60% STOXX 40% Bonds Mean return Volatility Sharpe ratio Adj. Sharpe ratio VaR 1% VaR 5%

5.02% 15.05% 15.45% -2.66% 2.65% 1.57%

58.5% STOXX 39% Bonds 2.5% Futures 5.58% 13.28% 21.73% 1.20% 2.15% 1.36%

54 % STOXX 36% Bonds 10% Futures 7.31% 9.51% 48.45% 19.87% 1.42% 0.88%

page 261

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 12

Bayesian Calibration for Low Frequency Data

12.1

Introduction

This chapter is devoted to a presentation of parameter estimation risk in a situation where there is a need for data augmentation due to the fact that the observable asset prices are available at very low frequency (annually or quarterly) and hence the analyst also need to produce intermediary values for the asset price for internal calculations such as profit and loss. The model parameters generate model risk while the auxiliary data that ought to be generated are of model uncertainty nature. Here I show how to solve this problem in a real-world example using Bayesian techniques coupled with MCMC inferential techniques. The aim is to describe a novel technique for calibration of contingent claim models used for low frequency data while accounting for parameter uncertainty at the same time. Therefore, this research is similar in spirit to the research of [Bongaerts and Charlier (2009)] about the importance of parameter variations on financial decision making, where the parameters are computed in a particular parametric framework. Having said that, the material presented here is new, being based on a theoretically correct bridge process that improves the calibration process for models applied to low frequency data such as time series arising from real-estate. Furthermore, our measure of parameter uncertainty model risk averages over the entire posterior distribution of parameters of interest and produces an entire distribution for the pricing functions of various contingent claims, as opposed to the approach described in [Kerkhof et al. (2010)] where a confidence interval for a risk measure such as value-at-risk or expected shortfall is employed. Section 12.3 contains a superior data augmentation procedure for

263

page 263

April 28, 2015

12:28

264

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

partially observed diffusions using an appropriate bridge process. This new technique is applied to a more concrete situation, the pricing of property derivatives using a mean-reverting model, in Sec. 12.4. The numerical analysis for the commercial property index in the UK is detailed in Sec. 12.5 where the importance of considering a range of parameter values for ascertaining price differentiations is shown clearly. In addition, a new measure is proposed to quantify the parameter uncertainty contributing to model risk. 12.2

Problems in Pricing Derivatives for Assets with a Slow Business Time

Consider a financial market associated with a stochastic basis (Ω, F, F, P) where F = {Ft }t≥0 , P is the objective probability measure and Ft is the σ-algebra at time t containing all the information available to agents in the economy at time t. The market is spanned by the primary underlying asset {Yt }t≥0 and a money market account B = {Bt }t≥0 is available such that dBt = rt Bt dt, where {rt }t≥0 is an adapted interest rate process. If Q is a risk-neutral pricing equivalent martingale measure, since Q ∼ P, for any contingent claim with payoff H(YT ) at maturity T , the price at time t < T is given by      T

Πt (H; ϑ) = EQ exp −

rs ds H(YT )Ft , ϑ

(12.1)

t

where ϑ is the vector of parameters representing the model for Y . The Radon-Nikodym process ηt = dQ dP Ft means that the change of measure from the objective measure to the risk-neutral measure gives the likelihood process. Moreover, taking into account that {ηt }t≥0 is a P-martingale it follows that %  &   T EP ηT exp − t rs ds H(YT )Ft , ϑ . (12.2) Πt (H; ϑ) = ηt Within this very general set-up we can consider already parameter uncertainty which is related to calibration and model building. It is natural to assume that a historical path Yt1 , . . . , Ytn has been observed, with t1 < . . . < tn < t and t denoting the day when the valuation takes place, usually today. When the aim is to calculate the price of a contingent claim with some maturity T into the future, the pricing mechanism encapsulated by Πt (H; ϑ) in (12.2) carries parameter uncertainty.

page 264

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Bayesian Calibration for Low Frequency Data

265

Without developed liquid futures, swaps and options markets for property, it is difficult to infer the implied market view on the value of parameters for a given horizon T . Most of the real-estate derivatives currently trade over-the-counter and the analyst should take parameter uncertainty into consideration. If Θ is the parameter ϑ domain and a prior distribution p(ϑ) is assumed then this parameter uncertainty can be integrated out in a Bayesian set-up  Πt (H; ϑ)p(dϑ) (12.3) Πt (H) = Θ

Hence Πt (H) =

    EP ηT e(− tT rs ds) H(YT )Ft , ϑ 

ηt

Θ

= Θ

p(dϑ)

  T dP P ηT e(− t rs ds) H(YT )Ft , ϑ p(dϑ) Ft E dQ

(12.4)

Notice that the Radon-Nikodym derivative is determined by a set of transformations of the parameter vector ϑ. If p({Yt0 , . . . , Ytn }) is the marginal probability of the observed data and p(dϑ|Ft ) is the posterior density distribution of parameter vector ϑ then, applying Bayes’ formula p(dϑ|Ft ) =

dP p(dϑ) dQ Ft

p({Yt1 , . . . , Ytn })

.

(12.5)

and replacing in (12.4) we get  & % T Πt (H) = p({Yt1 , . . . , Ytn })EP ηT e(− t rs ds) H(YT )Ft , ϑ p(dϑ|Ft ) Θ

(12.6) This shows that, after observing some past values for the process, the posterior distribution of the vector of parameters ϑ can be updated, mainly following changes of the likelihood presented by the Radon-Nikodym derivative. A very large class of models in financial modelling literature is described by the general diffusion equation dYt = μ(t, Yt ; ϑ)dt + σ(t, Yt ; ϑ)dWt

(12.7)

with W = {Wt }t≥0 a standard Wiener process. Suppose that the analysis is done today at time t. The aim is to value some contingent claims at some

page 265

April 28, 2015

12:28

266

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

future horizon T while calibrating our model with the historical information observed at previous times t1 < t2 . . . < tn ≤ t, that is Yt1 , . . . , Ytn . Without loss of generality the time step δt = tj+1 − tj is assumed to be equal for all j ∈ {0, . . . , n − 1}. In property markets this is usually one year or one quarter or one month. For parameter estimation the standard analysis for the parameter estimation would proceed to discretize the equation (12.7) via the EulerMaruyama scheme Ytj+1 = Ytj + μ(t, Ytj ; ϑ)δt + σ(t, Ytj ; ϑ)εj

(12.8)

with εj ∼ N (0, δt ). Standard techniques for estimating the parameters of these diffusions from partially discrete observations have been developed by Durham and Gallant (2001); Eraker (2001); Roberts and Stramer (2001). However, as the time interval δt is actually quite large this approximation scheme would not work in this context. The reason for this is the fact that the transition density implied by the discrete time process given in (12.8) will converge to the actual transition probability density of the continuous time process discussed in (12.7) only when the value of δt is small. This has also been discussed in Sec. 9.2. One neglected aspect related to low frequency data is the finite sample characteristic. Since δt is fixed and possibly quite large, the usual results ensuring that discrete time solutions converge to continuous time model specification do not apply. This problem manifests in some important financial markets such as real-estate. In the next section we show how to circumvent this problem by data augmentation using a bridge process having the correct marginal and transition density. For path dependent products and for risk management as well it is extremely important to have the correct transition density when producing auxiliary data in order to avoid introducing model risk. 12.3

Choosing the Correct Auxiliary Values

The solution to the problem of partial information is to augment the information between two observed consecutive observations Ytj and Ytj+1 with sufficient auxiliary variables satisfying the properties of the process given by the continuous time model specification. Only then the discrete scheme in (12.8) is applicable. The data augmentation technique has been pioneered by [Tanner and Wong (1987)] and the idea is to ascribe N latent

page 266

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bayesian Calibration for Low Frequency Data

Carte˙main˙WS

267

values between each pair of consecutive historical observations. The time between a generic pair Ytj and Ytj+1 will be divided into N + 1 equal time t ≡ h. The augmented vector looks like this steps of length Nδ+1  (12.9) Y c = Yt1 , Y11 , . . . , Y1N , Yt2 , Y21 , . . . , Y2N , . . . , Ytn where the vector of the observed data is Y obs = (Yt1 , Yt2 , . . . , Ytn ) and the  1 N . , . . . , Yn−1 vector of auxiliary values is Y a = Y11 , . . . , Y1N , . . . , Yn−1 Applying Bayes’ formula provides the posterior distribution of all parameters and augmented data p(ϑ, Y a |Y obs ) ∝

M −1 '

c p(Yj+1 |Yjc , ϑ)p(ϑ)

(12.10)

j=1

where M = n + (n − 1)N . At a first glance, looking at (12.8) may suggest a straightforward Gaussian set-up, based on the likelihood factor   2 7 c Yj+1 − Yjc + μ(tj , Yjc )h 1 c c √ . exp − p(Yj+1 |Yj , ϑ) = 2σ 2 (tj , Yjc )h σ(tj , Yjc ) 2πh (12.11) This rationale would be correct only if all components of the vector Y c were observable. However, this vector of data is only partially observable. The auxiliary values between two consecutive observed values form a path starting at the initial observed value and finishing at the next observed value. The paths should be filled according to a bridge process that must satisfy the properties of the initial model (12.7). Early contributions to the bridge sampling technique include Eraker (2001) who proposed a Gibbs sampling intermediary step, and Durham and Gallant (2001), who suggested a Brownian bridge. Here we advocate an improved method related to the seminal result in Lyons and Zheng (1990) allowing the construction of exact bridges from the appropriate distribution. This technique is feasible when the transition probability densities of the underlying continuous time process are known analytically. Following Lyons and Zheng (1990), paths from the exact bridge processes can be simulated with a separate diffusion process given by dut = f (ut , ϑ)dt + σ(ut , ϑ)dWt

(12.12)

where the drift is determined by f (u, ϑ) = μ(t, u; ϑ) + σ 2 (u; ϑ)∇x (log p(t, x; τ, y)) |x=u

(12.13)

Here p(t, x; τ, y) is the transition probability density function of the initial continuous time model (12.7). Between the observed values Ytj and Ytj+1

page 267

April 28, 2015

12:28

268

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

we shall insert N values {Yjk }k={1,...,N } on a path from the bridge process with the drift determined by f (ut , ϑ). Taking advantage of the Gaussian values structure the intermediary values are drawn from  (12.14) p(Yjk+1 |Ytj , Ytj+1 ) = N Yjk + f (Yjk ), σ 2 (t, Yjk )h for any k ∈ {0, . . . , N − 1}, with Yj0 ≡ Ytj . 12.4 12.4.1

Empirical Exemplifications A mean-reversion model with predictability in the drift

In this book a mean-reverting model is preferred as a suitable vehicle for analyzing the dynamics of the real-estate indices. The debate regarding mean reversion models versus random walk models has a long history in the financial literature. Since real-estate markets display the characteristics of commodity markets we prefer using a mean-reverting type model. The next important question from a modelling point of view is whether to use jumps or not. It is difficult, if not impossible, to disentangle possible jumps on the underlying process from mean-reversion effects. Assuming that the current level of the process is above the fundamental level, a downward jump may be just a manifestation of the reversion to the mean effect and not an actual jump. This type of confounding is particularly relevant for realestate markets where the information is usually revealed at yearly frequency. Hence, jumps are not included in the current modelling. The model specification starts with the log of the real-estate price index, that is Yt = log(Xt ), making the assumption that there is E = {Et }t≥0 , a fundamental level for the process Y = {Yt }t≥0 , such that Et is Ft predictable and d(Yt − Et ) = −θ(Yt − Et )dt + σdWt

(12.15)

where θ, σ > 0 and W is a Wiener process. The equation can be rewritten as

 dEt + θ(Et − Yt ) dt + σdWt (12.16) dYt = dt This type of process exhibiting predictability in the drift has been investigated also by [Lo and Wang (1995)]. The predictable process {Et }t≥0 can be a simple linear function of the type α + βt or a linear or nonlinear function of covariate information. One

page 268

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bayesian Calibration for Low Frequency Data

Carte˙main˙WS

269

can envisage that the real-estate prices at a given point in time depend on the levels of interest rates, macroeconomic factors such as levels of unemployment, fiscal policies, oil prices and inflation, to mention only a few important determinants. It is straightforward to calculate the solution of (12.15); for s < t  t exp {θ(u − t)}dWu . (12.17) Yt = Et + [Ys − Es ] exp {θ(s − t)} + σ s

The conditional density Yt |Ys , ϑ with s < t, is Gaussian distributed with mean and variance given by σ 2 [1 − exp {−2θ(t − s)}] . Es (Yt ) = Et +(Ys −Es ) exp {(θ(s − t))}, vars (Yt ) = 2θ (12.18) This result also means that Xt |Fs has a log-normal distribution, which will prove useful in deriving exact pricing formulae for the main contingent claims currently traded in real-estate markets. 12.4.2

Data augmentation

For the trended mean-reverting process

 dEt + θ(Et − Yt ) dt + σdWt (12.19) dYt = dt The suggested solution to the problem of partial information is to augment the information between two consecutive observations with some latent variables. Only then the Euler-Maruyama discretization scheme in (12.8) would work. Applying the procedure described above, formula (12.20) gives the likelihood factor using directly the Gaussian distribution implied by the discrete process in (12.8)   2 7 ψtj − ψtj+1 − θψtj h 1 c c (12.20) exp − p(Yj+1 |Yj , ϑ) = √ 2σ 2 h σ 2πh where ψtj = Etj − Yjc . The transition density is known exactly for this class of models, a Gaussian distribution with moments given by (12.18), so √   2θ θ[y − Eτ − (x − Et )e−θ(τ −t) ]2 exp − p(t, x; τ, y) =  σ 2 (1 − e−2θ(τ −t) ) σ 2π(1 − e−2θ(τ −t) ) (12.21) which implies that 2θ exp {−θ(τ − t)}[y − Eτ − (x − Et ) exp {−θ(τ − t)}] ∇x log p(t, x; τ, y) = σ 2 (1 − exp {−2θ(τ − t)}) (12.22)

page 269

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

270

Carte˙main˙WS

Model Risk in Financial Markets

Between the observed values Ytj and Ytj+1 , one will insert N values {Yjk }k={1,...,N } on a path from the bridge process with the drift determined1 by where Bt =

f (ut , ϑ) = exp [−2θ(tj+1 −t)]+1 θh exp [−2θ(tj+1 −t)]−1 and

At + B t u t

2θh exp [−(tj+1 − t)]  Ytj+1 − Etj+1 + Et exp −θ(tj+1 − t) + (1 − exp [−2θ(tj+1 − t)]) dEt + θEt (12.23) dt The intermediary draws will then be made from At =

p(Yjk+1 |Ytj , Ytj+1 ) = N (ηYjk + ξ, σ 2 h)

(12.24)

2θ(kh−1)

+1 and where η = 1 + θh ee2θ(kh−1) −1

ξ = Etj +(k+1)h + (θ − 1)Etj +kh +  2θeθ(kh−1)  θ(kh−1) Y (12.25) − E + e E t +1 t t +kh j j+1 j 1 − e2θ(kh−1) for any k ∈ {0, . . . , N − 1}, with Yj0 ≡ Ytj . Remark that we have implicitly assumed that Yt1 is a constant starting point, for simplicity. Alternatively one can assume that Yt1 is a random variable with a Gaussian distribution. For example, assuming that the process Y starts from for this very first value is Y0 = E0 we %get that the marginal likelihood & σ2 p(Yt1 |ϑ) = N Et1 , 2θ [1 − exp (−2θt1 )] . 12.5

MCMC Inference for the IPD model

The augmentation process of data leads to a full vector Y c on which Bayesian inference can now be performed and calibration pursued. Recent research by MacKinnon and Zaman (2009) points to a general fundamental type of level around which property prices evolve. For application purposes the investor may calibrate the current view on the trend from historical data and embed this in their model for the index. As an example we shall consider here the case of the simple fundamental level2 given by Et = α + βt 1 See

(12.26)

Appendix for a detailed calculation. advanced research in the real-estate area shows how the fundamental level can be linked to macroeconomic variables and interest rates, see [Tunaru (2013b)]. 2 More

page 270

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Bayesian Calibration for Low Frequency Data

271

Table 12.1: Posterior analysis summary for the MCMC analysis of the IPD index on the log-scale, for the model with non-constant fundamental level. θ is the mean-reversion parameter, α and β are the intercept and slope parameters of the linear long-run mean on the log scale and σ is the volatility per annum. Posterior estimates are the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations.

α β σ θ

mean 10.83 0.018 0.10 0.04

sd 18.84 0.13 0.01 0.07

MC error 0.11 8.35E-4 7.67E-5 9.56E-4

2.5% -31.48 -0.27 0.08 0.001

5% -19.24 -0.22 0.08 0.002

median 7.40 0.04 0.10 0.01

95% 44.06 0.20 0.13 0.19

97.5% 53.52 0.25 0.13 0.26

The vector of parameters describing our trended mean-reverting model is ϑ = (θ, σ, α, β). Posterior inference can be performed on some parameters in closed form. However, for the purposes of our analysis MCMC techniques are applied via the inferential engine WinBUGS. For a thorough description of these techniques applied to diffusions in finance see Eraker (2001) and also Jacquier et al. (1994b). The data has been provided by Investment Property Databank (IPD) and is available from the author upon request. The convergence of MCMC is very fast, 10,000 iterations running in seconds. The inference results shown below are calculated from a sample of 50,000 MCMC iterations following a burn-in period of 150,000 iterations. The Monte Carlo error values are all less than 10% of the posterior standard deviation values, indicating a good fit3 . In Table 12.1 only the results from the long chain on the main parameters of interest are presented, namely the slope β and intercept α of the linear trend, the speed of reversion θ and the volatility parameter σ. It is well-known that the volatility in property markets is not very large and this is confirmed in this example where the posterior mean and posterior median of σ are both equal to 10% per annum. For θ however, the posterior mean is only 0.04 while its 3 The usual MCMC convergence checks are all passed but they are not presented here due to lack of space. The inference, calculated either from a very long single chain or from simulating several chains starting from different points, produces very similar calculations. The priors used were the Gamma(0.01, 0.001) distribution for the precision parameter 1/σ 2 , the uniform distribution between (0,2) for the mean reversion parameter θ and the Gaussian distribution with mean 0 and precision 0.001 for the parameters α and β.

page 271

April 28, 2015

12:28

272

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

posterior median is 0.01, suggesting a skewed posterior distribution for this important parameter. Before diving into the rich provision of the MCMC output for various quantities of interest, we recall that the model used here employed a nonconstant time trend. If the estimate of the slope coefficient β is positive that simply implies that the fundamental price of property ought to increase with time, while if β is negative then the property price ought to decrease. Either of the two conclusions contradicts financial economics theory, so the first important question to answer is the following: Is there a significant time trend representative for the fundamental price? Visually it seems from Fig. 12.1(b) that a positive linear time trend is “definitely” significant. The R2 from the ordinary least squares fit is 97.76%, quite “remarkable”. However, events like the property crash experienced in 2008 remind us that the property bubbles leading to positive exponential growth are bound to burst. One of the advantages of MCMC techniques is the Bayesian updating. The inference on the slope parameter β indicates that this parameter is not significant. Considering the credibility intervals4 at the 95% and the 90% level of confidence from Table 12.1 it becomes clear that β is not significant. The conclusion is that there is no significant time trend and our mean-reverting model should use a constant fundamental level. The revised analysis, cancelling out β, produces the estimates in Table 12.2 and all parameters are significant. Now we can proceed to identify the bridge intermediary values and other statistics of interest such as forward prices and option prices.

4 One credibility interval for the 95% level of confidence is easily constructed from the interval with the margins given by the 2.5% and 97.5% quantiles. Similarly for the 90% level of confidence.

page 272

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Bayesian Calibration for Low Frequency Data

273

IPD UK Annual All Property 1800 1600 1400

1200 1000 800 600 400 200

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

0

(a) actual levels 8 7.5 7 6.5 6 5.5 5 4.5 4

y = 0.0944x + 4.5681 R² = 0.9776

3.5 3

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

April 28, 2015

(b) log scale; linear trend fitted by OLS with R-squared 97.76% suggesting a “very good” fit.

Fig. 12.1: Historical trend of the IPD Annual UK commercial property index for the period 1980-2009.

page 273

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

274

Carte˙main˙WS

Model Risk in Financial Markets

Table 12.2: Posterior analysis summary for the MCMC analysis of the IPD index for the model with a constant fundamental level. θ is the meanreversion parameter, α is the long-run mean on the log scale and σ is the volatility per annum. Posterior estimates are the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations.

α σ θ

mean 14.51 0.10 0.03

sd 10.03 0.01 0.02

MC error 0.24 7.33E-5 4.08E-4

2.5% 7.16 0.08 0.002

5% 7.37 0.08 0.003

40

0.16

35

0.14

30

0.12

25

0.1

20

0.08

15

0.06

10

0.04

95% 36.21 0.13 0.06

97.5% 44.68 0.14 0.06

0.02

5 0 −0.05

median 10.55 0.10 0.02

0

0.05

0.1

0.15

0 0

0.2

20

(a) θ

40

60

80

100

120

(b) α

35

30

25

20

15

10

5

0 0.05

0.1

0.15

0.2

0.25

(c) σ

Fig. 12.2: Posterior densities of the main parameters of interest of the meanreverting model for the IPD index. θ is the mean-reversion parameter, α is the long-run mean on the log scale and σ is the volatility per annum. All densities are constructed from a sample of observations collected from 50,000 MCMC iterations.

page 274

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Bayesian Calibration for Low Frequency Data

275

Table 12.3: Posterior statistics of the augmented data representing the proper bridge between data points. Posterior estimates are the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations following a burn-in period of 150,000 iterations. node Y11 Y12 Y13 Y21 Y22 Y23 Y31 Y32 Y33 .. . 1 Y27 2 Y27 3 Y27 1 Y28 2 Y28 3 Y28 1 Y29 2 Y29 3 Y29

mean 4.65 4.69 4.74 4.76 4.79 4.81 4.84 4.86 4.89

sd 0.05 0.05 0.051 0.051 0.057 0.051 0.051 0.057 0.051

MC error 2.22E-4 2.55E-4 2.28E-4 2.45E-4 2.75E-4 2.22E-4 2.37E-4 2.70E-4 2.30E-4

2.5% 4.55 4.58 4.64 4.66 4.68 4.71 4.74 4.75 4.78

5% 4.56 4.60 4.66 4.68 4.70 4.73 4.75 4.77 4.80

median 4.65 4.69 4.74 4.76 4.79 4.81 4.84 4.86 4.89

95% 4.73 4.79 4.82 4.85 4.88 4.90 4.92 4.96 4.97

97.5% 4.75 4.81 4.84 4.87 4.90 4.91 4.94 4.97 4.99

... 7.34 7.33 7.32 7.24 7.15 7.07 7.08 7.09 7.11

... 0.051 0.057 0.051 0.051 0.057 0.051 0.051 0.057 0.051

... 2.37E-4 2.51E-4 2.30E-4 2.24E-4 2.61E-4 2.39E-4 2.48E-4 2.67E-4 2.34E-4

... 7.24 7.22 7.22 7.14 7.04 6.97 6.98 6.98 7.00

... 7.26 7.24 7.24 7.15 7.06 6.98 6.99 7.00 7.02

... 7.34 7.33 7.32 7.23 7.15 7.07 7.08 7.09 7.11

7.43 7.43 7.41 7.32 7.25 7.16 7.17 7.19 7.19

7.45 7.45 7.42 7.34 7.27 7.17 7.18 7.21 7.21

An additional advantage of MCMC methods is the ability to produce the entire posterior density diagram of the quantities of interest. The three densities shown in Fig. 12.2 suggest unimodality, which is reassuring from an application point of view, but there is also some degree of skewness that would be difficult to detect otherwise. The fitting of the model already produces the estimates for the proper bridge. In this example we consider augmenting annual data at quarterly points so three intermediary values are produced for the path associated with each pair of observed annual points. A snapshot of the posterior estimates of the auxiliary values is provided in Table 12.3. Taking advantage of the MCMC approach, several point estimates and interval estimates are readily available. Following the data augmentation procedure outlined above it is straightforward to build bridges of intermediary values between the observed data points of the right quality.

page 275

April 28, 2015

12:28

276

12.6

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Derivatives Pricing

For the empirical application detailed in this section we are mainly interested in pricing contingent claims on the IPD All Property Annual UK index. For simplicity of exposition we shall assume that the risk-free interest rate r is constant and equal to 2% per annum. The IPD index Xt+T = exp (Yt+T ) has a log-normal distribution since it is easy to see that Yt+T |Yt ∼ N (my (t, T ), σy2 (t, T )) with

λT σ λT σ + βT + Yt − α + my (t, T ) = α − exp (−θT ) (12.27) θ θ σ2 [1 − exp (−2θT )] (12.28) σy2 (t, T ) = 2θ The parameter λT changes the drift upon the change of measure from P to Q. The forward price at time t for maturity horizon T in the future is then derived as   σy2 (t, T ) (12.29) Ft (T ; ϑ) = exp my (t, T ) + 2 Since real-estate markets are inherently incomplete, the forward pricing formula is mainly used for fixing a pricing measure to be used for the other derivatives that may be issued on the same underlying index. This is done via λT , the market price of risk, that is calibrated for each of the five annual futures maturities. This quantity is derived exogenously here since the focus is not on identifying the market price of risk in realestate markets but rather on dealing with sparsity of data and parameter estimation uncertainty. It is straightforward in this case to price European vanilla call and put options. For a given set of parameters the formulae can be derived in closed form ct (K, T ; ϑ) = exp (−rT ) [Ft (T ; ϑ)Φ(d1 ) − KΦ(d2 )] & % σy2 ;ϑ) + ln Ft (T K 2 d1 = σy d2 = d1 − σy (t, T )

(12.30) (12.31) (12.32)

Based on the MCMC predictive sample for p(F2009 (T )|Y0 , Yt , t, ϑ), point estimates can be obtained such as the posterior mean, median or mode, the posterior standard deviation, other quantiles and so on. The focus here has been on building the appropriate bridge with intermediary values that

page 276

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Bayesian Calibration for Low Frequency Data

277

is helpful for marking-to-market any contingent claims dependent on the property index. Regarding the pricing of these contingent claims, one of the main contributions of this book is to show how to produce a wide range of inferential measures of the fair prices of property derivatives. In this way the problem of parameter uncertainty can be elegantly circumvented, giving a robust solution to this type of model risk. For the IPD example, we shall use the following annual term structure of the market price of risk: λ1 = 2.58, λ2 = 0.73, λ3 = 0.70, λ4 = 0.82, λ5 = 1.00 that has been estimated in a separate independent financial engineering exercise on the same IPD index. The market price of risk can be reverse-engineered from an equilibrium model such as recently proposed in the literature by Geltner and Fisher (2007) or it can be calibrated directly within an equilibrium model as illustrated by Cao and Wei (2010).

Table 12.4: Posterior analysis summary for the MCMC analysis of the forward prices on the IPD index. The forward prices on the IPD index, calculated for 2009 and five annual future maturities. Calculations are performed under a given term structure of the market price of risk λ1 = 2.58, λ2 = 0.73, λ3 = 0.70, λ4 = 0.82, λ5 = 1.00. Estimates are for the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations following a burn-in period of 150,000 iterations.

F2009 (1) F2009 (2) F2009 (3) F2009 (4) F2009 (5)

mean 1004.0 1200.0 1202.0 1144.0 1036.0

2.5% 910.3 1065.0 1009.0 905.9 771.2

5% 927.6 1088.0 1042.0 944.0 811.2

median 1007.0 1203.0 1206.0 1146.0 1036.0

95% 1068.0 1300.0 1353.0 1339.0 1261.0

97.5% 1078.0 1317.0 1379.0 1372.0 1302.0

The posterior estimates revealed in Table 12.4 indicate that, parameter uncertainty alone may lead to different theoretical forward curves. For example, one can compare the vectors for posterior means versus posterior medians. The difference can become even larger if we compare the extreme quantiles. A more informed view is contained in Fig. 12.3 where the posterior densities of fair forward prices are illustrated.

page 277

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

278

Carte˙main˙WS

Model Risk in Financial Markets −3

0.01

7

x 10

0.009 6 0.008 5

0.007 0.006

4

0.005 3

0.004 0.003

2

0.002 1 0.001 0 700

800

900

1000

1100

1200

(a) T = 1 year and λ1 = 2.58

0 800

1000

1100

1200

1300

1400

1500

(b) T = 2 years and λ2 = 0.73

−3

4.5

900

−3

x 10

3.5

4

x 10

3

3.5 2.5 3 2.5

2

2

1.5

1.5 1 1 0.5

0.5 0 600

800

1000

1200

1400

1600

(c) T = 3 years and λ3 = 0.70

0 400

600

800

1000

1200

1400

1600

1800

(d) T = 4 years and λ4 = 0.82

−3

3

x 10

2.5

2

1.5

1

0.5

0 400

600

800

1000

1200

1400

1600

1800

(e) T = 5 years and λ3 = 1.00

Fig. 12.3: Posterior densities of fair forward prices on the IPD index. Calculations are representative for the year 2009 and all future five year maturities. All densities are constructed from a sample of vector observations collected from 50,000 MCMC iterations following a burn-in period of 150,000 iterations.

page 278

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Bayesian Calibration for Low Frequency Data

279

A similar analysis can now be performed for the European call vanilla options. In Table 12.5 the results for the at-the-money options are presented, for each of the five future annual maturities, calculated as of 2009. Looking at the various point estimates and quantiles the skewness of the posterior densities of call prices looks more predominant. Indeed, the posterior densities depicted in Fig. 12.4 confirms this assertion and it appears that the pricing models are subject to asymmetric parameter uncertainty. The dispersion and skewness of the posterior density indicates how wide the parameter uncertainty is. The posterior densities comprise fair prices of call options for different combinations of parameter values.

Table 12.5: Posterior analysis summary for the MCMC analysis of the European call vanilla prices on the IPD index. The European call vanilla prices on the IPD index have an at-the-money strike K ∼ = 1219, calculated for 2009 and five annual future maturities. Calculations are performed under the following given term structure of the market price of risk λ1 = 2.58, λ2 = 0.73, λ3 = 0.70, λ4 = 0.82, λ5 = 1.00. Estimates are for the mean, standard deviation, and quantiles including the median. All estimates are calculated from a sample of 50,000 MCMC iterations following a burn-in period of 150,000 iterations.

c(K, 1) c(K, 2) c(K, 3) c(K, 4) c(K, 5)

mean 1.34 60.66 77.58 65.00 39.00

2.5% 0.22 14.83 12.64 5.64 1.33

5% 0.29 19.23 17.70 8.85 2.44

median 1.16 57.87 71.69 55.69 28.54

95% 3.04 112.40 158.90 154.90 112.60

97.5% 3.54 124.10 178.40 178.20 135.40

The main advantage of using the MCMC techniques in this context is the availability of a wide range of inferential statistics, all calculated from the same output. Estimating fair prices for derivatives can now be done on a set-basis and it is not constrained to pointwise procedures. Moreover, the posterior densities of the derivatives prices reveal the extent of model risk involved.

page 279

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

280

Carte˙main˙WS

Model Risk in Financial Markets 0.7

0.014

0.6

0.012

0.5

0.01

0.4

0.008

0.3

0.006

0.2

0.004

0.1

0.002

0 −2

0

2

4

6

8

10

(a) T = 1 year and λ1 = 2.58

0 −50

0

50

100

150

200

250

(b) T = 2 years and λ2 = 0.73

−3

9

x 10

0.01 0.009

8

0.008

7

0.007

6

0.006 5 0.005 4 0.004 3

0.003

2

0.002

1 0 −100

0.001 0

100

200

300

400

(c) T = 3 years and λ3 = 0.70

0 −100

0

100

200

300

400

(d) T = 4 years and λ4 = 0.82

0.02 0.018 0.016 0.014 0.012 0.01 0.008 0.006 0.004 0.002 0 −100

0

100

200

300

400

(e) T = 5 years and λ5 = 1.00

Fig. 12.4: Posterior densities of fair prices of European vanilla call option on the IPD UK All Property index. Calculations are for 2009 and all future five year maturities. All densities are constructed from a sample of vector observations collected from 50,000 MCMC iterations following a burn-in period of 150,000 iterations.

page 280

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bayesian Calibration for Low Frequency Data

12.7

Carte˙main˙WS

281

Notes and Summary

Here I exemplified how to use Bayesian inference via MCMC on low frequency data problems, with a real-estate index and contingent claims as the example. This type of data is interesting because it presents two important problems, lack of sufficient data to estimate the diffusion process at required tenors and parameter uncertainty which is ignored in general in derivatives pricing. This asset class is very important in financial markets and the application presented here is closely related to the research I have done after 2008 and published in a series of papers [Fabozzi et al. (2009)], [Fabozzi et al. (2010)] and [Fabozzi et al. (2012b)]. While the numerical data in the example discussed here comes from a less known modelling area – real-estate derivatives, the approach presented is general and applicable to other asset classes where market information is not available daily. Calculating P&L daily may require using markingto-model and then any parameter mis-estimation may translate into P&L miscalculation. The results presented here are relatively new, although I have presented them at conferences between 2009 and 2013. Here are some important lessons coming out of this chapter. (1) Low frequency data requires some technical innovation in order to enable the analyst to revalue the position of assets at intermediary times when there is no market data available for benchmarking. (2) The auxiliary values ought to be chosen such that the probability structure, marginal and conditional, corresponds to the specified model. The bridge process we utilize in this book conforms to all these requirements and its calibration is easy with MCMC techniques. (3) From a practical perspective I advocate that one can measure the parameter uncertainty impact on derivatives valuation by looking at the posterior quantiles of the asset price under consideration. (4) Taking advantage of MCMC technology it is straightforward to evaluate numerically the impact of parameter estimation uncertainty. It is very important to acknowledge that this risk is not shared equally or symmetrically by the buyer and seller of financial instruments.

page 281

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 13

MCMC Estimation of Credit Risk Measures

13.1

Introduction

In 2004 I was doing some research on the linkage between credit ratings and default probabilities. The fundamental relationship between the frequency of corporate defaults and credit ratings has significant implications for large funds with vast numbers of corporate bonds. Illiquid bonds or loans may be difficult to mark–to–market over any given period of time. Analysts have then no choice but to rely on credit ratings in order to assess the value of a loan or bond and to infer the probability of default. Over time three classes of credit risk models have emerged in the finance industry. The first class contains structural models which involve directly the asset and liability structure of the company, such as KMV’s model and CreditMetricsT M based on Merton’s seminal work (Merton, 1974). The second class is described as the reduced–form models that specify exogenously the dynamics of the spread between default free and credit–risky bonds in an arbitrage–free world ([Jarrow and Turnbull (1995); Duffie and Singleton (1999)]). A set of models nested into this class employs credit ratings, so that default is conceptualized through a gradual change in ratings driven by a Markov transition matrix ([Das and Tufano (1996); Jarrow et al. (1997)]). Last but not least the third class of models is characterised by the actuarial approach, where calibration to historical data is key. A well-known example in this category is CreditRisk+, a tool developed by Credit Suisse Financial Products. The main ingredients for all portfolio credit risk models are the estimates of average default probabilities for borrowers corresponding to each credit risk rating grade of a financial institution. The link between default frequencies and ratings has a direct impact on the financial institutions

283

page 283

April 28, 2015

12:28

284

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

capital structure, on designing asset–backed securities, and on setting regulatory capital requirements for banks. It is therefore important to have a flexible and yet powerful methodology capable of producing reliable default probability estimates from observed ratings. The relationship between default frequencies and rating categories has been explored in the literature (see, for example, [Blume et al. (1998)] and [Zhou (2001)]) and it has been put again under scrutiny in the aftermath of the subprime crisis, with misleading ratings being blamed for inducing false investor’s expectations of probabilities of default. [Carey and Hrycay (2001)] discussed an empirical examination of the major mapping methods used to estimate average default probabilities by grade. They found evidence of potential bias, instability and gaming. The need for an improved calibration methodology is clear in this context. In particular, any modelling approach should be able to take into account default data that is usually sparse. For some higher rating categories there has been almost no default for many years, and even at the other end of the rating spectrum only a small proportion of borrowers default. In addition, it is desirable that the model allows analysts to sometimes have a subjective input for taking into account macroeconomic policies or sudden changes in the international business environment; this cannot be accommodated by standard existing techniques and here a Bayesian approach such as that presented in [Stefanescu et al. (2009)] solves this problem in an elegant way. Lastly, the model should inherently take into account ordinal explanatory variables such as ratings; while there exists a vast literature on models where the dependent variable is qualitative ordinal, the case when the explanatory variables are ordinal has been addressed only in a few papers and always in a maximum likelihood framework ([Terza (1987)], [Ronning and Kukuk (1996)], and [Kukuk (2002)]). [Stefanescu et al. (2009)] developed a conceptual statistical calibration methodology for credit transition matrices, including probabilities of default, using a hierarchical Bayesian approach that takes into account ordinal explanatory variables. Thus, the models discussed below are based on the assumption that a company’s rating reflects the fact that its bond default risk, as characterized by an ordinal latent index, lies within a given interval with unknown endpoints. Markov Chain Monte Carlo (MCMC) techniques are employed for estimation of model parameters. The MCMC approach produces the inferred distribution for all parameters of interest, and easily overcomes potential problems due to sparse data encountered by other estimation procedures such as maximum likelihood. The same methodology

page 284

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

285

can also be used to incorporate further covariate data or expert opinion in a straightforward manner. 13.2

A Short Example

In the search for parsimony, statistical modelling of credit ratings data has been used gradually in the industry. For example, [Bluhm et al. (2003)] used Moody’s ratings data to show how corporate default probabilities may be calibrated to external ratings. Their analysis is based on a log-linear model linking the observed mean default frequency (M D) over the period 1993 to 2000 to the credit ratings category (Rating) modelled as an ordinal variable. Default probabilities are then inferred for all credit rating categories. As in their study I use here the observed mean default frequencies (MD) over the period 1993 to 2000. Some categories had no observed defaults and therefore there is no data available. The same type of calculations are done for the standard deviations (s.d.). The model discussed in [Bluhm et al. (2003)] for observed mean default frequencies (MD) of corporate companies rated by Moody’s over the period 1993 to 2000, is given by the log-linear relationship ln(M D) = −5 ln(30) + 0.5075Rating

(13.1)

The relationship under scrutiny is between MD as an endogenous variable and the credit ratings categories (Rating) as an exogenous variable. One obvious fact is that MD takes values in the range [0,1] while the Rating is an ordinal variable. With this log-linear model default probabilities are inferred for all credit rating categories and they are presented for convenience in Table 13.1. There are several observations that are worth mentioning at this point. First, the default probability by definition should increase as one moves from Aaa to B3. This is not the case with data presented in Table 13.1, Ba2 is associated with a lower default probability of 0.63% than the default probability associated with Ba1 of 0.69%. While the difference seems small this is an issue to consider for the calibration. The second point is that rating agencies use a more refined system. While the methodology presented here can be used in principle for enlarged systems of rating one should realise that with more rating categories the data is even sparser and this may have an undesired impact on the models simply because of lack of data. Another point regards the lack of data for superior ratings such as Aaa. This should not impose a zero default probability for these ratings.

page 285

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

286

Carte˙main˙WS

Model Risk in Financial Markets

Recent events in the corporate world strongly suggest that even Aaa rated companies may be touched by sudden death. Thus, we do not consider Aaa as a riskless rating class. The issue may be a bit more controversial for countries. Although one may argue that G7 countries are risk-free, at least for calibration issues, the analyst benefits more by not imposing a modelling constraint requiring fitted default probabilities for superior ratings to be zero.

Table 13.1: Corporate default probabilities implied by the log-linear regression model for corporates rated by Moody’s over the period 1993 to 2000. Rating Aaa Aa1 Aa2 Aa3 A1 A2 A3 Baa1 Baa2 Baa3 Ba1 Ba2 Ba3 B1 B2 B3

MD NA NA NA 0.08% NA NA NA 0.06% 0.06% 0.46% 0.69% 0.63% 2.39% 3.79% 7.96% 12.89%

s.d. NA NA NA 0.33% NA NA NA 0.19% 0.20% 1.16% 1.03% 0.86% 2.35% 2.49% 6.08% 8.14%

Estimated default probability 0.005% 0.008% 0.014% 0.023% 0.038% 0.063% 0.105% 0.174% 0.289% 0.480% 0.797% 1.324% 2.200% 3.654% 6.070% 10.083%

From an econometric point of view the main problems that the analyst is facing are: • Limited sample size; the number of observations used by the regression models is in one-to-one correspondence to the rating categories. • The data is incomplete in the sense that the data sample used for calibration may not contain any observed defaults for obligors with some given ratings such as Aaa. • The response variable has support in the interval [0,1]. The model described in the previous section does not satisfy this requirement and it may lead to default probabilities greater than 100%. • Last but not least, the explanatory variable employed for calibration is

page 286

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

287

ordinal with 16 categories. This issue may prove quite thorny to deal with. Here I propose a corrected model for calibration obtained by transforming the response variable MD on a different scale.

MD = α + β × Rating (13.2) logit(M D) ≡ ln 1 − MD This is a logistic regression model that can be fitted easily to data. The goodness-of-fit of this model looks very good, with an adjusted R2 of 82.3%. The regression coefficients are highly significant so that for prediction purposes one may use their estimates α  = −15.1673 and β = 0.5058, respectively. The probabilities of default estimated in this manner are presented in Table 13.2. Table 13.2: Corporate default probabilities implied by the logistic linear regression model for corporates rated by Moody’s over the period 1993 to 2000. Rating Aaa Aa1 Aa2 Aa3 A1 A2 A3 Baa1 Baa2 Baa3 Ba1 Ba2 Ba3 B1 B2 B3

MD NA NA NA 0.08% NA NA NA 0.06% 0.06% 0.46% 0.69% 0.63% 2.39% 3.79% 7.96% 12.89%

s.d. NA NA NA 0.33% NA NA NA 0.19% 0.20% 1.16% 1.03% 0.86% 2.35% 2.49% 6.08% 8.14%

Estimated default probability 0.004% 0.007% 0.012% 0.020% 0.032% 0.054% 0.089% 0.148% 0.245% 0.407% 0.675% 1.119% 1.855% 3.075% 5.098% 8.451%

It is obvious that the results are very good for higher ratings and consistently undervalued for lower grades. This may be due to the lack of sufficient data, as after all the regression model is fitted with only a few data points. However, another explanation may reside in the fact that we assumed the rating grades to be ordinal and equally spaced, that is the difference between Aaa and Aa1 is the same as the difference between Ba2 and Ba3, for example.

page 287

March 31, 2015

12:19

BC: 9524 - Model Risk in Financial Markets

288

Carte˙main˙WS

Model Risk in Financial Markets

Comparing the results presented in Table 13.2 one can remark that all default probabilities implied by the log-linear model are larger than the corresponding ones produced with the logistic model. The two models, logistic and log-linear, can be compared vis-a-vis the observed default rates and this is shown clearly in Fig. 13.1.

logistic

log-linear

observed

14.000%

Mean default probability

12.000%

10.000% 8.000% 6.000% 4.000% 2.000%

0.000% Aa3

Baa1

Baa2

Baa3

Ba1

Ba2

Ba3

B1

B2

Fig. 13.1: Comparison of mean default probabilities: observed versus loglinear and logistic models for corporates rated by Moody’s over the period 1993 to 2000.

Somehow paradoxically the log-linear model, that breaks the fundamental assumption that the default probability is between 0 and 1, seems to

page 288

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

289

produce closer results to observed data. This superiority may be however illusory. For the corporate bond data analysed above the question still remains “why does the log-linear model perform better than the properly specified logistic model?”. My intuition is that the cause is directly linked to the fact that rating grades are not equally spaced. This important observation deserves further clarification from the credit rating agencies.

logistic

log-linear

observed

40.00% 35.00% 30.00% Mean default probability

April 28, 2015

25.00% 20.00% 15.00% 10.00% 5.00% 0.00% Aaa Aa1 Aa2 Aa3 A1

A2

A3 Baa1 Baa2 Baa3 Ba1 Ba2 Ba3 B1

B2

B3

Fig. 13.2: Comparison of calibration results for default probabilities: loglinear and logistic models versus observed. All credit ratings are used for corporates rated by Moody’s over the period 1993 to 2000.

An analyst may argue that although there were no defaults between 1983 and 2000 for some rating categories the data for these categories should be included. Hence, in Table 13.2 the observed data cells with NA, not available, should also be taken into consideration. Taking a value of 0% for some observations leads to problems for the transformation applications

page 289

April 28, 2015

12:28

290

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

such as logistic. To overcome this problem one can replace, by convention, 0% with 0.000001%, a very small but positive value. This can have an effect on the calibration models used. The logistic regression model with the enlarged data provides a goodness-of-fit measure = of R2 = 72%. The coefficients are significant and they are estimated as α −20.613 and β = 1.2545. Figure 13.2 depicts a similar picture to before. Once again the logistic curve seems to be to far out for the lower grades from the curve of observed values, only this time it is above!?! Furthermore, as is well-known from the statistical modelling literature (see [Agresti (2002)]) an ad-hoc adjustment adding a small positive constant to M D and 1 − M D in the logit link, will introduces bias in the estimated default probabilities. This simple example illustrates that the model selected and the way it is applied in practice can impact on the final conclusions even if a small change in the data analysis is made. In my opinion a Bayesian analysis is best suited for these types of problem. Not only Bayesian method coupled with MCMC techniques are capable of dealing with a wide range of model specifications and probability distribution assumptions but they can easily overcome problems caused by missing data. Clearly for top credit rating categories there is insufficient data on defaults and therefore I advocate using Bayesian methods for model calibration in credit markets.

13.3

Further Analysis

What can an analyst do when data for calibration is sparse and he cannot increase the number of observations as desired? Recall that the majority of econometrics techniques rely on a sample that is large enough, small samples being notoriously difficult to analyse. In addition the analyst may wish to consider some subjective information1 that may prove to be important and this is very difficult, if not impossible, within maximum likelihood or generalised least squares estimation and testing frameworks. An answer to both problems is to develop models under a Bayesian framework. The models can be structured on several layers of stochasticity if necessary to account for the complexity of data used for calibration, and subjective priors can be used to implement expert opinions. 1 This may be that all-important opinion of his/her senior manager or the senior economist of the bank.

page 290

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

13.3.1

Carte˙main˙WS

291

Bayesian inference with Gibbs sampling

For example consider the logistic regression model (13.2) analysed above. The notation is slightly changed to follow Bayesian modelling as described here. Next Y denotes the mean default frequency between 1983 and 2000 while X represents the credit ratings, taking values from 1 to 16 in a oneto-one correspondence to the credit ratings Aaa to B3 as used by Moody’s. The Bayesian logistic model is specified here hierarchically:

Yi |α, β, τ ∼ N (μi , τ ), i = 1, 2, . . . , 16. ln 1 − Yi μi = α + βXi α ∼ N (0, 0.001), β ∼ N (0, 0.001) τ ∼ Gamma(3, 1).  σ = 1/τ The second parameter of the Gaussian distribution, such as τ above, is the precision parameter, that is the inverse of the variance, σ 2 here. Hence a small precision means a very large standard deviation and therefore the prior distributions that we assumed for the regression parameters α and β reflect our ignorance about their values. The same follows with the prior distribution gamma for the precision parameter τ . Expert opinion can be incorporated into this type of modelling by imposing more concentrated priors. For example, a downturn in the economy that may lead to a general increase in the defaults for all rating categories is equivalent to an upward shift of the intercept and maybe also of the slope of the logistic curve. These changes can be inserted into the model by changing the priors for α and β to plausible ranges. The inference is extracted here based on a sample of 10,000 iterations after convergence criteria are passed. The whole exercise can be executed in less than one minute on a standard PC. A useful rule of thumb that can be used to assess the accuracy of the posterior estimates is to calculate the Monte Carlo error for each parameter. This should be less than 5% of the sample standard deviation. The inference is reported in Table 13.3. The analyst can use -16.48 as an estimate for α, 0.942 for β and 2.1 for σ. In a Bayesian inferential set-up there is a similar concept to confidence intervals, called credibility intervals, [Carlin and Louis (1996)], given by the posterior estimates of the 2.5% and 97.5% quantiles. The credibility intervals provided in Table 13.3 show that all parameters are significant. The power of MCMC techniques

page 291

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

292

Carte˙main˙WS

Model Risk in Financial Markets

Table 13.3: Posterior estimates of the mean, standard deviation, median and quantiles for the Bayesian logistic model. All credit ratings used here are for corporates rated by Moody’s over the period 1993 to 2000.

α β σ

mean -16.48 0.9419 2.143

s.d. 1.138 0.117 0.3604

MC error 0.008015 8.29E-4 0.002773

2.5% -18.7 0.7065 1.576

median -16.49 0.9428 2.097

97.5% -14.2 1.168 2.968

is the ability to visualise the whole posterior distribution of parameters of interest. This can be helpful to identify distributions with more than one mode, in which case maximum likelihood estimation would be misleading, or to identify problems with skewness, kurtosis or fat tails. A major problem in calibrating the default frequencies to credit ratings is the transformation on a suitable scale of the endogenous variable Y = M D, representing mean default. The logistic transformation is the first one that is usually used but other transformations are available. Hence, the next Bayesian model investigated here is based on a log(−log) transformation. This model is again specified hierarchically as log(−log(Yi ))|α, β, τ ∼ N (μi , τ ), i = 1, 2, . . . , 16.

(13.3)

μi = α + βXi

(13.4)

α ∼ N (0, 0.001), β ∼ N (0, 0.001)

(13.5)

τ ∼ lognormal(0, 0.001)  σ = 1/τ .

(13.6) (13.7)

Here a different prior distribution has been used for τ , a parameter with positive support, mainly to show the wide range of distributions that can be combined with MCMC techniques. There is little to choose from between a gamma distribution and a lognormal distribution as used for this second layer of modelling. The convergence checks are not included here for lack of space. Only the inference results are presented next in the text in Table 13.4. Nevertheless, the two models can be compared using the Deviance Information Criterion (DIC) developed by [Spiegelhalter et al. (2002)] as a yardstick. This measure takes into consideration the model complexity and is based on the posterior of the deviance, that is −2×likelihood, plus the effective number of parameters (pD), defined as the posterior mean of the deviance D ˆ The model with the smallest minus the deviance of the posterior means D.

page 292

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

293

Table 13.4: Posterior estimates of the mean, standard deviation, median and quantiles for the Bayesian log-log model. All credit ratings used are for corporates rated by Moody’s over the period 1993 to 2000.

α β σ

mean 3.067 -0.1323 0.2897

s.d. 0.156 0.017 0.059

MC error 8.03E-4 7.69E-5 3.47E-4

2.5% 2.754 -0.164 0.2013

median 3.068 -0.1323 0.2808

97.5% 3.379 -0.1001 0.4312

Table 13.5: Comparison of the DIC measure for the Bayesian logistic regression model and the Bayesian log(-log) regression model. All credit ratings used are for corporates rated by Moody’s over the period 1993 to 2000. Model logistic log(-log)

D 74.835 5.146

ˆ D 72.054 1.988

pD 2.781 3.158

DIC 77.617 8.304

Deviance Information Criterion is estimated to be the model that would best predict a replicate dataset of the same structure as that currently observed. Considering the results presented in Table 13.5 it seems that the log(−log) regression model provides a better fit. Other transformation such as probit can be considered in a similar manner and all three transformations can be varied with different prior continuous distributions having positive support, not only the gamma and the lognormal presented here. Let us see the comparison between the fit of the two models versus the observed values. Figure 13.3 shows that the Bayesian log(−log) model is more conservative while the Bayesian logistic model tends to overestimate the probabilities of default. However, if the last credit rating category B3 is left out of the analysis then the Bayesian logistic regression fits the data really well. In any case, the models used aggregated default data and important information may be lost in that process. This analysis reveals that model identification risk, that is selecting the appropriate model from a set of suitable models, is very important and it may have a clear impact on the financial decision process. When the models are used to produce a vector of numerical results it is usually difficult to decide on which models perform better. In this small example, the logistic link model seems to be more conservative than the log-log link model for the lower rating categories and riskier for the middle rating categories.

page 293

March 31, 2015

12:19

BC: 9524 - Model Risk in Financial Markets

294

Carte˙main˙WS

Model Risk in Financial Markets

Bayesian_logistic

Bayesian_log(-log)

Observed

25.00%

Mean default probability

20.00%

15.00%

10.00%

5.00%

0.00% Aaa

Aa2

A1

A3

Baa2

Ba1

Ba3

B2

Fig. 13.3: Comparison of calibration results for default probabilities: Bayesian log-log and Bayesian logistic models versus observed. All credit ratings used are for corporates rated by Moody’s over the period 1993 to 2000.

13.4

Hierarchical Bayesian Models for Credit Risk

It is clear that one can harness the power of the Bayesian modelling and MCMC inference to develop models for more complex problems related to credit risk management. The hierarchical Bayesian models presented below continue the line of the non–Bayesian specifications from [Kao and Wu (1990)], [Terza (1987)] and [Hsiao (1983)]. The ordered probit model has been used, among others, by [Nickell et al. (2000)] to explore the dependence of rating transition probabilities on business cycles and on other characteristics of the borrowers, and by [Cheung (1996)] to explain rating

page 294

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

295

levels based on indebtedness. [Gossl (2005)] proposed an extension of Merton’s model for credit default fitted in a Bayesian framework, being able to capture correlations between default probabilities of obligors from different rating classes. Our Bayesian models, while offering similar benefits by estimating the entire joint posterior distribution of default probabilities, are different in that they model explicitly the impact of rating on default probabilities. 13.4.1

Model specification of probabilities of default

Consider a population of B borrowers indexed by j = 1, . . . , B. Let Zj be a binary variable taking the value 0 if borrower j defaulted, and 1 otherwise. Let xj ∈ Rd be a covariate vector for borrower j. The covariate information is usually borrower specific (for example, ratings), but it could also consist of general economic indicators. In particular, the first component of each x will typically be one, denoting the presence of an intercept term. We shall denote the probability of default by p(x) = Pr(Z = 0; x). A general approach to modelling the probability of default is to consider p(x) = ϕ(β T x) where ϕ : R −→ [0, 1] is a link function and β ∈ Rd is a vector of regression parameters. Usually ϕ is chosen to be a cumulative distribution function because its resulting monotonicity ensures that covariate effects are easily interpretable. One example of such a function is the probit link Φ(t) = Pr(N (0, 1) < t) defined by the cumulative distribution function of the standard Gaussian distribution. Examples of other link functions are the logit and the log(-log) links, that have been applied in the example above at the beginning of this chapter. In particular, I am interested in modelling the effect of an ordinal covariate representing rating category. A common approach is to assume the existence of an underlying (or latent) continuous variable for the ordinal indicator — for example, this gives rise to the probit model when the ordinal indicator is the dependent variable. In the case when the ordinal indicator is the explanatory variable, this is often either replaced by a set of dummy variables, or used itself as a regressor. [Kukuk (2002)] shows that both approaches could lead to wrong answers when assessing whether the corresponding continuous latent variable has a significant influence on the dependent variable or not. [Hsiao and Mountain (1985)] proposed a linear model with ordinal covariates based on latent variables with known thresholds, such as is the case, for example, with grouped income data. [Ronning and Kukuk (1996)] relaxed this strong assumption and further discussed a

page 295

April 28, 2015

12:28

296

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

model where both dependent and explanatory variables are ordinal, where the unknown thresholds are estimated jointly with the structural parameters in a two–stage procedure. This approach relies on a set of distributional assumptions for the variables, and estimates can be biased if these assumptions are not met. Moreover, neither of these models is specified in a Bayesian framework, therefore they cannot easily incorporate subjective expert information. Here latent variables with unknown thresholds are used for modelling the ordinal covariate indicators in a hierarchical Bayesian framework. Without loss of generality it is assumed that the covariate information is solely given by the rating category. Let n be the number of rating categories, and let Cj be the rating category for borrower j, with j = 1, . . . , B. The random variables Cj are ordinal and observable for each borrower, so that the covariate vector for j is given by xj = (1, Cj ). Our goal is to model the probability of default in each rating category i, defined by pi = Pr(Z = 0|C = i), for i = 1, . . . , n. The main assumption is that the category variable C is an indicator of the event that some unobservable continuous variable, say R, lies between certain thresholds. Specifically, let γ1 , γ2 , . . . , γn+1 be a set of unknown thresholds. A corporate bond issue belongs to a certain risk category, say C = i, if the latent variable R falls in the interval (γi , γi+1 ). It is expected that the issuers in a given risk category i will exhibit roughly the same expected default risk. Also, the widths of the risk category intervals need not be equal, and in practice the interval for Aaa bonds may have a different length than the interval for Bb bonds. For i = 1, . . . , n, let mi be the number of issuers and Yi the number of defaults in rating category i. We shall consider the following model: Yi | mi , pi ∼ Binomial(mi , pi ),

i = 1, . . . , n

(13.8)

logit(pi ) = β0 + β1 Ri + bi Ri | γi , γi+1 ∼ U (γi , γi+1 ) bi | σ 2 ∼ N (0, σ 2 ) γi+1 = γi + zi The random variables Ri have a uniform distribution on (γi , γi+1 ) and represent the latent effect of category ratings. The uniform is a special case of the generalized beta distribution; for comparison, in the application described in Section 3 we also fitted the model (13.8) with the assumption

page 296

April 30, 2015

14:13

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

297

of a generalized beta distribution for Ri , and this led to very similar results. The random effects bi are assumed to have a Gaussian distribution with mean zero and standard deviation σ. This choice of the Gaussian distribution is motivated by mathematical convenience, but other choices are also possible. The increments zi between the unknown thresholds must be positive, and in practice they will be given a gamma prior distribution as described in the following section. Note that different link functions may be used in model (13.8). Alternatives to the logit link are the probit and log(-log) functions. Specifically, for the probit link we replace logit(pi ) = β0 + β1 Ri + bi in (13.8) with pi = Φ(β0 + β1 Ri + bi ),

(13.9)

where Φ(x) is the cumulative distribution function of the standard Gaussian distribution. Similarly, for the log(-log) link, we take log(− log(pi )) = β0 + β1 Ri + bi .

(13.10)

We shall exemplify the use of the three link functions in the analysis of the Standard&Poor’s data in the next Section. The model is flexible enough to also allow for more complex dependence patterns — for example, we show in Sec. 13.5.3 how an autoregressive structure can be incorporated in the model, making it suitable for the analysis of time series data. 13.4.2

Model estimation

The hierarchical model (13.8) specified in the previous subsection may be estimated from sample data in a Markov Chain Monte Carlo (MCMC) framework. Let us denote by θ the vector of parameters and hyperparameters of the model, given by θ = (γ1 , z1 , . . . , zn , β0 , β1 , σ 2 ). A Bayesian specification requires prior distributions to be chosen for all parameters and hyperparameters in the hierarchy. Let p(θ) be the probability density of the prior distribution of θ, and let Y be a shorthand notation for the sample data. The joint posterior density p(θ | Y) of all parameters given the observed data is proportional to the product of the likelihood function and the prior density: p(θ | Y) ∝ p(Y | θ) · p(θ)

(13.11)

I assume that the model parameters are a priori independent, so that the prior density p(θ) is a product of prior densities for each parameter. With little external information available, in general I would like to specify

page 297

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

298

Carte˙main˙WS

Model Risk in Financial Markets

non–informative priors p(·) for the components of θ. For instance, in the application described below I specify Gaussian priors with large variance N (0, 103 ) for the regression parameters β0 and β1 , gamma priors with large variance for zi , and a diffuse inverse gamma prior for σ 2 . However, if expert opinion is available, it can be easily incorporated into the analysis by specifying more concentrated priors. This type of external information can be incorporated as subject-specific elicitation by changing the priors for β0 and β1 to plausible ranges. The first model with a logistic link function is described by the following general system of equations. For all i = 1, 2, . . . , n Yi ∼ Binomial(mi , pi )

(13.12)

logit(pi ) = β0 + β1 Ri + bi bi ∼ N (0, σ 2 ), σ 2 = 1/τ Ri = γi + zi Ui , γi+1 = γi + zi zi ∼ Gamma(δ, δ), Ui ∼ U (0, 1) γ1 ∼ Gamma(α, α), τ ∼ Gamma(u, v) β0 ∼ N (0, τ0 ), β1 ∼ N (0, τ1 ) The specification of the gamma distribution with equal parameters for γ1 and for the increments zi implies that the mean of those variables is one and the precision is equal to α for the former and δ for the latter. The joint posterior density for the first model can be calculated as follows: p(p1 , . . . , pn , z1 , . . . , zn , β0 , β1 , τ, γ1 ) ∝

n '

pYi i (1 − pi )mi −Yi ·

i=1



i=1

×

τ e−

b2 i 2

i=1

× τ0 e n '

n '

τ0 2

β02

· τ1 e

τ1 2

β12

· e−vτ τ u−1

i−1

%

e[β0 +β1 (γ1 + 1+e

j=1

·

n ' δ δ δ−1 −δzi z e Γ(δ) i i=1

(13.13)

αα −αγ1 vu · e Γ(u) Γ(α)

zj +zi ui )+bi ]Yi

i−1

[β0 +β1 (γ1 +

τ

j=1 zj +zi ui )+bi ]

τ

&m i · τ n e − 2

n

i=1

b2i

 τ0 2 τ1 2 αα −αγ1 vu δ nδ −δ n i=1 zi · τ e 2 β0 · τ e 2 β1 · e−vτ τ u−1 · e · e 0 1 Γ(δ)n Γ(u) Γ(α)

page 298

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

299

where we made use of the evident relationship γi = γ1 +

i−1 

zi .

j=1

For the second model defined by the probit link function pi = Φ(β0 + β1 Ri + bi ) we can derive in a similar manner the joint posterior density p(p1 , . . . , pn , z1 , . . . , zn , β0 , β1 , τ, γ1 ) ∝

n '

{Φ[β0 + β1 (γ1 +

i=1

i−1 

zj + z i u i ) + b i ] Y i

j=1

× Φ[−β0 − β1 (γ1 +

i−1 

τ

zj + zi ui ) − bi ]mi −Yi } · τ n e− 2

n

i=1

b2i

j=1

×

τ0 2 τ1 2 αα −αγ1 vu δ nδ −δ ni=1 zi · e e · τ0 e 2 β0 · τ1 e 2 β1 · e−vτ τ u−1 n Γ(δ) Γ(u) Γ(α)

For the third model based on the log(-log) link function pi = , the joint posterior density is given by

β0 +β1 Ri +bi

e−e

p(p1 , . . . , pn , z1 , . . . , zn , β0 , β1 , τ, γ1 ) ∝

n '

{e−Yi e

β0 +β1 (γ1 +

i−1 z +zi ui )+bi j=1 j

· [1 − e−e

β0 −β1 (γ1 +

i−1 z +zi ui )−bi j=1 j

]mi −Yi }

i=1 τ

× τ n e− 2

n

×e−vτ τ u−1

i=1

b2i

·

τ0 2 τ1 2 δ nδ −δ ni=1 zi e · τ0 e 2 β 0 · τ1 e 2 β 1 Γ(δ)n

vu αα −αγ1 · e Γ(u) Γ(α)

The Gibbs sampler requires all the conditional posterior distributions, and these can be derived based on the joint posterior distribution. For example, from the general expression (13.13) for the logistic link model we can derive the closed form density of the conditional posterior distribution of τ . Thus τ

p(τ | . . .) ∝ τ n τ u−1 e− 2 [ 

n

i=1

b2i +v ]

∝ Gamma n + u, v +

n  b2 i

i=1

2



page 299

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

300

Carte˙main˙WS

Model Risk in Financial Markets

The conditional posterior densities of the other parameters and hyperparameters are given by the following formulae:

p(bi | . . .) ∝ 

e bi Y i e −

i−1

1 + eβ0 +β1 (γ1 +

i=1 n ' i=1

j=1

eβ1 γ1



p(γ1 | . . .) ∝ 6 n

p(z1 , . . . , zn ) ∝

τ b2 i 2

1+e

n

1+e

e

i−1

i−1

j=1

j=1

zj +zi ui )+bi

zj +zi ui )

i−1

β0 +β1 (γ1 +

m i

Yi −γ1 α

i=1

β0 +β1 (γ1 +

e β 1 Yi (



zj +zi ui )+bi

zj +zi ui )+bi

j=1

m i

m i

n '

ziδ−1 e−δzi

i=1

2

p(β0 | . . .) ∝ 

p(β1 | . . .) ∝ e

enβ0 e−τ0 β0 /2 i−1

1 + eβ0 +β1 (γ1 +

−τ1 β12 /2



eβ1

j=1

n

i=1

zj +zi ui )+bi i−1

Yi (γ1 +

j=1

i−1

1 + eβ0 +β1 (γ1 +

m i

j=1

zj +zi ui )

zj +zi ui )+bi

m i .

Sampling from the conditional posterior distributions can be realized either using a griddy Gibbs approach [Ritter and Tanner (1992)], or an adaptive sampling algorithm [Gilks and Wild (1992)] and [Wild and Gilks (1993)]. The marginal posterior density of each parameter is obtained by integrating out the other parameters from the joint posterior density given by (13.11). This is difficult to achieve analytically, therefore I propose the use of Gibbs sampling ([Geman and Geman (1984)]) and its upgraded form Adaptive Rejective Sampling ([Gilks and Wild (1992)]) for generation of the marginal posterior distributions. For all applications I have used the WinBUGS package. For all parameters, 95% credible intervals can be computed from the samples of observations generated from the posterior densities, and these can be then used in testing specific hypotheses about the parameters. As outlined above, following [Spiegelhalter et al. (2002)], I shall use the Deviance Information Criterion (DIC) to choose among different models fitted to the same data set. The model with the smallest DIC is estimated to be the model that would best predict a replicate dataset of the same structure as the data actually observed.

page 300

April 30, 2015

14:13

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

13.5 13.5.1

Carte˙main˙WS

301

Standard&Poor’s Rating Data Data description

I exemplify the methodology developed in the previous section with the analysis of rating data from Standard&Poor’s (S&P). Investigating the relationship between default probabilities and credit ratings based on historical data from rating agencies is an insightful endeavour. This is partly due to the fact that rating agencies such as S&P have rating systems differing from the internal systems of most large banks. Many banks estimate default risk over a short horizon of typically one year, while S&P estimates default risk over a long horizon varying by borrower. Moreover, the agencies assign ratings based on a “stress scenario” for the borrowers. As [Carey and Hrycay (2001)] note, that estimate is close to the estimate of the borrower’s default probability at the time of rating assignment only if the borrower is already in the stress scenario. The agencies’ rating procedures assign high risk grades such as C and B to borrowers that are already in a risky condition. This situation induces an asymmetry between the rating and the associated probability of default that requires careful modelling. The source of the data set that I analyze here is the CreditPro 7.0 database of long–term local currency issuer credit ratings. The data pooled the information from 11,150 companies that were rated by S&P as of 31 December 1980, or that were first rated between that date and 31 December 2004. Among these rated issuers there were industrials, utilities, financial institutions, and insurance companies around the world. Public information ratings as well as ratings based on the guarantee of another company were not taken into consideration. The data also did not include structured finance vehicles, public–sector issuers, subsidiaries whose debt was fully guaranteed by a parent or whose default risk was considered identical to that of their parents, as well as sovereign issuers. The data set contains information on seven rating categories: AAA, AA, A, BBB, BB, B, and CCC/C. The number of issuers and the number of defaults in each rating category are available for every year during the 24 year horizon. The classification of an issuer into a credit rating category reflects S&P’s opinion of a company’s overall capacity to pay its obligations, focusing on the obligor’s ability and willingness to meet its financial commitments on a timely basis. It is hoped that the rating generally indicates the likelihood of default regarding all financial obligations of the firm. Note, however, that a company may not have rated debt but a probability

page 301

April 28, 2015

12:28

302

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

of default may still be needed for various financial calculations. The definition of default varies from one credit rating agency to another. In this data set, a default has been recorded by Standard&Poor’s upon the first occurrence of a payment default on any financial obligation, rated or unrated, other than a financial obligation subject to a bona fide commercial dispute2 . The analysis presented here is more pedagogical for the purposes of highlighting the problems that appear when calibrating credit risk models.

13.5.2

Hierarchical model for aggregated data

The first analysis of the Standard&Poor’s data considers the aggregate number of defaults over the horizon 1981–2004. The credit risk industry is divided on the issue whether ratings are cross–sectionally independent. While this can be a matter for debate, here the view of Credit Suisse Financial Products (1997), [Wilson (1997)] and [Nickell et al. (2000)] is taken that independence can be assumed, at least in the first instance. To the aggregate data I fit the model (13.8) characterised by the logistic link, as well as two other similar models using the probit and log(-log) link functions specified by (13.9) and (13.10). Using standard Bayesian applied statistical modelling I utilise diffuse but proper priors for all parameters. Hence, N (0, 103 ) priors are taken for the regression parameters β0 and β1 . To investigate the impact of the choice of prior variance, I carried out a sensitivity analysis by running the chains with different prior Gaussian distributions with variances ranging between 103 and 106 . The results were broadly similar, hence here I report only the summary statistics based on the runs with N (0, 103 ) priors. I also specified a gamma prior with large variance Gamma(1, 0.1) for zi , i = 1, . . . , 7, and a diffuse inverse gamma prior Inv − Gamma(1, 0.1) for the random effects variance σ 2 . For each model two parallel chains were started with different sets of initial values, and the Gibbs sampler was run for 50,000 iterations with the first 20,000 iterations discarded as a burn–in period. Gelman and Rubin’s diagnostic ([Gelman et al. (1995)]) indicated satisfactory convergence of the chains. After convergence, inference on the parameters of interest β0 , β1 , σ 2 and p1 , . . . , p7 was based on the pooled sample iterations of both chains. 2 This is not true when an interest payment missed on the due date is made within the grace period.

page 302

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

303

Fig. 13.4: Posterior kernel density estimates for investment grade default probabilities using the logistic link model and the S&P data for the aggregate number of defaults over the horizon 1981–2004.

Figures 13.4 and 13.5 give the posterior densities estimates for the default probabilities p1 , . . . , p7 in the model with the logistic link function. These were computed using a kernel density estimator with bandwidth equal to a quarter of the range of each variable. Remark that while the likelihood distribution is the same for all ratings, the posterior distribution looks very different across ratings, with one tail distributions for higher rat-

page 303

April 28, 2015

12:28

304

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

ings for which very little or no observed default data was available to more symmetric distributions for lower credit ratings, where there was more data available. The posterior distributions seem to separate the ranges of posterior probabilities of default, but clearly there is estimation uncertainty that can be accounted for easily in a Bayesian set-up.

Fig. 13.5: Posterior kernel density estimates for non-investment grade default probabilities using the logistic link model and the S&P data for the aggregate number of defaults over the horizon 1981–2004.

The posterior means, standard errors, medians, and 95% credible in-

page 304

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

305

tervals for all model parameters are reported in Table 13.6. None of the 95% credible intervals contain zero and therefore all parameters are highly significant. The models can be compared using the deviance information

Table 13.6: Bayesian MCMC posterior estimates from the Standard&Poor’s data for the aggregate number of defaults over the horizon 1981–2004. The hierarchical model (13.8) is estimated with the logit, probit, and log(-log) link functions.

Logit link (DIC = 42.607)

Probit link (DIC = 42.974)

Log–log link (DIC = 39.913)

Mean

standard deviation

Median

Credibility intervals (2.5% − −97.5%)

β0

-11.930

0.640

-11.930

(-13.060, -10.770)

β1

1.630

0.128

1.617

(1.391, 1.877)

σ2

0.389

0.195

0.3395

(0.164, 0.899)

β0

-5.031

0.375

-5.103

(-5.622, -4.261)

β1

0.654

0.051

0.647

(0.573, 0.759)

σ2

0.275

0.107

0.251

(0.140, 0.544)

β0

-11.560

0.720

-11.61

(-13.140, -10.130)

β1

2.118

0.199

2.079

(1.809, 2.499)

σ2

0.373

0.177

0.330

(0.162, 0.819)

criterion (DIC). The values for this criterion, also reported in Table 13.6, show that the third model based on the log(-log) link has the best fit for predictive purposes. However, the differences in the DIC values are so small that from a practical perspective all three models have comparable goodness-of-fit. Statistics of the posterior probability distributions have also been computed for the probabilities of default p1 , . . . , p7 in each rating category. Table 13.7 reports the means, standard errors, medians, and 95% credible intervals for pi , i = 1, . . . , 7, under all three models. The posterior means of p1 , . . . , p7 differ little between models with different link functions, especially for default probabilities corresponding to the lower rating categories. This is consistent with the fact that all three models have similar predictive

page 305

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

306

Carte˙main˙WS

Model Risk in Financial Markets

Table 13.7: Bayesian MCMC posterior estimates of default probabilities p1 , . . . , p7 for each rating category, obtained from fitting the Bayesian hierarchical model to the Standard&Poor’s 1981–2004 aggregated data with the logit, probit, and log–log link functions. Default prob

Mean

Standard deviation

Median

Credibility intervals (2.5% − −97.5%)

p1

5.453e-5

5.419e-5

3.974e-5

(4.22e-6, 1.85e-4)

p2

1.295e-4

8.628e-5

1.091e-4

(2.42e-5, 3.42e-4)

Logit link

p3

4.295e-4

1.452e-4

4.113e-4

(1.93e-4, 7.46e-4)

p4

0.002888

4.327e-4

0.002868

(0.0021, 0.0038)

p5

0.01197

0.001034

0.01194

(0.0100, 0.0140)

p6

0.05708

0.002281

0.05704

(0.0528, 0.0616)

p7

0.2879

0.01258

0.2876

(0.2637, 0.3122)

p1

2.118e-5

4.185e-5

6.174e-6

(5.77e-8, 1.34e-4)

p2

1.135e-4

9.24e-5

9.023e-5

(9.76e-6, 3.45e-4)

p3

4.212e-4

1.428e-4

4.029e-4

(1.92e-4, 7.48e-4)

p4

0.002932

4.374e-4

0.002912

(0.0021, 0.0038)

p5

0.01202

0.001055

0.01196

(0.0100, 0.0142)

p6

0.05706

0.002284

0.05702

(0.0526, 0.0617)

p7

0.2876

0.01287

0.2874

(0.2625, 0.313)

p1

4.161e-5

3.393e-5

3.225e-5

(6.46e-6, 1.31e-4)

p2

1.226e-4

7.706e-5

1.041e-4

(2.73e-5, 3.22e-4)

p3

4.67e-4

1.535e-4

4.508e-4

(2.156e-4, 8.13e-4)

p4

0.002871

4.253e-4

0.002854

(0.0021, 0.0037)

p5

0.01197

0.001038

0.01195

(0.0099, 0.0141)

p6

0.05708

0.002223

0.05705

(0.0528, 0.0615)

p7

0.2878

0.01263

0.2877

(0.2634, 0.3126)

Probit link

Log–log link

power, as indicated by the DIC values. Note also that all the three models estimate positive probabilities of

page 306

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

307

default for superior rating categories such as AAA and AA. Recent events in the corporate world strongly suggest that even AAA companies may be faced with sudden collapse, and thus even the AAA category should not be considered a riskless rating class. For analysis purposes, the lack of observed default data for superior ratings should not impose a zero default probability for these ratings, and the models developed here are flexible enough to take this into account. Furthermore, the model estimation risk associated with the probabilities of default for different credit ratings is captured here elegantly with the 95% credible intervals.

Fig. 13.6: Posterior kernel density estimate for the ratio between the cumulative default probability in the speculative grade categories and the cumulative default probability in the investment grade categories. Aggregated Standard&Poor’s 1981–2004 data, logistic model.

Once the model has been fitted, it is straightforward to estimate quantities of interest other than the default probabilities for each rating class. For example, interest may focus on the ratio between the cumulative default probability in the speculative grade categories and the cumulative default

page 307

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

308

Carte˙main˙WS

Model Risk in Financial Markets

probability in the investment grade categories, given by δ=

p5 + p 6 + p 7 . p1 + p 2 + p 3 + p 4

(13.14)

For the model with the logistic link function, the posterior mean of δ is 103.6 with a 95% credible interval (78.6, 136.4). The posterior density estimate of δ is given in Fig. 13.6. Hence, the probability of default for an investment grade bond is almost 100 times smaller than the probability of default for a speculative grade bond. The ratio between the two probabilities of default for investment versus speculative types of bond may vary between 60 times to 180 times. This variation is nothing but a manifestation of parameter estimation risk. 13.5.3

Hierarchical time-series model

Since the yearly frequency of defaults in each rating category are available in the S&P ratings data, it is relevant to attempt an analysis that can take into account any serial correlation over consecutive years. Let T be the length of the time horizon (here T = 24), and let t = 1, ..., T be the yearly observation times. I extend the notation to let Yit , mit and pit denote respectively the number of defaults, number of issuers, and probability of default in rating category i at time t. Then I suggest the following model: Yit | mit , pit ∼ Binomial(mit , pit ),

i = 1, . . . , n,

t = 1, . . . , T

logit(pit ) = β0 + β1 Ri + bit bit = abi(t−1) + εit

(13.15)

Ri | γi , γi+1 ∼ U (γi , γi+1 ) γi+1 = γi + zi εit | σ 2 ∼ N (0, σ 2 ) This model extends model (13.8) by using the parameter a ∈ R to account for a possible autoregressive correlation structure of the random terms bit in (13.15). The rating variables Ri do not depend on time and, as previously, have a uniform distribution on (γi , γi+1 ). The model specified by (13.15) was fit to the yearly data using non–informative prior distributions N (0, 103 ) for β0 , β1 , and a, and inverse Gamma(0.1, 0.1) for σ. Table 13.8 reports the posterior estimates of all parameters.

page 308

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

309

Table 13.8: Bayesian estimates for Standard&Poor’s yearly rating data on all corporates between 1981–2004; parameters of the time series model (13.15). Parameter

Mean

SD

Median

95% Credibility Interval

β0

-13.020

0.781

-13.000

(-14.42, -11.21)

β1

1.834

0.120

1.829

(1.619, 2.085)

a

0.159

0.147

0.160

(-0.111, 0.452)

σ2

0.697

0.077

0.694

(0.557, 0.857)

Because the estimated posterior mean of a is 0.159, and the 95% credible interval (−0.111, 0.452) contains zero, I cannot thus reject the null hypothesis that a = 0, and hence I conclude that under this model specification there is no significant autoregressive structure in the yearly data. As we have seen in Sec. 7.6 many times in Finance the analyst may draw the wrong conclusion because of a spurious relationship. The Bayesian analysis may help to circumvent this problem by allowing us to look at the entire posterior density of a parameter of interest and therefore have a better informed view on the significance of that parameter. It would be interesting to see the results of the analysis for each credit rating category. For AAA rated companies the Bayesian model is likely to predict positive default probabilities even if there were no observed defaults.

13.5.4

Hierarchical model for disaggregated data

Since the analysis in the previous subsection has not detected any autoregressive correlation structure, it is reasonable to also fit the model (13.8) to the yearly default data, rather than only to the aggregated data for 1981– 2004. This analysis may reveal trends in the probabilities of default for each rating class that are not apparent from the aggregate results. Thus, I applied the model (13.8) to the default data for the 24 yearly observation periods. The non-informative prior distributions on model parameters were chosen as described in Sec. 13.4.1. Convergence of the Markov chains was assessed again after the first 20,000 iterations, and a further 30,000 iterations were retained for inference. Figures 13.7 and 13.8 plot the posterior means of default probabilities in each rating category for each year, together with the 95% credible limits and with the observed values.

page 309

April 28, 2015

12:28

310

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Fig. 13.7: Observed values and posterior means with credible intervals for investment grade default probabilities, Standard&Poor’s yearly rating data on all corporates between 1981–2004.

page 310

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

311

Fig. 13.8: Observed values and posterior means with credible intervals for non-investment grade default probabilities, Standard&Poor’s yearly rating data on all corporates between 1981–2004.

page 311

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

312

Carte˙main˙WS

Model Risk in Financial Markets

As expected, the estimated default probabilities are very close to the observed default rates, especially for lower rating categories. The boundaries of the credible intervals for probabilities of default offer valuable information about parameter estimation risk over time and they can be very useful in a stress-testing framework. The yearly plots emphasize the increase in the estimated default probabilities for all rating categories in 1990 and 2001. This is consistent with the observed high default rates for speculative grade bonds in these periods, and also with previous results that show high correlations among default probabilities. The matrix of pairwise correlations between p1 ,. . . , p7 is given in Table 13.9. As expected the correlations are all positive and are very high between pairs of adjacent default probabilities and not so strong between investment grades and speculative grades default probabilities.

Table 13.9: Bayesian MCMC posterior estimates of correlations of probabilities of default, based on the logistic link model, Standard&Poor’s yearly rating data on all corporates between 1981–2004. p1

p2

p3

p4

p5

p6

p7

p1

1.000

0.968

0.939

0.584

0.673

0.316

0.152

p2



1.000

0.942

0.576

0.669

0.398

0.241

p3





1.000

0.679

0.779

0.488

0.363

p4







1.000

0.766

0.579

0.583

p5









1.000

0.598

0.482

p6











1.000

0.647

p7













1.000

Bayesian analysis offers great modelling flexibility. I cannot think of other frameworks that will allow the calculation of correlations among default probabilities of different rating grades, and with such elegant simplicity. It is also important to notice that even when the likelihood parts for different probabilities of default imply tacitly independent stochastic structures across these quantities, it is possible that the posterior inference of all seven parameters indicates the presence of mutual correlations. This is another great advantage of Bayesian modelling. Possible lack of knowledge about the independence structures in the model is rapidly corrected when

page 312

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

313

combined with data. Therefore, rather than starting from zero every time the model is calibrated, Bayesian updating offers a mechanism to improve our subjective knowledge as more data becomes available.

13.6

Further Credit Modelling with MCMC Callibration

In this section the aim is to implement the Bayesian Credit Portfolio Model developed by [Gossl (2005)]. Let Xi be the generic one-year asset return. For a portfolio of N credit risky instruments Xi =



ρY +

 1 − ρZi , i = 1, . . . , N

i.i.d. ρ ∈ [0, 1], Y ∼ N (0, 1), Zi ∼ N (0, 1) ρ accounts for the intra-portfolio dependencies within the portfolio. Then, it is easy to see that P(Xi < ki ) = pi =⇒ ki = Φ−1 (pi )

−1 √ Φ (pi ) − ρ × y √ P(Xi < ki |Y = y) = pi|y =⇒ pi|y = Φ 1−ρ For K rating classes and T years of data define • Lt = (Lt,1 , . . . , Lt,K ) is the vector of defaults at the end of year t • nt = (nt,1 , . . . , nt,K ) is the vector of rated issuers at the beginning of year t • p = (p1 , . . . , pK ) is the vector of the probability of defaults from each rating category

p(Lt = lt |nt , p, ρ, yt ) =

K '

Binomial(nt,j , pj|yt , lt,j )

(13.16)

j=1

p(p, ρ, y|n, l) ∝

T '

p(Lt = lt |nt , p|y , ρ, yt )p(p)p(ρ)p(y)

t=1

i.i.d. pj ∼ U (0, 1),

i.i.d. ρ ∼ U (0, 1), yt ∼ N (0, 1)

(13.17)

Another model that may be suitable is a Bayesian Panel Count Data model

page 313

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

314

Carte˙main˙WS

Model Risk in Financial Markets

specified as follows Lt,j |n, pt,j ∼ Binomial(nt,j , pt,j )

−1 √ Φ (pj ) − ρ × yt √ pt,j = Φ 1−ρ pj = Φ(α + β × j), yt ∼ N (0, 1)

(13.19)

ρ ∼ U (0, 1), α ∼ N (0, 0.0001), β ∼ N (0, 0.0001)

(13.21)

(13.18)

(13.20)

The inference results are gathered from applying MCMC and after a 70,000 burn-in period I retain a sample of 20,000. Various posterior estimates for the probabilities of default are presented in Table 13.10. It is reassuring to see that the posterior means and medians obey the rating monotonicity. The advantage of this model is that information is used jointly on probabilities across time and cross-sectionally across ratings. Furthermore, Bayesian analysis employing MCMC techniques allows gathering additional information on the probabilities of default such as cross-correlations and credibility intervals. This type of inference would be really hard to obtain with other statistical methods. Table 13.10: Bayesian MCMC posterior inference for the Bayesian Panel Count Data model using Standard&Poor’s yearly rating data on all corporates between 1981–2004. node α β p[1] p[2] p[3] p[4] p[5] p[6] p[7] ρ

mean -5.727 0.6989 2.99E-07 8.30E-06 1.49E-04 0.0017 0.0129 0.0629 0.2024 0.07335

s.d. 0.1383 0.02067 2.00E-07 4.05E-06 5.17E-05 4.17E-04 0.002154 0.00745 0.01778 0.02403

2.50% -5.996 0.6581 7.18E-08 3.00E-06 7.38E-05 0.001072 0.009276 0.04949 0.1687 0.03841

median -5.73 0.6989 2.44E-07 7.37E-06 1.40E-04 0.0017 0.0127 0.0623 0.2018 0.06918

97.50% -5.451 0.7395 8.40E-07 1.86E-05 2.74E-04 0.0027 0.0178 0.0792 0.2393 0.1328

In Fig. 13.9 I present the evolution of the yearly factor yt between 1981 and 2004. Negative values of the yearly factor are associated with an overall increase of credit risk whereas positive values of the yearly factor correspond to a view less risky credit environment. Notice that the yearly factor evolution indicate a a stationary kind of pattern and also a mean reverting behaviour around the level of zero. Hence, as of 2004, the increasing pos-

page 314

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

315

itive local trend between 2002 and 2004 may increase for few more years but history shows that there will be a mean reversion not far away.

2.50%

post mean

97.50%

4

3

2

1

0

1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

-1

-2

-3

Fig. 13.9: The posterior mean and credible interval for the yearly factor yt of the Bayesian Panel Count Data model and Standard&Poor’s data between 1981 and 2004.

In Fig. 13.10 I depict the posterior correlation across the estimated probabilities of default. It is evident that there is strong correlation between the adjacent parameters as indicated by the dark region around the main diagonal. There is also stronger correlation between superior ratings, that is between probabilities of default p1 , p2 , p3 , associated with superior ratings. One possible explanation is the lack of default data in all these categories. There is also very weak correlation of estimates of probabilities of default across the second diagonal. This shows that there is less interaction between the investment grade categories and speculative categories.

page 315

April 28, 2015

12:28

316

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Fig. 13.10: Correlation matrix of probabilities of default p1 , p2 , . . . , p7 corresponding to the Standard&Poor’s seven rating categories: AAA, AA, A, BBB, BB, B, and CCC/C. The data used is the Standard&Poor’s yearly rating data on all corporates between 1981–2004.

13.7

Estimating the Transition Matrix

The ultimate concept in credit risk management is the credit transition matrix since it conveys not only the probabilities of default but also the probabilities of rating upgrade and rating downgrade. The Bayesian modelling can be expanded to cover rating transition matrices. 13.7.1

MCMC estimation

Estimating the rating transition matrix is important in credit markets and it is notoriously difficult, see [Nickell et al. (2000)], [Berd (2005)] and [Engelman and Ermakov (2011)]. Denoting by • Lt,i = (Lt,i,1 , . . . , Lt,i,K ) the vector of the number of assets that moved over year t from rating i, and • pt,i = (pt,i,1 , . . . , pt,i,K ) the vector of the probability of defaults from

page 316

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

MCMC Estimation of Credit Risk Measures

317

each rating category: p(Lt,i = lt,i |nt,i , pt,i ) = Multi(nt,i , pt,i )

(13.22)

αt,i,j pt,i,j = K , αt,i,j = eat,i,j j=1 αt,i,j

(13.23)

δi,j , at,i,i = 0 (1 + |i − j|)

(13.24)

at,i,j = bi,j

bi,j ∼ N (0, 0.001)

(13.25)

Table 13.11: Posterior means of parameters b. Their value on the main diagonal is not relevant because of the identifiability constraint. bi,j AAA AA A BBB BB B CCC

AAA -9.979 -22.69 -34 -38.4 -66.94 -45.65

AA -4.952 -7.494 -18.04 -27.83 -35.29 -54.2

A -15.82 -4.821 -6.171 -16.35 -23.74 -25.56

BBB -28.23 -15.06 -5.523 -5.332 -16.72 -19.55

BB -37.35 -29.52 -15.97 -5.905 -5.28 -10.8

B -62.25 -33.69 -25.29 -14.08 -4.664

CCC -68.68 -50.97 -39.82 -24.47 -13.17 -5.694

-3.145

D -74.78 -63.77 -45.95 -28.34 -16.6 -7.646 -0.9696

The Bayesian MCMC inference results summarised in Table 13.11 reveal that the posterior mean is negative for the parameter b for all possible pairs of ratings. Moreover, the estimates are larger for pairs that are more extreme, such as AAA and CCC. For a given rating category, there is a monotony of estimated values for parameter b in both directions. Table 13.12: Posterior medians of transition probabilities using data from Standard&Poor’s between 1981-2004. pi,j AAA AA A BBB BB B CCC Default

AAA 0.9160 0.0062 4.8E-04 1.9E-04 4.0E-04 1.7E-05 9.1E-04 1.0E-04

AA 0.0771 0.9045 0.0216 0.0022 8.1E-04 7.2E-04 9.8E-05 1.1E-04

A 0.0047 0.0812 0.9133 0.0409 0.0036 0.0022 0.0034 4.4E-04

BBB 8.3E-04 0.0059 0.0577 0.8964 0.0579 0.0031 0.0042 0.0031

BB 5.6E-04 5.8E-04 0.0045 0.0468 0.8327 0.0587 0.0148 0.0131

B 4.4E-05 0.0011 0.0016 0.0082 0.0809 0.8229 0.1111 0.0644

CCC 6.9E-05 1.9E-04 3.2E-04 0.0019 0.0103 0.0477 0.5348 0.3292

page 317

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

318

Carte˙main˙WS

Model Risk in Financial Markets

Table 13.12 provides the posterior means of the elements of the transition matrix with seven ratings. Overall the estimated values are similar to what has been produced previously in the literature on rating transition estimation. Considering in more detail this table, it can be observed that the posterior probabilities do preserve the monotony across ratings and that the probabilities of default for investment grade ratings are very small. This is not surprising since the model and the results reflect the fact that the estimation is done over aggregated data over a long period. This also highlights one major difference between information on probabilities of default conveyed by credit ratings and by credit default spread rates. The former is a view over the business cycle whereas the latter gives the view in the short term, that is to the immediate point. 13.7.2

MLE estimation

It would be useful to compare our Bayesian MCMC method described above with a standardised procedure such as maximum likelihood. [Bangia et al. (2002)] and [Hu et al. (2002)] describe the derivation of the MLE for the credit transition matrix leading to the following formulae: T T  lt,i,j lt,i,j wi (t) = t=1 (13.26) pi,j = T n i,t t=1 t=1 ni,t nt,i 1 (13.27) = wi (t) = T T t=1 nt,i pi,j =

T 1  lt,i,j , T t=1 nt,i

(13.28)

Table 13.13: Maximum likelihood estimators of transition probabilities using data from Standard&Poor’s on all corporates between 1981-2004. pi,j AAA AA A BBB BB B CCC Default

AAA 0.91646 0.00621 0.00049 0.00021 0.00041 0 0.00088 0

AA 0.07721 0.90460 0.02156 0.00223 0.00083 0.00074 0 0.00010

A 0.00484 0.08117 0.91330 0.04101 0.00363 0.00223 0.00353 0.00044

BBB 0.00091 0.00600 0.05775 0.89638 0.05794 0.00317 0.00441 0.00311

BB 0.00060 0.00060 0.00448 0.04682 0.83268 0.05876 0.01501 0.01316

B 0 0.00110 0.00167 0.00824 0.08096 0.82296 0.11132 0.06437

CCC 0 0.0002 0.00034 0.00200 0.01036 0.04775 0.53534 0.32950

page 318

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

MCMC Estimation of Credit Risk Measures

Carte˙main˙WS

319

Comparing the estimates in Tables 13.12 and 13.13 it can be observed that the maximum likelihood procedure may produce zero estimates while the Bayesian model does not. From the point of view of magnitude of values there is agreement between the two approaches for this dataset. However, the maximum likelihood method does not allow the analyst to capture the estimation risk and other important relationships between estimators such as correlations. 13.8

Notes and Summary

The example discussed at length here points to some valuable lessons for the investment analytics, model validating and risk management teams at an investment bank. A refined framework within a Bayesian setup is described in [Stefanescu et al. (2009)]. Correlation plays a vital role in credit modelling, particularly portfolio credit modelling. Interesting discussions on correlation and the Gaussian copula are presented in [Morini (2011)]. It is difficult to say which copula should be used in practice and there is intensive research in that area, see www.gloria-mundi.com and www.defaultrisk.com. [Fabozzi and Tunaru (2007)] discuss a potential bias in the standard formula for pricing Nth-to-default credit contracts and they suggest that the bias increases with correlation. (1) It is preferable to look at a few models that are similar rather than focus all the attention on one specific model. (2) There are instances where it is difficult to have data available. It does not mean that we should remove those categories from the analysis or draw a hasty conclusion that the probability of observing an occurrence n those categories in the future will always be zero. These problems surface when inappropriate methods of inference are used. Bayesian analysis is perfectly capable of providing a solution to all these problems. (3) It is useful to try a battery of models and compare them against data. (4) MCMC techniques are superior to other methods of extracting inference in many ways. The most important advantage is that based on the same simulation output several problems can be solved at once. (5) MCMC methods are also suitable for periodic updating of results. The posterior distributions in the last model validation exercise become the prior distributions and new data is taken into account in the current model validation procedure.

page 319

April 28, 2015

12:28

320

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

(6) In Finance there is a necessity for stress testing and ‘what if’ scenario analysis. This involves considering unobserved situations. Bayesian analysis coupled with MCMC inference is at the moment the only framework where subject expert elicitation can be incorporated and ‘what if’ scenarios can be easily created by imposing restrictions on distribution parameters.

page 320

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Chapter 14

Last But Not Least. Can We Avoid the Next Big Systemic Financial Crisis? 14.1

Yes, We Can

Modelling is seen by many as an art. The Finance industry is driven by models and real financial losses may occur due to wrong modelling. There is a quick list of tasks that anyone involved with models can follow in order to minimize their own wrong doing. (1) Evaluate and verify assumptions behind the model. (2) Test new models against well-known models and against known results. (3) Occam’s razor principle. Ceteris paribus, the simpler model is always better. (4) Backtest and stress test your models against a wide range of scenarios using out-of-sample historical forecasts. (5) Understand the discrepancies and pitfalls of a given model without sweeping them under the carpet. (6) Re-calibrate and re-estimate your models periodically. In my opinion there are far too many models in financial literature and too little time is spent on fully understanding the intricacies of these models. Why should models on which important decisions are based and billions of dollars traded, be validated in-house only? How many of these models will pass an independent review? If we do not dare to do similar things with new medicine why is it acceptable to do it with finance? Quantitative finance specialists are sometimes called rocket-scientists because many Physics researchers migrated into the field of Finance. Would we board a rocket developed on the models an investment bank is using and let some of the traders give guidance from the control tower? A quant once said that a wrong statement cited four times becomes a theorem with a name. With 321

page 321

April 28, 2015

12:28

322

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

recent advances in computer power, the same incorrect result is implemented and applied a thousand times faster. A straightforward way to slow down the proliferation of models in Finance is to ask investment banks, hedge-funds and financial boutiques, to pay royalties to the scientists who developed the models. One dollar, or even one cent, per transaction where the model has been used. Model creators will then be more diligent since there will be an incentive to take more time and design models that work better in a variety of circumstances, and are easy to explain, implement, calibrate and so on. Users will be inclined to scrutinize the models that they are using as they will be paying for them. Further empirical comparative research is needed to backtest results across asset classes. Independent research and model validation can play a vital role in screening out models that carry deficiencies. A very important role is supposed to be played by regulators. Who appoints the regulators and what qualifies a person to be a regulator? Actually the regulators are most of the time non-specialists and I believe that it is right that they are “objective”, in the sense that they need to see the forest, but have little knowledge about each tree. Model risk can be controlled only if we convince the market participants to be self-interested in sharing information and to pro-actively manage this type of risk. Regulators can passively manage model risk by facilitating debates and promoting best practice. Academics can work on both sides and can offer independent advice that should not be neglected. At the same time, are the textbooks in line with the latest market practices, latest regulations, and latest product development? Quite often they are not and this is due to the fact that the innovation in financial markets is not always in the same space and time as academic research. What if the regulators created a body to approve the models being used in the industry, model creators were paid an infinitesimal sum per transaction, bankers used the model they felt was most appropriate but documented how parameter estimates were arrived at and given what inputs? Then model risk could be greatly reduced. The internet could facilitate this entire process in an elegant, productive and reliable manner.

14.2

No, We Cannot

Financial systems are complex. There are thousands of different financial instruments, financial regulations in different jurisdictions and millions of

page 322

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Last But Not Least. Can We Avoid the Next Big Systemic Financial Crisis?

323

market participants world-wide. It is virtually impossible to have an accurate description and understanding of the dynamics of financial securities prices, second by second1 . The only representation of this complex reality is via models. Models in Finance are not like models in Physics for a simple reason. The latter look at repeatable reality, the former do not because of human interactions. Hence, it is difficult to know whether modelling in Finance should proceed from empirical to theoretical, as it is done in physics, or from theoretical to empirical, as it is done in Biosciences. In my opinion, it is a combination of both, since Finance neither has very clear theoretical rules as in Biosciences nor can rely on homogeneous, repetitive, data occurring at different points in time as in Physics. Time itself contributes to changes in Finance realities. One thing is for sure. Finance can operate only through models and as soon as we talk about models we will get exposure to model risk. Therefore, I am not suggesting and I cannot advocate total elimination of model risk because it cannot be done. What is important is realising and recognising that this risk does exist. Then, it is important to be critical at the model development stage, considering all aspects of modelling (how many models proposed in finance showed how to estimate those parameters?), and then periodically validating models looking at new datasets and old datasets, backtesting and stress-testing. The evident and strong, but still esoteric, connection between regulators and market participants may cause distortions to any prescribed quantitative based risk management policy. In his famous public lecture at the Reserve Bank of Australia, Goodhart stated: “Any statistical relationship will break down when used for policy purposes”. [Danielsson (2002)] followed up with a corollary saying that “A risk model breaks down when used for regulatory purposes”. The main reason behind these statements is that as long as risk is modelled with behavioral equations that are assumed to be invariant under observation, the next financial crisis will always catch us on the wrong foot because it will be caused by new sources. [Danielsson (2002)] argued that since market data is endogenous to market behavior, any statistical inference made from data covering periods of stability will not be helpful in times of crisis. Thus, from a financial stability point of view risk measures such as VaR or ES may potentially give misleading information about risk, possibly even contributing indirectly to both idiosyncratic and systemic risk. 1 Algorithmic

trading is considering activities taking place at millisecond frequencies.

page 323

April 28, 2015

12:28

324

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Since the effects of regulation on the risk and value of a company are important, is it possible to design regulations that do not themselves compromise the companies that form the financial markets? [Brennan and Schwartz (1982a)] showed that it is possible to design a valuation model that takes into account the effects of the regulatory policy. However, there is still a long way to go to have a framework for regulation and model valuation and risk management that is integrated in a consistent manner at an industry level. There are educational barriers, incentive divergences and ultimately different risks faced by regulators, who are more interested in financial system stability overall, investors, who are more interested in returns for various levels of risk, and risk managers and auditors, who are interested in avoiding a repeat of previous crises. Even when this theoretical argument is solved there is still something fundamental that is impossible to overcome: the difference between theoretical and empirical. Consider a set of conditions C1 , . . . , Cm that must be satisfied by a risk measure in order for that risk measure to be unanimously accepted by academia, industry and regulatory bodies. Suppose that this measure ψ has been identified. For simplicity consider the situation of only two assets with risks X and Y and let us denote by ψ(X, Y )|Cj the fact that the risk measure ψ satisfies condition Cj for any two financial risks X and Y .  = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) two representaNow consider X tive samples for the population unknown risks X and Y . It is obvious to me that we can have ψ(X, Y )|Cj , ∀j ∈ {1, . . . , m} but there may exist i ∈ {1, . . . , m} such that  Y )  Ci ψ(X, In other words, a risk measure may satisfy all required properties but its sampling version may fail to satisfy all properties!?! As an example, VaR is subadditive under elliptical distributions, which include the Gaussian or normal distribution, but sampling error alone may invalidate this property for some given data. 14.3

A Non-technical Template for Model Risk Control

Here I attempt to set up a template for testing, identifying, eliminating and accounting for model risk. Having a routine in place is better than acting

page 324

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Last But Not Least. Can We Avoid the Next Big Systemic Financial Crisis?

325

on an ad-hoc basis. With time the template can be improved and adapted to the internal needs of the institution. In my opinion, the template should require answering a series of questions. It is also useful to adhere to some general principles first. One important idea is that hedging should be applied only if the cost of implementing the hedging model plus the extra cost produced by the model risk of the hedge are less than the initial risk against which the hedge is created. 14.3.1

Identify the type of model risk that may appear

(1) What is the model used for? • Pricing and Hedging. What kind of application is it? – Derivatives – Portfolio optimisation – Corporate applications • Risk measure calculation or forecasting. What kind of application is it? – – – –

Value-at-Risk Expected shortfall Credit counterparty risk Cash-flow models

(2) What type of data is used to estimate the parameters of the model? • Historical data • Current market data • Both historical and current market data (3) Are there any known problems with this type of model? • Yes. Could they affect your results in the future? • No. When was the last time the literature on this model was reviewed? (4) What statistical methods are used to estimate the parameters? (a) Have you seen other studies applying that method to similar cases? (b) Have you studied the sensitivity of results to parameter estimates? It is worth changing the parameters slightly and reporting the impact on pricing or risk calculations in a table. (5) Periodically backtest and stress-test your model results. • Any model that fails a backtest should be revalidated. • Any model that fails a stress-test should be put on surveillance.

page 325

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

326

Carte˙main˙WS

Model Risk in Financial Markets

14.3.2

A guide for senior managers

At institutional level it is important to have processes in place to support a sound system for model validation, review and rejection. Many institutions have already enforced this. The key processes for containing model risk are related to the categories enumerated next (1) Personnel. • • • •

Who are the people involved in the model validation and control? What is the expertise of each person involved? What are the responsibilities of each person involved? Ensure staff are updated periodically on the latest developments in financial modelling and model risk.

(2) Detailed documentation. Each model should have a complete set of documents detailing: • • • •

assumptions mathematical derivations computer code testing results before going live

(3) Model free checking. In many situations it is possible to verify the model against well-known relationships that are model free. These include put-call parity, forward no-arbitrage relationships, upper and lower boundaries, gradient and convexity properties and so on. (4) Benchmarking against other models. • Comparative results against standard industry models. • Discussion of advantages and possible disadvantages of the new model. (5) Stress testing. There should be evidence of what type of results the model will generate in extreme scenarios. There should be an a priori set of rules that will be followed depending on the outcomes of stress testing. (6) Outcomes of the model validation • All models perform well. Ideally! • There are minor problems with the models. Investigate! • There are major problems with one model or models. Create committee to analyse and cure the problem. • Periodic review.

page 326

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Last But Not Least. Can We Avoid the Next Big Systemic Financial Crisis?

14.4

327

There is Still Work to Do

Recalling the 1986 presidential speech at the American Finance Association, Fisher Black said: “In the end, a theory is accepted not because it is confirmed by conventional empirical tests, but because researchers persuade one another that the theory is correct and relevant.”

Convenience is often the enemy of progress. Researchers should try to criticise various models and techniques more. Only then the models that survive the test of time can be truly beneficial to us all. There are many facets of model risk that require more intensive research. Here are some of the ideas stemming out of the discussions in this book. [Alexander (2003)] pointed out that firm-wide enterprise model risk will focus on the aggregated portfolio and therefore model validation that occurs at an individual risk level will not capture crude assumptions that are made with respect to dependencies as an example, at the aggregated level. Therefore, model risk must be dealt with at individual lines of business but also globally. Hence, one could argue that model validation in a bank should be carried out in a macroscopic as well as a microscopic manner. What is the model risk measure most valid theoretically and most useful practically? There are new frameworks being proposed but we need more involvement and debate before settling on a particular methodology. The linkage between hedging and model risk has not been investigated very much so far. Is it right to argue that hedging is a form of protection against model risk? Moreover, incomplete markets bring in a particular type of model risk, that is model identification. Would it be right to select a suboptimal model that has a lower parameter estimation risk? Risk management has entered a new phase with direct implications for capital reserve calculations impacting on day to day operations. Should we look also at the “good” risk, that is the profit tail of the distribution? Questioning and debating more, and sharing openly more results can only improve financial modelling with respect to model risk. If, when you have read this book, you have also started questioning some of my statements, I have achieved my goal.

page 327

May 2, 2013

14:6

BC: 8831 - Probability and Statistical Theory

This page intentionally left blank

PST˙ws

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

page 329

Chapter 15

Notations for the Study of MLE for CIR process

Here are the notations used by [Tang and Chen (2009)]. ∞ Γ(c)  z k Γ(a + k)Γ(b + k) F (a, b, c; z) = Γ(a)Γ(b) k! Γ(c + k) k=0

ϑθ = 2κθ/σ 2 , ϑβ = 2κ/σ 2 , S1,ij =

δij = Δ|j − i|

∞  (e−κδij e−κΔ )l Γ(l + 1)Γ(ϑβ − 2) Γ(ϑθ − 1 + l) l=1

S2,ij C1,ϑ C2,ϑ

∞  (e−κδij e−κΔ )l Γ(l + 2)Γ(ϑβ − 2) = Γ(ϑθ + l) l=1

= (ϑθ − 1) 2e−κΔ (ϑθ + 1) − ϑθ (1 + e−κΔ ) −

ϑ2 + ϑθ ϑθ + θ (1 − e−κΔ ) ϑθ − 1 ϑθ − 1 (ϑθ − 1)(ϑ2θ + ϑθ ) (1 − e−κΔ ) = 2e−κΔ (ϑθ − 1) − ϑ2θ (1 − e−κΔ ) + ϑθ − 2

V4 (ϑ, Δ) =

e2κΔ [(1 − e−κΔ ){nC2ϑ − 2ϑθ (ϑθ − 1)(ϑθ − 2)(1 − e−κΔ ) nΔ n n   × (eκΔ S1,ij − S2,ij )t} + (1 − e−κΔ )2 i=1 j=i+1 n n  



×[

ϑ2θ [F (1, 1, ϑθ ; e−κδij ) − 1] − ϑθ e−κδij

i=1 j=1 n n  

−2





ϑ2θ [F (1, 1, ϑθ ; e−κδ(i−1)j ) − 1] − ϑθ eκΔ e−κδij



j=1 i=j+1

−2

n n  

[C1,θ e−κδij − ϑθ (ϑθ − 1)(ϑθ − 2) × (eκΔ S1,ij − S2,ij )]

i=1 j=i

329



April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

330

Carte˙main˙WS

Model Risk in Financial Markets

B3 (ϑ, Δ) = −

n n  eκΔ (1−e−κΔ )   ( (ϑθ −1)e−κδij −ϑ2θ [F (1, 1, ϑθ ; e−κδij )−1] n i=1 j=1

+

n n   

−ϑθ e−κδij e−κΔ +ϑ2θ [F (1, 1, ϑθ ; e−κδ(i−1)j )−1]



j=1 i=j+1

+

n n  

{C1,ϑ e−κδij −ϑθ (ϑθ −1)(ϑθ −2)(eκΔ S1,ij −S2,ij )})

i=1 j=i

+

Δ V4 (ϑ, Δ) 2

1 − e−κΔ 2 σ 2(ϑθ − 1) ϑθ 2 ] V5 (ϑ, Δ) = 2 [1 + κΔ ϑβ e −1

B4 (ϑ, Δ) =

V6 (ϑ, Δ) = A1 (ϑ, Δ)2 Z1 (ϑ, Δ) + A2 (ϑ, Δ)2 Z2 (ϑ, Δ) 2Δσ 2 σ 2 eκΔ 2κ − A1 (ϑ, Δ) = , A2 (ϑ, Δ) = − 2 −2κΔ Δκ 1−e Δ (1 − e−2κΔ ) ϑθ − 1 (1 − e−κΔ ) Z1 (ϑ, Δ) = ϑθ 1 1 ϑβ e−κΔ −2κΔ [1 + {12e + (12ν + 48) Z2 (ϑ, Δ) = 2 4β3 1 + e−κΔ c(ϑ, Δ)(ϑθ − 1) 2 ϑβ + (3ν 2 + 12ν) (ϑθ − 1)(ϑθ − 2)c(ϑ, Δ) 2(ϑθ + ϑθ e−κΔ − 2e−κΔ ) }] − (1 + e−κΔ )(ϑθ − 1) c(ϑ, Δ) = 2ϑβ (1 − e−κΔ )−1 ,

ν = 2ϑθ .

page 330

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Bibliography

Abadir, K. (2005). The mean-median-mode inequality: Counterexamples, Econometric Theory 21, pp. 477–482. Acerbi, C. (2004). Risk Measures for the 21st Century, chap. Coherent Representations of Subjective Risk-aversion (Wiley, New York), pp. 147–207. Acerbi, C. and Szekely, B. (2014). Back-testing expected shortfall, Risk November. Acerbi, C. and Tasche, D. (2002). On the coherence of expected shortfall, Journal of Banking and Finance 26, pp. 1487–1503. Agresti, A. (2002). Categorical Data Modelling (Wiley, New York). Ait-Sahalia, Y. (1996). Testing continuous-time models of the spot interest rate, Review of Financial Studies 9, pp. 385–426. Ait-Sahalia, Y. (2002). Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach, Econometrica 70, 1, pp. 223–262. Alexander, C. (2003). The present and future of financial risk management, Journal of Financial Econometrics 3, pp. 3–25. Alexander, C. and Sarabia, J. (2012). Quantile uncertainty and value-at-risk model risk, Risk Analysis 32, 8, pp. 1293–1308. Amin, K. (1993). Jump-diffusion valuation in discrete-time, Journal of Finance 48, 5, pp. 1833–1863. Andersen, L. and Andreasen, J. (2001). Factor dependence of bermudan swaptions: fact or fiction? Journal of Financial Economics 62, 1, pp. 3–37. Andrews, D. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space, Econometrica 68, 2, pp. 399–405. Angelidis, T. and Degiannakis, S. (2007). Backtesting VaR models: A two-stage procedure, Journal of Risk Model Validation 1, 2, pp. 27–48. Angelidis, T. and Degiannakis, S. (2009). New Econometric Modelling Research, chap. Econometric modelling of value- at- risk (Nova, New York), pp. 9–60. Artzner, P., Delbaen, F., Eber, J., and Heath, D. (1999). Coherent measures of risk, Mathematical Finance 9, 3, pp. 203–228. Artzner, P. and Heath, D. (1995). Approximate completeness with multiple martingale measures, Mathematical Finance 5, pp. 1–11. 331

page 331

April 28, 2015

12:28

332

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Aruoba, S. B. and Fernandez-Villaverde, J. (2014). A comparison of programming languages in economics, working paper, SSRN. Athreya, K. and Fukuchi, J. (1994). Bootstrapping extremes of i.i.d random variables, in Proceedings of the Conference on Extreme Valute Theory (NIST). Athreya, K. B. (1987). Bootstrap of the mean in the infinite variance case, The Annals of Statistics 15, 2, pp. 724–731. Avellaneda, M., Levy, A., and Paras, A. (1995). Pricing and hedging derivative securities in markets with uncertain volatilities, Applied Mathematical Finance 2, pp. 73–88. Avramov, D. (2002). Stock return predictability and model uncertainty, Journal of Financial Economics 64, pp. 423–458. Babu, J. (1984). Bootstrapping statistics with linear combinations of chi-squares as weak limit, Sankhya 46, pp. 86–93. Ball, C. and Torous, W. (1996). Unit roots and the estimation of interest rate dynamics, Journal of Empirical Finance 3, pp. 215–238. Ball, C. A. and Torous, W. N. (1983). A simplified jump process for common stock returns, Journal of Financial and Quantitative Analysis 18, 1, pp. 53–65. Ball, C. A. and Torous, W. N. (1985). On jumps in common stock prices and their impact on call option pricing, The Journal of Finance 20, 1, pp. 155–173. Bams, D., Lehnert, T., and Wolff, C. (2005). An evaluation framework for alternative var- models, Journal of International Money and Finance 24, pp. 944–958. Bangia, A., Diebold, F., Kronimus, A., Schagen, C., and Schuermann, T. (2002). Rating migration and the business cycle, with application to credit portfolio stress testing, Journal of Banking and Finance 26, pp. 445–474. Bao, Y. and Ullah, A. (2004). Bias of a value-at-risk estimator, Finance Research Letters 1, pp. 241–249. Barrieu, P. and Scandolo, G. (2013). Assessing financial risk model, arXiv.org, arXiv:1307.0684. Barry, C., French, D., and Rao, R. (1991). Estimation risk and adaptive behavior in the pricing of options, Financial Review 26, pp. 15–30. Basawa, I., Malik, A., McCormick, W., Reeves, J., and Taylor, R. (1991). Bootstrapping unstable first-order autoregressive processes, Annals of Statistics 19, 2, pp. 1098–1101. Basel (2006). International convergence of capital measurement and capital standards, Tech. rep., Basel Committee on Banking Supervision. Basel (2010). Basel III: A global regulatory framework for more resilient banks and banking systems, Tech. rep., Basel Committee on Banking Supervision, revised June 2011. Basel (2011). Revisions to the Basel II market risk framework, Tech. rep., Basel Committee on Banking Supervision. Basel (2013). Fundamental review of the trading book: A revised market risk framework, Tech. rep., Basel Committee on Banking Supervision. Basu, S. and DasGupta, A. (1992). The mean, median and mode of unimodal distributions: A characterization, Technical Report 92-40, Department of

page 332

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

333

Statistics, Purdue University. Battig, R. (1999). Completeness of securities market models-an operator point of view, The Annals of Applied Probability 9, 2, pp. 529–566. Battig, R. J. and Jarrow, R. A. (1999). The second fundamental theorem of asset pricing: A new approach, The Review of Financial Studies 12, 5, pp. 1219–1235. Bawa, V., Brown, S., and Klein, R. (eds.) (1979). Estimation Risk and Optimal Portfolio Choice (North-Holland, Amsterdam). Baz, J. and Chacko, G. (2004). Financial Derivatives (Cambridge University Press, Cambridge). Beckers, S. (1981a). A note on estimating the parameters of the jump-diffusion model of stock returns, Journal of Financial and Quantitative Analysis 16, pp. 127–140. Beckers, S. (1981b). Standard deviations implied in option prices as predictors of future stock price variability, Journal of Banking and Finance 5, 3, p. 363381. Beder, T. (1995). VaR: Seductive but dangerous, Financial Analysts Journal 51, 5, pp. 12–24. Bensaid, B., Lesne, J., Pag`es, H., and Scheinkman, J. (1992). Derivative asset pricing with transaction costs, Mathematical Finance 2, pp. 63–86. Beran, R. and Srivastava, M. (1985). Bootstrap tests and confidence regions for functions of a covariance matrix, Annals of Statistics 13, pp. 95–115. Berd, A. M. (2005). Dynamic estimation of credit rating transition probabilities, working paper 0912.4621, ArXiv.org. Berger, J. and Wolpert, R. (1984). The Likelihood Principle (Institute of Mathematical Statistics, Hayward, CA). Berger, J. O. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence, Journal of the American Statistical Association 82, 397, pp. 112–122. Bergstrom, A. (1984). Handbook of Econometrics, Vol. 2, chap. Continuous time stochastic models and issues of aggregation over time (North Holland, Amsterdam), pp. 1145–1212. Berkowitz, J. (2001). Testing density forecasts with applications to risk management, Journal of Business and Economics Statistics 19, 4, pp. 465–474. Berkowitz, J., Christoffersen, P. F., and Pelletier, D. (2011). Evaluating valueat-risk models with desklevel data, Management Science 57, 12, pp. 2213– 2227. Berkowitz, J. and O‘Brien, J. (2002). How accurate are value-at-risk models at commercial banks. Journal of Finance 57, 3, pp. 1093–1111. Bernardo, J. and Smith, A. (1994). Bayesian Theory (Wiley, Chichester, UK). Bharadia, C. N., M.A. and Salkin, G. (1996). Computing the Black-Scholes implied volatility, Advances in Futures and Options Research 8, pp. 15–29. Bibby, B. M., Jacobsen, M., and Sorensen, M. (2010). Handbook of Financial Econometrics, Vol. 1, chap. Estimating Functions for Discretely Sampled Diffusion-Type Models (North-Holland, New York), pp. 203–268. Bibby, B. M. and Sorensen, M. (1995). Martingale estimation functions for dis-

page 333

April 28, 2015

12:28

334

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

cretely observed diffusion processes, Bernoulli 1, 1/2, pp. 17–39. Bibby, B. M. and Sørensen, M. (1996). On estimation for discretely observed diffusions: A review. Theory of Stochastic processes 2, pp. 49–56. Bickel, P. and Freedman, D. (1981). Some asymptotic theory for the bootstrap, Annals of Statistics 9, pp. 1196–1217. Birnbaum, A. (1962). On the foundations of statistical inference, Journal of the American Statistical Association 57, 298, pp. 269–306. Bjork, T. (2001). Option Pricing, Interest Rates and Risk Management, chap. A Geometric View of Interest Rate Theory (Cambridge University Press), pp. 241–277. Bjork, T. and Christensen, B. (1999). Interest rate dynamics and consistent forward rate curves, Mathematical Finance 9, 4, pp. 323–348. Black, F. (1989). How to use the holes in Black-Scholes, Journal of Applied Corporate Finance 1, 4, pp. 67–73. Black, F. and Karasinski, P. (1991). Bond and option pricing when short rates are log-normal, Financial Analysts Journal 47, pp. 52–59. Bluhm, C., Overbeck, L., and Wagner, C. (2003). An Introduction to Credit Risk Modeling, Financial Mathematics Series (Chapman & Hall/CRC, London). Blume, M., Lim, F., and Mackinlay, A. (1998). The declining credit quality of U.S. corporate debt: Myth or reality, The Journal of Finance 53, 4, pp. 1389–1413. Bongaerts, D. and Charlier, E. (2009). Private equity and regulatory capital, Journal of Banking and Finance 33, pp. 1211–1220. Bossy, M., Gibson, R., Lhabitant, F., Pistre, N., Talay, D., and Zheng, Z. (2000). Volatility model risk measurement and strategies against worst case volatilities, Journal de la Soci´et´e Fran¸caise de Statistique 141, pp. 73–86. Boucher, C. M., Danielsson, J., Kouontchou, P. S., and Maillet, B. B. (2014). Risk models at risk, Journal of Banking and Finance 44, pp. 72–92. Boyer, B. H., Gibson, M. S., and Loretan, M. (1977). Pitfalls in tests for changes in correlations, working paper 597, Board of Governors of the Federal Reserve System. Boyle, P. P., Tan, K. S., and Tian, W. (2001). Calibrating the Black-DermanToy model: some theoretical results, Applied Mathematical Finance 8, pp. 27–48. Branger, N. and Schlag, C. (2004). Model risk: A conceptual framework for risk measurement and hedging, working paper, Goethe University. Brennan, M. and Schwartz, E. (1982a). Consistent regulatory policy under uncertainty, The Bell Journal of Economics 13, 2, pp. 506–521. Brennan, M. and Schwartz, E. (1982b). An equilibrium model of bond pricing and a test of market efficiency, Journal of Financial and Quantitative Analysis 17, 3, pp. 301–329. Brenner, M. and Subrahmanyam, M. (1988). A simple formula to compute the implied standard deviation, Financial Analysts Journal 5, pp. 80–83. Bretagnolle, J. (1983). Lois limites du bootstrap de certaines fonctionnelles, Annales de L’institut Henri Poincar´e 19, Sec. B, pp. 282–296. Britten-Jones, M. and Schaefer, S. (1999). Non-linear value at risk, European

page 334

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

335

Finance Review 2, pp. 161–187. Brooks, C. and Persand, G. (2000). Value at risk and market crashes, Journal of Risk 2, 4, pp. 5–26. Brooks, C. and Persand, G. (2002). Model choice and Value-at-Risk performance, Financial Analysts Journal 58, 5, pp. 87–97. Brown, S. (1979). The effect of estimation risk on capital market equilibrium, Journal of Financial and Quantitative Analysis 14, 2, pp. 215–220. Brown, S., Bawa, V. S., and Klein, R. W. (1979). Estimation risk and optimal portfolio choice: A survey, Proceedings of the American Statistical Association, pp. 53–58. Brown, S. and Chen, S. N. (1983). Estimation risk and simple rules for portfolio selection, Journal of Finance 38, 4, pp. 1087–1093. Brown, S. and Klein, W. (1984). Model selection when there is “minimal” prior information, Econometrica 52, 5, pp. 1291–1312. Bunnin, F., Guo, Y., and Ren, Y. (2002). Option pricing under model and parameter uncertainty using predictive densities, Statistics and Computing 12, pp. 37–44. Buraschi, A. and Corielli, F. (2005). Risk management implications of timeinconsistency: Model updating and recalibration of no-arbitrage models, Journal of Banking and Finance 29, pp. 2883–2907. Butler, J. and Schachter, B. (1998). Improving Value-at-Risk estimates by combining kernal estimation with historical simulation, Review of Derivatives Research, 1, pp. 371–390. Cairns, A. J. (2000). A discussion of parameter and model uncertainty in insurance, Insurance: Mathematics and Economics 27, pp. 313–330. Cairns, A. J. (2004). Interest rate models (Princeton University Press, Princeton). Cambanis, S. (1977). Some properties and generalizations of multivariate EyraudGumbel-Morgenstern distributions, Journal of Multivariate Analysis 7, pp. 551–559. Campbell, S. D. (2007). A review of backtesting and backtesting procedures, Journal of Risk 9, pp. 1–17. Campolongo, F., J¨ onsson, H., and Schoutens, W. (2013). Quantitative Assessment of Securitisation Deals, Springer Briefs in Finance (Springer, Heidelberg). Cao, M. and Wei, J. (2010). Valuation of housing index derivatives, Journal of Futures Markets 30, pp. 660–688. Capinski, M. and Zastawniak, T. (2003). Mathematics for Finance (Springer, London). Carey, M. and Hrycay, M. (2001). Parameterizing credit risk models with rating data, Journal of Banking and Finance 25, pp. 197–270. Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis (Chapman & Hall, London). Cerny, A. (2009). Mathematical Techniques in Finance: tools for incomplete markets (Princeton University Press, Princeton and Oxford). Chan, K. (1993). Asymptotic behaviour of the Gibbs sampler, Journal of the American Statistical Association 88, pp. 320–326. Chan, K., Karolyi, F., Longstaff, F., and Sanders, A. (1992). An empirical com-

page 335

April 28, 2015

12:28

336

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

parison of alternative models of the short-terminterest rate, Journal of Finance 47, pp. 1209–1227. Chapman, D. A. and Pearson, N. D. (2000). Is the short rate drift actually nonlinear? Journal of Finance 55, pp. 355–388. Chappell, D. and Dowd, K. (1999). Confidence intervals for VaR, Financial Engineering News, pp. 1–2. Charemza, W. (2002). Guesstimation, Journal of Forecasting 21, pp. 417–433. Cheung, S. (1996). Provincial credit ratings in Canada: An ordered probit analysis, working paper 96-6, Bank of Canada. Christoffersen, P. (2012). Elements of Financial Risk Management, 2nd edn. (Academic Press, Oxford). Christoffersen, P., Diebold, F., and Schuermann, T. (1998). Horizon problems and extreme events in financial risk management, discussion paper 98-16, Financial Institutions Center, The Wharton School University of Pennsylvania. Christoffersen, P. and Goncalves, S. (2005). Estimation risk in financial risk management, Journal of Risk 7, 3, p. 128. Christoffersen, P. F. (1998). Evaluating interval forecasts, International Economic Review 39, 4, pp. 841–862. Christoffersen, P. F. and Pelletier, D. (2004). Backtesting value-at-risk: A duration-based approach, Journal of Financial Econometrics 2, 1, pp. 84– 108. Cont, R. (2006). Model uncertainty and its impact on the pricing of derivative instruments, Mathematical Finance 13, pp. 519–547. Contreras, P. and Satchell, S. (2003). A Bayesian confidence interval for VaR, Cambridge Working Papers in Economics 0348, Cambridge. Corrado, C. and Miller, T. (1996). A note on a simple, accurate formula to compute implied standard deviations, Journal of Banking and Finance 20, pp. 595–603. Cox, J. (1975). Notes on option pricing I: Constant elasticity of diffusions, Unpublished draft. Cox, J. and Huang, C. F. (1989). Optimal consumption and portfolio policies when asset prices follow a diffusion process, Journal of Economic Theory 49, pp. 33–83. Cox, J. and Huang, C. F. (1991). A variational problem arising in financial economics, Journal of Mathematical Economics 20, pp. 465–487. Cox, J., Ingersoll, J., and Ross, S. (1980). An analysis of variable rate loan contracts, Journal of Finance 35, pp. 389–403. Cox, J., Ingersoll, J., and Ross, S. (1981). A reexamination of traditional hypotheses about the term structure of interest rates, Journal of Finance 36, pp. 769–799. Cox, J., Ingersoll, J., and Ross, S. (1985). A theory of the term structure of interest rates, Econometrica 53, pp. 385–407. Cox, J. and Ross, S. (1976). The value of options for alternative stochastic prcesses, Journal of Financial Economics 3, pp. 145–166. Cox, J., Ross, S., and Rubenstein, M. (1979). Option pricing: A simplified ap-

page 336

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

337

proach, Journal of Financial Economics 7, pp. 229–263. Cramer, H. (1957). Mathematical Methods of Statistics (Princeton University Press, Princeton). Crouhy, M., Galai, D., and Mark, R. (1998). Model risk, Journal of Financial Engineering 7, 3/4, pp. 267–288. Dacunha-Castelle, D. and Florens-Zmirou, D. (1986). Estimation of the coefficient of a diffusion from discrete observations, Stochastics 19, pp. 263–284. Dana, R.-A. and Jeanblanc, M. (2007). Financial Markets in Continuous Time (Springer, Berlin). Danielsson, J. (2002). The emperor has no clothes: Limits to risk modelling, Journal of Banking & Finance 26, pp. 1273–1296. Danielsson, J., James, K., Valenzuela, M., and Zer, I. (2014). Model risk of risk models, Finance and Economics Discussion Series 2014-34, Federal Reserve Board, Washington, D.C. Danielsson, J., Samorodnitsky, G., Sarma, M., Jorgensen, B. N., and Vries, C. G. D. (2005). Subadditivity re-examined: the case for Value-at-Risk. FMG Discussion Papers, London School of Economics. Darkiewicz, G., Dhaene, J., and Goovaerts, M. (2003). Coherent distortion risk measures - a pitfall, Proceedings of the Seventh International Congress on Insurance: Mathematics and Economics, lyon. Darkiewicz, G., Dhaene, J., and Goovaerts, M. (2004). Distortion risk measures for sums of random variables, Blaetter der DGVFM XXVI, 4, pp. 631–641. Das, S. and Tufano, P. (1996). Pricing credit sensitive debt when interest rates, credit ratings and credit spreads are stochastic, Journal of Financial Engineering 5, pp. 161–198. David, H. (1981). Order Statistics, Vol. 2 (Wiley, New York). David, H. and Mishriky, R. (1968). Order statistics for discrete populations and for grouped samples, Journal of the American Statistical Association 63, 324, pp. 1390–1398. Deheuvels, P., Masona, D., and Shorack, N. (1993). Some results on the influence of extremes on the bootstrap, Annales de l’Institut Henri Poincar´e 29, pp. 83–103. Dempster, M., Evstigneev, I., and Schenk-Hopp´e, K. (2007). Volatility-induced financial growth, Quantitative Finance 7, pp. 151–160. Dempster, M. A., Evstigneev, I., and Schenk-Hopp´e, K. (2011). The Kelly Capital Growth Investment Criterion: Theory and Practice, chap. Growing wealth with fixed-mix strategies (World Scientific), pp. 427–455. Derman, E. (1996). Model risk, Quantitative strategies research notes, Goldman Sachs, New York. Derman, E. (1997). VaR-Understanding and Applying Value-at-Risk, chap. Model risk (Risk Publications, London), pp. 83–88. Dermody, J. and Rockafellar, R. (1991). Cash stream valuation in the face of transaction costs and taxes, Mathematical Finance 1, pp. 31–54. Detering, N. and Packham, N. (2013). Measuring the model risk of contingent claims, SSRN. Dhaene, J., Denuit, M., Goovaerts, M., Kaas, R., and Vyncke, D. (2002a). The

page 337

April 28, 2015

12:28

338

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

concept of comonotonicity in actuarial science and finance: applications, Insurance: Mathematics & Economics 31, 2, pp. 133–161. Dhaene, J., Denuit, M., Goovaerts, M., Kaas, R., and Vyncke, D. (2002b). The concept of comonotonicity in actuarial science and finance: theory, Insurance: Mathematics & Economics 31, 1, pp. 3–33. Dharmadakiri, S. and Joag-dev, K. (1988). Unimodality, Convexity, and Applications (Academic Press, New York). Dickler, R. A. J., Daniel T. and van Deventer, D. R. (2011a). Inside the Kamakura book of yields: An analysis of 50 years of daily U.S. treasury zero coupon bond yields, Kamakura memorandum 26. Dickler, R. A. J., Daniel T. and van Deventer, D. R. (2011b). Inside the Kamakura book of yields, volume III: A pictorial history of 50 years of U.S. treasury par coupon bond yields, Kamakura memorandum 5. Dohnal, G. (1987). On estimating the diffusion coefficient, Journal of Applied Probability 24, 1, pp. 105–114. Doran, J. S. and Ronn, E. I. (2005). The bias in Black-Scholes/Black implied volatility: An analysis of equity and energy markets, Review of Derivatives Research 8, pp. 177–198. Dowd, K. (2000a). Adjusting for risk: An improved sharpe ratio, International Review of Economics and Finance 9, pp. 209–222. Dowd, K. (2000b). Assessing VaR accuracy, Derivatives Quarterly 6, 3, pp. 61–63. Dowd, K. (2001). Estimating VaR with order statistics. Journal of Derivatives 8, 3, pp. 23–30. Dowd, K. (2002). An Introduction to Market Risk Measurement (Wiley, Chichester). Dowd, K. (2006). Using order statistics to estimate confidence intervals for probabilistic risk measures, Journal of Derivatives, 14, 2, pp. 1–5. Dowd, K. (2010). Using order statistics to estimate confidence intervals for quantile- based risk measures, Journal of Derivatives, 17, 3, pp. 9–14. Dowd, K. (2013). Encyclopedia of Financial Models, Vol. II, chap. Model Risk (Wiley, Hobken New Jersey), pp. 691–698. Dowd, K. and Cotter, J. (2007). Evaluating the precision of estimators of quantilebased risk measures, working paper WP 08/17, UCD Business Schools. Draper, D. (1995). Assessment and propagation of model uncertainty, Journal of the Royal Statistical Society Series B 57, 1, pp. 45–97. Drost, F. and Nijman, T. (1993). Temporal aggregation of GARCH processes, Econometrica 61, pp. 909–927. Duffie, D., Ma, J., and J.Yong (1994). Black’s consol rate conjecture, working paper, graduate School of Business, Stanford University, Stanford, CA. Duffie, D. and Protter, P. (1992). From discrete to continuous-time finance: Weak convergence of the financial gain process, Mathematical Finance 2, 1, pp. 1–15. Duffie, D. and Singleton, K. (1999). Modeling term structures of defaultable bonds, Review of Financial Studies 12, pp. 687–720. D¨ umbgen, L. (1993). On nondifferentiable functions and the bootstrap, Probability Thoery and Related Fields 95, pp. 125–140.

page 338

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

339

Durham, G. and Gallant, A. (2001). Numerical techniques for maximum likelihood estimation of continuous-time diffusion processes, Journal of Business Economics and Statistics 20, pp. 297–316. Durrett, R. (1995). Probability: Theory and Examples, 2nd edn. (Duxbury Press). Durrleman, V., Nikeghbali, A., and Roncalli, T. (2000). How to get bounds for distribution convolutions? a simulation study and an application to risk management, working paper, Cr´edit Lyonnais, groupe de Recherche Op´erationnelle. Dybvig, P. and Ingersoll, J. (1982). Mean-variance theory in complete markets, Journal of Business 55, 2, pp. 233–251. Dybvig, P., Rogers, L. C., and Back, K. (1999). Portfolio turnpikes, Review of Financial Studies 12, pp. 165–195. Efron, B. (1979). Bootstrap methods: Another look at the jackknife, Annals of Statistics 7, 1, pp. 1–26. Efron, B. and Tibshirani, R. (1994). An Introduction to the Bootstrap (Chapman & Hall/CRC). El-Karoui, N., Jeanblanc-Picqu´e, M., and Shreve, S. (1998). Robustness of the Black and Scholes formula, Mathematical Finance 8, 2, pp. 93–126. Elerian, O., Chib, S., and Shephard, N. (2001a). Likelihood inference for discretely observed nonlinear diffusions, Econometrica 69, 4, pp. 959–993. Elerian, O., Chib, S., and Shephard, N. (2001b). Likelihood inference for discretely observed nonlinear diffusions, Econometrica 69, pp. 959–993. Elliott, M. (1997). Controlling model risk, Derivatives Strategy . Embrechts, P., Kluppelberg, C., and Mikosch, T. (1997). Modelling Extremal Events for Insurance and Finance (Springer, Berlin). Embrechts, P., McNeil, A., and Straumann, D. (2002). Risk Management: Value at Risk and Beyond, chap. Correlation and Dependence in Risk Management: Properties and Pitfalls (Cambridge University Press), pp. 176–223. Embrechts, P., Puccetti, G., and R¨ uschendorf, L. (2013). Model uncertainty and VaR aggregation, Journal of Banking & Finance 37, 8, pp. 2750–2764. Embrechts, P., Puccetti, G., R¨ uschendorf, L., Wang, R., and Beleraj, A. (2014). An academic response to Basel 3.5, Risks 2, 1, pp. 25–48. Embrechts, P., Wang, B., and Wang, R. (2015). Aggregation-robustness and model uncertainty of regulatory risk measures, Finance and Stochastics Forthcoming. Emmer, S., Kratz, M., and Tasche, D. (2014). What is the best risk measure in practice@ a comparison of standard measures, working paper 1312.1645v3, arXiv. Engelman, B. and Ermakov, K. (2011). The Basel II Risk Parameters, chap. Transition Matrices: Properties and Estimation Methods (Springer, Berlin), pp. 103–116. Engle, R. F. (2001). Garch 101: The use of arch/garch models in applied econometrics, Journal of Economic Perspectives 15, 4, pp. 157–168. Eraker, B. (2001). MCMC analysis of diffusion models with application to finance, Journal of Business Economics and Statistics 19, pp. 177–191. Escanciano, J. C. and Olmo, J. (2010). Backtesting parametric value-at-risk with

page 339

April 28, 2015

12:28

340

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

estimation risk, Journal of Business & Economic Statistics 28, 1, pp. 36–51. Escanciano, J. C. and Olmo, J. (2011). Robust backtesting test for value-at-risk, Journal of Financial Econometrics 9, 1, pp. 132–161. Escanciano, J. C. and Pei, P. (2012). Pitfalls in bactesting historical simulation models, Journal of Banking and Finance 36, 8, pp. 2233–2244. Fabozzi, F., Leccadito, A., and Tunaru, R. (2012a). A new method to generate approximation algorithms for financial mathematics applications, Quantitative Finance 12, 10, pp. 1571–1583. Fabozzi, F. and Tunaru, R. (2006). On risk management problems related to a coherence property, Quantitative Finance 6, 1, pp. 75–81. Fabozzi, F. and Tunaru, R. (2007). Some inconsistencies in modeling credit portfolio products, International Journal of Theoretical and Applied Finance 10, 8, pp. 1305–1321. Fabozzi, F. J., Shiller, R. J., and Tunaru, R. S. (2009). Hedging real estate risk, The Journal of Portfolio Management 35, 5, pp. 92–103. Fabozzi, F. J., Shiller, R. J., and Tunaru, R. S. (2012b). A pricing framework for real estate derivatives, European Financial Management 18, 5, pp. 762–789. Fabozzi, F. J., Shiller, R. J., Tunaru, R. S., Fabozzi, F. J., Shiller, R. J., and Tunaru, R. S. (2010). Property derivatives for managing european realestate risk, European Financial Management, pp. 8–26. Fama, E. (1984). Term premiums in bond returns, Journal of Financial Economics 13, pp. 529–546. Fan, J. and Zhang, C. (2003). A re-examination of diffusion estimators with applications to financial model validation, Journal of American Statistical Association. 98, pp. 118–134. Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2 (Wiley, New York). Figlewski, S. (2004). Estimation error in the assessment of financial risk exposure, working paper, New York University. Filipovic, D. (1998). A note on the Nelson-Siegel family, Mathematical Finance 9, 4, pp. 349–359. Filipovic, D. (2009). Term-Structure Models (Springer, Berlin). Florens-Zmirou, D. (1993). On estimating the diffusion coefficient from discrete observations, Journal of Applied Probability 30, 4, pp. 790–804. F¨ ollmer, H. and Leukert, P. (1999). Quantile hedging, Finance and Stochastics 3, p. 251273. F¨ ollmer, H. and Schied, A. (2002). Convex measures of risk and trading constraints, Finance and Stochastics 6, 4, pp. 429–447. Foster, F. and Whiteman, C. (1999). An application of Bayesian option pricing to the soybean market, American Journal of Agricultural Economics 81, 3, pp. 722–727. Frank, M., Nelsen, R. B., and Schweizer, B. (1987). Best possible bounds for the distribution of a sum – a problem of Kolmogorov, Probability Theory and Related Fields 74, pp. 199–211. Fr´echet, M. (1957). Les tableaux de corr´elation dont les marges sont donn´ees, Annales de l’Universit´e de Lyon, Sciences Math´ematiques et Astronomie 4,

page 340

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

341

pp. 13–31. FSA (2009). The Turner review, a regulatory response to the global banking crisis, www.fsa.gov.uk/pubs/ other. Gatheral, J. (2006). The Volatility Surface (John Wiley & Sons, Hoboken). Gelman, A., Carlin, J., Stern, H., and Rubin, D. (1995). Bayesian Data Analysis (Chapman and Hall, New York). Geltner, D. and Fisher, J. (2007). Pricing and index considerations in commercial real estate derivatives, Journal of Portfolio Management, 33, 5, pp. 99–117. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6, pp. 721–741. Geweke, J. (2001). A note on some limitations of CRRA utility, Economics Letters 71, pp. 341–346. Gibson, R. (ed.) (2000). Model Risk - Concepts, Calibration and Pricing (Risk Books). Gibson, R., Lhabitant, F., Pistre, N., and Talay, D. (1999). Interest rate model risk: An overview. Journal of Risk 3, pp. 37–62. Gibson, R., Lhabitant, F.-S., and Talay, D. (2010). Modeling the term structure of interest rates: A review of the literature, Foundations and Trends in Finance 5, pp. 1–156. Gilks, W. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling, Applied Statistics 41, 2, pp. 337–348. Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering (Springer). Gneiting, T. (2011). Making and evaluating point forecasts, Journal of the American Statistical Association 106, 494, pp. 746–762. Gobet, E., Hoffman, M., and Reiß, M. (2004). Nonparametric estimation of scalar diffusions based on low frequency data, The Annals of Statistics 32, 5, pp. 2223–2253. Goetzmann, W., Ingersoll, J., Spiegel, M., and Welch, I. (2004). Sharpening sharpe ratios, Yale ICF working paper 02-08, Yale International Center for Finance, Yale School of Management. Gossl (2005). Predictions based on certain uncertainties -a Bayesian credit portfolio approach, working paper, HypoVereinsbank AG. Gourieroux, C. and Jasiak, J. (2010). Handbook of Financial Econometrics, Vol. 1, chap. Value at Risk (North-Holland, New York), pp. 553–616. Green, T. C. and Figlewski, S. (1999). Market risk and model risk for a financial institution writing options, Journal of Finance 54, 4, pp. 1465–1499. Hansen, L. P., Scheinkman, J. A., and Touzi, N. (1998). Spectral methods for identifying scalar diffusions, Journal of Econometrics 86, pp. 1–32. Harrison, J. and Pliska, S. (1983). A stochastic calculus model of continuous trading: Complete markets, Stochastic Processes and Their Applications 15, pp. 313–316. Hartz, C., Mittnik, S., and Paolella, M. (2006). Accurate Value-at-Risk forecasting based on the (good old) normal-GARCH model, working paper 333,

page 341

April 28, 2015

12:28

342

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

National Centre of Competence in Research Financial Valuation and Risk Management. Harvey, C. R. and Liu, Y. (2014). Backtesting, Tech. rep., SSRN. Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options, The Review of Financial Studies 6, 2, pp. 327–343. Heyde, C. (1963). On a property of the lognormal distribution, Journal of Royal Statistical Society Ser. B 25, pp. 392–393. Hoetig, J., Madigan, D., and Raftery, A. (1999). Bayesian model averaging: A tutorial, Statistical Science 14, 4, pp. 382–417. H¨ offding, W. (1940). Massstabinvariante korrelationstheorie, Schriften des Mathematischen Seminars und des Instituts f¨ ur Angewandte Mathematik der Universit¨ at Berlin 5, pp. 181–233. Hogan, M. (1993). Problems in certain two-factor term structure models, Annals of Applied Probability 3, pp. 573–591. Hogan, M. and Weintraub, K. (1993). The lognormal interest rate model and eurodollar futures, Tech. rep., Citibank, New York. Hong, K. J. S. S., J. and Scherer, B. (2010). Using approximate results for validating Value-at-Risk, The Journal of Risk Model Validation 4, pp. 69–81. Honor´e, P. (1998). Pitfalls in estimating jump-diffusion models, working paper 18, University of Aarhus, Aarhus School of Business. Hsiao, C. (1983). Studies in Econometrics, Time Series, and Multivariate Statistics, chap. Regression Analysis With a Categorized Explanatory Variable (Academic Press, New York), pp. 93–129. Hsiao, C. and Mountain, D. (1985). Estimating the short run income elasticity for demand of electricity by using cross-sectional categorized data, Journal of American Statistical Association 80, pp. 259–265. Hu, Y., Kiesel, R., and Perraudin, J. (2002). The estimation of transition matrices for sovereign credit ratings, Journal of Banking and Finance 26, pp. 1383– 1406. Hull, J. C. and Suo, W. (2002). A methodology for assessing model risk and its application to the implied volatility function model, Journal of Financial and Quantitative Analysis 37, 2, pp. 297–318. Inui, K. and Kijima, M. (2005). On the significance of expected shortfall as a coherent risk measure, Journal of Banking and Finance 29, pp. 853–864. Jacod, J. (2010). Handbook of Financial Econometrics, Vol. 2, chap. Inference for Stochastic Processes (North-Holland, New York), pp. 197–240. Jacquier, E. and Jarrow, R. A. (2000). Bayesian analysis of contingent claim model error, Journal of Econometrics 94, pp. 145–180. Jacquier, E., Polson, N., and Rossi, P. (1994a). Bayesian analysis of stochastic volatility models, Journal of Business & Economic Statistics 12, pp. 371– 389. Jacquier, E., Polson, N., and Rossi, P. (1994b). Bayesian analysis of stochastic volatility models, Journal of Business and Economic Statistics 12, 4, pp. 371–418. Jacquier, E., Polson, N., and Rossi, P. (2004). Bayesian analysis of stochastic

page 342

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

343

volatility models with fat-tails and correlated errors, Journal of Econometrics 122, pp. 185–212. Jacquier, E., Polson, N. G., and Rossi, P. E. (1994c). Bayesian analysis of stochastic volatility models (with discussion), Journal of Business and Economic Statistics 12, 4, pp. 371–389. Jarrow, R. and Turnbull, S. (1995). Pricing derivatives on financial securities subject to credit risk, Journal of Finance 50, 1, pp. 53–86. Jarrow, R. A. (2009). The term structure of interest rates, Annual Review of Financial Economics 1, pp. 69–96. Jarrow, R. A., Jin, X., and Madan, D. (1999). The second fundamental theorem of asset pricing, Mathematical Finance 9, 3, pp. 255–273. Jarrow, R. A., Lando, D., and Turnbull, S. (1997). A Markov model for the term structure of credit risk spreads, Review of Financial Studies 10, 2, pp. 481–523. Jarrow, R. A. and Rudd, A. (1982). Approximate option valuation for arbitrary stochastic processes, Journal of Financial Economics 10, pp. 347–369. Jiang, G. J. and Knight, J. L. (1997). A nonparametric approach to the estimation of diffusion processes, with an application to a short-term interest rate model, Econometric Theory 13, 5, pp. 615–645. Joag-dev, K. (1984). Handbook of Statistics, chap. Measures of Dependence (North-Holland/Elsevier, New York), pp. 79–88. Johannes, M. and Polson, N. (2010). Handbook of Financial Econometrics, Vol. 2, chap. MCMC Methods for Continuous-Time Financial Econometrics (North-Holland, New York), pp. 1–72. Jorion, P. (1988). On jump processes in the foreign exchange and stock markets, Review of Financial Studies 1, 4, pp. 427–445. Jorion, P. (1996). Risk2: Measuring the risk in value-at-risk. Financial Analysts Journal 52, pp. 47–56. Ju, N. (2002). Pricing Asian and basket options via Taylor expansion, Journal of Computational Finance 5, 3, pp. 79–103. Kaas, R., Dhaene, J., Vyncke, D., Goovaerts, M., and Denuit, M. (2002). A simple geometric proof that comonotonic risks have the convex-largest sum, ASTIN Bulletin 32, 1, pp. 71–80. Kaas, R., Goovaerts, M., and Tang, Q. (2004). Some useful counterexamples regarding comonotonicity, Belgian Actuarial Bulletin 4, 1, pp. 1–4. Kandel, S. and Stambaugh, R. (1996). On the predictability of stock returns: an asset allocation perspective, Journal of Finance 51, pp. 385–424. Kao, C. and Wu, C. (1990). Two-step estimation of linear models with ordinal unobserved variables: The case of corporate bonds, Journal of Business & Economic Statistics 8, 3, pp. 317–325. Karatzas, I., Lehoczky, J. P., and Shreve, S. E. (1987). Optimal portfolio and consumption decisions for a small investor on a finite horizon, SIAM Journal of Control Optimisation 27, pp. 1157–1186. Karolyi, G. (1993). A Bayesian approach to modeling stock return volatility for option valuation, Journal of Financial and Quantitative Analysis 28, 4, pp. 579–594.

page 343

April 28, 2015

12:28

344

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Kendall, M. and Stuart, A. (1977). The advanced theory of statistics, Vol. 1, 4th edn. (Macmillan, New York). Kerkhof, J. and Melenberg, B. (2004). Backtesting for risk-based regulatory capital, Journal of Banking and Finance 28, pp. 1845–1865. Kerkhof, J., Melenberg, B., and Schumacher, H. (2010). Model risk and capital reserves, Journal of Banking and Finance 34, pp. 267–279. Kessler, M. and Sorensen, M. (1999). Estimating equations based on eigenfunctions for a discretely observed diffusion, Bernoulli 5, pp. 299–314. Kiefer, N. (1978). Discrete parameter variation: Efficient estimation of a switching regression model, Econometrica 46, 2, pp. 427–434. Kienitz, J. and Wetterau, D. (2012). Financial modelling: Theory, implementation and Practice with MATLAB Source (Wiley, Chichester). Kijima, M. (2006). A multivariate extension of equilibrium pricing transforms: the multivariate Esscher and Wang transforms for pricing financial and insurance risks, Astin Bulletin 36, 1, pp. 269–283. Kijima, M. and Muromachi, Y. (2008). An extension of the wang transform derived from Buhlmann’s economic premium principle for insurance risk, Insurance: Mathematics and Economics 42, 3, pp. 887–896. Kim, S., Shephard, N., and Chib, S. (1998). Stochastic volatility: Likelihood inference and comparison with ARCH models, Review of Economic Studies 65, pp. 361–393. Kirch, M. (2002). Efficient hedging in incomplete markets under model uncertainty, working paper, TU Wien. Knight, F. (1921). Risk, Uncertainty, and Profit (Hart, Schaffner & Marx; Houghton Mifflin Company, Boston). Korn, R. and Kraft, H. (2004). On the stability of continuous-time portfolio problems with stochastic opportunity set, Mathematical Finance 14, 3, pp. 403–414. Kukuk, M. (2002). Indirect estimation of (latent) linear models with ordinal regressors. A Monte Carlo study and some empirical illustrations, Statistical Papers 43, pp. 379–399. Kurtz, T. and Protter, P. (1991). Weak limit theorems for stochastic integrals and stochastic differential equations, Annals of Probability 19, pp. 1035–1070. Kutoyants, A. (1984). Parameter Estimation for Stochastic Processes (Heldermann-Verlag, Berlin). Lancaster, T. (2004). Introduction to Modern Bayesian Econometrics (WileyBlackwell). Latane, H. and Rendleman, R. (1976). Standard deviation of stock price ratios implied by option premia, Journal of Finance 31, pp. 369–382. Leipnik, R. (1981). The lognormal law and strong non-uniqueness of the moment problem, Theory Probability and Applications 26, pp. 850–852. Lemke, W. (2006). Term Structure Modeling and Estimation in a State Space Framework, Lecture Notes in Economics and Mathematical Systems, Vol. 565 (Springer, Berlin). Li, S. (2005). A new formula for computing implied volatility, Applied Mathematics and Computation 170, 1, pp. 611–625.

page 344

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

345

Lindley, D. and Philips, L. (1976). Inference for a Bernoulli process (a Bayesian view), American Statistician 30, pp. 112–119. Lioui, A. and Poncet, P. (2001). On optimal portfolio choice under stochastic interest rates, Journal of Economics Dynamics and Control 25, p. 18411865. Liu, J., Longstaff, F. A., and Pan, J. (2003). Dynamic asset allocation with event risk, Journal of Finance 58, pp. 231–259. Liu, J., Wong, W., and Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes, Biometrika 81, pp. 27–40. Liu, J., Wong, W., and Kong, A. (1995). Covariance structure and convergence rate of the Gibbs sampler with various scans, Journal of Royal Statistical Society B 57, pp. 157–169. Lo, A. (1988). Maximum likelihood estimation of generalized Itˆ o processes with discretely sampled data, Econometric Theory 4, 2, pp. 231–247. Lo, A. (2002). The statistics of Sharpe ratio, Financial Analysts Journal 58, 4, pp. 36–52. Lo, A. and Wang, J. (1995). Implementing option pricing models when asset returns are predictable, Journal of Finance 50, pp. 87–129. Loisel, S. (2009). A trivariate non-Gaussian copula having 2-dimensional Gaussian copulas as margins, Cahiers de Recherche de l’ISFA, wP2106. Longstaff, F., Santa-Clara, P., and Schwartz, E. (2001). Throwing away a billion dollars: cost of suboptimal exercise strategies in the swaptions market, Journal of Financial Economics 62, pp. 39–66. Longstaff, F. A. (2000). Arbitrage and the expectations hypothesis, Journal of Finance 55, pp. 989–994. Lutz, F. (1940). The structure of interest rates, Quarterly Journal of Economics 55, 1, pp. 36–63. Lyons, T. (1995). Uncertain volatility and the risk free synthesis of derivatives, Applied Mathematical Finance 2, pp. 117–133. Lyons, T. J. and Zheng, W. (1990). On conditional diffusion processes, Proceedings of Royal Society of Edinburgh 115, pp. 243–255. MacKinnon, G. and Zaman, A. A. (2009). Real estate for the long term: The effect of return predictability on long-horizon allocations, Real Estate Economics 37, 1, pp. 117–153. Makarov, G. (1981). Estimates for the distribution function of a sum of two random variables when the marginal distributions are fixed, Theory of Probability and its Applications 26, pp. 803–806. Malevergne, Y. and Sornette, D. (2003). Testing the Gaussian copula hypothesis for financial assets dependences, Quantitative Finance 3, pp. 231–250. Malkiel, B. (1966). The Term Structure of Interest Rates: Expectations and Behavior Patterns (Princeton University Press, Princeton). Manaster, S. and Koehler, G. (1982). The calculation of implied variances from the Black-Scholes model: A note, Journal of Finance 37, pp. 227–230. Matthies, A. (2014). Validation of term structure forecasts with factor models, The Journal of Risk Model Validation 8, 3, pp. 65–95. McCulloch, J. (1975). An estimate of the liquidity premium, Journal of Political

page 345

April 28, 2015

12:28

346

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

Economy 83, pp. 95–119. McCulloch, J. (1993). A re-examination of traditional hypotheses about the term structure: A comment, The Journal of Finance 48, pp. 779–789. McDonald, R. (2006). Derivatives Markets, 2nd edn. (Pearson, Boston). McNeil, A. J., Frey, R., and Embrechts, P. (2005). Quantitative Risk Management, Princeton Series in Finance (Princeton University Press, Princeton and Oxford). Melnikov, A. V. and Petrachenko, Y. G. (2005). On option pricing in binomial market with transaction costs, Finance and Stochastics 9, pp. 141–149. Merton, R. (1969). Lifetime portfolio selection under uncertainty, Review of Economics and Statistics 51, pp. 247–257. Merton, R. (1974). On the pricing of corporate debt: The risk structure of interest rates, Journal of Finance 29, pp. 449–470. Merton, R. (1980). On estimating the expected return on the market: An exploratory investigation. Journal of Financial Economics 8, pp. 323–361. Merton, R. C. (1976a). The impact on option pricing of specification error in the underlying stock price returns, Journal of Finance 31, pp. 333–350. Merton, R. C. (1976b). Option pricing when underlying stock returns are continuous, Journal of Financial Economics 3, pp. 125–144. Moraux, F. (2011). Large sample confidence intervals for normal VaR, Journal of Risk Management in Financial Institutions 4, 2, pp. 189–200. Morini, M. (2011). Understanding and Managing Model Risk: A Practical Guide for Quants, Traders and Validators (Wiley, Chichester). Mykland, P. A. (2010). Handbook of Financial Econometrics, Vol. 2, chap. Option Pricing Bounds and Statistical Uncertainty: Using Econometrics to Find and Exit Strategy in Derivatives Trading (North-Holland, New York), pp. 135–196. Nakagawa, T. and Osaki, S. (1975). The discrete Weibull distribution, IEEE Transactions on Reliability R-24, 5, pp. 300–301. Navas, J. F. (2003). Correct calculation of volatility in a jump-diffusion model, Journal of Derivatives 11, 2, pp. 66–72. Nelsen, R. B. (1995). Copulas, characterization, correlation, and counterexamples, Mathematics Magazine 68, 3, pp. 193–198. Nelsen, R. B. (2006). An introduction to copulas, 2nd edn., Lecture Notes in Statistics (Springer, Berlin). Nguyen, H. T., Kreinovich, V., and Sriboonchitta, S. (2009). A new justification of Wang transform operator in financial risk analysis, International Journal of Intelligent Technologies and Applied Statistics 2, 1, pp. 45–57. Nickell, P., Perraudin, W., and Varotto, S. (2000). Stability of rating transitions, Journal of Banking and Finance 24, pp. 203–227. Nowman, K. (1997). Gaussian estimation of single-factor continuous time models of the term structure of interest rates, Journal of Finance 52, pp. 1695– 1706. Osborne, M. J. (2014). Multiple Interest Rate Analysis (Palgrave Macmillan, Basingstoke). Patton, A. (2009). Handbook of Financial Time Series, chap. Copula-based mod-

page 346

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

347

els for financial time series (Springer, Berlin), pp. 767–785. Patton, A. J. (2012). A review of copula models for economic time series, Journal of Multivariate Analysis 110, pp. 4–18. Pedersen, A. R. (1995). A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations, Scandinavian Journal of Statistics 22, p. 5571. Pelsser, A. (2008). On the aplicability of the Wang transform for pricing financial risks, ASTIN Bulletin 38, pp. 171–181. Pesaran, H., Pettenuzzo, D., and Timmermann, A. (2007). Learning, structural instability and present value calculations, Econometric Reviews 26, 2-4, pp. 253–288. Phillips, P. and Yu, J. (2005). Jackknifing bond option prices, Review of Financial Studies 18, pp. 707–742. Piazzesi, M. (2010). Handbook of Financial Econometrics, Vol. 1, chap. Affine Term Structure Models (North-Holland, New York), pp. 691–766. Platen, E. and Heath, D. (2006). A Benchmark Approach to Quantitative Finance (Springer, Berlin). Polson, N. and Stroud, J. (2003). Bayesian Statistics, Vol. 7, chap. Bayesian Inference for Derivative Prices (Oxford University Press, Oxford), pp. 641– 650. Polson, N. G. and Roberts, G. O. (1994). Bayes factors for discrete observations from diffusion processes, Biometrika 81, 1, pp. 11–26. Press, J. (1967). A compound events model for security prices, Journal of Business 40, pp. 317–335. Pritsker, M. (1997). Evaluating value at risk methodologies: Accuracy versus computational time, Journal of Financial Services Research 12, pp. 201– 241. Ritter, C. and Tanner, M. (1992). Facilitating the Gibbs sampler : The Gibbs stopper and the Griddy Gibbs sampler, Journal of the American Statistical Association 87, pp. 861–868. Roberts, G. and Polson, N. (1994). On the geometric convergence of the Gibbs sampler, Journal of Royal Statistical Society B 56, pp. 377–384. Roberts, G. and Stramer, O. (2001). On inference for partially observed non-linear diffusion models using the Metropolis-Hastings algorithm, Biometrika 88, pp. 603–621. Roll, R. (1970). The Behaviour of Interest Rates (Basic Books, New York). Romano, J. (1988). Bootstrapping the mode, Annals of the Institute of Statistical Mathematics 40, pp. 565–586. Ronning, G. and Kukuk, M. (1996). Efficient estimation of order probit models, Journal of American Statistical Association 91, 435, pp. 1120–1129. R¨ osch, D. and Scheule, H. (eds.) (2010). Model Risk (Riskbooks, London). Rosenberg, J. and Engle, R. (2002). Empirical pricing kernels, Journal of Financial Economics 64, pp. 341–371. Ross, S. A. (1976). The arbitrage theory of capital asset pricing, Journal of Economic Theory 13, pp. 341–360. Rosu, I. and Stroock, D. (2003). S´eminaire de Probabilit´es XXXVII, Lecture Notes

page 347

April 28, 2015

12:28

348

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

in Mathematics, Vol. 1832, chap. On the Derivation of the Black-Scholes Formula (Springer), pp. 399–414. Roux, A. and Zastawniak, T. (2006). A counter-example to an option pricing formula under transaction costs, Finance and Stochastics 10, 4, pp. 575– 578. Roynstrand, T., Nordbo, N., and Strat, V. (2012). Evaluating power of value-atrisk backtests, master thesis, Norwegian University of Science and Technology, Trondheim. Sandmann, K. and Sondermann, D. (1993). A term structure model and the pricing of interest rate derivatives, The Review of Futures Markets 12, 2, pp. 391–423. Sandmann, K. and Sondermann, D. (1997). A note on the stability of lognormal interest rate models and the pricing of eurodollar futures, Mathematical Finance 7, pp. 119–125. Satchell, S. and Christodoulakis, G. (eds.) (2008). The Analytics of Risk Model Validation (Academic Press, London). Schmeidler, D. (1986). Integral representation without additivity, Proceedings of the American Mathematical Society 97, pp. 225–261. Sch¨ obel, R. and Zhu, J. (1999). Stochastic volatility with an Ornstein-Uhlenbeck process: An extension. European Finance Review 4, pp. 23–46. Self, S. and Liang, K. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions, Journal of the American Statistical Association 82, 398, pp. 605–610. Sharpe, W. F. (1994). The Sharpe ratio, Journal of Portfolio Management Fall, pp. 49–58. Shiller, R. (1989). Investor behavior in the october 1987 stock market crash: Survey evidence, in R. J. Shiller (ed.), Market Volatility (MIT Press, Boston). Sørensen, H. (2002). Parametric inference for diffusion processes observed at discrete points in time: a survey, working paper 119, University of Aarhus, University of Copenhagen. Spiegelhalter, D., Best, N., Carlin, B., and van der Linde, A. (2002). Bayesian measures of model complexity and fit, Journal of Royal Statistical Society, Series B 64, pp. 583–640. Stanescu, S. and Tunaru, R. (2012). Handbook of Research Methods and Applications in Empirical Finance, chap. Quantifying the Uncertainty in VaR and ES Estimates (Edward Elgar Publishing), pp. 357–372. Stanescu, S. and Tunaru, R. (2014). Investment strategies with VIX and VSTOXX futures, working paper, SSRN, CeQuFin, University of Kent. Stanton, R. (1997). A nonparametric model of term structure dynamics and the market price of interest rate risk, Journal of Finance 52, pp. 1973–2002. Stefanescu, C., Tunaru, R., and Turnbull, S. (2009). The credit rating process and estimation of transition probabilities: A Bayesian approach, Journal of Empirical Finance 16, 2, pp. 216–234. Stoyanov, J. (2014). Counterexamples in Probability, 3rd edn. (Dover Publications Inc). Stroock, D. (1993). Probability Theory – An Analytical View (Cambridge Univer-

page 348

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Bibliography

Carte˙main˙WS

349

sity Press, Cambridge). Stuart, A. and Ord, K. (1987). Kendall’s Advanced Theory of Statistics, Vol. 1. Distribution Theory (Arnold, London). Talay, D. and Zheng, Z. (2002). Worst case model risk management, Finance and Stochastics 6, pp. 517–537. Tang, C. Y. and Chen, S. X. (2009). Parameter estimation and bias correction for diffusion processes, Journal of Econometrics 149, pp. 65–81. Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation, Journal of American Statistical Association 82, 398, pp. 528–540. Taqqu, M. S. (2001). Bachelier and his times: A conversation with Bernard Bru, Finance and Stochastics 5, 1, pp. 3–32. Tarashev, N. (2010). Measuring portfolio credit risk correctly: Why parameter uncertainty matters, Journal of Banking and Finance 34, pp. 2065–2076. Terza, J. (1987). Estimating linear models with ordinal qualitative regressions, Journal of Econometrics 34, pp. 275–291. Tierney, L. (1994). Markov chains for exploring posterior distributions, Annals of Statistics 22, pp. 1701–1762. Timmermann, A. (1993). How learning in financial markets generates excess volatility and predictability in stock prices, Quarterly Journal of Economics 108, pp. 1135–1145. Tunaru, R. (2010). Constructing discrete approximations algorithms for financial calculus from weak convergence results, in Progress in Analysis and Its Applications: Proceedings of the 7th International Isaac Congress, pp. editor = M. Ruzhansky and J. Wirth, publisher = World Scientific Publishing, address = Singapore,. Tunaru, R. (2011). Stochastic Analysis 2010, chap. Discrete Algorithms for Multivariate Financial Calculus (Springer, Berlin), pp. 243–266. Tunaru, R. (2013a). Encyclopedia of Financial Models, Vol. III, chap. Applications of Order Statistics to Risk Management Problems (Wiley, Hoboken, New Jersey), pp. 289–295. Tunaru, R. (2013b). The fundamental economic term of commercial real-estate in UK, SUERF STUDIES 4, pp. 27–40, Property Prices and Real Estate Financing in a Turbulent World, editors Morten Balling & Jesper Berg. Uhlenbeck, G. and Ornstein, L. (1930). On the theory of Brownian motion, Physical Review 36, p. 82341, reprinted in N. Wax, eds., 1954, Selected Papers on Noise and Stochastic Processes, Dover Pub., 93-111. van Deventer, D. R. (2011). Pitfalls in asset and liability management: One factor term structure models, Kamakura memorandum 7. Vasicek, O. (1977). An equilibrium characterization of the term structure, Journal of Financial Economics 5, pp. 177–188. Vrontos, S., Vrontos, I., and Giamouridis, D. (2008). Hedge fund pricing and model uncertainty, Journal of Banking and Finance 32, pp. 741–753. Wang, S. (1996). Premium calculation by transforming the layer premium density, ASTIN Bulletin 26, pp. 71–92. Wang, S. (2000). A class of distortion operators for pricing financial and insurance

page 349

April 28, 2015

12:28

350

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Model Risk in Financial Markets

risks, Journal of Risk and Insurance 67, 1, pp. 15–36. Wang, S. (2002). A universal framework for pricing financial and insurance risks, ASTIN Bulletin 32, 2, pp. 213–234. Wang, S. and Dhaene, J. (1998). Comonotonicity, correlation order and premium principles, Insurance: Mathematics and Economics 22, 3, pp. 235–242. Wang, X., Phillips, P. C., and Yu, J. (2011). Bias in estimating multivariate and univariate diffusions, Journal of Econometrics 161, pp. 228–245. Wild, P. and Gilks, W. (1993). Adaptive rejection sampling from log-concave density functions, Applied Statistics 42, 4, pp. 701–709. Williams, D. (1999). Models vs. the market: Survival of the fittest, report FIN514, Meridien Research. Wilson, T. (1997). Credit risk modelling: A new approach, Tech. rep., McKinsey Inc., New York, unpublished mimeo. Wise, G. L. and Hall, E. B. (1993). Counterexamples in Probability and Real Analysis (Oxford University Press, Oxford). Yamai, Y. and Yoshiba, T. (2002). Comparative analyses of expected shortfall and value-at-risk: Their estimation error, decomposition, and optimization, Monetary and Economic Studies 20, 1, pp. 87–121. Yoshida, N. (1990). Estimation for diffusion processes from discrete observations, Journal of Multivariate Analysis 41, pp. 220–242. Young, M. and Lenk, P. (1998). Hierarchical Bayes methods for multifactor model estimation and portfolio selection, Management Science 44, 11, pp. S111– S124. Yu, J. and Phillips, P. (2001). A Gaussian approach for estimating continuous time models of short term interest rates, The Econometrics Journal 4, pp. 211–225. Zhou, C. (2001). Credit rating and corporate defaults, Journal of Fixed Income 3, 11, pp. 30–40.

page 350

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Index

antithetic sampling, 228 Asian call delta, 237 Asian call vega, 238 Asian option, 237 autocorrelation function, 224 Avellaneda-Levy-Paras model, 80

calibration, 75 Christoffersen conditional coverage backtest, 183 Christoffersen independence backtest, 183 CIR model, 23, 124, 207 Coherent risk measure, 106 Coherent risk measures, 110 comonotonicity, 146, 153 computational implementation risk, 67 conditional covariance matrix, 144 conditional coverage test, 187 conditional expectation, 135 conditioned correlation, 142 confidence intervals for VaR, 192 consol, 42 constant absolute risk aversion, 17 constant elasticity of variance model, 87, 92 Constant Relative Risk Aversion, 12 constant-elasticity of variance, 137 convex risk measures, 110 copula, 132, 145 correlation, 129 correlation breakdown, 141 correlation coefficient, 132 credibility interval, 90, 291 credible interval, 312 credit ratings, 283 credit transition matrix, 316

backtesting, 181, 199 Bayes’ formula, 84 Bayesian analysis, 312 Bayesian Credit Portfolio Model, 313 Bayesian inference, 270 Bayesian learning, 13 Bayesian logistic model, 291 Bayesian model averaging, 86, 119 Bayesian modelling, 291 Bayesian Panel Count Data model, 313 Bayesian updating, 99, 272 binomial market model, 56 Black-Monday, 118 Black-Scholes formula, 239, 245 Black-Scholes model, 87, 235 Black-Scholes-Barenblatt equation, 79 bootstrapping, 221 bootstrapping technique, 205 bridge process, 267 bridge sampling technique, 267 Brownian bridge, 267 Brownian motion, 136, 153 butterfly option, 229 351

page 351

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

352

Carte˙main˙WS

Model Risk in Financial Markets

data augmentation scheme, 269 default probabilities, 285 delta, 233 delta-gamma method, 240 Deviance Information Criterion, 293 digital option, 235 Dirac probability measure, 131 distortion operator, 60 distortion risk measures, 160 Dothan model, 123 double-loss betting strategy, 137 Edgeworth expansion, 243 Euler-Maruyama discretization, 84, 266 exact bridge processes, 267 exchangeability, 148 expectations hypothesis, 31 expected shortfall, 168, 199 exponentially weighted moving average, 169 extended Vasicek model, 47 financial gain process, 58 forward curve, 45 gamma, 233 GARCH model, 169, 178 Gaussian distribution, 131 Gelman-Rubin statistics, 92 Gibb’s canonical distribution, 102 Gibbs sampler, 299, 302 Girsanov kernel, 60 Gram-Charlier expansion, 242 Greek parameters, 90, 232 Heston model, 126 historical simulation, 169 HJM, 45 I-unstable, 123 implied volatility, 227, 245, 248, 249 independence coverage test, 187 inference for diffusion processes, 206 intra-portfolio dependencies, 313 investment portfolio analysis, 119

jump-diffusion models, 215 Knightian uncertainty, 67 Kullback-Leibler information divergence measure, 101 Kupiec backtesting test, 182 L-EH hypothesis, 39 leading order relative bias, 211 Likelihood Principle, 221 linear correlation coefficient, 140 Local Expectations Hypothesis, 34 log(-log), 302 log-linear model, 285, 288 log-log link model, 293 logistic regression model, 287 lognormal distribution, 134, 146 M-stability, 122 market incompleteness, 71 Markov Chain Monte Carlo, 87, 92, 271, 292, 297 Markov process, 137, 182 maximum likelihood estimation, 205, 244, 318 maximum likelihood estimator, 219, 222 mean-reverting process, 138 median, 133 Merton’s lognormal jump diffusion model, 253 MLE bias, 211 model averaging, 70 model identification risk, 221 model protocol risk, 75 model risk, 65, 322 model selection risk, 66, 70 model uncertainty, 106 model validation, 322 Monte Carlo simulation, 227 Moody’s ratings, 285 multivariate Gaussian distribution, 144 negative binomial distribution, 221 Nelson-Siegel, 47

page 352

April 28, 2015

12:28

BC: 9524 - Model Risk in Financial Markets

Carte˙main˙WS

Index

Newton-Raphson algorithm, 248, 249 order statistics, 192 Ornstein-Uhlenbeck model, 208 parameter estimation risk, 66, 68, 98 parameter estimation uncertainty, 90 parameter uncertainty model risk measure, 97 Pareto distribution, 174 pathwise derivative, 233, 236 pathwise estimation, 235 Pearson linear correlation, 161 piecewise Vasicek process, 212 Poisson process, 215 pricing kernel, 73 probabilities of default, 301, 312 probit model, 302 pseudo-likelihood estimators, 212 pseudo-ML estimators, 214 quantile, 165, 179 Return-to-Maturity Expectations Hypothesis, 34 sampling importance resampling, 85 Sharpe ratio, 11, 14, 15, 257 short rate, 22, 76 spurious regression, 139

353

Standard&Poor’s rating data, 301 stochastic volatility, 126 subadditivity, 109 superreplication, 57 Taylor expansion, 244 transition density, 266 transition probability density function, 267 U-statistic, 221 unbounded likelihood function, 215 uncertain volatility, 78 Utility function, 13 Value-at-Risk, 163 VaR estimation uncertainty, 179 VaR horizon, 177 VaR subadditivity, 175 VaR superadditivity, 175 Vasicek model, 22, 125, 207 vega, 233 volatility process, 80 wealth process, 121 Weibull distribution, 185 yield curve, 40 Yield-to-Maturity Expectations Hypothesis, 34

page 353