Multistate Models in Earthquake Modeling 9781119579069, 1119579066, 9781786301505

351 89 7MB

English Pages 184 Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Multistate Models in Earthquake Modeling
 9781119579069, 1119579066, 9781786301505

Table of contents :
Content: Cover
Half-Title Page
Dedication
Title Page
Copyright Page
Contents
List of Abbreviations
List of Symbols
Preface
Introduction
I.1. Motivation and objectives
I.2. Seismic hazard assessment
I.3. Earthquake occurrence models
I.3.1. Stress release models
I.3.2. Stress-based models
I.3.3. Renewal models
I.3.4. Markov and semi-Markov models
I.3.5. Hidden (semi- )Markov models
1. Fundamentals on Stress Changes
1.1. Introduction
1.2. Stress interaction
1.3. Stress changes calculation
1.4. Modeling of Coulomb stress changes for different faulting types 2.4.4. Steps number for the first earthquake occurrence2.5. Conclusion
3. Hidden Markov Renewal Models
3.1. Introduction
3.2. Semi-Markov framework
3.3. Hidden Markov renewal framework
3.4. Modeling earthquakes in Greece
3.4.1. Hitting times and earthquake occurrence numbers
3.5. Conclusion
4. Hitting Time Intensity
4.1. Introduction
4.2. DTIHT for semi-Markov chains
4.2.1. Statistical estimation of the DTIHT
4.3. DTIHT for hidden Markov renewal chains
4.4. Conclusion
5. Models Comparison
5.1. Introduction
5.2. Markov framework
5.2.1. HMM case
5.2.2. HMRM case 5.3. Markov renewal framework5.3.1. HMM case
5.3.2. HMRM case
5.4. Conclusion
Discussion & Concluding Remarks
Appendices
Appendix 1: Markov Models
Appendix 2: Hidden Markov Models
A2.1. Scoring or evaluation problem
A2.1.1. Estimation or training problem
A2.1.2. Decoding or alignment problem
Appendix 3: Dataset
References
Index
Other titles from iSTE in Mathematics and Statistics
EULA

Citation preview

Earthquake Statistical Analysis through Multi-state Modeling

“We Athenians in our persons take our decisions on policy and submit them to proper discussion. The worst thing is to rush into action before the consequences have been properly debated. And this is another point where we differ from other people. We are capable at the same time of taking risks and estimating them beforehand. Others are brave out of ignorance; and when they stop to think, they begin to fear. But the man who can most truly be accounted brave is he who best knows the meaning of what is sweet in life, and what is terrible, and he then goes out undeterred to meet what is to come.” – Abstract from Pericle’s Funeral Oration in Thucydides’ “History of the Peloponnesian War” (started in 431 B.C.)

Statistical Methods for Earthquakes Set coordinated by Nikolaos Limnios, Eleftheria Papadimitriou, George Tsaklidis

Volume 2

Earthquake Statistical Analysis through Multi-state Modeling

Irene Votsi Nikolaos Limnios Eleftheria Papadimitriou George Tsaklidis

First published 2019 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd 27-37 St George’s Road London SW19 4EU UK

John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA

www.iste.co.uk

www.wiley.com

© ISTE Ltd 2019 The rights of Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou, George Tsaklidis to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Control Number: 2018957211 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-78630-150-5

Contents

. . . . . . . . . . . . . . . . . . . . .

ix

List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . .

xi

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .

xix

Chapter 1. Fundamentals on Stress Changes . . . . . .

1

List of Abbreviations

1.1. Introduction . . . . . . . . . . . . . . . . . . 1.2. Stress interaction . . . . . . . . . . . . . . 1.3. Stress changes calculation . . . . . . . . . 1.4. Modeling of Coulomb stress changes for different faulting types . . . . . . . . . . . . . 1.4.1. ΔCS for strike-slip faulting . . . . . . 1.4.2. ΔCS for dip-slip faulting . . . . . . . . 1.5. Seismicity triggered by stress transfer . . . . . . . . . . . . . . . . . . . . . . . 1.5.1. Triggering of strong earthquakes . . . . . . . . . . . . . . . . . . . 1.5.2. Aftershock triggering . . . . . . . . . . 1.5.3. Triggering of mining seismicity . . . 1.6. Discussion on stress interaction . . . . .

. . . . . . . . . . . . . . .

1 4 12

. . . . . . . . . . . . . . .

15 15 16

. . . . .

21

. . . .

21 23 28 31

. . . .

. . . .

. . . .

. . . .

vi

Earthquake Statistical Analysis through Multi-state Modeling

Chapter 2. Hidden Markov Models . . . . . . . . . . . . . 2.1. Introduction . . . . . . . . . . . . . . . . . . . 2.2. Hidden Markov framework . . . . . . . . . 2.3. Seismotectonic regime and seismicity data . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. Application to earthquake occurrences . . . . . . . . . . . . . . . . . . . . . . 2.4.1. Two hidden states and three observation types . . . . . . . . . . . . . . . . . 2.4.2. Three hidden states and three observation types . . . . . . . . . . . . . . . . . 2.4.3. Model selection and simulation . . . . . 2.4.4. Steps number for the first earthquake occurrence . . . . . . . . . . . . . . . . . . . . . 2.5. Conclusion . . . . . . . . . . . . . . . . . . .

35

. . . . . . . .

35 37

. . . .

42

. . . .

44

. . . .

45

. . . . . . . .

48 50

. . . . . . . .

53 54

Chapter 3. Hidden Markov Renewal Models . . . . . . .

57

3.1. Introduction . . . . . . . . . . . . . . . 3.2. Semi-Markov framework . . . . . . . 3.3. Hidden Markov renewal framework 3.4. Modeling earthquakes in Greece . . 3.4.1. Hitting times and earthquake occurrence numbers . . . . . . . . . . . 3.5. Conclusion . . . . . . . . . . . . . . .

. . . .

57 58 65 66

. . . . . . . . . . . . . . . .

69 73

Chapter 4. Hitting Time Intensity . . . . . . . . . . . . . .

75

4.1. Introduction . . . . . . . . . . . . . . . 4.2. DTIHT for semi-Markov chains . . 4.2.1. Statistical estimation of the DTIHT . . . . . . . . . . . . . . . . . 4.3. DTIHT for hidden Markov renewal chains . . . . . . . . . . . . . . . . . . . . . 4.3.1. Statistical estimation of the DTIHT . . . . . . . . . . . . . . . . . 4.4. Conclusion . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . . . . .

75 76

. . . . . . . .

78

. . . . . . . .

83

. . . . . . . . . . . . . . . .

85 87

Contents

Chapter 5. Models Comparison . . . . . . . . . . . . . . . 5.1. Introduction . . . . . . . . . . 5.2. Markov framework . . . . . 5.2.1. HMM case . . . . . . . . . 5.2.2. HMRM case . . . . . . . . 5.3. Markov renewal framework 5.3.1. HMM case . . . . . . . . . 5.3.2. HMRM case . . . . . . . . 5.4. Conclusion . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

89

. . . . . . . .

89 90 92 92 93 95 96 97

Discussion & Concluding Remarks . . . . . . . . . . . .

99

Appendices

. . . . . . . .

vii

. . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Appendix 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Appendix 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Appendix 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

List of Abbreviations

ΔCF F

Coulomb failure function

AIC

Akaike’s information criterion

a.s.

almost surely

BIC

Bayesian information criterion

EM

expectation-maximization

EMC

embedded Markov chain

HMC

hidden Markov chain

HMRC

hidden Markov renewal chain

HMM

hidden Markov model

HMRM

hidden Markov renewal model

HSMM

hidden semi-Markov model

HSMC

hidden semi-Markov chain

MC

Markov chain

MLE

maximum likelihood estimator

MRC

Markov renewal chain

PHMM

Poisson hidden Markov model

SMC

semi-Markov chain

SMM

semi-Markov model

SMK

semi-Markov kernel

List of Symbols

N

set of non-negative integers, N∗ = N\{0}

E

state space of the (semi-)Markov chain

E∗

state space of the hidden Markov renewal chain

L

state space of the Markov renewal chain

A

observation space

ME

real matrices defined on E × E

ME×A

real matrices defined on E × A

ME (N)

matrix-valued functions with values in ME and defined on N

A1 ∗ A2

(discrete-time) matrix convolution product on A1 , A2 ∈ ME (N)

J = (Jn )n∈N

embedded Markov chain

X = (Xn )n∈N

sojourn times between successive jumps

S = (Sn )n∈N

jump times

Y = (Yn )n∈N

sequence of observations

Z = (Zk )k∈N

(semi-)Markov chain

U = (Uk )n∈N

backward recurrence times

xii

Earthquake Statistical Analysis through Multi-state Modeling

(Z, U) = (Zk , Uk )n∈N

double Markov chain

(J, S) = (Jn , Sn )n∈N

Markov renewal chain

(J, S, Y) = (Jn , Sn , Yn )n∈N

hidden Markov renewal chain

(Z, Y) = (Zk , Yk )k∈N

hidden (semi-)Markov chain

M

fixed censoring time

HM

observation of a trajectory in [0, M ]

N (M )

jumps number of the EMC up to time M

Ni (M )

visits number of the EMC to state i up to time M

Nij (M )

jumps number of the EMC from state i to state j up to time M

N(i,s) (M )

visits number of the MRC to state (i, s) up to time M

N(i,s1 )(j,s2 ) (M )

transitions number of the MRC from state (i, s1 ) to state (j, s2 ) up to time M

q = (qij (k))i,j∈E,k∈N

semi-Markov kernel

Q = (Qij (k))i,j∈E,k∈N

cumulative semi-Markov kernel

P = (pij )i,j∈E

transition matrix of the EMC

f = (fij (k))i,j∈E,k∈N

conditional sojourn time distribution

h = (hi (k))i∈E,k∈N

sojourn time distribution

F = (Fij (k))i,j∈E,k∈N

cumulative conditional sojourn time distribution

H = (Hi (k))i∈E,k∈N

cumulative sojourn time distribution

List of Symbols

xiii

H = (H i (k))i∈E,k∈N

survival function of sojourn times

α = (α(i))i∈E

initial distribution of the EMC

m = (mi )i∈E

mean sojourn time

μ = (μi )i∈E

mean recurrence time

π = (πi )i∈E

stationary distribution of the SMC

ν = (νi )i∈E

stationary distribution of the EMC

 a

initial distribution of the double Markov chain

P

transition matrix of the double Markov chain

 π

stationary distribution of the double Markov chain

a 

initial distribution of the MRC

P

transition matrix of the MRC

π

stationary distribution of the MRC

T(i,j)

first passage time of the Markov chain (Z,Y) in state (i, j)

a

initial distribution of the Markov chain (Z,Y)

L

likelihood function

R

emission probability matrix

pij (M ), qij (k, M ), . . .

estimators of pij , qij (k), . . .

N (0, 1)

standard normal distribution (μ = 0, σ = 1)

L

− →

convergence in distribution

1A

indicator function of a set A

ΔCF F

change in Coulomb failure function

ΔCS

Coulomb stress change

Δp

pore pressure change

Δτ

shear stress change

Δσ

fault-normal stress change

xiv

Earthquake Statistical Analysis through Multi-state Modeling

Δt

time shift

Σ

planar fault surface

μ

coefficient of friction

μ

apparent coefficient of friction

B

Skempton’s coefficient

τ

long-term stressing rate

U

uniform dislocation

δij

Kronecker delta

Preface

Statistical seismology attracts the attention of seismologists, statisticians, geologists, engineers, government officials and insurers among others, since it serves as a powerful tool for seismic hazard assessment and, consequently, for risk mitigation. This field aims to connect physical and statistical models and to provide a conceptual basis for the earthquake generation process. To date, purely deterministic models have inadequately described earthquake dynamics. This is mainly due to the restricted knowledge concerning fundamental state parameters related to the causative process, such as stress state and properties of the medium. Comparing the deterministic approaches with the stochastic ones, we should note that, today, the latter are the most favorable. Stochastic processes allow for the efficient modeling of real-life random phenomena and the quantification of associated indicators. This book is intended as a first, but at the same time, a systematic approach for earthquake multi-state modeling by means of hidden (semi-)Markov models. It provides a presentation of bibliography sources, methodological studies and the development of stochastic models in order to reveal the mechanism and assessment of future seismogenesis. This book aims to ease the reader in getting and exploiting conceivable tools for the application of multi-state models to

xvi

Earthquake Statistical Analysis through Multi-state Modeling

concrete physical problems encountered in seismology. It also aims to encourage discussions and future modeling efforts in the domain of statistical seismology, by tackling from, theoretical advances to very practical applications. This book is concerned with several central themes in a rapidly developing field: earthquake occurrence modeling. It contains seven chapters and three appendices and begins with two lists containing abbreviations and symbols used throughout the book. Next is an introduction that describes the state-of-the-art earthquake modeling approaches that focus on multi-state models. Chapter 1 introduces the reader to the crustal stress state, stress changes and evolution and the association with earthquake generation. The complexity of this process is then investigated by using advanced stochastic models in the chapters that follow. Chapter 2 presents a multi-state modeling approach that enables the description of strong seismicity in the broader Aegean area from 1865 to 2008. This chapter aims to help the reader to acquaint with the application of hidden Markov models and it presents a detailed example of multi-state modeling in seismology. In particular, hidden Markov models are used to shed some new light on the “hidden” component that controls the generation of earthquakes: the stress field. Our purpose is to assess the evolution of the stress field and its inherently causative role in both the number and size of earthquakes. Chapter 3 presents (hidden) semi-Markov models and their associated stochastic processes. It contains all statistical estimation tools that enable the reader to estimate indicators of interest associated with the occurrence of strong earthquakes.

Preface

xvii

Chapter 4 presents theoretical results for the statistics of stochastic processes that can have direct applications in seismology. It aims to teach how the study of a real-life phenomenon could lead to the development of the statistics of stochastic processes and thereafter how these theoretical results could be further used to answer open questions regarding the phenomenon under study. Chapter 5 gives some guidelines for the comparison of multi-state models and provides specific numerical examples. This last chapter is a collection of concluding remarks, open questions and perspectives in the field. At the end of the book, three appendices are provided. Appendix 1 presents some main definitions of Markov models. Appendix 2 presents how the three problems regarding hidden Markov models could be solved. Appendix 3 presents the dataset used throughout the book. The authors express their gratitude to M. Hamdaoui for his technical help and assistance. The changes and evolution of the stress field were calculated using the program written by J. Deng [DEN 97a] based on the DIS3D code by S. Dunbar and Erikson (1986) and the expressions of G. Converse. Some of the figures were plotted using the Generic Mapping Tools algorithm [WES 98]. This book will be useful to applied statisticians and geophysicists interested in the theory of multi-state modeling. It can also be useful to students, teachers and professional researchers who are interested in statistical modeling for earthquakes. Irene V OTSI Nikolaos L IMNIOS Eleftheria PAPADIMITRIOU George T SAKLIDIS October 2018

Introduction

“Act with awareness” — Pittacus of Mytilene I.1. Motivation and objectives Earthquakes constitute one of the most lethal natural hazards resulting in more than 200.000 casualties worldwide each decade. They can become vastly devastating and life-threatening, as in the cases of the recent 2011 M 8.9 Japan, the 2008 M 8.0 Sichuan China and the 2004 M 9.3 Sumatra earthquakes. Earthquake forecasting is therefore a social demand, and scientific efforts have to be intensified for this scope. The quest for earthquake prediction dates back to times when superstition prevailed, and prediction was the domain of occultism and this search is still ever-present. Despite more than a century of research, research on earthquake prediction has undergone broad criticism and skepticism, is continuously debatable and continuously remains as an insolvable but highly attractive scientific problem. At the outset, clarification is needed regarding the usage of the term “prediction” in seismology. Reliable earthquake predictions are considered the ones that provide a

xx

Earthquake Statistical Analysis through Multi-state Modeling

space–time–magnitude range, including the magnitude scale (i.e. local magnitude, moment magnitude, etc.) and the number of earthquakes expected in this range (i.e. zero, one, at least one, etc.). The forecast or prediction of an earthquake is a statement about time, hypocenter location, magnitude and the probability of occurrence of an individual future event within reasonable error ranges [ZÖL 09]. The prediction was continuously expressed as the occurrence probability of an earthquake, in a given time, space and magnitude range. The definition of this range constitutes a scientific target by itself. The techniques developed for this scope were diverse, and thus, earthquake prediction was discriminated in a short term, when the referred time interval concerned a day to a few hundred days before a strong earthquake, an intermediate term covering the interval from about one year to one decade and a long term for intervals longer than a decade [KNO 96]. I.2. Seismic hazard assessment For the evaluation of seismic hazard, a set of parameters are used that express the intensity of ground motion. Thus, the probability of exceedance of predefined parameter values in a specified exposure time needs to be calculated. For the seismic hazard assessment at a specific site, either the deterministic approach or the stochastic approach is followed. In the deterministic approach, the ground shaking at the site is estimated from one or more earthquakes of a specified location and magnitude. Deterministic earthquake scenarios may be based on the actual occurrences of past events, or they may be postulated scenarios backed by analysis of seismological and geological data. The other approach is the probabilistic method, in which the contributions from all possible earthquakes around the site are integrated to find the shaking to not overpass a certain probability estimate at that place in some time period.

Introduction

xxi

Both approaches exhibit strong and weak points. “The deterministic approach provides a clear and trackable method of computing seismic hazard, whose assumptions are easily discerned. It provides understandable scenarios that can be related to the problem at hand. However, it has no way for accounting for uncertainty. Conclusions based on deterministic analysis can be easily upset by the occurrence of new earthquakes”. The probabilistic approach to seismic hazard calculations originally proposed by Cornell [COR 68] uses an integration of the anticipated ground motion produced by all earthquake sources and magnitudes comprised in a specific area around the site of interest, for calculating the probabilities of certain levels of the ground motion there. In this way, the probabilistic method provides the potential of the specific ground motion exceedance during some time period. “The probabilistic approach is capable of integrating a wide range of information and uncertainties into a flexible framework. Unfortunately, its highly integrated framework can obscure those elements that drive the results and its highly quantitative nature can lead to false impressions of accuracy”. I.3. Earthquake occurrence models Comparing the deterministic approach for seismic hazard assessment with the probabilistic one, we should note that the latter is the most favorable today, relying on stochastic models for estimating the probabilities of generation of strong earthquakes. Deterministic earthquake prediction is still far from becoming feasible for practical applications, whereas the probabilistic one is realistic. Besides data analysis, the modeling of the earthquake process is essential for a deep understanding and potential forecasts of the earthquake process. Progress in earthquake modeling can be assessed by examining different model classes. The two main classes are stochastic models and

xxii

Earthquake Statistical Analysis through Multi-state Modeling

physics-based models [HAI 09]. Here, we focus on stochastic models serving as a tool for probabilistic seismic hazard assessment. Let us first provide the fundamental difference between a stochastic model and a physical model. The main difference between a stochastic model and a physical model is that the former, in contrast to the latter, considers that the physical process depends on some random aspects and therefore could not be fully understood. These random aspects are taken into account in the stochastic modeling and are expressed by means of parameters or associated stochastic processes. The stochastic models could enable us to quantify the parts of the physical process that are accessible to direct measurement, the parts that are due to its randomness and the associated uncertainties. On the other hand, the physical models aim to achieve full understanding and prediction of the physical process. The strict discrimination between the stochastic and physical models, however, cannot be unambiguously performed, since a large percentage of the models comprise physical, stochastic and empirical components. Stochastic models play two main roles in their diverse fields of application [VER 10b, VER 10a]. First, in statistical mechanics, stochastic models aim to understand the associated physical process itself. Second, they aim to achieve planification, decision-making and/or prediction. Earthquake occurrence models are further divided into time-dependent and time-independent ones. The main assumption of the time-independent earthquake occurrence models is that the number of earthquake occurrences follows the Poisson distribution. In this case, the only information that is needed in order to calculate the associated probabilities is the mean recurrence times. The most common time-independent stochastic model of earthquake occurrences is the Poisson model, which assumes that earthquake occurrence does not depend on time. This model considers that the epicenters and times of earthquakes that exceed a certain threshold

Introduction

xxiii

magnitude correspond to the realization of a temporally homogeneous Poisson process and serve as a test bed for comparisons with more complicated models. In contrast, according to the time-dependent earthquake occurrence models, the probability of an earthquake occurrence is not independent of the other event times. These models require not only the mean recurrence times of earthquakes but also the variance of the frequency of earthquakes and the time since the last event. Given that the memoryless property that characterizes the Poisson process is quite restrictive, other stochastic models have been considered. The time-dependent models are particularly appealing since they provide results that are consistent with the elastic rebound theory of earthquakes. Renewal, Markov and semi-Markov models belong to this category. Earthquake occurrence models, either time-independent or time-dependent, are based on assumptions regarding the magnitude–frequency distribution. The simplest among them is the “characteristic earthquake model”, in which all strong earthquakes associated with a certain fault segment are assigned similar magnitudes, average displacements and rupture lengths, while the Gutenberg–Richter magnitude– frequency distributions and multi-segment ruptures involve more complexities. The time-dependent models are more complicated with more input parameters and assumptions. The stochastic models constitute an extended version of physical models since they aim to explain the variability of the observations and the hidden features underlying the physical process. Let us now describe some of the most important stochastic models that have been used to model earthquake occurrences. Each model provides a different type of insight into the physical process and its randomness. The merit of models lies in the degree to which they can explain a composite phenomenon.

xxiv

Earthquake Statistical Analysis through Multi-state Modeling

I.3.1. Stress release models Point process modeling for earthquake data was introduced by Vere-Jones [VER 66, VER 70, VER 78]. In this type of modeling, a seismic sequence is interpreted as a realization of a point process. The statistical inference of point processes is based on the conditional intensity function, which characterizes entirely a point process model. The corresponding models, known as stress release models, constitute the probabilistic translation of Reid’s theory of the elastic rebound [REI 10], widely used in the analysis of the historical catalogues for China, Japan and Iran. These models are based on the Cramér–Lundberg model, widely used in finance. The main idea here is that when an earthquake occurs, a sudden decrease in the stress level is observed along the fault; then, the tectonic strain rebuilds gradually over time and the next earthquake occurs when the stress exceeds a certain threshold. Different versions of the stress release models exist depending on the different conjectures on the physical process, which evolves in the regional scale: in the simple stress release model, a unique physical process acts in the region; in the so-called independent model, a different physical process, with different loading rate, is present in each sub-region, whereas the linked (or coupled) stress release model takes into account positive or negative interactions among different zones. Zheng and Vere-Jones [ZHE 91, ZHE 94] thoroughly studied the simple model and also applied it to the historical catalogues of Japan, Iran and China. Liu et al. [LIU 99] introduced the linked stress release model, which considers the stress transfer between different seismic areas and therefore describes the spatial interaction of earthquakes. Lu et al. [LU 99] applied linked stress release models to a historical catalogue of Japanese earthquakes and provided forecasting results based on the best fitted model. Bebbington and Harte [BEB 01, BEB 03] as well as

Introduction

xxv

Bebbington [BEB 05], introduced procedures that allow us to study the robustness and the significance of predicted interactions when linked stress release models are applied. A Bayesian approach was adopted to make inference about the parameters of the stress release models, which were applied to some Italian seismogenic zones in [ROT 07]. Later, a frequentist approach was followed to describe the earthquake occurrences of the central Ionian Islands (Greece) by means of stress release models in [VOT 11]. In addition to the modeling of temporal sequences through simple point processes, multivariate or marked point processes, could be used to model space–time or time– magnitude sequences; in this way, additional information coming from precursor phenomena could be included in the processes. For an overall presentation of the aforementioned models, see Vere-Jones [VER 95] and Ogata [OGA 88]. For recent advances in the topic, see LLenos and Michael [LLE 13] or Bray and Schoenberg [BRA 13] and references therein. I.3.2. Stress-based models In the last 20 years, an increasing recognition has been observed in the strong influence of stress changes on the place of occurrence and time of future earthquakes. Deng and Sykes [DEN 97a, DEN 97b] proposed the stress evolutionary model, a model after calculating the Coulomb stress changes, and originally tested it in southern California. It is based on the effect that the displacement in the elastic seismogenic layer exerts on the components of the stress tensor, from which the shear and normal stress can be calculated onto the target fault (or receiver). An increase of shear stress in the slip direction and a decrease of normal stress result in encouragement of the receiver for failure. According to this model, the slow tectonic loading onto the elastic part of the major regional faults is taken into account and added along with the coseismic displacements of the strong earthquakes.

xxvi

Earthquake Statistical Analysis through Multi-state Modeling

For the static stress calculations, the faults that failed are approximated as static dislocations embedded into an elastic half-space, and the coseismic on slip of the stronger events along with the longterm slip rates onto the important regional faults, are summing together. For causative faults, the demanded parameters include the faulting type (strike, dip and rake), its dimensions (length and width) and the average coseismic slip. The faulting type of the target seismic source (the triggered fault) is also necessary to calculate the resulting stress tensor accordingly. From previous investigations based on the model, it has become clear that the decreases in the seismicity rate are found in locations of stress relaxation, resulting from negative values of Coulomb stress changes, ΔCF F < 0 (shadow zones of ΔCF F ). In areas where ΔCF F was enhanced (bright regions of ΔCF F ), the next large event is expected to occur. For this reason, the sites with increased stress must be seen with caution as more possible candidates to host future earthquake activity. Although these calculations are effective for the qualitative analysis and identification of possible locations of failure, in order to provide quantitative estimates, the translation of these changes into earthquake occurrence probabilities is requested. The broadly accepted method for this scope is the rate-state formulation [DIE 94, DIE 96], which combines BPT renewal with a physical model [MAT 02], involving the computation of ΔCF F . The stress changes are considered in the introduction of a permanent shift in the time since the last event (clock advance or delay) or by a modification of the expected mean recurrence time [CON 10]. Positive or negative Coulomb stress changes onto a fault segment permanently decrease or increase, respectively, the expected time until the next earthquake due to tectonic loading alone, and in this way, the conditional probability.

Introduction

xxvii

The pioneer work of Stein et al. [STE 97] showed that the locations of 9 out of 10 earthquakes with M ≥ 6.7 were sites of positive Coulomb stress changes, and furthermore provided probability estimates with the inclusion of these stress changes. Nalbant et al. [NAL 98] studied the stress transfer due to coseismic slips of M ≥ 6.0 earthquakes that occurred in the North Aegean Sea and northwestern Turkey, while Papadimitriou and Sykes [PAP 01b] calculated the stress changes resulting from both the coseismic slip and the long-term slip on the major regional faults. Hubert-Ferrari et al. [HUB 99] calculated the stress changes after taking into account the coseismic displacements of the earthquakes with M ≥ 6 that have occurred since 1700 and the continuous interseismic stress loading, and found that two 1999 earthquakes were located in stress-enhanced areas. In the area of southeastern Aegean, the evolutionary model is capable of determining that strong (M ≥ 6.5) earthquakes form temporal clusters [PAP 05]. A comprehensive review is provided by Paradisopoulou et al. [PAR 10] to determine the application of the model in the area of the eastern Aegean Sea and western Turkey. The stress evolutionary model was widely applied to sub-regions of the study area (see, for example, [PAP 02, PAP 03, PAP 05, PAR 12, LEP 12] among others). I.3.3. Renewal models Renewal models were proposed for long-term earthquake forecasting by Utsu [UTS 72], Rikitake [RIK 74] and Hagiwara [HAG 74] around one quarter century ago. Based on the model proposed by them, the elastic strain energy is gradually accumulated in the seismogenic layer alone during the interseismic period and then released in the next earthquake on that specific fault or fault segment. This concept of continuous accumulation and abrupt release of strain energy emerges from the elastic rebound theory introduced by Reid [REI 10]. In the frame of renewal processes, the roles of the specific survival function and the

xxviii

Earthquake Statistical Analysis through Multi-state Modeling

implied hazard rate are crucial. To calculate the probability of occurrence of a certain earthquake at a specified time in the future, the proper probability distribution has to be selected. Aiming to contribute to this problem, a considerable number of renewal models were applied referring to different distribution choices, including Gaussian [RIK 74], Weibull [HAG 74, UTS 84, RIK 99], gamma [UDI 75, UTS 84], Poisson [JAC 98] and lognormal [NIS 87] distributions. All of these distributions have two parameters, except Poisson, which only has one. They represent the observations satisfactorily, and there is no one distribution that is particularly most efficient to represent earthquake occurrence. Even though the Poisson model might be rejected against a renewal model [ELL 99], the discrimination of the most appropriate renewal model is obscured by the deficiency and uncertainty of earthquake recurrence data. Regardless of the choice of distribution, it can generally be established that a distribution that fits the short interevent times well does not fit the long interevent times and vice versa. To overcome this problem, Garavaglia et al. [GAR 11] proposed a renewal model that includes the mixture exponential–Weibull distribution. Using this mixture renewal model, they calculated conditional probabilities of occurrence having a rather sufficient degree of credibility for the Italian dataset. The renewal process is not free from criticism: (1) all past history, not only the last event, influences earthquake generation; (2) independent repeat times exhibit a uniform distribution and (3) the main reason of seismicity can suggest the Poisson process. Different approaches were followed, except the completely mathematical ones, in [MAT 02], where the seismic cycle in a fault segment is modeled by the time evolution of the so-called Brownian relaxation oscillator. The failure time distribution in this model, which is called the Brownian

Introduction

xxix

passage time (BPT) distribution, has a two-parameter analytic closed form, which is known as the inverse Gaussian distribution in the statistics literature. This model performs well and has been used for the prediction of large earthquake recurrence intervals in California [ELL 99]. I.3.4. Markov and semi-Markov models Although renewal models do provide estimates of the probability of occurrence of strong earthquakes, the estimates are independent of their magnitude and depend only on the time since the last great earthquake. This type of dependency can be incorporated by the use of Markov models, the composition of datasets and the definition of their state space in terms of earthquake magnitudes. Markov models have been available for seismic hazard assessment since 1980. By comparing the different models, a conclusion is drawn that the Poisson models are more efficient in moderate-activity regions but with frequent occurrences, while the Markov models have proved more capable of simulating the temporal succession of stronger, rather infrequent earthquakes. Tsapanos and Papadopoulou [TSA 99] modeled earthquake occurrences using a discrete-time Markov model in the areas of southern Alaska and the Aleutians Islands, which are among the most actively deforming areas in the world, frequently experiencing large earthquakes. The states of the model were based on seismic zonation [PAP 94]. The frequency of visits and the transition probabilities in each of the defined states were calculated for thresholds of different magnitudes. Moreover, each zone was considered to undergo an “active” or “inactive” state. For each zone, the transition probabilities between the active and inactive states were calculated along with the mean duration of an active period (in years). An overview of the investigations based on Poisson, renewal and branching models is given by Anagnos and Kiremidjian [ANA 84] in a thorough description of

xxx

Earthquake Statistical Analysis through Multi-state Modeling

different models, their stochastic expressions and the parameters necessary for the relevant applications. Markov models take into account the dependencies of future earthquakes on the magnitudes of previous earthquakes; however, they do not consider the time since the last event. This is due to the memoryless property of the geometric distribution (discrete time case) and the exponential distribution (continuous time case). A case of Markov models concerns the semi-Markov ones, which can combine both the aforementioned dependencies and can be applied to characterize location-specific patterns of earthquake recurrence, using physical models and taking into account strain accumulation and release and sporadic times for earthquake occurrence. Especially, semi-Markov models were introduced in seismology by Patwardhan et al. [PAT 80], who argued that the times of earthquake occurrences and the associated magnitudes are not randomly distributed. They applied a semi-Markov model to a dataset that includes earthquakes with magnitudes M ≥ 7.8 that occurred in the Pacific Belt. The same model was applied by Cluff et al. [CLU 80] to the Wasatch Fault zone, Utah, for earthquakes of magnitudes M ≥ 6.5. Patwardhan et al. [PAT 80] and Cluff et al. [CLU 80] used historical and geological data; however, they followed Bayesian procedures to enrich the available data and to calculate the corresponding transition probabilities. Many other semi-Markov models were developed on the basis of the slipand time-predictable hypothesis. Anagnos and Kiremidjian [ANA 84] applied a semi-Markov model to describe the time-predictable earthquake sequences that occurred in the region of Parkfield. After 1 year, Anagnos and Kiremidjian [ANA 85] applied a finite-state time-predictable stochastic model assuming that the interarrival times follow the Weibull distribution and the earthquakes are spatially dependent. Guagenti and Winterstein [GUA 84] applied a semi-Markov model for time- and slip-predictable earthquake

Introduction

xxxi

sequences. Later, a semi-Markov model for combined timeand slip-predictable earthquake sequences was studied by Cornell and Winterstein [COR 86]. Lutz and Kiremidjian [LUT 95] used a generalized semi-Markov model (see, for example, [WHI 80]) to describe both time and spatial dependences of earthquakes occurred in the northern San Andreas Fault. In this model, several earthquakes were associated with each state visited by the semi-Markov process, unlike the “classical” model, in which only one event is associated with each state. On the basis of this model, long-term seismic hazard estimates important for engineering applications were obtained. Sadeghian and Jalali-Naini [SAD 08] applied a discrete-time semi-Markov model, where states correspond to both seismic zones in the region of Iran [KAR 94] and earthquake magnitudes. The obtained results were validated by comparing the difference between the observed earthquake occurrence probabilities and the forecasted ones. Altinok and Kolcak [ALT 99] applied a semi-Markov model to earthquakes that occurred in the North Anatolian region of Turkey in the 20th Century and estimated the transition probabilities between region and magnitude states. Sadeghian [SAD 10] applied the aforementioned model, in which seismic zones were defined by means of fault lines. The authors studied the impact of the definition of zones on the forecasting of earthquakes. In more recent studies, sojourn times have been assumed to follow a Weibull distribution and the maximum likelihood estimators of the model parameters have been obtained. A parametric approach was followed by Garavaglia and Pavani [GAR 11] to estimate the parameters of the model that was applied to collect the data of earthquakes that occurred in Turkey in the 20th Century. This model differs from the previous one presented in [ALV 05], in that it assumes a mixture of exponential and Weibull distributions for sojourn times and therefore is more coherent with the physical earthquake generation process. More recently, a parametric semi-Markov

xxxii

Earthquake Statistical Analysis through Multi-state Modeling

approach has been adopted in [MAS 12]. Both homogeneous and non-homogeneous semi-Markov models in continuous time were applied and their characteristics were estimated. The sojourn times were considered to follow generalized Weibull distributions with parameters being estimated by the MLE method. I.3.5. Hidden (semi-)Markov models Hidden Markov models (HMMs) are one of the most successful statistical models that have gained popularity in the last 40 years. The use of hidden or unobservable states makes the model generic enough to describe several complex real-life problems on the one hand and the relatively simple dependence structure enables the use of efficient computational procedures on the other hand. In particular, HMMs were introduced in 1966 in [BAU 66], where they were referred to as probabilistic functions of chains. Since then, their theory has been substantially developed and many new results have been obtained. For a detailed introduction to methods and applications of HMMs in speech recognition, we refer the interested reader to [RAB 89]. HMMs have been widely applied in the fields of speech recognition [RAB 89, RAB 93], computational biology [KRO 94] and signal processing [ELL 04]. Although Markov models are widely applied in seismology, very few applications of hidden Markov models have been made in the field mostly focusing on continuous-time HMMs. We first give a brief description of the continuous-time HMMs applied to seismology. Granat and Donnellan [GRA 02] classified earthquakes that occurred in southern California by means of a modified HMM, which uses a deterministic annealing method for the scientific analysis of global positioning system data. In the HMM proposed by Ebel et al. [EBE 07], observations correspond to both sojourn times and spatial quadrants of the associated earthquakes. In

Introduction

xxxiii

particular, the sojourn times were considered to follow the exponential distribution with parameters associated with hidden states. The probability distribution of the location consisted of the probabilities of the ensuing earthquakes in the different quadrants. The probability that an earthquake occurs within 1 day since the last event was calculated for each quadrant. Seismic signals of possible volcanic origin in Tenerife were identified and classified by means of continuous-time HMMs [BEY 08]. As for the modeling of discrete-time HMMs in seismology, it exclusively concerns an incident of HMMs: the Poisson hidden Markov models (PHMMs). In these models, the observations are governed by Poisson distributions (see, for example, [ZUC 09, COO 04, COO 91]). Various applications using this model exist, for example, traffic modeling [SCO 03] and inventory control [CHI 97]. Zucchini and MacDonald [ZUC 09] applied stationary PHMMs to realize earthquake counts. More recently, Orfanogiannaki et al. [ORF 14] have used PHMMs to model earthquake frequencies in a seismogenic area of western Greece for the 1990–2006 period. They used these models to identify seismicity changes and seismicity clusters associated with strong events. Wu [WU 10] applied an HMM in earthquake declustering and compared it with the epidemic-type aftershock sequence model, using a dataset of the central and western regions of Japan. Recently, HMMs have attracted significant scientific interest, as a result of which the understanding of their statistical properties has been improved and asymptotically optimal algorithms have been suggested to solve the three corresponding problems, i.e. the estimation, evaluation and decoding problems. Several statistical inference aspects have been considered, such as the estimation of the order of HMMs [RYD 94] and the study of the asymptotic behavior of the maximum likelihood estimators [LER 92, RYD 95, BIC 98, DOU 01, DOU 04]. The most popular method for solving the estimation problem in HMMs is the EM

xxxiv

Earthquake Statistical Analysis through Multi-state Modeling

(expectation-maximization) algorithm, introduced in its full generality by Dempster et al. [DEM 77] in their landmark paper. The version of EM algorithm applied to HMMs is the Baum–Welch algorithm [BAU 70]. However, since the log-likelihood function can be evaluated routinely, it is feasible to perform parameter estimation by gradient-based algorithms. The advantages and disadvantages of both estimation procedures in terms of their convergence rates, model parameterizations and dependence on the initial values are discussed in detail in [CAP 10, BUL 06, ZUC 09]. A quick search of the literature shows that for HMMs, in particular, and for incomplete data models, in general, the EM algorithm is much more popular than the gradient-based alternatives. A major constraint of HMMs is the fact that the time spent in a given state could only be described by means of the geometric distribution. In order to overcome this limitation, some more general models, such as the hidden semi-Markov models (HSMMs), were introduced by Ferguson [FER 80]. HSMMs, first applied for machine speech recognition, consider that the underlying process forms a semi-Markov chain (SMC) with a variable sojourn time for each state. Besides, Ferguson [FER 80] showed that an HSMM could be seen as an HMM if both the state and the sojourn time elapsing in this state are considered as a complex HMM state. Since their introduction, HSMMs have been applied in numerous scientific areas, including speech recognition, human activity recognition and network anomaly detection. They have also been applied in meteorological contexts (see, for example, [SAN 98, SAN 99, SAN 00]). Depending on their type and the application field, HSMMs are also known as “explicit-duration HMMs”, “variable-duration HMMs”, “HMMs with explicit duration”, “generalized HMMs”, “segmental HMMs” as well as “segmental models”. Although

Introduction

xxxv

HSMMs have been successfully applied in many scientific fields, they have not been widely applied in earthquake modeling. In particular, HSMMs have been applied to automatically detect earthquakes and to classify them for an efficient elaboration of large seismic datasets [BEY 11]. The problems related to statistical inference include the forward–backward algorithm, the maximum likelihood and the maximum a posteriori estimates of the state sequence. The interested reader can refer to Yu [YU 10] for important inference problems related to HSMMs. Concerning especially the estimation problem, Ferguson [FER 80] proposed an EM algorithm to obtain the MLEs of the parameters of HSMMs. Since then, several alternatives of the EM algorithm have been introduced for both parametric and non-parametric cases. Levinson [LEV 86], Guédon and Cocozza-Thivent [GUÉ 90], Durbin et al. [DUR 98], Sansom and Thomson [SAN 01], Guédon [GUÉ 03, GUÉ 05], Bulla and Bulla [BUL 06] and Barbu and Limnios [BAR 08] are a few typical references. Regarding the asymptotic properties of the MLEs for the non-parametric HSMMs, they were first studied in [BAR 06]. The asymptotic properties of the MLEs for general HSMMs with backward recurrence time dependence were studied in [TRE 09].

1 Fundamentals on Stress Changes

“We still do not know one thousandth of one percent of what nature has revealed to us” — A. Einstein 1.1. Introduction Earthquake generation is the result of the accumulation and release of strain on a given fault or fault segment. External stress produces deformation (strain), which under elastic conditions leads eventually to failure. The elastic deformation is instantaneous and is completely recoverable when the applied stress is removed. When a linear relation exists between the applied stress and the resultant strain, the material is characterized as purely elastic. This assumption is a good approximation for small deformations. In the case where the material continues to deform beyond the elastic limit, it undergoes permanent deformation and failure occurs due to the breakdown of interatomic bonds. Earthquakes are generated by displacement on discontinuities in the elastic part of the lithosphere, with the seismogenic faults assumed to maintain the elastic properties. The continuous plate motion loads the faults and

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

2

Earthquake Statistical Analysis through Multi-state Modeling

fault segments that are located along the plate boundaries, for example, and the resulting accumulated strain will culminate in a slip onto the fault surfaces. Given that the plate motion is considered stable, the strain loading and release is expected to be regular in time, unless these fault segments are not expected to follow the stick and slip stages completely independently. Successive earthquake occurrences are usually interdependent [SCH 90]. This implies that a slip on one segment seems to “load” or “unload” adjacent segments, and thus their earthquake recurrence cannot be independent. This suits the observation that their reoccurrence does not take place in regular interseismic periods. Accumulation of strain, which governs the earthquake recurrence times, differs among different areas since the strain rate depends on the tectonic activity as the relative plate motion. At the interface between adjacent plates, for example, the strain rate acquires its maximum values, which resulted in the most frequent and largest earthquakes. In continental regions where the rates of strain accumulation are lower, whereas the seismicity is more diffused, appropriate approximations are required to achieve estimates of the anticipated earthquake hazard. Despite substantial advances in our understanding in the last decades since the associated faults are interacting through their stress field, we still have a long way to go to achieve reliable estimates of the recurrence times of stronger earthquakes associated with the major faults in a given area. This highlights the requisiteness for intensifying our efforts towards identification of the location and occurrence time of the anticipated strong earthquakes. Substantial progress has been made in identifying the source regions of future earthquakes by stress interaction modeling, which led to the assessment that the slip during the occurrence of a strong earthquake changes the stress field and increases the likelihood for the occurrence of nearby earthquakes. Outstanding examples are the stress calculation after the

Fundamentals on Stress Changes

3

preferential occurrence of the aftershocks of the 1992 Landers main shock (Mw = 7.3) [KIN 94] and the along-strike sequential occurrence of large earthquakes in the North Anatolian Fault [STE 97], which will be presented in more detail in the following sections. Although these cases, along with a considerable number of earthquake occurrences, were consistent with these forecasts, other earthquake forecasts based on the relevant approach were not verified. The recognition that stress changes considerably influence the time and place of the next earthquake has been reviewed in [HAI 10]. The earthquake-prone areas encompass fault zones containing a large number of faults, with the location of some of them being unknown, and for this reason, several recent devastating earthquakes are associated with faults whose hazard was inadequately assessed. The Coulomb stress changes caused by the displacement in the occurrence of strong earthquakes associated with specific faults and fault segments in a fault population were confirmed to be ample to explain many seismic observations, including aftershock locations, spatial evolution of earthquake series and absence of expected shocks in active regions after the occurrence of strong earthquakes. This is due to the fact that the failure of one fault segment transfers stresses to the nearby segments, which encourages or discourages more earthquakes associated with these faults. Therefore, fault interaction is an indispensable component for any seismic hazard assessment. The effect of the Coulomb stress changes has a remarkable impact on the distances of two or three fault lengths. Remote triggering at distances equal to several fault lengths, which can reach thousands of kilometers, depending on the magnitude of the causative earthquake, has been observed; however, after a strong earthquake, it is perfectly determined by the propagation of transient (dynamic) seismic waves because they are capable of inducing failure either immediately or by delayed triggering. The triggering role of

4

Earthquake Statistical Analysis through Multi-state Modeling

the passage of seismic waves is mainly important in the near field. The assessment of the earthquake forecasts based on the calculation of stress changes is mainly performed with the available earthquake catalogs that span a duration (100–150 years) much shorter than the recurrence intervals of the strong earthquakes in a given study area, which may take values of hundreds to thousands of years. This is the main reason why many strong earthquakes cannot be forecasted, thereby making a deterministic seismic hazard assessment more uncertain. Stress modeling has proved to be effective in most of the places where it is applied; nevertheless, it is not adequate for an integrated seismic hazard assessment because it has been accomplished in mapped (already-known) active faults. For this purpose, we need to use techniques that can reveal the anticipated hazard by modeling complex interactions using mathematical analysis together with stress changes calculations, based on and interpreted with realistic physical models. 1.2. Stress interaction The occurrence of an earthquake is influenced by the slow continuous tectonic loading along with the stress changes due to the coseismic slip of the previous earthquakes; in particular, the stronger and the closer ones that occur close together both in time and space (otherwise a time “delay” is observed in the occurrence of an anticipated earthquake) manifest these stress interactions, meaning that during an earthquake occurrence, the stress transferred to the neighboring faults may increase or decrease the stress onto them, and in this way, it may enhance or inhibit earthquake occurrence there. Earthquake interaction is of particular interest to understand whether strong earthquakes cluster both spatially and temporally, occurring in time intervals of some months or years, or even in shorter time frames, instead

Fundamentals on Stress Changes

5

of hundreds of years, which is the typical recurrence time of such earthquakes when the associated faults are considered individually. The state of stress is examined soon after an earthquake occurs, considering that the stress is released from the activated fault. The causative fault remains inactive during the interevent time, which represents the time for this certain fault reactivation, i.e. the time that the stress needs to be rebuilt and released again in the second earthquake, typically hundreds to thousands of years. When an earthquake occurs, the stress is not dissipated, with its changes exhibiting a certain spatial pattern around the fault that failed, and particularly at the fault tips. These stress changes were found to be related to changes in seismicity behavior and triggering at distances much longer than the fault length and for stress changes as small as 0.1 bar [REA 92, KIN 94]. In any case, an earthquake occurs by stress which triggers only when the fault is in the late phase of its seismic cycle, meaning that it is already mature and close to failure. The stress state of the particular fault or fault segment might be evaluated on the basis of its known stressing rate and recurrence history. Therefore, the recurrence time of strong earthquakes depends on the long-term tectonic loading, which is assumed constant with time, the stress drop during the earthquake occurrence and the stress at which the fault failed (failure stress). Stress changes may modify the mean return period and cause either advancement or retardation of the next earthquake occurrence. This time shift (Δt, in years) is simply expressed as ΔCF S/τ , where ΔCF S is the Coulomb stress change (Coulomb failure stress) mainly attributed to the coseismic static stress changes and τ is the continuous tectonic loading. This impact on the occurrence time is called either “clock advance” (in the case of stress loading) and “clock delay” (in the case of stress relaxation) for the cases of

6

Earthquake Statistical Analysis through Multi-state Modeling

stress enhancement and stress inhibition, respectively. For example, the subsequent strong earthquakes on the fault segments of the San Andreas Fault are found, which occurred almost a decade ago in the anticipated time of occurrence of great earthquakes, by the 1992 Landers sequence of earthquakes [JAU 92]. Simpson et al. [SIM 88] found that the 1983 Coalinga earthquake inhibited the occurrence of the subsequent moderate Parkfield shock for about one year. The changes in the stress field are the result of strain accumulation and release in the brittle layer according to the seismic cycle concept. This is based on the assumption that the static stress changes caused by an earthquake occurrence are completely recovered during the interseismic period, meaning that the total of stress equals zero. This also perfectly agrees with the time-predictable model. The assumption is that the static stress change at the time of occurrence of a strong earthquake is completely recovered during the period of strain accumulation, i.e. the net change in stress over the earthquake cycle is zero. This assumption is equivalent to the time-predictable model of earthquake occurrence [SHI 80]. Stress changes are either “static”, resulting from the coseismic slip and taking place instantaneously and permanently, or “dynamic”, caused by the passage of seismic waves, in which case they are oscillatory and transient. Dynamic and static stress changes cannot be distinguished either observationally or theoretically at short times and distances from an earthquake, and both approximately attenuate as some inverse power of the distance is caused when the seismic waves travel through and are oscillatory and impermanent. In the near field and soon after the occurrence of a strong earthquake, the impact of either the static or dynamic stress changes cannot be discriminated, whereas both rapidly decrease in value when the distance from the fault increases [STE 05]. In longer distances, the static stress changes attenuate faster, approximately following the inverse of the

Fundamentals on Stress Changes

7

cube of the distance from the source, while the dynamic stress changes arrive at longer distances in a distinctly different way since their attenuation rate is lower. Coseismic Coulomb stress changes and their interrelation with the aftershock epicentral distribution were initially investigated in [DAS 81]. Nevertheless, it has become popular since the occurrence of the 1992 Landers (M = 7.3) earthquake [STE 92]. The estimated values of the stress changes that are often found to have promoted the occurrence of large earthquakes on neighboring faults or fault segments are approximately equal to a small percentage of the stress drop on the ruptured fault, being generally in the range of 20–30 MPa. The values of the stress related to triggering are a hundred to a thousand times lower. Given that the stress drops and the triggering times, either advances or delays, are smaller than the earthquake recurrence times, the evidence is provided that both faults, the causative fault and the target fault, are required to be synchronized and at the ultimate state of their seismic cycles. Paleoseismological data show that for the same regions, prior earthquakes have occurred in clusters of ruptures of several faults separated by long quiescent periods [SCH 10]. Theoretical and experimental data reveal that synchronization may happen at the positive stress coupling area between adjacent fault segments and slip rates ranging in between certain conjugation thresholds. Among the globally known cases of earthquake interaction are the M ≥ 7.0 triplet that took place in: (a) 1811–1812 in New Madrid (USA) [WIL 10]; (b) the sequential along-strike occurrence in the Anatolian Fault (NAF) that started in 1939 [STE 97], which culminating with two earthquakes with M ≥ 7.0 that occurred in less than two months temporal distance along two adjacent fault segments in 1999 [PAP 01a]; and (c) the two M > 8.5 Sumatra earthquakes that occurred within a few months in 2004 and 2005 [MCC 05]. The notable example of the NAF in 1999 is depicted with the spatial variations of the calculated

8

Earthquake Statistical Analysis through Multi-state Modeling

Coulomb failure function (ΔCF F ) expressing the changes caused by the coseismic slip of the Mw = 7.4 earthquake of August 17, 1999 that struck a segment in NW Turkey in the area of Izmit Bay (Figure 1.1). The stress pattern shown in Figure 1.1 was calculated in accordance with the fault plane solution of the main shock, i.e. a vertical strike-slip faulting [PAP 01b]. Large positive values of ΔCF F are well correlated with the distribution of strong aftershock foci and also with the adjacent eastern fault segment, where the 1766 event of M = 7.3 was generated, implying possible triggering of this segment, which effectively failed in a relatively short time on November 12, 1999 with the M = 7.2 Düzce main shock. -200.00 -100.00 -10.00 26˚

-1.00

-0.10

27˚

28˚

-0.01

0.00

0.01 29˚

0.10

1.00 30˚

10.00 100.00 200.00 31˚

42˚

42˚

991112

41˚

41˚ 990920

40˚

40˚

km 0 26˚

27˚

28˚

29˚

75 30˚

150 31˚

Figure 1.1. Coseismic Coulomb stress changes (ΔCF F ) caused by the August 17, 1999 Izmit main shock of North Anatolian Fault, resolved in agreement with the min shock faulting type (vertical right-lateral strike-slip fault). The locations of the M ≥ 4.4 aftershocks, taking place during August 17–September 30, 1999, are superimposed. The focal mechanism of the triggered Düzce main shock (November 12, 2001, Mw = 7.0) is depicted as an equal-area lower-hemisphere projection and plotted at the epicenter, where the positive Coulomb stress changes have taken values of about 1 bar (source: [PAP 01b]). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Fundamentals on Stress Changes

Kefalonia Fault

9

Lefkada Fault

2000 1980

1983, M=7.0

Year of Occurrence

1972, M=6.3

1960 1953, M=7.2

1948, M=6.4

1953, M=6.4

1948, M=6.5

1940 1920

1915, M=6.7 1912, M=6.8

1915, M=6.6

1914, M=6.3

1900 1880 1869, M=6.4

1867, M=7.4

1860 0

20

40

60

80

100

120

Distance along the faults (km)

Figure 1.2. Temporal presentation of the activated major faults in the Kefalonia Transform Fault Zone (KTFZ) along the Kefalonia branch, whose strike is ∼ N 50o E, and the Lefkada branch, whose strike is ∼ N 10o E, since 1867. The solid lines indicate the corresponding rupture lengths. Modified from [PAP 02]. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Synchronization of two adjacent fault branches was identified along the Kefalonia Transform Fault Zone (KTFZ). From the spatiotemporal distribution of strong (M ≥ 6.3) crustal earthquakes originated on one of the two branches, namely on the Kefalonia or Lefkada branch of the KTFZ, it resulted that they were clustered in relatively short time intervals (of the order of a few years) alternating with much longer, relatively quiescent periods [PAP 02]. In each active period, there was a relatively large event or a series (two to four) of events, close in time and abutting or slightly overlapping with rupture zones (Figure 1.2). This synchronization has been estimated to take place four times since 1867, i.e. since when the available earthquake catalog [PAP 97] was verified for its completeness at this magnitude threshold. This seismic behavior was investigated through the calculations of Coulomb stress changes caused by the coseismic displacement of the consecutive events and the

10

Earthquake Statistical Analysis through Multi-state Modeling

continuous slow tectonic loading on the activated fault segments. The result was that in 13 out of 14 cases, the forthcoming earthquakes were in stress-enhanced area values, i.e. from 0.01 MPa to higher than 0.1 MPa. This implies that the observed synchronization is well supported by stress transfer among neighboring fault segments in a fault population. 57 yrs

54 yrs

76 yrs

Magnitude

7.0

54 yrs

56 yrs

117 yrs

6.5

6.0

5.5 1500

1600

1700

1800

1900

2000

Time (years) Figure 1.3. Temporal distribution of earthquake magnitudes, where clustering and quiescence periods are shown. The alteration of active and inactive periods and their durations are indicated by bars and are given in years on the top of the figure. Modified from [PAP 03]. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

One more remarkable example from Greece concerns the episodic occurrence of M ≥ 6.2 earthquakes in the Thessalia Fault Zone between 1954 and 1957, when three seismic sequences took place and where no such events had occurred in about two centuries [PAP 03]. Figure 1.3 shows a magnitude–time plot, where the aforementioned behavior is illustrated from 1500, i.e. from when the earthquake catalog [PAP 97] was considered complete. The stars denote earthquakes with M ≥ 6.2, whereas the circles denote the 6.0 < M < 6.2 ones; this distinction has been made to secure a

Fundamentals on Stress Changes

11

reliable magnitude estimation. Four active periods are found to be alternating with inactive ones. As far as the 20th Century is concerned, it resulted that all events were generated in areas of positive stress changes because of the displacements during the occurrence of previous shocks and the continuous tectonic loading on the active faults in the area. Fault interaction encompasses dynamic stress changes, in addition to the static ones, which are time-varying and transient. Although static stress changes are critical for aftershocks close to the rupture, in distances longer than one fault length, they are even smaller than tidal stresses. The changes in dynamic stress are caused by the seismic waves that transmit transient oscillatory stresses that do not permanently alter the net load of the fault, but its mechanical state. Dynamic stress changes were capable of explaining remote triggering as well as the aftershock activity. For the Mw = 7.3 1992 Landers earthquake, Kilb et al. [KIL 00] found similar asymmetries in the aftershock pattern and the dynamic stress pattern. Following the 1999 Mw = 7.4 Izmit (Turkey) main shock, an intensification of seismicity in the Greek territory took place, at distances of 400–1000 km from the main rupture [BRO 00]. Small events occured soon after the surface waves of the main shock passed through the probably triggered area. In contrast to the case of the Landers main shock with long-distance triggering, the activated areas are clearly non-volcanic. It has been found that dynamic triggering of seismicity takes place in geothermal and magmatic fields [HIL 93]. A spatial correlation between geothermal and activated areas is feasible, although the recent magmatism does not exist in the study area. The strength of the triggering waves can be measured either by the amplitude of the transient stress, which scales as the particle velocity, or by the energy density delivered by

12

Earthquake Statistical Analysis through Multi-state Modeling

the waves. A physical mechanism is required to transform the transient stresses of the seismic waves into sustained stresses on the fault capable of producing an earthquake, hours or days later, and several possible mechanisms have been suggested (see, for example, [HIL 93, GOM 98]). The most favorable interpretation is based on fluid mechanics, because both observed triggering and geothermal activity take place in tectonic environments where stretching is the dominant style of active deformation. 1.3. Stress changes calculation The modeling of static stress changes can be calculated easily enough, whereas the absolute values of the stress cannot be measured. The modeling requires knowing the fault geometry and the sense and magnitude of the coseismic slip, the details of which, in turn, become less significant when the distance of the observation point from the rupture increases [AKI 02]. Stress changes associated with coseismic displacements are calculated using a disclocation model of a planar fault surface Σ, which is assumed to be embedded in a homogeneous elastic half-space, for the displacement calculations. According to Steketee [STE 58], the displacement field component, uk (k th component of u), in the aforementioned model and for an arbitrary uniform dislocation U , onto Σ, is calculated as   Ui k uk = wij vj dΣ, [1.1] 8πμ Σ where μ is the shear modulus, vj are the direction cosines of the normal to the dislocation surface, Ui is the ith component k are six sets of Green’s functions. The of U and wij displacements and strain components are calculated by the integration [1.1] [OKA 92]. The elastic stress tensor components, sij , are estimated according to Hooke’s law,

Fundamentals on Stress Changes

13

assuming an isotropic medium from the elastic strain components, eij , sij =

2μν δij ekk + 2μeij , 1 − 2ν

[1.2]

where ν is Poisson’s ratio and δij is the Kronecker delta. The faults fail when the stress onto them overpasses their strength. The failure proximity was measured by the Coulomb failure function changes (ΔCFF) (given by [SCH 90, HAR 98] and the references therein). These changes caused by the displacement during the main shock occurrence are estimated by the following equation: ΔCF F = Δτ + μ(Δσ + Δp), where Δτ and Δσ are the changes in the shear and normal stresses, respectively, onto the fault plane, and Δp is the change in pore pressure in the rupture area. Both Δτ and Δσ are estimated from the stress tensor given in [1.2] for the causative fault plane. Change in shear stress Δτ is considered positive when shear stress increases in the direction of slip; Δσ is positive when the tensional normal stress increases. When the compressional normal stress decreases, the static friction onto the fault plane also decreases. When both Δτ and Δσ are positive, the fault approaches failure; negative Δτ and Δσ move the fault away from failure. A positive value of ΔCFF indicates that the fault is approaching failure. This may happen when the shear stress is increased or when the effective normal stress, μ(Δσ + Δp), is decreased. The pore pressure change during the coseismic phase, when the porous medium is considered to still be in undrained conditions [RIC 76], is given by Δp = −B

Δσkk , 3

14

Earthquake Statistical Analysis through Multi-state Modeling

where B is Skempton’s coefficient (0 ≤ B < 1), which depends on the bulk moduli of the medium and the volume percentage completed by fluid, and Δσkk is the trace of the induced stress tensor. Rock experiments suggest typical values of B ranging between 0.5 and 0.9 [ROE 96]. An alternative interpretation is based on the fact that the material of the fault zone is more ductile than the material in the surrounding area, which results in equality of the normal stress components, σxx = σyy = σzz , and in this case, Δσkk /3 = Δσ in the fault zone. Under these conditions, i.e. a homogeneous and isotropic medium outside and homogeneous and isotropic inside the more ductile fault zone, it is derived that ΔCF F = Δτ + μ Δσ. Here, μ is the apparent friction coefficient, ranging between 0.6 and 0.8 (see [HAR 98] and the references therein) and μ = μ(1 − B). The parameter μ is the apparent friction coefficient for including the influence of pore fluids along with the material properties of the fault zone. For the homogeneous isotropic poroelastic model, μ is the function of Δσkk and Δσ:  β  Δσkk  . μ = μ 1 − 3 Δσ The parameter β  for rock is contiguous to Skempton’s coefficient B for soils and depends on the bulk moduli of the material and the percentage of the fluid filling in the material. The undrained case is usually considered [BEE 00], where Δp depends on the normal stress change on the observational fault plane. The selection of an appropriate value for μ is demanded for the modeling. The μ value in coseismic static stress changes calculations is determined to be between 0.0 and 0.75. The most widely accepted value, i.e. 0.4, was suggested by [KIN 94], who found that

Fundamentals on Stress Changes

15

fluctuations of the friction coefficient resulted in a subtle influence on aftershock correlations. The variation of friction coefficient values influences the values of Coulomb stress changes and, to a lesser extent, their spatial pattern. Smaller friction coefficient values result in smaller Coulomb stress changes, since the resistance to coseismic slip is smaller. Thus, the coseismic stress drop is lower, also leading to smaller Coulomb stress changes on the receiver faults. 1.4. Modeling of Coulomb stress changes for different faulting types Stress is a tensorial quantity that changes in space and time. Similarly, the stress field spatial representation considerably changes on most target faults when faulting geometry and kinematic properties are varied. Thus, the sign of the Coulomb stress changes (ΔCS) should be investigated as a function of certain faulting type. At a given site, a stress-enhanced area for an E-W striking normal fault and a stress-inhibited area for any other faulting representation can be observed. The dip of the target fault considerably affects the static stress changes. Variations in dip angle modify the spatial variations of positive and negative stress changes. In this way, a fault plane located inside a stress-enhanced area could be placed in a stress-inhibited area. 1.4.1. ΔCS for strike-slip faulting Calculations of Coulomb stress changes were first performed for several cases of vertical strike-slip faults, given that this faulting geometry facilitated the presentation and interpretation of the spatial distribution of these stress changes. In order to investigate the influence of the 1992

16

Earthquake Statistical Analysis through Multi-state Modeling

Landers M = 7.4 main shock on the future hazard of the San Andreas Fault system, King et al. [KIN 94] examined the possible triggering of one earthquake by another. It was found that the distribution of aftershocks along with several other moderate nearby earthquakes might be determined by the Coulomb criterion in that aftershocks are abundant, where the Coulomb stress was larger than 0.5 bar, and sporadic seismicity in places with Coulomb stress decreases by the same value. It has been found that the 1992 M = 7.4 strike-slip Landers earthquake triggered the M = 6.5 strike-slip Big Bear earthquake associated with a neighboring fault segment by increased static stress changes values equal to 0.3 MPa. The spatial pattern of the stress field inverted according to an almost vertical strike-slip receiver fault is shown in Figure 1.1. 1.4.2. ΔCS for dip-slip faulting Studies on static stress changes on dip-slip faults follow, and the first attempts concern the 1980 Irpinia (Italy) normal fault earthquake. The results were analogous to those in the strike-slip cases, revealing stress enhancement on the neighboring strike-slip Potenza fault, which activated in 1990 and 1991 [NOS 97]. In the same way, an earthquake series that took place at the South Lunggar Rift (Tibet) between 2004 and 2008 is perfectly explained by stress transfer among the failed fault segments [RYD 12]. The first 2004 main shock put an along-strike receiver fault in positive stress changes, which failed in 2005. This latter increased the positive static stress changes onto two antithetic faults that ruptured in 2008. [RYD 12]. In the back arc Aegean region, dominated by N-S extension, relative clustering in strong earthquake occurrence alternating with relatively quiescent periods was satisfactorily interpreted by stress transfer among the fault segments comprising in a fault population, like that in the southern Aegean [PAP 05] and Northern Greece and Bulgaria [PAP 07]. An example is given here of the stress

Fundamentals on Stress Changes

17

field changes calculation and the resultant triggering, from the cascade occurrence of four M ≥ 6.0 seismicities in the Thessalia district (Greece) between 1954 and 1957, along with the activation of two contiguous faults in 1985. Figure 1.4 depicts a regional map with the focal mechanisms of these earthquakes plotted at the location of their epicenter, whereas the year of occurrence is designated above the beach ball that represents their fault plane solution. From the position of the inferred surface traces of the faults associated with each earthquake, it is evident that they comprise a fault population, namely adjacent fault segments that fail with the same mechanism. It is worth noting that the two doublets of 1957 and 1980 had a few minutes’ time difference in their occurrence. Although Coulomb stress changes can explain possible triggering (prompt, like in the doublets of 1957 and 1980, or delayed, like the other events), it is not feasible to assess sequential occurrence. 22˚ 39.5˚

22.5˚

23˚ 39.5˚

1955

1957

1980

1957 1980

1954

km 0

39˚ 22˚

12.5

25

39˚ 22.5˚

23˚

Figure 1.4. Focal mechanisms of the M ≥ 6.3 shocks associated with faults bounding along the southern basin periphery shown as equal-area lower-hemisphere projections with their year of occurrence written above and mapped at their epicentral position. The inferred surface expression of the causative faults are also plotted, with the ticks showing their dipping direction. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

18

Earthquake Statistical Analysis through Multi-state Modeling

Figure 1.5. Coulomb stress changes associated with the occurrence of the 1957/03/08 M 6.5 earthquake in the southern margin of Thessalia basin, central Greece, inverted according to the faulting type of the source fault at 8.0 km depth and static stress changes (in bars) given according to the color scale. The fault surface expressions are depicted by white lines, with the ticks showing the dip direction, whereas the causative fault is shown in black. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Figure 1.5 presents the spatial pattern of the static stress changes that are due to the coseismic displacement of the 1957/03/08 earthquake with M = 6.5 and the evaluation of their impact in the adjacent fault segments. These changes are computed in agreement with the sense of slip on the fault that failed 80/40/–90 and at a depth of 8 km. The major faults of this faulting network are plotted at their inferred surface fault traces. The red areas represent stress increase, the blue represents stress decrease and the black and white represent causative and adjacent major regional faults, respectively. From a visual inspection, all neighboring faults are located inside stress-enhanced areas. The positive static stress changes are comprised in lobes beyond the fault edges, revealing increased stress concentration. The next earthquake occurred just after seven minutes, on the eastern

Fundamentals on Stress Changes

19

adjacent fault segment. It should be mentioned at this point that an earthquake of M = 7.0 occurred in 1954 on this fault, which was the first and the strongest in this seismic excitation. It is worth noting that the along-strike adjacent normal faults are inside the positive lobes where the Coulomb stress changes obtained their maximum values. Similar fault interactions after major earthquakes were inferred for contractional tectonic settings. For example, Lin et al. [LIN 11] linked the majority of the aftershocks of the 2003 Mw = 6.9 thrust fault Zemmouri (Algeria) earthquake to an increase in coseismic Coulomb stress change. The analysis of static Coulomb stress changes after the 2008 Mw = 7.9 Wenchuan earthquake, which ruptured the Beichuan and Pengguan reverse faults, showed significant static stress changes, either positive or negative, on the regional faults. Static stress interactions were also sought for thrust faulting environment. Lin et al. [LIN 11] associated most of the aftershocks of the 2003 Mw = 6.9 thrust fault Zemmouri (Algeria) earthquake with positive coseismic Coulomb stress change. The coseismic slip of the 2008 Mw = 7.9 Wenchuan earthquake associated with the Beichuan and Pengguan reverse faults, resulted in significant static stress changes, either positive or negative, on the regional faults [PAR 08]. Figure 1.6 shows the Coulomb stress changes associated with the coseismic slip of an Mw = 6.7 main shock in 2013 that occurred on a certain fault segment along the western Hellenic arc. Although the spatial pattern is quite similar to the one shown in Figure 1.5, containing four main lobes for positive and negative values of the ΔCS, the considerably shallower fault dip resulted in less symmetry in their shape. The location of the dip-slip faults in relation to the causative fault, either normal or thrust ones, also influences the received static stress changes. The displacement fields for normal and thrust faulting are considerably different.

20

Earthquake Statistical Analysis through Multi-state Modeling

Nevertheless, failure for both types of this dip-slip faulting type is discouraged if the receiver faults are directly located in the hanging wall and footwall of the causative fault. This happens because the coseismic displacements in the upper crust counteract the sense of slip on the receiver faults. Regardless of the faulting type, either stretching or contraction, the maximum positive stress changes are located around the fault tips as well as in smaller areas onto the hanging wall of the target faults and the footwalls of the causative faults. The Coulomb stress changes were calculated assuming a slip model without heterogeneities that emerge from the particular earthquake generation mechanism, localized strength and frictional variations.

Figure 1.6. Coulomb stress changes due to the 2013 (Mw = 6.7) earthquake coseismic slip that occurred in the western Hellenic arc, resolved for a thrust faulting type. The epicenters of the main shock and its aftershocks are shown by the asterisk and circles, respectively, with aftershock epicenters colored and sized according to the corresponding event magnitude. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Fundamentals on Stress Changes

21

1.5. Seismicity triggered by stress transfer Fault interaction was investigated in several cases on a regional scale by evaluating stress spatial distribution for diverse faulting types according to the characteristics of the regional fault network. Static stress changes transferred from the causative to the neighboring receiver faults and, in several cases, the accumulated stress changes that included the long-term tectonic loading were revealed. Stress triggering takes place because of the stress redistribution caused by the coseismic slip in the main rupture. During an earthquake, the built-up elastic stress in the crust is relieved, and at the same time, the stress in certain regions is unambiguously increased by the coseismic slip. The faults comprised in these regions are mature enough, meaning at the late stage of their seismic cycle, to be possible candidates for failure by triggering. This may occur very fast, like the Big Bear earthquake that occurred only hours after the Landers earthquake; after several years, like the Hector Mine earthquake that occurred seven years after the Landers earthquake; or after several decades, like the 1995 M = 6.9 Kobe (Japan) main shock that is considered to be triggered by the 1944 M = 8.0 Tonankai and the 1946 M = 8.2 Nankaido earthquakes [POL 97]. 1.5.1. Triggering of strong earthquakes The significance of earthquake interaction investigation points to the feasibility of predicting the sites of the future earthquakes. In the cases where the triggering of strong earthquakes is sought, the stress changes are estimated after considering the coseismic slips on the important fault segments in a fault population and summing the changes of each stress tensor component as they occur in time [DEN 97a]. These authors computed the Coulomb stress changes caused by the coseismic slips of seven M ≥ 7.0 main

22

Earthquake Statistical Analysis through Multi-state Modeling

shocks that have occurred since 1812 along with the tectonic loading on the major regional faults in southern California. It was found that 95% of the M ≥ 6.0 earthquakes generated by either strike-slip or reverse faulting occurred in stress-enhanced areas. After continuing this investigation in [DEN 97b], it was confirmed that more than 85% of the M ≥ 5.0 earthquakes that occurred between 1932 and 1995 were located in areas of positive static stress changes, whereas the remaining 15% are located adequately close to the borders between positive and negative stress change areas. In North Aegean and northwest Turkey, it was found that since 1912 four times more earthquakes are correlated with increased Coulomb stress due to the coseismic slips of previous events in the dataset [NAL 98]. Papadimitriou and Sykes [PAP 01b] investigated the evolving stress field in the 20th Century in North Aegean by considering, in addition to the strong main shocks, coseismic slip along with the slow tectonic loading on the significant fault segments in the study area, and calculating the stress changes according to the focal mechanism of the next earthquake whose triggering was inspected. The calculations revealed that large earthquakes occurred in stress-enhanced areas, whereas most of the moderate shocks with known focal mechanism were also located in areas of positive ΔCF F . A notable case concerns the sequence of earthquakes in the area of western Sichuan, where frequent strong (M ≥ 6.5) earthquakes occurred, with most of them associated with fault segments belonging to the sinistral strike-slip Xianshuihe fault zone, with a total length of 350 km. From both historical information and instrumental recordings, the alteration of highly active periods with quiescent ones was verified, along with a notable epicentral migration. In the most recent active period, the rupture areas of strong earthquakes were abutting and covered the entire Xianshuihe fault. Papadimitriou and her colleagues [PAP 04]

Fundamentals on Stress Changes

23

investigated the possible triggering of each earthquake by the previous ones by calculating the evolution of the stress field since 1893. The changes in the static stress changes were calculated after considering the coseismic slip of the strong (M ≥ 6.5) earthquakes and the long-term slip rate on the different fault segments and inverted according to the faulting type of the faults of interest. The calculations showed that all of the strong events and most of the moderate-magnitude ones, with a known focal mechanism, were in areas of increased Coulomb stress. This adds more value to the calculation technique of Coulomb stress, which is a powerful tool for forecasting future seismic activity (Figure 1.7). By extending the stress changes calculations up to 2025, the seismic hazard was estimated to be ensuing for the fault segments that are found in stress-enhanced areas. 1.5.2. Aftershock triggering The positive Coulomb stress changes are not only located at the tips of the causative faults, but they also form off-fault lobes where the aftershock activity is expected to be triggered. The interpretation of aftershock occurrence beyond the fault tips was first given by [DAS 81], who indicated that the aftershocks have occurred in specific locations where crack models predict an increase in stress resulting from the main shock rupture. Large stress increases were noted near the crack tip, but in addition, there were small stress increases on either sides of the crack or about one crack away. These were the regions in which off-fault aftershocks were often seen alike in the case of the Mw = 6.4 July 26, 2001 Skyros (North Aegean, Greece) main shock that occurred in the western part of the North Aegean Sea [KAR 03].

24

Earthquake Statistical Analysis through Multi-state Modeling

100˚

100˚

102˚

102˚ 32˚

32˚

b

a

1904/06/30 .

1893/08/29 .

30˚

30˚

32˚

32˚

d

c 1923/03/24 .

30˚

30˚ 1948/05/25.

32˚

32˚ 1967/08/30 .

f

e

1955/04/14 .

30˚

100˚

30˚

100˚

102˚

102˚

Coulomb Failure Function Change (bars) -200.00 -100.00 -10.00

-1.00

-0.10

-0.01

0.00

0.01

0.10

1.00

10.00

100.00

200.00

Figure 1.7. Stress evolution along the Xhianshuihe and Litang fault zones since 1893, calculated at a depth of 8.0 km. The stress changes are calculated each time for the faulting type of the next strong event and are denoted by the color scale at the bottom (in bars). Fault plane solutions are plotted as lower-hemisphere equal-area projections, on the top of which the occurrence date (year/month/day) is written. The fault traces are depicted by white lines, and the fault segment associated with the occurrence of each event in each stage of the evolutionary model is shown in black. (a) Coseismic Coulomb stress changes associated with the 1893 event. (b) Stress evolution until just before the 1904 event. (c) ΔCF F just before the 1923 event. (d) Stress evolution until just before the occurrence of the 1948 event. (e) State of stress just before the 1955 event. (f) Stress evolution just before the occurrence of 1967 event, calculated for normal faulting type (modified from [PAP 04]). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Fundamentals on Stress Changes

25

The seismicity forms three distinctive clusters aligned in different directions, which also differ from the known normal and right lateral strike-slip faults in the study area. The Coulomb stress changes that are caused by the main rupture were calculated, and the stress-enhanced areas are consistent with the off-fault aftershock activity (Figure 1.8), showing a means for the evaluation of the seismic hazard emerging for the strong aftershocks that occur far from the main shock epicenter. For securing the robustness of this result, the ΔCS values were calculated for a range of frictional parameters and fault geometry (Figure 1.8) and the findings were further supported. On August 14, 2003 a strong (Mw = 6.2) main shock took place in the Lefkada Island (Central Ionian). Numerous aftershocks occurred at distances of more than 40 km beyond the fault tip, with a dense cluster, in particular, well located inside a lobe where the positive ΔCS values became higher (Figure 1.9). Theoretical static stress changes from the main shock provide a plausible interpretation for the off-fault aftershock activity and the triggered seismicity associated with the adjacent fault and further evidence for seismic hazard associated with this fault [KAR 04]. The static stress changes due to the coseismic slip of the 1995 M w = 6.5 Kozani-Grevena (Greece) main shock on the aftershock locations of 173 aftershocks recorded between six and 12 days after the main shock were investigated in [LAS 09]. A detailed rupture model (comprising three sub-faults), relocated aftershock epicenters and reliable fault plane solutions are used for this scope. A statistical testing method was developed, which investigated the possibility that the same set of aftershocks inside a certain area, whose occurrence was attributed to the given static stress changes, would be there even without any influence in the stress changes due to the coseismic slip of the main shock. These changes were computed at each aftershock focus and for both nodal planes (Figure 1.10).

26

Earthquake Statistical Analysis through Multi-state Modeling

(bars) 50.0 39.2˚ 10.0

Mw=6.4

1.0 0.1

39˚

0.0 -0.1 38.8˚

-1.0

Mw=5.4

(a)

-10.0

km 0

10

20 -50.0

24˚

24.2˚

24.4˚

24.6˚

39.4˚ 20 km

20 km

20 km 39.2˚ 39˚ 38.8˚

(d)

(c)

(b) 39.4˚

20 km

20 km

20 km 39.2˚ 39˚ 38.8˚

(g)

(f)

(e) 39.4˚

20 km

20 km

20 km 39.2˚ 39˚ 38.8˚

(h)

(i)

(j)

24˚ 24.2˚ 24.4˚ 24.6˚

24˚ 24.2˚ 24.4˚ 24.6˚

24˚ 24.2˚ 24.4˚ 24.6˚

Figure 1.8. (a) Coulomb stress changes (in bars), caused by the 2001 Skyros main shock, for a target plane with strike = 140◦ , dip = 70◦ and rake = −10◦ . The epicenters of the main shock (large asterisk), foreshocks (squares) and best-located aftershocks (circles) are also plotted. The main shock focal mechanism is shown as lower-hemisphere equal-area projection. The stress changes are shown by contours of 0.1 bar: with for receiver faults striking between 120◦ and 160◦ (b), dipping between 50◦ and 90◦ (c), and slip angles between 0◦ and −30◦ (d). ΔCF F for μ in the range of 0.2 − 0.9 (e), B between 0.5 and 0.9 (f), calculation depths between 8 and 15 km (g), strikes of the fault plane between 138◦ and 158◦ (h), dips of the fault plane between 50◦ and 90◦ (i) and rakes of the fault plane between −20◦ and 20◦ (j) (source: [KAR 03]). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Fundamentals on Stress Changes

39˚

38.8˚

km

30.00

0 5 10

1.00

14/8/2003 . 0.05

0.10 0

0.05

0.05

0.00

0.05 0

38.6˚

-0.05 -0.10

. 38.4˚

27

16/11/2003 14/5/1983 20.4˚

20.6˚

-1.00

20.8˚

-30.00

Figure 1.9. Static stress changes (in bars) caused by the coseismic slip of the 2003 Lefkada main shock (solid star) are calculated at a depth of 8 km with μ = 0.6 for a typical fault plane solution for this place (strike = 28o , dip = 82o and rake = 172o ). Contours denote values of 0, 0.05 and 1 bar. The white thick line represents the main rupture. Aftershocks (small circles) not related to the main rupture are mostly located in stress-enhanced areas. Two clusters of aftershocks, south and north of the main rupture, are inside areas with stress changes higher than 1 bar. The November 16, 2003 epicenter (open star) and the May 14, 1983 fault plane solution are also plotted (modified from [KAR 04]). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

The probability distribution of the proportion of aftershocks to occur in these areas independently of the stress changes was chosen by the use of a non-parametric kernel density estimator for their spatial distribution. Separate analyses were carried out for areas with positive values of stress change larger than or equal to 0.1, 0.3, 1.0, 5.0 and 10.0 bar and for those with negative values of stress

28

Earthquake Statistical Analysis through Multi-state Modeling

change smaller than or equal to –0.1, –0.3, –1.0, –5.0 and –10.0 bar. The tests have indicated, very confidently, a probability increase for aftershocks to be generated inside areas of increased stress, showing triggering caused by the static stress change. The analysis, however, has not provided arguments to approve the inclusion of stress shadows inside areas of larger values of negative stress change. A statistically significant increase of the probability was estimated for earthquakes inside stress changes less than or equal to –5.0 and –10.0 bar. In locations with larger absolute values of stress change, this probability increases regardless of the sign of the change. Nevertheless, this is more prevalent in areas of positive change than in those of negative change [LAS 09]. The location of some aftershocks in regions of negative static stress changes might be attributed to the facts that the slip model is much simpler than the real one and the details of the crustal heterogeneities are not taken into account, and due to the activation of several small faults with geometries different from the dominant one. When seeking stress shadows, a problem obscuring statistical analysis is associated with the fact that the background seismicity is usually quite sporadic and, consequently, the required statistically significant postseismic rate decrease cannot be obtained. 1.5.3. Triggering of mining seismicity The evaluation of seismic hazard in mining areas comprises both societal and scientific components, given that the risk in the nearby built environment is high even from low- or moderate-magnitude earthquakes, and the earthquake occurrence is comparatively high. It has been

Fundamentals on Stress Changes

29

shown that the activity is time-dependent and that small stress changes are capable of encouraging or discouraging the anticipated seismicity. These interactions among mining-induced earthquakes in the Rudna Mine of the Legnica-Glogów Copper District in south-west Poland were investigated using Coulomb stress changes calculations [ORL 09]. These stress changes are not capable of inducing new tremors, since they are just a small percentage of the stress field in mining areas. Nevertheless, when the rock mass at the nucleation point is close to failure, it can then be further encouraged. For each investigated case, cumulative static stress changes caused by the previous earthquakes with energy greater than 105 J and with a known focal mechanism that occurred in the LGCD area during 1993–1999 were calculated. These calculations were performed according to the focal mechanism of the target rupture, i.e. the next occurrence in the dataset. The stress was considered to be equal to zero before the occurrence of the first event, when the calculations were started. At each step of the calculation, the correlation between the derived stress field and the earthquake locations was sought. The results indicated that very often mining earthquakes may cause stress changes that are capable of triggering other shocks nearby. In this case, a large percentage of the shocks, reaching up to 60%, are inside areas with positive values of stress changes, with most of them being located in regions of positive ΔCF F above 0.01 MPa. Even in the cases where the earthquake foci are inside areas of negative Coulomb stress changes, most of the ruptured zones are partially inside stress-enhanced areas, which further shows the possible triggering at the nucleation location.

Earthquake Statistical Analysis through Multi-state Modeling

−1

21.6˚

(b)

5 10

5 10 40.2˚

−10

−1

0.1

0

0 −1

−1−01 101 −0.1 0 10 1 .1

10.1 0.1 −−0 10 10 −1

1

40˚

1

−0.10 −1

22˚

km

−1

−0 .1

21.8˚

−1

0

0−0.1

40.2˚

21.4˚

22˚

km

1

(a)

21.8˚

−0.1 0.1

21.6˚

10

21.4˚

0.1

30

40˚ −1

0.1

40.2˚

(d)

5 10

km 0

−1

5 10 40.2˚

1

0 .1 −0 −1

0

.1 −0

km

(c)

1

−1

1 0 −1

−0.1

1

1

10

0 10−1−0.1 −10

40˚

1

40˚

1 .0.11 −1−00

0.

0 −.1 −0

10 0 0. 1 −

−1

0 0.1

21.4˚

21.6˚

21.8˚

−200.00−100.00 −10.00 −1.00 −0.10 −0.01

22˚

0.00

21.4˚

0.01

21.6˚

0.10

1.00

21.8˚

22˚

10.00 100.00 200.00

Figure 1.10. Coulomb stress changes due to a detailed coseismic slip model [RES 05] for the Kozani-Grevena main shock, the epicenter of which is shown by the large star. The Coulomb stress changes, indicated by the gray scale and contours, were calculated (in bars) according to the characteristics of both nodal planes of each aftershock which are plotted as small white circles. The focal mechanisms are shown as lower-hemisphere equalarea projections. Calculations were performed for the (a) north-dipping and (b) south-dipping nodal planes of the normal faulting aftershock (210/21/-90, 50/70/-83) that occurred on 1995/05/20 at 04:46:31.18, with M = 2.5, normal faulting aftershock (210/21/-90, 50/70/-83) and 7.57 km depth. The stress changes are distributed in a different way, and in the first case, the event is encouraged (37.64 bar at its hypocenter), whereas in the second case, it is discouraged (-4.65 bar). Analogously, different distributions are derived for (c) the E-W-oriented and (d) the N-S-oriented nodal planes of the strike-slip aftershock (78/78/-5, 169/85/-168) that occurred on 1995/05/24 at 22:09:17.99, with M = 2.8 at 7.01 km depth. The earthquakes occurred in positive static stress changes for both cases, 1.0 and 3.97 bar, respectively (source: [LAS 09]). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Fundamentals on Stress Changes

31

1.6. Discussion on stress interaction Interaction between faults is studied by calculating the changes in their associated stress field. The first achievements in the Landers earthquake sequence prove that Coulomb stress calculations might constitute a powerful tool for assessing the interaction between strong events and main shocks with their aftershocks. Convincing evidence is furthermore found between Coulomb stress changes and seismicity rate variations for several years after the occurrence of strong earthquakes. It appears that Coulomb stresses even equal to 0.1 bar are capable of influencing the aftershock locations (see [HAR 00] and the references therein). This value is a small percentage of the stress drop during an earthquake occurrence, which explains why the expressions “enhancement” and “encouragement” are more appropriate than earthquake generation. The effectiveness of the stress changes being much smaller than the stress drops during a failure, to influence the seismicity behavior and to enhance or discourage the occurrence of moderate to large subsequent events, is correlated with the state of stress on the target faults and whether these changes in stress are capable of advancing or delaying the next failures [GOM 00]. The earthquake triggering is not a function of the static stress changes alone, but also of other factors similar to the current stage of the seismic cycle on the target fault. In the case that a fault is at an early stage of its seismic cycle, the Coulomb stress changes are not efficient in triggering the next rupture. Modeling of Coulomb stress changes assumes that a fault is locked during the interseismic period and is continuously loading and the change in the time for the next rupture (either advance or delay, Δt) that is caused by the static stress step is not influenced by the time it happens. This means that the earthquakes triggered by a clock advance would have taken place later in time. One question of paramount importance is whether a static stress change

32

Earthquake Statistical Analysis through Multi-state Modeling

threshold exists, above which triggering takes place. Then, which among the areas where such changes were calculated are more prominent for earthquake nucleation? How imminent will the triggered earthquake be and why are there “delayed” triggered earthquakes? Is it feasible and in which way is stress enhancement adequate to trigger earthquakes on otherwise inactive faults? Static stress change that is smaller than 0.1 bar and that is adequate to influence subsequent earthquakes, by accelerating or delaying their occurrence, is continuously under investigation. Several cases exist where smaller values of stress changes are obviously correlated with seismicity distribution. Another important contribution to the long-term loading process arises from the viscoelastic relaxation of the lithosphere and asthenosphere, which is caused by coseismic stress perturbations and influences the long-term time-dependent stress transfer. It may enhance the amplitude and the extent of negative stress changes on a short time scale because of the relaxation process taking place below the seismogenic layer, and it would also reload the entire crust over longer time scales. In general, it is worth noting here that viscoelastic relaxation processes, poroelastic effects (fluid flow, for instance), creep and rate- and state-dependent friction influence the postseismic stress distribution in ways that cannot yet be fully explained. Theoretical models have been developed to answer the aforementioned and related questions. Nevertheless, the mechanisms involved in the nucleation of triggered earthquakes are complex, and the impact of the changes in the stress field caused by the slip during the strong earthquake occurrence, with the influence of the stress changes associated with coseismic slip along with the long-term slip rates on all known causative seismogenic faults, where strong earthquakes might be anticipated, is difficult to be unequivocally calculated. It then becomes necessary to approach the fault interaction through proper

Fundamentals on Stress Changes

33

tools of statistical analysis with which the hidden stress state will be revealed. The combination of appropriate catalog of earthquakes, associated with specific fault populations with distinctive seismotectonic properties in selected areas, and modeling stress interactions in these fault populations along with proper statistical tools, has yielded promising results in revealing earthquake generation patterns [VOT 13, PER 16].

2 Hidden Markov Models

“Everything in the world has a hidden meaning” — N. Kazantzakis 2.1. Introduction Hidden Markov models (HMMs) are used to describe random phenomena that are governed by inaccessible, “hidden” mechanisms. They are extensively used in many application fields, including seismology, reliability, biology and pattern recognition. In particular, HMMs are primarily used to describe systems that are observed at discrete times, where the observations are induced by an underlying (“hidden”) process that is unknown or unexplained. The statistical aspects of HMMs were first considered by Baum and Petrie [BAU 66]. An introduction to HMMs towards speech recognition was presented by Rabiner [RAB 89]. Later, many statistical inference results were obtained for HMMs, such as the estimation of their order [RYD 94] and the asymptotic properties of the maximum likelihood estimators of their parameters [BIC 98, DOU 01, RYD 95]. Markov models provide a valuable tool for seismic hazard assessment. In seismology, HMMs were introduced as innovative models towards seismic hazard assessment. Some

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

36

Earthquake Statistical Analysis through Multi-state Modeling

classical references of hidden Markov models are those of Granat and Donnellan [GRA 02], Ebel et al. [EBE 07], Beyreuther et al. [BEY 08], Zucchini and MacDonald [ZUC 09] and Orfanogiannaki et al. [ORF 14]. The main advantage of HMMs is their adaptability and flexibility, since they allow any structure of states and any choice of transition and emission distributions according to the problem under study. An HMM is fully determined by its initial law, as well as by its transition and emission probabilities. The observation process may be either continuous or discrete, univariate or multivariate. Its distribution is time-invariant and may belong to any parametric family. Poisson, binomial, exponential and Gaussian are widely used in the study of HMMs. Non-standard HMMs, including autoregressive HMMs, coupled HMMs and factor HMMs, are described in [KOS 01]. Here, we focus on HMMs with finite state and observation spaces. The hidden chain forms a first-order Markov chain and is considered to be stationary and time-homogeneous. Moreover, the models are not specified beforehand and their parameters are calibrated based only on the observations. In other words, no prior knowledge is used, since there is no prior knowledge to favor distributional choices for hidden states or observations. Therefore, a fully non-parametric approach is followed, i.e. no assumptions on the hidden and observational distributions are made. The objective of this chapter is to briefly introduce HMMs and focus on a real-life application in seismology. Earthquake occurrences are due to actual stress fields, which cannot be directly observed. Nevertheless, static stress changes due to the coseismic lips of the strong main shocks and slow tectonic loading on the major regional faults ([STE 99, PAP 01a] among others) could be estimated. These estimations are important given that stress changes influence future earthquake occurrence

Hidden Markov Models

37

by encouragement or discouragement to failure of the fault segments, where stresses are transferred. Even in that case, however, the current stress field state is not completely revealed, since the calculations are limited in time by the availability of the input information, due to the time limitations of the earthquake catalogs. We note that even though the calculation of the stress field changes is feasible, the actual stress field is unobservable and shedding light in this direction is challenging. In this study, HMMs are applied on a complete data sample comprising strong (M ≥ 6.5) earthquakes that have occurred in Greece and its surrounding area since 1845. The hidden states are related to different levels of the stress field. The models are compared with respect to their number of states by means of different information criteria. Additional results are obtained, including the expected value and the variance of the number of steps to visit a specific state of the underlying Markov chain and a specific observation. All the quantities of interest are obtained by solving the corresponding statistical problems. 2.2. Hidden Markov framework A discrete-time hidden Markov process or a hidden Markov model (HMM) is a stochastic process related to two other stochastic processes, the observation process and the underlying Markov chain (MC). At discrete time instants, the Markov chain is assumed to visit a state, which in turn generates an observation by the random process that corresponds to its state (Figure 2.1). The stochastic evolution of the Markov chain depends on its initial law and its transition probability matrix. In particular, the underlying MC visits its first state based on its initial law and, according to the transition probability

38

Earthquake Statistical Analysis through Multi-state Modeling

matrix, it visits the next states. The underlying Markov chain is unavailable to the direct observation and is called hidden or latent. The hidden MC is here defined to be especially the chain and not the process. At each time instant, the observer has access to the observation, i.e. to the output of the random function associated with each state and not to the state itself. Yk−1

Yk

Yk+1

Zk−1

Zk

Zk+1

Figure 2.1. Typical trajectory of a hidden Markov chain

Let N be the set of non-negative integers. Indeed, the HMM consists of the underlying Markov chain, (Zk )k∈N , which is not observable, and the observable process (Yk )k∈N . The observable process is related to the underlying Markov chain in the sense that the k th observation, Yk , is determined by the corresponding hidden state, Zk . Since Zk is unobservable, all the statistical inference will be drawn by means of the observation sequence. Here, we concentrate on discrete-time HMMs, where both the observation and the state spaces are finite. When the state spaces are general, the corresponding model is a state space model and the respective inference is made by means of filters such as the standard Kalman filter, the extended filter, the unscented filter and the particle filter (see [DEL 04] and references therein). Let us denote by k,  ∈ N two non-negative integers such that k ≤ . We further denote the vectors Zk = (Zk , . . . , Z ) and Yk = (Yk , . . . , Y ). In the following, we present a specific

Hidden Markov Models

39

application of HMMs in the field of seismology. In particular, we consider that the states of the underlying MC correspond to stress field levels, whereas observations correspond to earthquake magnitude classes. We denote the state space of the Markov chain Zk by E = {1, 2, . . . , M } and the observation space by A. Since (Zk )k∈N is a first-order Markov chain, the distribution of the random variable Zk+1 given the history of the process, Z0k , only depends on the last visited state, Zk (“Markov property”), i.e. P (Zk+1 |Z0k ) = P (Zk+1 |Zk ), for every k ∈ N. Throughout this book, we assume that the transition probabilities and the conditional distribution of Yk , given Zk , do not depend on the time index k, i.e. we assume that the HMM is (time-) homogeneous. Moreover, the chain (Zk )k∈N is considered to be stationary. We denote the transition probability matrix of the underlying MC by P = (pij )i,j∈E , where pij = P (Zk+1 = j| Zk = i), k ∈ N, and its initial law by α = (α(i))i∈E , where α(i) = P (Z0 = i). The set of the non-negative 2−dimensional arrays on E × A is denoted by ME×A , for any given sets E and A. We further denote by R = (Ri (a); i ∈ E, a ∈ A) ∈ ME×A the emission probability matrix, where Ri (a) = P (Yk = a|Zk = i). We consider that the following relation holds true: Ri (a) = P (Yk = a|Zk = i) = P (Yk = a|Y0k−1 = ·, Z0k−1 = ·, Zk = i), for all a ∈ A, i ∈ E, k ∈ N. A stationary HMM is entirely determined by its parameters set ϑ = (P, R). On the contrary, a non-stationary HMM is ultimately determined by ϑ = (α, P, R). Since we focus on the stationary case, the parameters set in the following will be ϑ = (P, R). The

40

Earthquake Statistical Analysis through Multi-state Modeling

asymptotic properties of the maximum likelihood estimators (MLEs) of ϑ were studied in [BAU 66]. The conditions for consistency were weakened in [PET 69]. Later, Leroux [LER 92] and Bickel et al. [BIC 98] studied the asymptotic properties of the MLEs, when the observable process is defined in a general space. There are three statistical problems associated with the HMMs: the training, scoring and decoding problems. Here, we briefly describe these three problems, and for more details, see [FRU 06] and [CAP 10]. 1) Training or estimation problem Given the observations and the set of possible states, our objective is to estimate the parameters set ϑ. In a non-parametric context, the objective is to find the most likely set of transition and emission probabilities. In other words, the estimation problem is solved when the parameters set ϑ that is estimated is the most probable set to have generated the observations. The likelihood function of the observation sequence o0 , . . . , oN for a given parameters set ϑ is   P Y 0 = o 0 , . . . , YN = o N ; ϑ =



...

j0 ∈E

 jN ∈E

α(j0 )Rj0 (o0 )

N 

pjl−1 jl Rjl (ol ).

l=1

The problem cannot have a global solution in the sense that the likelihood function cannot be globally maximized. We can, however, choose ϑ, which locally maximizes the likelihood function by means of an iterative procedure, the Baum– Welch algorithm [BAU 70]. The Baum–Welch algorithm is the most efficient and prevalent method to obtain the maximum likelihood estimates defined by   ϑM L = arg max P Y0 = o0 , . . . , YN = oN |ϑ . ϑ

The most important characteristic of the Baum–Welch algorithm is that it avoids the computational explosion that

Hidden Markov Models

41

comes from the direct calculation of the likelihood function. The Baum–Welch algorithm is based on the forward and backward probabilities, which are calculated by solving the scoring or evaluation problem (see Appendix 2 section A2.1.1). 2) Scoring or evaluation problem The objective is the computation of the likelihood function, which represents the joint probability of an observation sequence (when just one sequence is available), oN 0 , for a given value of the parameters set ϑ, i.e.   P Y0 = o0 , . . . , YN = oN |ϑ . This computation requires the summation over M N +1 state sequences, i.e. over all the possible state sequences. In other words, this results in computational operations of the order 2(N + 1)M N +1 . The first problem consists in the likelihood computation without the exponential growth. This problem is solved by means of forward probabilities as described in Appendix 2 section A2.1. 3) Decoding or alignment problem Given an observation sequence, oN 0 , and a parameters set, ϑ, we aim to find the most probable state path, i.e. the state path that is most probable to have given rise to the observations. Strictly speaking, the observations are not sufficient to reveal the actual hidden states and the actual model. In other words, the optimal model could only be found in a certain probabilistic context. There are many different ways to define the optimality. For the definition of the “optimal” state sequence, several criteria are available in the literature. The optimality criterion that is chosen here aims to maximizethe  probability P Y0 = o0 , . . . , YN = oN , J0 = j0 , . . . , JN = jN |ϑ by selecting j0 , . . . , jN , when ϑ is known and o0 , . . . , oN is fixed. This criterion is primarily used in practice and can be

42

Earthquake Statistical Analysis through Multi-state Modeling

implemented by means of the Viterbi algorithm [VIT 67] (see Appendix 2 section A2.1.2.). Given an observation sequence oN 0 , our objective is to estimate the conditional distribution of (Yk )k∈N as well as the characteristics of the underlying MC. 2.3. Seismotectonic regime and seismicity data The study area consists of the most seismically active regions in the world, exhibiting faulting complexity and a frequent occurrence of strong catastrophic earthquakes. Aiming to contribute to seismic hazard assessment, the aforementioned methods were applied to the regional seismicity, and information is taken from the earthquake catalog compiled in the Geophysics Department of the Aristotle University of Thessaloniki with data of the Hellenic Unified Seismological Network (HUSN) (http://geophysics. geo.auth.gr/ss/). Data are complete for M > 8.0 since 550 BC, M > 7.3 since 1500, M > 6.5 since 1845, M > 5.2 since 1911, M > 5.0 since 1950 and M > 4.5 since 1964. The prevailing geodynamic characteristic of the study area is the subduction of the Eastern Mediterranean oceanic lithosphere under the Aegean microplate [PAP 70] along the Hellenic Trough. Intense seismicity takes place along the convergence boundary (Figure 2.2), associated with thrust faulting and an NE-SW direction of the axis of maximum compression, a direction being almost unaltered along the entire arc. Thrust faulting dominated the continental collision of former Yugoslavia and continues south along the coastal regions of Albania and northwestern Greece, with contractional convergence motion almost normal to the seismicity zone that goes mainly parallel to the coastline. The continental collision and oceanic subduction are separated by the dextral strike-slip Cephalonia Transform Fault (CTF) that is placed [SCO 85] accommodating the highest seismicity in the Mediterranean area. Seismicity in the northern Aegean Sea

Hidden Markov Models

43

is mainly concentrated along the North Aegean Trough (NAT), which is the boundary between the Eurasian lithospheric plate and the south Aegean microplate. NAT constitutes the westward prolongation of the North Anatolian Fault (NAF), and the dextral strike-slip motion is here combined with the intense north–south extension [MCK 78] and continues into the Aegean in the southwesterly direction. This style of deformation is also expressed in reliable fault plane solutions of strong earthquakes [PAP 98]. The long-term seismogenesis of the recent strong earthquakes in this region has been discussed in [PAP 06].

Figure 2.2. Main active boundaries in the Aegean area shown as solid lines. The arrows indicate the approximate direction of relative plate motion. NAT: North Aegean Trough; CTF: Cephalonia Transform Fault; NAF: North Anatolian Fault; RTF: Rodos Transform Fault. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

44

Earthquake Statistical Analysis through Multi-state Modeling

2.4. Application to earthquake occurrences The effect of one earthquake on another, either by triggering or suspending, was considered and reviewed by Harris two decades ago [HAR 98]; since then, this has been broadly used as an effective tool for seismic hazard assessment. Earthquake interaction dominates seismicity behavior and contributes to a deeper understanding of the earthquake generation process, and at the same time, a more reliable seismic hazard evaluation when stress transfer is incorporated into probability estimates [STE 97]. Stress transfer provides a powerful tool in deciphering certain patterns in earthquake occurrence, like clustering and quiescence. There are, however, limits to our knowledge of the current stress state, which are dependent on the shortage of necessary data for this scope. It is feasible to obtain estimates of stress changes in a given time window, including coseismic stress changes, long-term slip rates and stress relaxation, in addition to the uncertainties intrinsic in the involved parameters. By taking into account this reasoning, HMMs are applied to fill in the gap of unknown stress levels. The unknown mechanism that generates the observations is assumed to be the seismic stress, and hidden states are related to lower or higher stress levels. The stress field is considered to be time-varying and therefore its values could be significantly different. In particular, discrete values of the hidden stress field should be estimated. The number of the values of the hidden stress field is also unknown and has to be sought. According to previous indications, there is a relation between earthquake magnitudes and the respective stress fields. This is the reason why the observations are classified based on earthquake magnitudes. In this section, HMMs are applied to an earthquake catalog that covers the period 1845–2008, with magnitudes larger than or equal to 6.5 (see Appendix 3). The catalog is complete for this threshold and homogeneous

Hidden Markov Models

45

with respect to the magnitude scale. Here, the discrete time instants correspond to the sequential numbering of earthquakes. 43˚

42˚

41˚

40˚

39˚

38˚

37˚

36˚

35˚

34˚

km 33˚ 18˚

0

100

19˚

20˚

200 21˚

22˚

23˚

24˚

25˚

26˚

27˚

28˚

29˚

30˚

Figure 2.3. Epicentral distribution of seismicity in the study area from the 6th Century BC to May 2011. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

2.4.1. Two hidden states and three observation types The earthquake catalog contains 116 strong (M ≥ 6.5) earthquakes, which are divided into three subsets. In particular, the dataset is partitioned into a subset containing earthquakes with magnitudes M ∈ [6.5, 6.7] (55 earthquakes), the second subset includes earthquakes with magnitudes

46

Earthquake Statistical Analysis through Multi-state Modeling

M ∈ [6.8, 7.1] (48 earthquakes) and the third subset consists of earthquakes whose magnitudes are larger than 7.1 (13 earthquakes). A clustering method, the k-means method, was used to classify the earthquakes into different subsets. First, we consider that two hidden states do exist underlying the observations and we solve the three aforementioned problems. To estimate the parameters for the two-state HMM, we apply the Baum–Welch algorithm. For our purposes, we choose random initial values many times. Once the likelihood function is maximized, the corresponding estimators are considered as the maximum likelihood estimators. However, there is no certainty that the parameter estimates are the “global” maximum likelihood estimates, since despite the large number of initial points, they could correspond to local estimates. For the application of the Baum–Welch algorithm, a stopping criterion needs to be defined, which could be a predefined maximum iteration number, the stationarity of the likelihood function, the stationarity of the parameter estimates and so on. Obviously, the application of the algorithm depends on both the choice of the initial values and the selected stopping criterion. For more discussion on this topic, we refer the interested reader to [BIE 03] and the references therein. Here, we use as a stopping criterion the stationarity of the log-likelihood function, i.e. the non-augmentation of the likelihood function (with 3 digits accuracy). The estimated transition probability matrix, P, is

State 1 P = State 2



State 1 State 2  0.481 0.519 , 0.340 0.660

 is whereas the estimated emission probability matrix, R,

Hidden Markov Models

 = State 1 R State 2



Observ. Type 1 Observ. Type 2 0.936 0.163

Observ. Type 3

0.003 0.691

0.061 0.146

47

 .

According to the results, earthquakes with magnitudes M ∈ [6.5, 6.7] are generated by the first stress field level. On the contrary, the second level mainly generates earthquakes with magnitudes M ∈ [6.8, 7.1]. In particular, earthquakes belonging to the first type, i.e. with M ∈ [6.5, 6.7], result from the first level of the stress field, with a probability of 0.936. Earthquakes with magnitudes M ∈ [6.8, 7.1] are generated by the second stress field level, with a probability of 0.691. Figure 2.4 presents the occurrence of earthquakes with M ≥ 6.5 (top) and the revealed hidden states (bottom). Magnitudes

8

7

Decoded States

6 1845

1865

1885

1905

1865

1885

1905

1925

1945

1965

1985

2005

1925 1945 Time (years)

1965

1985

2005

2

1 1845

Figure 2.4. Magnitudes and revealed stress field levels versus time for the two-state HMM. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

We observe that the underlying Markov chain stayed in the first state in the period 1870–1881 since there were no occurrences of earthquakes with magnitudes larger than 6.7. Concerning the period 1952–1959, earthquakes were

48

Earthquake Statistical Analysis through Multi-state Modeling

exclusively emitted from the second hidden state. On the contrary, stronger earthquakes, i.e. earthquakes with M ≥ 7.5, are associated with the second hidden state. 2.4.2. Three hidden states and three observation types As a next step, we assume that three levels of the actual stress field do exist. The initial values of the parameters are randomly selected many times, and the three aforementioned problems are solved. The parameters set ϑ is estimated as State 1 State 1 0.829 P = State 2 ⎝ 0.126 State 3 0.000 ⎛

State 2 State 3 ⎞ 0.000 0.171 0.781 0.093 ⎠. 1.000 0.000

and ⎛ State 1  = State 2 ⎝ R State 3

Observ. Type 1 0.697 0.403 0.000

Observ. Type 2 0.303 0.597 0.000

Observ.Type 3 ⎞ 0.000 ⎠. 0.000 1.000

In two successive iterations, the log-likelihood function takes the same value (log L = −108.12), which implies the convergence of the Baum–Welch algorithm (320 iterations). The estimated emission probability matrix indicates that the observations of the first type are mainly emitted by the first hidden state, whereas observations of the third type uniquely come from the third hidden state. On the contrary, the second hidden state is associated with the generation of observations of the first two types. Earthquakes with magnitudes M ∈ [6.5, 7.1] are mainly followed by earthquakes with the same observation type. Periods are characterized by rarity or a lack of strong earthquakes (M > 7.1), when the underlying chain visits the

Hidden Markov Models

49

first two stress field levels. On the contrary, when the underlying chain primarily visits the third stress field level, it is characterized by rarity or a lack of events with M ≤ 7.1. According to the estimated transition probability matrix, P , the visits of the Markov chain to the first hidden state are mainly followed by self-transitions, which in turn results in earthquakes with magnitudes smaller than 7.2. The second hidden state mainly leads to self-transitions of the underlying Markov chain, whereas the third state mainly leads to visits of the second hidden state. In other words, the hidden state that is responsible for the generation of stronger earthquakes (M > 7.1) results, in the next step, in occurrences of weaker earthquakes (M ≤ 7.1). Referring to the solution of the decoding problem, the lower part of Figure 2.5 depicts the optimal state sequence for the given sequence of observations. The upper part of Figure 2.5 presents earthquake magnitudes and occurrence times along with the corresponding optimal state sequence.

Magnitudes

8

7

Decoded States

6 1845

1865

1885

1905

1865

1885

1905

1925

1945

1965

1985

2005

1925 1945 Time (years)

1965

1985

2005

3 2 1 1845

Figure 2.5. Magnitudes and revealed stress field levels versus time for the three-state HMM. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

50

Earthquake Statistical Analysis through Multi-state Modeling

The magnitudes of earthquakes that occurred in the periods 1912–1952 and 1957–1968 were smaller than 7.2, and in particular, they were larger than 6.7. According to the application of the Viterbi algorithm, these earthquake occurrences were due to the second hidden state. Earthquake occurrences that took place in the periods 1845–1855, 1858–1866, 1870–1883 and 1972–1981 were due to the first hidden state since the corresponding magnitudes were smaller than 6.8. 2.4.3. Model selection and simulation First, we use the Akaike information criterion [AKA 74] in order to choose the best model, i.e. the model that is most probable to have given rise to the observations. In particular, the fitted HMMs are compared with respect to the likelihood function and their number of states. The best model would be the one that minimizes the AIC defined by AIC = −2 · log L + 2, where log L is the maximum value of the likelihood function and  is the corresponding number of (free) parameters. As stated by the AIC values, the two-state HMM (AIC = 233.460) describes the observations better than the three-state HMM (AIC = 240.240). Alternatively, we can use the Bayesian information criterion [SCH 78], which is defined by BIC = −2 · log L + log n, where n represents the number of observations. For the model with two hidden states, BIC takes the value 249.984, whereas for the model with three states, the value of BIC is 273.288. Hence, both the information criteria indicate

Hidden Markov Models

51

that the two-state HMM describes the observations better than its three-state counterpart. Focusing now on the optimal model, i.e. the two-state HMM, we compute the transition probabilities for the observation sequence given by       P Zk = m  , R (j) pmk Rm (i)  P Yk+1 = j|Yk = i = P Y = i k m∈E ∈E for all i, j ∈ A, k ∈ N. Then, the transition probabilities of the observation sequence are estimated as follows: Observ. Type 1 Observ. Type 2 0.517 0.376 Observ. Type 1 Observ. Type 2 ⎝ 0.427 0.456 Observ. Type 3 0.455 0.430 ⎛

Observ.Type 3 ⎞ 0.107 ⎠. 0.117 0.115

We further compute the probabilities that the underlying Markov chain visits a particular state given that a specific observation is made. We move one step further and compute the confidence intervals of the parameters of the two-state HMM by using the parametric bootstrap method [VIS 00, ZUC 09]. In particular, the percentile method [EFR 93] is used to provide the bootstrap confidence intervals. It is worth mentioning that the asymptotic confidence intervals could also be obtained. We generate two sets of samples (of sizes 500 and 1,000) where each sample has the same size as the real dataset, i.e. 116 data points. The two-state HMM is then applied to each sample, and the MLEs of the parameters along with their

52

Earthquake Statistical Analysis through Multi-state Modeling

95% confidence intervals are calculated (Tables 2.1, 2.2). Briefly, the second row describes the parameter estimates, whereas the third and fourth rows present the corresponding confidence intervals obtained by using 500 bootstrap samples. Similarly, the fifth and sixth rows present the respective estimates obtained based on 1,000 bootstrap samples. The lower confidence bound is defined as the n · α/2-th order value of the parameter, whereas the upper confidence bound corresponds to the n · (1 − α/2)-th order value, where α denotes the significance level and n denotes the number of samples. Conditional distribution of the 1st hidden state, given the observations 1 0.8 0.6 0.4 0.2 0 0

15

30

45

60

75

90

105

Conditional distribution of the 2nd hidden state, given the observations 1 0.8 0.6 0.4 0.2 0 0

15

30

45

60 Event Number

75

90

105

Figure 2.6. Conditional distribution of the hidden states, (Jn )n∈N , given the observations, (Yn )n∈N , resulting from the two-state HMM. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Hidden Markov Models

P arameter M LE Lower bound (500) Upper bound (500) Lower bound (1,000) Upper bound (1,000)

p11 0.481 0.000 0.953 0.258 0.964

p12 0.519 0.047 1.000 0.036 0.742

p21 0.340 0.000 0.843 0.000 0.818

53

p22 0.660 0.158 1.000 0.182 1.000

Table 2.1. Confidence intervals for the transition probabilities of the underlying Markov chain at 5% significance level P arameter M LE Lower bound (500) Upper bound (500) Lower bound (1,000) Upper bound (1,000)

R1 (1) 0.936 0.538 1.000 0.512 1.000

R1 (2) 0.003 0.000 0.387 0.000 0.401

R1 (3) 0.061 0.000 0.384 0.000 0.380

R2 (1) 0.163 0.000 0.479 0.000 0.490

R2 (2) 0.691 0.396 0.965 0.386 0.970

R2 (3) 0.146 0.000 0.450 0.000 0.410

Table 2.2. Confidence intervals for the emission probabilities at 5% significance level

2.4.4. Steps number for the first earthquake occurrence We first observe that the stochastic process (Z, Y) = (Zk , Yk )k∈N is a (time-homogeneous) Markov chain with transition kernel     Q (i, a)(i , a ) = P (Zk+1 , Yk+1 ) = (i , a )|(Zk , Yk ) = (i, a) = pii Ri (a ), where (i, a), (i , a ) ∈ E × A, k ∈ N, and denote its initial law by a. We now consider that the state space of the MC (Z, Y), i.e. the space into two sets U and D, i.e.

E × A, is partitioned  E × A = U D, where U D = ∅, U, D = ∅. We further denote by a1 the restriction of the row vector a in U . We further denote by T(i ,j  ) the first passage time in state (i , j  ) ∈ D, i.e. T(i ,j  ) = inf{k ∈ N : (Zk , Yk ) = (i , j  )}.

54

Earthquake Statistical Analysis through Multi-state Modeling

If the initial state of the MC belongs to U , then the mean value of T(i ,j  ) is given by   E T(i ,j  ) = a1 (I − Q11 )−1 1r , where Q11 represents the restriction of Q on U × U and r is the cardinal of U . Moreover, 1r represents the column vector of r ones. Let us now focus on the state (i , j  ) = (2, 3), which corresponds to a visit of the underlying MC to the second state and the occurrence of an earthquake with magnitude greater than 7.1. We consider that all the states of the space E × A belong to the set U with the exception of the state (i , j  ). Assuming that at time k = 0 the MC visits the state (Z0 , Y0 ) = (1, 1), 6.662 steps will be needed on average to visit for the first time the state (i , j  ). In other words, the second hidden state will generate the occurrence of an earthquake with a magnitude greater than 7.1, when 6.662 earthquakes occur on average. The variance of the steps number for the first passage from the i−th state of U to a state that belongs to D is given by V ari (T ) = K(i) − (L1 (i))2 ,   where K = (I − Q11 )−1 1 + 2Q11 (I − Q11 )−1 1r and L1 = (I − Q11 )−1 1r . Then, the steps number for the first rtransit from U to D has variance that is given by V ar(T ) = i=1 α(i)V ari (T ). The overall variance of the steps number for the first transit to the state (i , j  ) is equal to V ar(T ) = 35.663. 2.5. Conclusion Statistical modeling through hidden Markov models could significantly contribute to the understanding of earthquake

Hidden Markov Models

55

occurrences. In particular, the stress field and its temporal variation are some of the most significant physical quantities that determine the earthquake process. The HMMs are applied to a set of occurrences of strong (M ≥ 6.5) earthquakes that took place in Greece and its surrounding area in the last one and a half centuries with the ultimate goal of revealing the stress field related to an ensuing earthquake. Since there is no “a priori” information on the number of existing stress field levels, different numbers of hidden states were considered, to improve the understanding of hidden states. First, three observation types were considered corresponding to earthquake magnitudes and two states of the unknown stress field do exist. Second, we considered that there exist three different hidden states causing earthquake occurrences and compared the two- and three-state HMM by means of AIC and BIC criteria. According to the criteria, the two-state HMM is selected over the three-state model. Concerning the two-state HMM, which is the optimal one in the aforementioned sense, earthquakes with magnitudes of the first observation type (M ∈ [6.5, 6.7]) are due to the first hidden state, whereas the second state generates earthquakes with different observation types and mainly earthquakes with magnitudes M ∈ [6.8, 7.1]. Additionally, in order to quantify the margin of error in the estimates, the bootstrap confidence intervals were computed for the parameters of the two-state model. We focus on the mean and the variance of the steps number needed for an earthquake with magnitude M > 7.1 to occur, given that it was emitted from the second hidden state. The HMMs could contribute to the seismic hazard assessment via revealing the different levels of the stress field and their number, as well as their association with earthquake magnitudes. Although HMMs could provide insights into the unknown stress field, these models are restrictive since they assume

56

Earthquake Statistical Analysis through Multi-state Modeling

geometrically distributed sojourn times. In other words, they assume that the times between the hidden states follow geometric distributions. The more general hidden semi-Markov models (HSMMs) [BAR 08] could relax the previous restriction by allowing any distribution for the sojourn times. In this sense, the study of HSMMs could improve the understanding of the earthquake generation process.

3 Hidden Markov Renewal Models

“The two most powerful warriors are patience and time” — L. Tolstoy 3.1. Introduction In the previous chapter, HMMs were applied to reveal the hidden physical parameter that controls the earthquake generation process, i.e. the stress field. This chapter aims to give insights into the genesis of strong earthquakes by estimating important indicators of the underlying “hidden” process which is considered to be a semi-Markov chain. Taking into account the results presented in the previous chapter, we focus our interest on the discrete-time hidden semi-Markov model (HSMM). In this model, the state duration that is attached to transitions can be distributed according to any discrete-time distribution. We further make the assumption that the observations are recorded when the jumps occur. Our objective is to estimate important indicators related to the levels of the stress field based on the suggested HSMM.

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

58

Earthquake Statistical Analysis through Multi-state Modeling

3.2. Semi-Markov framework Semi-Markov chains (SMCs) are discrete-time stochastic processes that generalize both renewal and Markov chains. For a Markov chain, the sojourn time in each state follows a geometric distribution. On the contrary, for a semi-Markov chain, the sojourn time can follow any distribution on N. For a Markov process in a continuous-time context, the sojourn time in each state follows an exponential distribution. Nevertheless, for a semi-Markov process, the sojourn time can follow any distribution on R+ . For an introduction to homogeneous semi-Markov chains, we refer the interested reader to [HOW 71] and [MOD 00]. For non-homogeneous SMCs, see [VAS 92] and [VAS 94], and for the ergodic theory of SMCs, see [ANS 60]. A thorough presentation of the theory of semi-Markov models and their applications is given in [BAR 08]. From an applicative point of view, the discrete-time framework is particularly advantageous compared to the continuous-time framework. An SMC cannot explode, and a finite sequence of semi-Markov kernel convolution products can be used to express the Markov renewal function. On the contrary, in the continuous-time case, an infinite sequence of the aforementioned products is necessary to express the Markov renewal equation. Thus, numerical computations are simpler and more precise in the discrete-time context. Moreover, discrete-time models can serve as a basis for numerical computations in the continuous-time context after the corresponding discretizations. Although the discrete-time case can be obtained from the continuous-time case, by considering a counting measure for discrete-time points, we focus on the discrete-time case because, when dealing with real-life problems, in most cases, data is discrete. On the one hand, the discrete-time model is much simpler to handle numerically than its continuous-time counterpart. On the other hand, it is used to handle numerically continuous-time

Hidden Markov Renewal Models

59

formulated problems. Here we are interested in the non-parametric case, i.e. the statistical estimation is based on counting processes. For a parametric multi-state approach via semi-Markov models (SMMs), see [BAR 17] and the references therein. (Xn ) : jump time

state

(Jn ) : system’s state

J0



(Sn ) : sojourn time

X1



Jn



J1

S0

S1

Xn+1

X2

S2

Sn

Sn+1

time

Figure 3.1. A representative sample path of the Markov renewal chain

Let us consider a random system with a finite state space E = {1, 2, . . . , s} and denote by N the set of non-negative integers (N∗ = N\{0}). The system evolves in time according to the following chains (Figure 3.1): – J = (Jn )n∈N taking values in E, where Jn represents the state visited by the system at the n−th jump time; – S = (Sn )n∈N taking values in N, where Sn represents the n−th jump time. We assume that S0 = 0 and 0 < S1 < . . . < Sn < Sn+1 < . . . ; – X = (Xn )n∈N taking values in N, where X0 = 0 almost surely and Xn = Sn − Sn−1 for all n ∈ N. Then, for all n ∈ N, Xn stands for the (sojourn) time spent in Jn−1 before the n−th jump.

60

Earthquake Statistical Analysis through Multi-state Modeling

D EFINITION 3.1.– (Markov renewal chain) The chain (J, S) = (Jn , Sn )n∈N is a Markov renewal chain (MRC) if it satisfies almost surely (a.s.) P (Jn+1 = j, Xn+1 = k|J0 , S0 , . . . , Jn , Sn ) = P (Jn+1 = j, Xn+1 = k|Jn ),

[3.1]

for all i, j ∈ E, n ∈ N and k ∈ N. If equation [3.1] is independent of n, then the Markov renewal chain (J, S) is called time-homogeneous, and the discrete-time semi-Markov kernel q = (qij (k); i, j ∈ E, k ∈ N) is defined by qij (k) = P (Xn+1 = k, Jn+1 = j|Jn = i). Then, the cumulative semi-Markov Q = (Qij (k); i, j ∈ E, k ∈ N) is defined by Qij (k) = P (Xn+1 ≤ k, Jn+1 = j|Jn=i ) =

k 

kernel

qij ().

=0

The process J = (Jn )n∈N is called the embedded Markov chain (EMC) of the MRC (J, S) with transition probability matrix P = (pij ; i, j ∈ E). We further denote by α = (αi ; i ∈ E) the initial law of the EMC, where α(i) = P (J0 = i), and by ν = (νi ; i ∈ E) the stationary distribution of the chain J. The distributions of the sojourn times are of two different types: the sojourn time distribution in a given state and the sojourn time distribution in a given state depending on the resulting visited state. We first define the conditional sojourn time distribution matrix by f = (fij (k); i, j ∈ E, k ∈ N), where fij (k) = P (Xn+1 = k|Jn = i, Jn+1 = j).

Hidden Markov Renewal Models

61

Then, the cumulative conditional distribution of the sojourn time is given by Fij (k) = P (Xn+1 ≤ k|Jn = i, Jn+1 = j) =

k 

fij ().

=0

The sojourn times are attached to the transitions, i.e. qij (k) = fij (k)pij , for any i, j ∈ E, k ∈ N. Second, for k ∈ N, we denote by hi (k), the sojourn time distribution in state i ∈ E: hi (k) = P (Xn+1 = k|Jn = i) =



qij (k).

j∈E

Then, the cumulative sojourn time distribution in state i ∈ E is given by Hi (k) = P (Xn+1 ≤ k|Jn = i) =

k 

hi ().

=1

Moreover, the corresponding survival function of sojourn times is given by H i (k) = 1 − Hi (k). D EFINITION 3.2.– (semi-Markov chain) Let N (k) = max{n ∈ N|Sn ≤ k} be the discrete-time counting process of the number of jumps in [1, k] ⊂ N. The chain Z = (Zk )k∈N is called a semi-Markov chain associated with the Markov renewal chain (J, S) if Zk = JN (k) .

62

Earthquake Statistical Analysis through Multi-state Modeling

Thus, Zk describes the system’s state at each (calendar) time k ∈ N. Furthermore, we have that Jn = ZSn and Sn = min{Sn−1 < k|Zk = Zk−1 }, n ∈ N. R EMARK 3.1.– It is worth mentioning that the counting process N (k) is defined in [1, k] instead of [0, k] for technical convenience reasons. Let us now introduce a mathematical operation on two matrix functions, the (discrete-time) convolution product. The set of real matrices on E 2 is denoted by ME , whereas ME (N) represents the set of matrix-valued applications which are defined on N and take values in ME . D EFINITION 3.3.– Let A, B ∈ ME (N) be two matrix-valued functions. The matrix convolution product A ∗ B is the matrix-valued function C ∈ ME (N) defined by Cij (m) =

m  

Bj (r)Ai (m − r),

i, j ∈ E, m ∈ N.

r=0 ∈E

Then, the following recursive formula can be used to define the n−fold convolution (n ∈ N) [BAR 08]. D EFINITION 3.4.– Let A ∈ ME (N) be a matrix-valued function. The n−fold convolution (n ∈ N) A(n) is the matrix-valued function that is recursively defined by  1, if m = 0 and i = j, (0) Aij (m) = 0, elsewhere, (1)

Aij (m) = Aij (m) and

(n)

Aij (m) =

m   r=0 ∈E

(n−1)

Ai (r)Aj

(m − r), m ∈ N, n ≥ 2.

Hidden Markov Renewal Models

63

Starting from a state i ∈ E (k = 0), the probability that the EMC will make the n−th jump to state j ∈ E at time k ∈ N∗ is expressed by (n)

qij (k) = P (Sn = k, Jn = j|J0 = i), which is the element of the n−fold kernel convolution in the position (i, j). On the contrary, for n = 0, we have (0)

qij (k) = P (S0 = k, J0 = j|J0 = i). Let us now consider a trajectory of the MRC (J, S), censored at fixed time M ∈ N, i.e. H(M ) = (X0 , . . . , XN (M ) , J0 , . . . , JN (M ) , UM ), where N (M ) = max{n|Sn ≤ M } counts the jumps number up to M , and UM = M − SN (M ) denotes the censored time spent in the last state JN (M ) . The trajectory of (J, S) evolves as follows: the EMC visits the first state i0 according to the initial law α. Then, the next state to be visited, i1 , is chosen based on the transition matrix P. Before moving to the state i1 , the EMC stays in the state i0 for a duration k, which is determined by the sojourn time distribution (fi0 i1 (k); k ∈ N). Note that, contrary to Markov chains, the sojourn time distributions can be any discrete distributions. Hence, semi-Markov chains are better adapted to real-life applications in comparison with Markov chains. For all i, j ∈ E and 1 ≤ k ≤ M , we denote: N (M ) – Ni (M ) = n=1 1{Jn−1 =i} ; N (M ) – Nij (M ) = n=1 1{Jn =j,Jn−1 =i} ; N (M ) – Nij (k, M ) = n=1 1{Jn =j,Xn =k,Jn−1 =i} . In other words, Ni (M ) counts how many times the EMC visits the state i ∈ E up to time M . On the contrary, Nij (M )

64

Earthquake Statistical Analysis through Multi-state Modeling

(respectively Nij (k, M )) counts the transitions of the EMC from i to j up to time M with any sojourn time (respectively with sojourn time equal to k). Considering a sample path H(M ) of the MRC, for all i, j ∈ E and k ∈ N (k ≤ M ), we define the empirical estimators of transition probabilities, pij , and the semi-Markov kernel (SMK), qij (k), respectively by pij (M ) =

Nij (M ) Nij (k, M ) and qij (k, M ) = . Ni (M ) Ni (M )

Once the estimator of the SMK is obtained, the computation of any quantity of the SMC is straightforward. The empirical estimator of the cumulative sojourn time  i (k, M ), is expressed by distribution in the state i ∈ E, H  i (k, M ) = H

k 

qij (, M ),

[3.2]

j∈E =0

 (k, M ), is estimated by whereas the survival function, H i k

 (k, M ) = 1 −   q (, M ), H i ij j∈E =0

for any k ∈ N. Similarly, for i, j ∈ E such that Nij (M ) = 0, the empirical estimator of fij (k) is defined by fij (k, M ) = Nij (k, M )/Nij (M ), and the respective estimator of the cumulative distribution is given by Fij (k, M ) =

k  =0

for any k ∈ N.

Nij (, M )/Nij (M ),

Hidden Markov Renewal Models

65

3.3. Hidden Markov renewal framework The main idea of an HMM is as follows: we observe the temporal evolution of a certain phenomenon (observed process); however, we are interested in the temporal evolution of another phenomenon that is not observable (hidden process). The observed and hidden processes are stochastically dependent, in the sense that the hidden process determines the realization of the observed process. Despite their advantages, HMMs are characterized by an important constraint. The only sojourn time distribution that they allow is the geometric one. Contrary to HMMs, HSMMs can potentially better describe a real-life problem since they allow for any sojourn time distribution. Let us assume that observation times coincide with jump times, i.e. observations are only emitted at jump timesn ∈ N.  Then, the discrete-time process (J, S, Y) = Jn , Sn , Yn n∈N is called the hidden Markov renewal chain (HMRC) (Figure 3.2). On the contrary, if we consider that observations are generated at k ∈ N, then the process (Z, Y) = (Zk , Yk )k∈N is called the hidden semi-Markov chain [BAR 08]. (Yn−1 )

(Yn )

(Yn+1 )

(Jn−1 , Sn−1 )

(Jn , Sn )

(Jn+1 , Sn+1 )

Figure 3.2. Typical trajectory of a hidden Markov renewal chain

66

Earthquake Statistical Analysis through Multi-state Modeling

We consider that the sequence Y = (Yn )n∈N consists of conditionally independent random variables over (Jn )n∈N , i.e. Ri (a) = P (Yn = a|Y0n−1 = ·, J0n−1 = ·, Jn = i) = P (Yn = a|Jn = i), for all a ∈ A, i ∈ E, n ∈ N. We further denote by R = (Ri (a); i ∈ E, a ∈ A) ∈ ME×A the emission probability matrix. A stationary HMRC is entirely determined by the parameter set Θ = (q, R), whereas a non-stationary HMRC is fully determined by Θ = (α, q, R). Since we focus on the stationary case, the parameter set is Θ = (q, R) or equivalently Θ = (f, p, R). 3.4. Modeling earthquakes in Greece We go one step further and apply the hidden Markov renewal model to the (non-declustered) catalog of seismicity in Greece from 1865 to 2008 (see Appendix Appendix 3). The study area incorporates a variety of tectonic styles and is well-cataloged. The catalog considered for the analysis is homogeneous and complete for magnitudes M ≥ 6.5 for the respective period. The available dataset promises to capture well the information needed for statistical modeling. The estimated emission probability matrix indicates that earthquakes with M ∈ [6.5, 6.7] are due to the first hidden state, whereas earthquakes with M ∈ [6.8, 7.1] are generated by the second hidden state. Stronger earthquakes (M > 7.1) are mainly due to the third state of the stress field. Once the hidden Markov chain has been decoded and by assuming that jump times coincide with emission times, important indicators of the underlying semi-Markov chain can be obtained. In the sequel, we present the empirical estimators of these indicators, where the states are considered to be the states revealed in Chapter 2. It is worth mentioning that

Hidden Markov Renewal Models

67

these estimators can serve to initialize the EM algorithm [BAR 08], which is the reference algorithm for parameter estimation in HSMMs. On the contrary, the states of the HMRM can be decoded via the Viterbi algorithm introduced in [PER 15]. First, the semi-Markov kernel is empirically estimated based on equation [3.3]. The maximum observed value of the sojourn time is 146 years. The empirical estimator of the semi-Markov kernel qij (k, M ) is described in Figure 3.3 for all i, j ∈ E and for all k ∈ N. Figure 3.4 presents the empirical estimator of the conditional sojourn time distribution for all i, j ∈ E and for all k ∈ N, fij (k, M ). 0.08 i=1, j=1 i=1, j=2 i=2, j=1 i=2, j=2

Semi−Markov Kernels

0.06

0.04

0.02

0 0

20

40

60 80 Time (months)

100

120

140

Figure 3.3. Empirical estimator of the semi-Markov kernel, qij (k, M ). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Before providing an explicit formula for the stationary distribution of (Zk )k∈N , let us make two assumptions: A1 The underlying chain is irreducible; A2 For any j ∈ E, mj = E(S1 |J0 = j) < ∞.

68

Earthquake Statistical Analysis through Multi-state Modeling

State Transition Functions

0.16 i=1, j=1 i=1, j=2 i=2, j=1 i=2, j=2

0.12

0.08

0.04

0 0

20

40

60 80 Time (months)

100

120

140

Figure 3.4. Empirical estimator of the conditional sojourn time distribution, fij (k, M ). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

The stationary distribution of the underlying SMC in the state j ∈ E can be estimated through the following plug-in type estimator [BAR 08]: π j (M ) = 

νj (M ) m  j (M ) ,  i (M ) νi (M ) i∈E m

where νi (M ) estimates the stationary distribution of the EMC and m  i (M ) estimates mi in the state i ∈ E. In particular, we use the following empirical estimators: νi (M ) =

Ni (M ) N (M )

and m  i (M ) =

 k≥0

 i (k, M )), (1 − H

Hidden Markov Renewal Models

69

for any i ∈ E. In addition, we denote by (Snj )n∈N the passage times between successive visits of the state j ∈ E and by μjj = Ej (S1j ) the corresponding mean recurrence time. Then, the mean recurrence time is estimated by  m  i (M ) νi (M ) μ jj (M ) = i∈E , νj (M ) for any state j ∈ E. In Table 3.1, the aforementioned estimated quantities are presented for any state j ∈ E. State j μ jj (M ) π j (M ) m  j (M ) νjj (M ) 1 35.727 0.542 19.364 0.474 2 32.213 0.458 14.754 0.526 Table 3.1. Estimated mean recurrence times and stationary distribution of the underlying SMC

3.4.1. Hitting times and earthquake occurrence numbers The probability that the embedded Markov chain makes a jump in the state j ∈ E at time k (k ≥ 1), given that its initial state (k = 0) is i ∈ E, is given by ψij (k) = P

k  n=0

k   (n) {Sn = k, Jn = j}|J0 = i = qij (k), n=0

 (n) with the respective empirical estimator ψij (k, M ) = k=0 qij (, M ). In the next step, we are interested in estimating the mean number of times that the hidden state j ∈ E is visited up to time k ∈ N, given that the initial state (k = 0) is i ∈ E. This is expressed by the Markov renewal function [BAR 08] and defined by Ψij (k) = Ei

k  n=0

k   1{Jn =i} = ψij (). =0

70

Earthquake Statistical Analysis through Multi-state Modeling

We further use the Markov renewal function to estimate the mean number of earthquakes that are produced from any initial state i ∈ E to any state j ∈ E (Figure 3.5). Expected Number of Earthquake Occurrences

0.09 i=1, j=1 i=1, j=2 i=2, j=1 i=2, j=2

0.06

0.03

0 0

20

40

60 80 Time (months)

100

120

140

Figure 3.5. Mean earthquake occurrence number from an initial hidden state i ∈ E to a hidden state j ∈ E, ψij (k, M ). For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

In the following, we denote by D a non-empty subset of E and by TD the first hitting time in D, i.e. TD = inf{ ∈ N : Z ∈ D}

and

inf{∅} = ∞.

Then, we call the hitting time distribution at time k ∈ N the distribution of TD , i.e. R(k) = P (TD ≤ k) = 1 − α1 ψ11 ∗ (I − H1 )(k)1s1 , where index 1 denotes the restriction of the respective  (n) vector/matrix to the first state and ψ11 (k) = kn=0 q11 (k). The

Hidden Markov Renewal Models

71

hitting time distribution can be estimated by the following plug-in type estimator:    M ) · 1)11 ](k)1s .  M) = 1 − α  1 [ψ11 (·, M ) ∗ I − diag(Q(·, R(k, 1 The conditional probability that the first hitting time is equal to k ∈ N, given that there are no visits to the set D until time k − 1, is denoted by R(k) = P (TD = k|TD ≥ k) and is given by

0, R(k) = 1−

R(k) R(k−1) ,

if R(k − 1) = 0, otherwise.

Then, the respective empirical estimator is given by ⎧  R(k,M )  − 1, M ) = 0, ⎪ , if k ≥ 1 and R(k ⎨ 1 − R(k−1,M  )  R(k, M ) = R(0,  M ), if k = 0, ⎪ ⎩ 0, otherwise. (ij)

In the sequel, Pk|δ denotes the probability that the hidden state j ∈ E will be visited during the next k time instants, given that the last visited state is the state i ∈ E and at least δ time instants have elapsed. We further use the term instantaneous rate of earthquake occurrences to describe the aforementioned probability, which in turn is expressed by the formula (ij)

Pk|δ =

Qij (k + δ) − Qij (k) H i (k)

and is estimated by  ij (k, M )  ij (k + δ, M ) − Q Q (ij) Pk|δ (M ) = .  (k, M ) H i

72

Earthquake Statistical Analysis through Multi-state Modeling

The estimated instantaneous rate of earthquake occurrences is presented in Table 3.2 for any type of transition. In particular, we compute the probability that an earthquake occurs in the next δ time instants (δ = 1/2, 1, 2, 3, 4 years), given that at least six months have elapsed since the last earthquake occurrence. Moreover, considering that the last visited state is the first state and no earthquakes have occurred during the last k time instants (k = 1, 2, 3 years), the probabilities of an earthquake occurrence during the next semester/years are computed (Table 3.3). Given that {Jn = 2}, the corresponding rates are reported in Table 3.4. k = 1 semester (11) (12) (21) (22) Pk|δ Pk|δ Pk|δ Pk|δ

δ

1 semester 1 year 2 years 3 years 4 years

0.06 0.12 0.22 0.26 0.30

0.31 0.38 0.48 0.55 0.55

0.11 0.14 0.19 0.21 0.23

0.33 0.37 0.48 0.63 0.70

Table 3.2. Estimated instantaneous rate of earthquake occurrences

δ

k = 1 year k = 2 years k = 3 years (11) (12) (11) (12) (11) (12) Pk|δ Pk|δ Pk|δ Pk|δ Pk|δ Pk|δ

1 semester 1 year 2 years 3 years 4 years

0.10 0.16 0.32 0.35 0.42

0.11 0.11 0.38 0.38 0.44

0.13 0.22 0.26 0.35 0.35

0.23 0.38 0.38 0.45 0.53

0 0.11 0.33 0.33 0.33

0 0 0.19 0.37 0.56

Table 3.3. Estimated instantaneous rate of earthquake occurrences – starting state 1

Hidden Markov Renewal Models

73

k = 1 year k = 2 years k = 3 years (21) (22) (21) (22) (21) (22) Pk|δ Pk|δ Pk|δ Pk|δ Pk|δ Pk|δ

δ

1 semester 1 year 2 years 3 years 4 years

0.04 0.11 0.16 0.18 0.20

0.07 0.20 0.46 0.60 0.73

0.01 0.02 0.04 0.05 0.05

0.10 0.39 0.58 0.77 0.87

0.06 0.06 0.12 0.12 0.12

0.18 0.35 0.70 0.88 0.88

Table 3.4. Estimated instantaneous rate of earthquake occurrences – starting state 2

3.5. Conclusion It is widely accepted that the earthquake occurrence time on a fault undergoing tectonic loading is controlled by both stress and frictional properties on that fault and by earthquakes on other nearby faults [STE 99]. The effects of a nearby earthquake are commonly associated with static and dynamic stress changes that it produces; however, they may also be related to processes set in motion by stress changes such as crustal fluid flow and plastic deformation. The aim of this chapter was to identify the stress field since its identification can help us to understand the dynamics and nature of the earthquake generation process, thus eventually leading to forecasting results. The data correspond to the realization of an unobservable state sequence where the states are related to different stress field levels. Here, we apply an HSMM where jump times coincide with emission times when the underlying chain is considered to be a semi-Markov chain. The model under study can be used to describe earthquake occurrences and to estimate the next most likely state along with its timing. The modeling of earthquakes through HSMMs can be extended in many ways. The addition of more parameters to the model can provide more insights into the nature of this process. More importantly, additional information about the

74

Earthquake Statistical Analysis through Multi-state Modeling

position of causative faults and faulting type can be incorporated and tested in the HSMM framework. The next challenge is to extend the model to space and incorporate variables based on both slow tectonic stress accumulation and coseismic stress changes related to large earthquakes [DEN 97a]. Once the empirical estimators of the distributions are obtained, a parametric framework can be used for the HSMM; for example, specific distributions can be chosen to describe sojourn times. Distributions that lead to closed-form solutions of the estimation problem can reduce computational complexity and lead to an efficient version of the expectation-maximization algorithm. Several extensions of HSMMs can be obtained. For example, the first-order Markov dependency characterizing the underlying SMC can be extended to the second or a higher order. Another example is when the underlying chain is considered to be non-homogeneous or non-stationary.

4 Hitting Time Intensity

“Success consists of going from failure to failure without loss of enthusiasm” — W. Churchill 4.1. Introduction In the stochastic modeling of random systems that can experience one or more failures, many reliability indicators have been studied. When multiple failures can occur for the system under study, the system is called repairable. On the contrary, if it only experiences one failure, then it is called a non-repairable system. Different reliability indicators have been introduced for both repairable and non-repairable systems, including mean failure times, hazard rates and availability function. For a detailed study of reliability indicators for semi-Markov chains, we refer the reader to [BAR 08, GEO 13, GEO 17, BAR 17] and references therein. In this chapter, we explicitly deal with an important reliability indicator for hidden (semi-) Markov models: the failure occurrence rate (ROCOF). In a continuous-time framework, the ROCOF at time t ∈ R∗ describes the

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

76

Earthquake Statistical Analysis through Multi-state Modeling

derivative of the mean failure number up to time t. On the contrary, in a discrete-time framework, it describes the probability that a failure occurs at time k ∈ N∗ . The ROCOF can decrease, increase or can even be constant, which means that the system under study can retrogress, ameliorate or even remain invariable over time. Here, in the discrete-time framework, the term discrete-time intensity hitting time (DTIHT) will be used to describe the ROCOF. For first-order Markov processes with a finite state space, the ROCOF was introduced in [YEH 97]. These results were generalized for Markov processes of higher order in [DAM 15]. Concerning finite semi-Markov processes, the ROCOF was studied in [OUH 02], whereas the general case was studied in [LIM 12]. Later, explicit formulas were obtained for the DTIHT of systems described by semi-Markov models [VOT 14] or hidden Markov renewal models [VOT 15]. The authors provided empirical estimators and numerical examples based on simulated and real seismological data. In this chapter, we briefly present the results obtained from the last two references. 4.2. DTIHT for semi-Markov chains We first denote the state space of Z by E = {1, . . . , s} and by U = (Uk )k∈N the backward recurrence time sequence, i.e. Uk = k − SN (k) . Then, following [LIM 01], the stochastic process (Z, U) = (Zk , Uk )k∈N is a (time-) homogeneous Markov chain, also called the double Markov chain. We denote the characteristics of the double Markov chain, i.e. its initial law,  = ( stationary law and transition kernel by a a(i, 0); i ∈ E),    = ( π π (i, t); i ∈ E, t ∈ N) and P = (P (i, t1 ), (j, t2 ); (i, t1 ), (j, t2 ) ∈ E × N ), respectively. In particular, we have  a(i, 0) = P (Z0 = i, U0 = 0),

Hitting Time Intensity

77

and following [CHR 08], we have ⎧ ⎨ qij (t1 + 1)/H i (t1 ), if i = j, t2 = 0,   P (i, t1 ), (j, t2 ) = H i (t1 + 1)/H i (t1 ), if i = j, t2 − t1 = 1, ⎩ 0, otherwise, for every (i, t1 ), (j, t2 ) ∈ E × N such that H i (t1 ) > 0. In the following, we make the next assumptions about the MRC: A1) the MRC (J, S) is irreducible and aperiodic; A2) the mean sojourn time in every state is finite. First, we consider that E is partitioned into functioning or up states and unworkable or down states. We denote the up and down states by U = {1, . . . , r} and D = {r + 1, . . . , s}, respectively. Obviously, U, D = ∅, U, D = E, U ∩ D = ∅ and U ∪ D = E. The value of the DTIHT at time k ∈ N∗ describes the mean transition number of the SMC from the operational to unworkable states at time k, i.e. rD (k) = P (Zk−1 ∈ U, Zk ∈ D). We first present an explicit formula for the DTIHT introduced in [VOT 15] and denote by Ak−1 =  aPk−1 , for ∗ every k ∈ N . T HEOREM 4.1.– The DTIHT at time k ∈ N∗ is expressed by the following formula: rD (k) =

k−1  i∈U j∈D =0

  Ak−1 (i, )P (i, ), (j, 0) .

78

Earthquake Statistical Analysis through Multi-state Modeling

P ROOF.– For a fixed value of k ∈ N∗ , we have rD (k) =



P (Zk−1 = i, Zk = j)

i∈U j∈D

=

k−1 

P (Zk−1 = i, Zk = j, Uk−1 = , Uk = 0)

i∈U j∈D =0

=

k−1 

  Ak−1 (i, )P (i, ), (j, 0) .

i∈U j∈D =0

4.2.1. Statistical estimation of the DTIHT Let us consider a trajectory of the MRC (J, S), censored at a fixed arbitrary time M ∈ N: H(M ) = (J0 , S1 , . . . , JN (M )−1 , SN (M ) , JN (M ) , UM ). We further denote the state space of the MRC by L (|L| = l0 ) and TM = {1, . . . , M }. Then, we introduce the counting processes of transitions of the MRC in the time interval [1, M ]: N (M ) – N(i,t1 ) (M ) = n=1 1{Sn−1 =t1 ,Jn−1 =i} ; N (M ) – N(i,t1 )(j,t2 ) (M ) = n=1 1{Sn−1 =t1 ,Jn−1 =i,Sn =t2 ,Jn =j} ; where (i, t1 ), (j, t2 ) ∈ L. We further define the counting process: N (M ) – Ni (M ) = n=1 1{Jn−1 =i} . At this point, we recall that the semi-Markov kernel can be estimated by [BAR 08] q ij (k, M ) =

N (M )  1 1{Jn =j,Xn =k,Jn−1 =i} , Ni (M ) n=1

Hitting Time Intensity

79

for any i, j ∈ E, k ∈ N. Then, the empirical estimator of the survival function is k

(k, M ) = 1 −   q (, M ), H i ij

k ∈ N.

j∈E =0

Using the previous empirical estimators, the plug-in type estimator of the transition kernel of the double Markov chain has elements given by ⎧

(t + 1, M )/H

(t , M ), if t − t = 1, i = j, ⎨H 2 1 i 1 i 1  



(t , M ), if t = 0, i = j, P M (i, t1 ), (j, t2 ) = q ij (t1 + 1, M )/H 1 2 i ⎩ 0,

otherwise.

Since the DTIHT is a functional of the characteristics of the double Markov chain, i.e. its initial law and transition kernel, we can directly obtain the following empirical, plug-in type estimator: r D (k, M ) =

k−1 



 

k−1;M (i, )P A M (i, ), (j, 0) ,

i∈U j∈D =0

k−1;M (i, ) is the element of the vector for every k ∈ N∗ , where A

k−1

 aPM in the position (i, ). The following theorem proves that r D (k, M ) is a strongly consistent estimator. P ROPOSITION 4.1.– The empirical estimator of the DTIHT at time k ∈ N∗ is strongly consistent, i.e. lim r D (k, M ) = rD (k)

M →∞

with probability 1. P ROOF.– We have that

lim P M = P

M →∞

80

Earthquake Statistical Analysis through Multi-state Modeling

with probability 1; since we have a finite number of terms, we directly obtain the desired result. (Z, U) be ergodic T HEOREM 4.2.– Let the double Markov chain √   and homogeneous. The random vector M P M (i, t1 ), (j, t2 ) −   P (i, t1 ), (j, t2 ) is asymptotically normal, i.e. √     L M P M (i, t1 ), (j, t2 ) − P (i, t1 ), (j, t2 ) −−−−→ N (0, Γ), M →∞

where Γ is the block diagonal matrix of dimension d × d (d = s2 (M + 1)2 ) defined by ⎞ ⎛ Ψ(1,0) 0 . . . . . . 0 π (1,0) ⎜ . . .. .. ⎟ ⎜ .. . . ... . . ⎟ ⎟ ⎜ ⎟ ⎜ . . . . . .. Ψ(i,t1 ) .. .. ⎟, Γ=⎜ ⎟ ⎜ . π (i,t1 ) ⎟ ⎜ . . . . . ⎜ . .. .. .. .. ⎟ ⎠ ⎝ . Ψ(s,M ) 0 . . . . . . . . . π(s,M )     δ(j,t2 )(r,t4 ) P (i, t1 ), (j, t2 ) − P (i, t1 ), (j, t2 ) and Ψ(i, t1 ) =   P (i, t1 ), (r, t4 ) , for every (i, t1 ), (j, t2 ), (r, t4 ) ∈ E × TM . ∗ 4.3.– For T HEOREM √   any k ∈ N , the random vector M r D (k, M ) − rD (k) is asymptotically normal, i.e. √   L  M r D (k, M ) − rD (k) −−−−→ N (0, Φ ΔΦ ), M →∞

R+

→ is the function where Φ :   Φ P (i , m ), (j  , t ) ; (i , m ), (j  , t ) ∈ E × TM [0, 1]d

=

k−1     Ak−1 (i, )P (i, ), (j, 0) i∈U j∈D =0

and Φ =





∂Φ

∂ P (i , m ), (j  , t )

 ; (i , m ), (j  , t ) ∈ U × TM



Hitting Time Intensity

81

is the d−dimensional row vector that includes the derivatives  of rD (k) with regard to P (i , m ), (j  , t ) . P ROOF.– First we note that √   √  M (

rD (k, M ) − rD (k)) = M Φ P M (i , m ), (j  , t )    . − Φ P (i , m ), (j  , t ) Then, since √      M P M (i , m ), (j  , t )    L − P (i , m ), (j  , t ) −−−−→ N (0, Γ), M →∞

the delta method [VAN 98] directly leads to the desired result. E XAMPLE.– Let us now illustrate the strong consistency of the empirical estimator of the DTIHT by means of Monte Carlo simulations. We consider an SMC with the state space E = {1, 2, 3} and the partition of E into the sets U = {1, 2} and D = {3}. The transition matrix of the EMC, P, and the initial law, α, are respectively given by ⎞ 0 0.5 0.5 ⎝0.3 0 0.7⎠ 0.6 0.4 0 ⎛

and

  α = 0.5 0.5 0 .

We assume that the sojourn times follow the Weibull distribution:  b b q k−1 − q k , if k ≥ 1, f (k) = 0, if k = 0,

82

Earthquake Statistical Analysis through Multi-state Modeling

where (q, b) = (0.1, 0.9) for transitions from state 1 to state 2 and, inversely, (q, b) = (0.6, 0.9) for transitions 1 −→ 3 and, inversely, (q, b) = (0.1, 2) for transitions 2 −→ 3 and (q, b) = (0.6, 0.9) for transitions 3 −→ 2. We are particularly interested in the discrete-time Weibull distribution since it is often encountered in reliability. We generate a trajectory of the SMC in the time interval [0, M ], where M = 50000, and compare the estimators that we obtain if we consider the “full” trajectory or a part of it with length M = 10000. In Figure 4.1, the theoretical and estimated values of the DTIHT are shown for the first 32 values of k.

0.06

0.05

DTIHT

0.04

0.03

0.02

0.01

True Value M=50000 M=10000

0.00 0

5

10

15

20

25

30

Time

Figure 4.1. DTIHT values versus time. For a color version of this figure, see www.iste.co.uk/votsi/multistate.zip

Hitting Time Intensity

83

R EMARK 4.1.– Note that as M increases, the DTIHT estimator converges to the true value, indicating its strong consistency. R EMARK 4.2.– We observe that the stationary DTIHT, i.e. lim r(k), is equal to 0.058. k→∞

4.3. DTIHT for hidden Markov renewal chains We consider a hidden Markov renewal model, where the observation space A is divided into two disjoint, non-empty subsets U and D such that U, D ⊂ A (U, D = A). The subset U = {1, . . . , r} represents the functioning (or up) states, whereas the subset D = {r + 1, . . . , s} represents the unworkable (or down) states. We  denote the state space of the HMRC (J, S, Y) = Jn , Sn , Yn n∈N by E ∗ (|E ∗ | = d0 ). We   further denote by π = (π (i, s); i ∈ E, s ∈ N), where  π (i, s) = u∈E νu qui (s), the stationary distribution of the MRC. In an HMRM context, we define the DTIHT as the mean transition number of the observation process to the set D at time k ∈ N∗ , i.e.        rD (k) = E ND (k) − E ND (k − 1) , where  ND (k) =

k 

1{Y−1 =i,Y =j} .

=1

T HEOREM 4.4.– The DTIHT at time k ∈ N∗ is expressed by the formula    (k) = rij (k), rD i∈U j∈D

84

Earthquake Statistical Analysis through Multi-state Modeling

where  rij (k)



=

k  

k 

a(, y)R (y)

(,y)∈E×A l=1 s0 ,s1 ∈E k0 =0 (l−1)

Rs0 (i)Rs1 (j)qis0 (k0 )qs0 s1 (k − k0 ).

P ROOF.– For fixed k ∈ N∗ , we define 

 rij (k) =

(,y)∈E×A

 r(,y) (k),

[4.1]

where    (k) = E ND (k) − ND (k − 1)|J0 = , Y0 = y r(,y) =

k 

P (Yl−1 = i, Yl = j, Sl = k|J0 = , Y0 = y).

i∈U j∈D l=1

Moreover, P (Yl−1 = i, Yl = j, Sl = k|J0 = , Y0 = y) =



k 

s0 ,s1 ∈E k0 =0

(l−1)

Rs0 (i)Rs1 (j)qis0 (k0 )qs0 s1 (k − k0 ). [4.2]

Consequently, from equations [4.1] and [4.2], we obtain  rij (k) =



k  

k 

a(, y)R;y

(,y)∈E×A l=1 s0 ,s1 ∈E k0 =0 (l−1)

Rs0 (i)Rs1 (j)qis0 (k0 )qs0 s1 (k − k0 ).

Hitting Time Intensity

85

4.3.1. Statistical estimation of the DTIHT First, we consider a trajectory of the HMRC (J, S, Y) in [0, M ] defined by H(M ) = (J0 , S1 , Y0 , . . . , JN (M ) , SN (M ) , YN (M ) ). The maximum likelihood estimators of the parameters can be obtained by adapting the EM algorithm [BAR 08] when observations are only recorded at jump times. In the () following, we denote the MLEs of qij (k), qij (k) and Ri (a) by ()

i (a, M ), respectively, for any q ij (k, M ), q ij (k, M ) and R i, j ∈ E, k,  ∈ N, a ∈ A. P ROPOSITION 4.2.– The estimator of the DTIHT at time k ∈ N∗ is strongly consistent, i.e. lim r  (k, M ) = r (k)

M →∞

with probability 1. P ROOF.– Following [BAR 08], the estimators q ij (k, M ), (n)

i;a (M ) are strongly consistent. Moreover, q ij (k, M ) and R since we deal with a finite number of terms, the result is straightforward. For studying the asymptotic normality for the proposed estimators, we need the following results.  Suppose that Jn , Sn )n∈N satisfies the T HEOREM 4.5.– assumptions A1 − A2. The random vector √

M q ij (t2 − t1 , M )Rj (m, M ) − qij (t2 − t1 )Rj (m) is asymptotically normal, i.e. √ L

j (m, M ) − qij (t2 − t1 )Rj (m) −−− M q ij (t2 − t1 , M )R −→ N (0, Γ), M →∞

86

Earthquake Statistical Analysis through Multi-state Modeling

where Γ is the block diagonal matrix of dimension (l0 d0 )×(l0 d0 ) defined by ⎞ ⎛ Ψ(1,0) 0 . . . . . . 0  ⎟ ⎜ π (1,0) .. .. .. ⎟ ⎜ .. . . . ⎟ ⎜ . . . . ⎟ ⎜ . .. Ψ(i,t1 ) .. .. ⎟ ⎜ . Γ=⎜ . . π (i,t ) . . ⎟, 1 ⎟ ⎜ ⎟ ⎜ .. .. .. . . . . ⎟ ⎜ . . . . . ⎠ ⎝ Ψ(s,SN (M ) ) 0 . . . . . . . . . π (s,S N (M ) ) where Ψ(i, t1 ) = δ(j,t2 ,m)(r,t4 ,q) qij (t2 − t1 )Rj (m) − qij (t2 − t1 ) Rj (m)qir (t4 − t1 )Rr (q) , for every (i, t1 ) ∈ L, (j, t2 , m), (r, t4 , q) ∈ E ∗ . T HEOREM 4.6.– For any k ∈ N∗ , the random vector √ 

 M r (k, M ) − r (k) is asymptotically normal, i.e. √ L  M (r  (k, M ) − r (k)) −−−−→ N (0, Φ ΔΦ ), M →∞

R l0 d 0

R+





→ is the function where Φ :   Φ (Rj  ;m , qi j  (k  − k0 )); (i , k0 ) ∈ L, (i , j  , k  ) ∈ E ∗ =

k  

k 

a(, y)R;y

i∈U j∈D (,y)∈E×A l=1 s0 ,s1 ∈E k0 =0 (l−1)

Rs0 (i)Rs1 (j)qis0 (k0 )qs0 s1 (k − k0 ), and Φ =



∂Φ , ∂Φ ∂qi j  (k −k0 ) ∂Rj  (m )

   ; (i , k0 ) ∈ L, (i , j  , k  ) ∈ E ∗ is

the (l0 d0 )-dimensional row vector that includes the derivatives of r (k) with respect to qi j  (k  − k0 ) and Rj  (m ).

Hitting Time Intensity

87

P ROOF.– First, we note that √

    M rij (k, M ) − rij (k)

=

√  M



k  

k 

a(, y)

i∈U j∈D (,y)∈E×A l=1 s0 ,s1 ∈E k0 =0 (l−1)

s (i, M )R

s (j, M )

 (y, M )R qis0 (k0 , M )

qs0 s1 (k − k0 , M ) R 0 1 (l−1) − a(, y)R (y)Rs0 (i)Rs1 (j)qis0 (k0 )qs0 s1 (k − k0 ) .

Then, since √

j  (m , M ) M q i j  (k  − k0 , M )R L − qi j  (k  − k0 )Rj  (m ) −−−−→ N (0, Δ), M →∞

the delta method [VAN 98] directly leads to the desired result. 4.4. Conclusion Failure rates are important indicators in the study of random systems modeled by stochastic processes, since they can signalize their amelioration or degradation and therefore can be used for their operational management. In the field of mechanical engineering, the ultimate goal of reliability studies is to make failures impossible. On the contrary, in the multi-state modeling of biological data, survival analysis is based on “time to failure” methods. In financial and insurance studies, failures are explicitly linked with the ruin time distribution. This chapter focuses on the study of the DTIHT in a multi-state context. Further research includes the study of the DTIHT for hidden semi-Markov models and models defined in continuous state spaces. The different hypotheses

88

Earthquake Statistical Analysis through Multi-state Modeling

taken into account can be relaxed, and the associated reliability indicator can be investigated. The obtained results can be extensively applied in many scientific fields including seismology, biology and finance. The asymptotic results have been obtained in a large-scale context. However, the study of estimators in a high-frequency framework is a topic of special interest.

5 Models Comparison

“There is no comparison between that which is lost by not succeeding and that which is lost by not trying” — F. Bacon 5.1. Introduction Markov chains are the simplest stochastic models that can be used to describe time-varying, random phenomena. Despite their simple structure, HMMs are rich enough to describe many real-life systems. However, there are cases where we intuitively feel that the sojourn times are not necessarily geometrically distributed. In these cases, Markov chains and Markov renewal chains can be generalized by combining renewal processes and Markov chains. Our aim is to compare HMMs and HMRMs in a Markov context and a Markov renewal context, and determine whether there are any differences in terms of their transition probability matrices. Following the applications of HMMs and HMRMs, it is of special interest to determine the context in which the application of semi-Markov models is preferred over the application of Markov models. In addition, by incorporating the “hidden” feature to our models, we aim to explore the framework within which the application of HMMs is

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

90

Earthquake Statistical Analysis through Multi-state Modeling

satisfactory or the application of more complex HMRMs can be justified. Given that the computational burden is much more important for HMRMs than for HMMs, we will investigate whether it is worth applying them in a specific Markov renewal environment or whether we should make a compromise by choosing the simplest HMMs. It is expected that in a Markov framework, both HMMs and HMRMs are well-adapted. Unlike HMRMs, HMMs do not suitably describe the observations generated in a Markov renewal framework. Therefore, our main objective is to verify these hints and their validity by providing some explicit examples. For our purposes, we will provide some necessary definitions. First, we recall that ME denotes the set of square matrices on a given set E. We further  define the total variation distance by ||L||1,∞ = supi∈E j∈E |Lij | over ME . Moreover, for every L1 , L2 ∈ ME , we have ||L1 L2 ||1,∞ ≤ ||L1 ||1,∞ · ||L2 ||1,∞ . 5.2. Markov framework To obtain a trajectory of a hidden Markov model for a fixed number of steps, M , denoted by HM = {Z0 , Y0 , . . . , ZM , YM }, we use Monte Carlo simulations. The generation is similar to that of a sample path of a Markov chain; however, an additional step is necessary. Once a hidden state is generated, the corresponding observation should be drawn. To generate a state visited by the underlying Markov chain, the transition probability matrix P and the initial law α should be specified. As a next step, and in order to generate the corresponding observation, we

Models Comparison

91

additionally need to specify the emission probability matrix R. In other words, in order to generate a trajectory of an HMM, the parameter set and the trajectorys length, M , i.e. the number of steps, should be determined. Let us now briefly describe the Monte Carlo algorithm that we use for our purposes (Algorithm 1). Initialization; Set k = 0 and generate Z0 by means of the initial law α; Iteration; Generate Z ∼ P(Zk , ·) and set Zk+1 = Z(ω); Generate Y ∼ RZk (·) and set Yk+1 = Y (ω); If k ≥ M, then end; else set k = k + 1 and repeat; Algorithm 1: Monte Carlo Algorithm 1 The output of the algorithm, HM , includes the following: 1) the successive visited states, (Z0 , . . . , ZM ); 2) the successive observations, (Y0 , . . . , YM ). Let us now present a numerical example. We first consider that the underlying Markov chain can visit two states, E = {1, 2}. The transition kernel P of the Markov chain and the initial law α are respectively given by  P=

0.379 0.209

0.621 0.791

 and

  α= 10 .

Moreover, the emission probability matrix, R, is given by R=

 0.936 0.163

0.003 0.691

 0.061 . 0.146

In the following, we consider these values as the “true” (or “theoretical”) values of the parameters.

92

Earthquake Statistical Analysis through Multi-state Modeling

5.2.1. HMM case First, we generate two sample paths of the hidden Markov chain for a fixed number of steps equal to M = 1000 and M = 2000, and fit an HMM to both observation sequences. The parameters are estimated by means of the Baum–Welch algorithm and compared with the corresponding “true” values. The algorithm converges when the absolute log-likelihood converges. Concerning the shortest sample path, H1000 , the algorithm converges (with an accuracy of 3 decimals) in eight iterations, and the estimated transition probability matrix becomes   0.445 0.555  . P1 = 0.270 0.730 In order to quantify the discrepancy between the “true” and the estimated transition probability matrix, we use the total variation distance. For the shortest trajectory, we conclude that ||P1 − P ||1,∞ = 0.066. When the length of the trajectory is larger, i.e. M = 2000, the convergence is achieved in six iterations, and the estimated transition probability matrix is given by   0.360 0.640  P2 = 0.254 0.746 and ||P2 − P ||1,∞ = 0.045. 5.2.2. HMRM case We use the previous trajectories and compare the transition probability matrices via the total variation distance. For the first trajectory (M = 1000), the estimated transition probability matrix is given by   0.429 0.571  , P1 = 0.235 0.765

Models Comparison

93

whereas for the second trajectory (M = 2000), the transition probability matrix is given by P2 =



0.348 0.189

 0.652 . 0.811

For the first case, the total variation distance is equal to ||P1 − P ||1,∞ = 0.050, whereas for the second case, it is ||P2 − P ||1,∞ = 0.031. R EMARK.– We should mention here that in a Markov context, the two models are well-adapted because both lead to small values of the total variation distance (< 0.1). 5.3. Markov renewal framework Let us now turn to a more general context: the Markov renewal context. Our objective is to compare an HMM with an HMRM in this context and to quantify the discrepancies between the “true” and the estimated values of the parameters. In order to allow for arbitrarily distributed sojourn times, we further relax the Markov hypothesis on the underlying chain (HMM) and reserve it for the embedded Markov chain (HMRM). In other words, in the Markov renewal context, the Markov property holds; however, it concerns the chain (Jn )n∈N instead of the underlying chain (Zk )k∈N . First, we generate sample paths of an HMRM in a fixed time interval [0, M ] described as HM = {J0 , S0 , Y0 , . . . , JN (M ) , SN (M ) , YN (M ) }. For this purpose, we briefly present a Monte Carlo algorithm (Algorithm 2) that enables the generation of such sample paths. The algorithm constitutes an extension of the algorithm used to generate trajectories of a semi-Markov

94

Earthquake Statistical Analysis through Multi-state Modeling

chain. In particular, the algorithm requires an additional step that enables the generation of observations at the jump times n. The parameters of the model have to be specified and the length of the trajectory has to be fixed. Initialization; Set k = 0, S0 = 0 and sample J0 from the initial distribution α; Iteration; while Sn ≤ M do Generate J ∼ PJn ,· and set Jn+1 = J(ω); Generate X ∼ fJn Jn+1 (·); Generate Y ∼ RJn+1 (·) and set Yn+1 = Y (ω); Set Sn+1 = Sn + X and n = n + 1; end Algorithm 2: Monte Carlo Algorithm 2 The output of the algorithm, HM , includes the following: 1) the successive visited states of the EMC, (J0 , . . . , JN (M ) ); 2) the successive jump times of the EMC, (S0 , . . . , SN (M ) ); 3) the successive observations, (Y0 , . . . , YN (M ) ). R EMARK.– It is worth noting that an alternative to generating trajectories of an MRC is to exclusively use the initial distribution and the semi-Markov kernel. To generate such sample paths, we additionally need to specify the conditional sojourn time distributions f11 (k), f12 (k), f21 (k) and f22 (k). For our purposes, we consider that the sojourn times are Weibull distributed, i.e.  b b q (k−1) − q k , if k ≥ 1, fij (k) = 0, if k = 0, where (q, b) = (0.1, 0.7) for self-transitions in state 1, (q, b) = (0.1, 1.5) for transitions from state 1 to state 2, (q, b) = (0.1, 2.0)

Models Comparison

95

for transitions from state 2 to state 1 and (q, b) = (0.6, 0.9) for self-transitions in state 2. In the sequel, we use the initial distribution, α, the emission probability matrix, R, the conditional sojourn time distributions, f, and the transition probability matrix  0.379 P= 0.209

0.621 0.791



to construct trajectories of the HMRM with lengths M = 1000 and M = 2000. 5.3.1. HMM case Second, an HMM is fitted to the two trajectories and the transition probability matrices are compared. For M = 1000, the algorithm converges (with an accuracy of 3 decimals) in 47 iterations, where the maximum value of the log-likelihood function is logL = −394.44. In this case, the estimated transition probability matrix is P1 =



0.780 0.037

 0.220 . 0.963

In the sequel, we validate the results by comparing the last matrix with the “initial” transition probability matrix, P. Hence, we use the total variation distance that was described previously. We obtain that ||P1 − P||1,∞ = 0.401, i.e. the HMM does not seem to be well-adapted in the semi-Markov environment. We now repeat our calculations for the trajectory H2000 . The log-likelihood function takes the final value logL = −883.59 as the Baum–Welch algorithm

96

Earthquake Statistical Analysis through Multi-state Modeling

converges (in 73 iterations). The estimated transition probability matrix is given by P2 =



0.413 0.281

 0.586 , 0.719

and ||P2 − P||1,∞ = 0.072. At this point, we should note that the total variation distance has higher values than that in the Markov framework, which indicates that the HMM is not adequate to describe observations generated in a Markov renewal framework. 5.3.2. HMRM case Applying an HMRM to the observation sequences, for the first trajectory (M = 1000), the transition probability matrix is given by P1 =



0.375 0.202

 0.635 , 0.795

whereas for the longer trajectory (M = 2000), we obtain P2 =



0.364 0.215

 0.636 . 0.785

Furthermore, for the first trajectory (M = 1000), the total variation distance between the “true” and the estimated transition probability matrix is ||P1 − P||1,∞ = 0.009. Concerning the second trajectory (M = 2000), it becomes ||P2 − P||1,∞ = 0.015. Comparing these values with the values obtained for the HMM, we conclude that an HMRM is more appropriate to be fitted than an HMM in such a framework.

Models Comparison

97

5.4. Conclusion According to the previous results, HMRMs adapt better than HMMs in a Markov renewal framework, justifying that they are more suitable for applications in a more general context. We observe that the stochastic behavior of HMMs and HMRMs is almost the same with respect to the transition probability matrix in a Markov context. Obviously, the complexity characterizing HMRMs can be justified in a semi-Markov framework, since the total variation distance between the estimator and the “true” value of the parameter is smaller than the respective value of an HMM. On the contrary, in a Markov context, both HMMs and HMRMs lead to a good description of the corresponding observations.

Discussion & Concluding Remarks

“The sun is new each day” — Heraclitus This book has a twofold purpose: to propose the description of earthquake occurrences by means of multi-state models and to provide a deeper understanding of the earthquake generation process. Here, we discuss our main results and present some possible extensions. Earthquake interaction is a fundamental characteristic of seismicity, resulting in earthquake sequences, clusters and aftershocks. An earthquake can intensify or restrain subsequent events according to their location and orientation. The inclusion of earthquake interaction in multi-state models promises a better comprehension of the earthquake occurrence and can contribute to the seismic hazard assessment. Moreover, considering the triggering factor of earthquakes can eventually enable the real-time estimation of the quantities under study. The real challenge is to move from a multi-state approach to a physics-based multi-state approach through the incorporation of ΔCF F [DEN 97a]. For this purpose, the methods proposed in [DIE 94, MAT 02] and [PAR 05] can be

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

100

Earthquake Statistical Analysis through Multi-state Modeling

modified to allow the fusion of an SMM with a physical model that considers ΔCF F . The application of multi-state models that are based on stress changes is a topic of further research with particular interest. Such models can provide an adequate description of the space and time dependence of the earthquake generation process and its inherent variability. For example, multi-state models can be used when different states correspond to different earthquake magnitudes and positive or negative values of ΔCF F . In Chapter 2, we considered that the evolution of the earthquake process is controlled by some unobservable characteristic: the stress field. The time-varying stress field is one of the most important physical parameters which affects the earthquake generation process. To explore the relation between earthquake occurrences and hidden levels, we use some powerful multi-state models: HMMs. The HMMs combine the power of scientific modeling with extended statistical theory. They are used to decode a key characteristic of the earthquake generation, the stress field related to the following event. They were applied to a catalog of strong (M ≥ 6.5) earthquakes that occurred in Greece and its surrounding areas in the last one and a half centuries. This study has gone some way towards improving our understanding of earthquake generation. The results obtained indicate that the existence of two states is more probable than the existence of three states. Earthquakes with magnitudes M ∈ [6.5, 6.7] are due to the first state, whereas stronger earthquakes (M ∈ [6.8, 7.1]) are mainly produced by the second state. For example, we estimate the mean value and the variance if the number of earthquakes is expected to occur for the first visit to a specific state. In the present version of the models, observations exclusively correspond to earthquake magnitude classes. Seismic data have been carefully chosen in relation to the magnitude cut-off in order to ensure that the seismic catalog

Discussion & Concluding Remarks

101

is complete and the dataset is reliable. Although the spatial components of earthquakes are not taken into account, their inclusion can improve the understanding of earthquake generation and the relation with the hidden characteristics. This can further shed light on similarities or differences between regions concerning their earthquake generation mechanisms. The HMM of interest can be generalized to incorporate more complicated structures of dependency in terms of observations and/or hidden states. For example, hidden states can form a higher-order, non-stationary or even a non-homogeneous Markov chain. Furthermore, the assumption of conditionally independent observations can be taken away. Of course, every generalization in the structure of HMMs will be followed by the corresponding estimation problems and solutions. In other words, we should note that extra dependencies on the model lead to additional computational burdens that should be taken into account. Taking into account the outcomes obtained, in Chapter 3, we presented the HMRM, which is flexible as the Markov renewal model, on the one hand, and presents modeling advantages, on the other hand. The model assumes that different states are visited according to the last visited state and the time spent during this last visit. The fundamental difference between this model and the hidden semi-Markov model is that it considers that emission times coincide with jump times. Important indicators related to the levels of the stress field are computed by means of relative measures of the underlying semi-Markov chain. Thus, this adds further understanding to the genesis of strong earthquakes and provides the main objectives of future studies. This is one of the first attempts to model earthquake occurrences using HSMMs, with many possibilities of generalization. The consideration of states corresponding to magnitude classes, with different possible sojourn

102

Earthquake Statistical Analysis through Multi-state Modeling

(interevent) time distributions, is realistic and habitual in the literature of the modern statistical seismology. Thus, the semi-Markov framework that allows the estimation of unobservable underlying generating mechanisms becomes reasonable and attractive. Particular attention has been paid to ensure data sufficiency. The inclusion of different types of observations can provide a better understanding of the earthquake generation process and give more detailed results. The incorporation of spatial variables and/or variables associated with coseismic stress changes and slow tectonic stress accumulation can be of special interest. A constraint of the HMRM under study is the assumption that observations are only recorded at jump times. This constraint can be overcome through the application of the classical HSMM, where different versions of the EM algorithm can be used in the context of both non-parametric and parametric cases. From a mathematical perspective, once the empirical estimators are obtained, a specific parametric approach can be adapted. As with the HMM context, many examples of generalization are available for HMRMs or HSMMs. For example, an underlying second- or higher-order Markov chain can be considered, allowing a second- or higher-order Markov dependence, respectively. Another type of generalization can be obtained if some of the assumptions on the observation process are relaxed. On the contrary, the HMRM can be assumed non-stationary or non-homogeneous. Chapter 4 described the computation and estimation of the hitting time intensity. First, an explicit formula is derived to compute the hitting time intensity for SMCs. As a result, an empirical estimator based on counting processes is proposed. The main results include the study of the asymptotic behavior of this estimator, in particular the asymptotic normality and strong consistency. Second, the hitting time intensity is considered for an HMRC, a statistical estimator

Discussion & Concluding Remarks

103

is introduced and its confidence interval is constructed based on its asymptotic properties. These results can be extended to general HMRCs and general SMCs. The hitting time intensity can also be studied in the continuous-time context, i.e. for both hidden (semi-)Markov processes and hidden (semi-)Markov chains. In conclusion, we used Monte Carlo simulations to compare HMMs and HMRMs in Markov and semi-Markov frameworks. We compare “true” parameters, i.e. parameters that have originated from the observations, with the corresponding estimators. In particular, among the different parameters, we focus on the transition probability matrix, which partially determines the stochastic behavior of the models. To quantify the observed discrepancies, different norms can be used, including the total variation distance. The results are obtained in the particular case where sojourn times follow specific distributions. To summarize the previous results based on specific models and certain assumptions, we conclude that the probabilistic behavior of HMCs and HMRCs is approximately the same in terms of the transition probability matrix in a discrete-time Markov environment. Even though an HMRM is more complex than an HMM, its application in a semi-Markov framework is justified, since it provides results that are more coherent with the observations. On the contrary, in a Markov context, both HMMs and HMRMs can be used to describe the data. Of special interest is the quantification of the discrepancies between “true” and estimated values by means of the emission probability matrix and functions of the model parameters.

Appendices

Appendix 1 Markov Models

Here, we present some main definitions from the theory of Markov chains. D EFINITION A1.1.– The random variable sequence X = (Xn )n∈N defined in a probability space (Ω, F, P), with values in the finite set E = {1, . . . , s}, is a Markov chain if, for any states i, j, i0 , i1 , . . . , in−1 ∈ E and for any integer n ∈ N∗ , we have P (Xn+1 = j|X0 = i0 , X1 = i1 , . . . , Xn−1 = in−1 , Xn = i) = P (Xn+1 = j|Xn = i) = pij (n).

[A1.1]

Equation [A1.1] indicates the Markov property. If pij (n) = pij for all n ∈ N, then the Markov chain is called homogeneous (with respect to time). The function (i, j) → pij , defined on E × E, is said to be the transition function of the chain. The n−step transition function is defined by (n)

pij = P (Xn+m = j|Xm = i), for any m, n ∈ N. As we are only concerned with Markov chains defined on a finite state space, the values of the n−step transition function

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

108

Earthquake Statistical Analysis through Multi-state Modeling

(n)

pij

P(n)

can be represented as the entries of a squared matrix (n)

= (pij )i,j∈E . We will then use the term n–step transition (n)

matrix to refer to the transition function pij . We define P(0) = I, with I(i, j) = δij (Kronecker symbol) and note that P(1) = P. The transition function of a Markov chain satisfies the following properties: 1) pij ≥ 0;  2) j∈E pij = 1;  (n) (m) (n+m) 3) k∈E pik pkj = pij . Note that, in matrix notation, P(n) represents the usual n−fold matrix product of P. Hence, we can write Pn instead of P(n) . The third property given above is called the Chapman–Kolmogorov identity (or equation). It can be written in matrix form as Pm Pn = Pn Pm = Pn+m , which shows that the n−step transition matrices Pn form a semi-group. The distribution of X0 is the initial distribution of the chain. P ROPOSITION A1.1.– Let (Xn )n∈N be a Markov chain with the initial distribution α and the transition matrix P = (pij ). For any n ≥ 1, k ≥ 0, and any states i, j, i0 , i1 , . . . , in ∈ E, we have P (X0 = i0 , X1 = i1 , . . . , Xn−1 = in−1 , Xn = in ) = αi0 pi0 i1 . . . pin−1 in ,

Appendix 1

109

P (Xk+1 = i1 , . . . , Xk+n−1 = in−1 , Xk+n = in |Xk = i0 ) = pi0 i1 . . . pin−1 in , (n)

P (Xn+m = j|Xm = i) = P (Xn = j|X0 = i) = pij . P ROPOSITION A1.2.– Let (Xn )n∈N be a Markov chain with the transition matrix P = (pij )i,j∈E . 1) The sojourn time of the chain in state i ∈ E is geometrically distributed on N∗ with parameter 1 − pii . 2) The probability that the chain visits the state j after pij leaving the state i is equal to 1−p (for pii = 1, which means ii that the state i is non-absorbing). The states of a Markov chain can be characterized as recurrent or transient. This distinction is fundamental to studying Markov chains. All of the following definitions and results will be given for a Markov chain (Xn )n∈N with the corresponding transition matrix P = (pij )i,j∈E : – ηi = min{n|n ∈ N∗ , Xn = i} (with min ∅ = ∞), the first passage time of the chain in state i. If X0 (ω) = i, then ηi is the hitting time to the state i. Note that ηi > 0;  – Ni (n) = n−1 k=0 1{Xk =i} , the time spent by the chain in state i, during the time interval [0, n − 1]. If n = ∞, then we note Ni = Ni (∞), with Ni taking values in N; n – Nij (n) = k=1 1{Xk−1 =i,Xk =j} , the number of (direct) transitions from state i to state j, up to time n. If n = ∞, then we note Nij = Nij (∞), with Nij (∞) taking values in N. P ROPOSITION A1.3.– A state i ∈ E is called recurrent if Pi (ηi < ∞) = 1; otherwise, when Pi (ηi < ∞) < 1, the state i is called transient. A recurrent state i is said to be null recurrent if μ∗ii = ∞ and positive recurrent if μ∗ii = Ei [ηi ] < ∞. The Markov chain is called (null/positive) recurrent (respectively transient) if all the states are (null/positive) recurrent (respectively transient).

110

Earthquake Statistical Analysis through Multi-state Modeling

D EFINITION A1.2.– If for any states i, j ∈ E, there exists n ∈ (n) N∗ such that pij > 0, then the Markov chain is said to be irreducible. Let us set ρij = Pi (ηj < ∞). L EMMA A1.1.– For any m ∈ N and i, j ∈ E, we have Pi (Nj ≥ m) = ρij ρm−1 jj . P ROPOSITION A1.4.– 1) A state i ∈ E is transient if and only if P (Ni > m) = 0 or  (n) if and only if n pii < ∞. 2) A state i ∈ E is recurrent if and only if P (Ni = ∞) = 1 or  (n) if and only if n pii = ∞. R EMARK A1.1.– For a Markov chain defined on a finite state space, every recurrent state is positive recurrent. An irreducible Markov chain defined on a finite state space is positive recurrent. P ROPOSITION A1.5.– For i and j recurrent states, we have a.s.

Ni (n)/n−−−→1/μ∗ii , n→∞ a.s.

Nij (n)/n−−−→pij /μ∗ii . n→∞

D EFINITION A1.3.– A probability distribution ν on E is said to be stationary or invariant for the Markov chain (Xn )n∈N if, for any j ∈ E, 

ν(i)pij = ν(j),

i∈E

or, in matrix form, νP = ν,

Appendix 1

111

where ν = (ν(1), . . . , ν(s)) is a row vector. P ROPOSITION A1.6.– For a recurrent state i, we have ν(i) = 1/μ∗ii . D EFINITION A1.4.– A state i ∈ E is called periodic of period (n) d > 1, or d−periodic, if gcd{n|n > 1, pii > 0} = d. The state i is called aperiodic, if d = 1. D EFINITION A1.5.– An aperiodic recurrent state is called ergodic. An irreducible Markov chain with one state ergodic (and then all states ergodic) is called ergodic. P ROPOSITION A1.7.– (Ergodic theorem for Markov chains) If a Markov chain is ergodic, then we have (n)

pij −−−→ν(j) n→∞

for any i, j ∈ E. P ROPOSITION A1.8.– For an ergodic Markov chain, there exists stationary distribution, ν, such that Pn converges at an exponential rate to the matrix Π = 1T ν, where 1 = (1, . . . , 1).

Appendix 2 Hidden Markov Models

A2.1. Scoring or evaluation problem For an HMM with the state space E = {1, . . . , M } and a sequence of N observations, there exist M N likely hidden sequences. Here, the problem is the evaluation of the likelihood function of the observation sequence. Specifically, given the parameters ϑ, and an observation sequence oN 1 , we aim to compute the likelihood function   LN (ϑ) = P Y1 = o1 , . . . , YN = oN |ϑ . For real-life problems, where N and M can be very large, we cannot compute the complete likelihood function by taking into account all the possible hidden states and then summing up. Rather than using such a highly exponential algorithm, we use an effective O(N M 2 ) algorithm, the forward algorithm described in the sequel. First, we define the forward probability as the probability that the k−th visited state is the state i after obtaining k observations, given the parameter ϑ, i.e. αi (k) = P (Y1k = ok1 , . . . , Zk = i|ϑ).

Earthquake Statistical Analysis through Multi-state Modeling, First Edition. Irene Votsi, Nikolaos Limnios, Eleftheria Papadimitriou and George Tsaklidis. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

114

Earthquake Statistical Analysis through Multi-state Modeling

We can efficiently compute αi (k) in a recursive way as follows:

N;

1) initialization: αi (1) = Ri (o1 )α(i), i ∈ E;  2) recursion: αi (k) = Ri (ok ) j αj (k − 1)pji , i ∈ E, 1 < k ≤    3) termination: P Y1 = o1 , . . . , YN = oN |ϑ = i αi (N )pif .

where f ∈ E represents the last visited state and α(i) is the initial probability of the state i ∈ E (k = 1). A2.1.1. Estimation or training problem Given the set of possible states and an observation sequence oN 1 , our objective is to evaluate the parameter set, ϑ. The classical algorithm for the estimation of the HMM is the Baum–Welch algorithm [BAU 70], which is a special case of the EM algorithm [DEM 77]. To understand this algorithm, we define the backward probability as follows: N = oN βi (k) = P (Yk+1 k+1 |Zk = i, ϑ).

The backward probability expresses the probability of the observation sequence from k + 1 up to N , given that Zk = i (and given the parameters ϑ). We can efficiently compute βi (k) as follows:

N;

1) initialization: βi (N ) = pif , i ∈ E;  2) recursion: βi (k) = j pij Rj (ok+1 )βj (k + 1), i ∈ E, 1 ≤ k