Statistical Techniques for Modelling Extreme Value Data and Related Applications [1 ed.] 1527532070, 9781527532076

This book tackles some modern trends and methods in the modelling of extreme data. Usually such data arise from random p

560 141 4MB

English Pages 285 [281] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Statistical Techniques for Modelling Extreme Value Data and Related Applications [1 ed.]
 1527532070, 9781527532076

Table of contents :
Contents
Preface
Notations and abbreviations
List of illustrations
List of tables
1 Introduction: Some basic and miscellaneous results
2 Asymptotic theory of order statistics: A historical retrospective
3 Bootstrap order statistics and calibration of the sub-sample bootstrap method
4 Statistical modelling of extreme value data under linear normalization
5 Extreme value modelling under power normalization
6 Methods of threshold selection
7 Estimations under power normalization for the EVI
8 Some applications to real data examples
9 Miscellaneous results
Appendix A: Summary of Hill’s estimators in the L-model and P-model
References
Author index
Subject index

Citation preview

Statistical Techniques for Modelling Extreme Value Data and Related Applications



Statistical Techniques for Modelling Extreme Value Data and Related Applications By

Haroon M. Barakat, El-Sayed M. Nigm and Osama M. Khaled

Statistical Techniques for Modelling Extreme Value Data and Related Applications By Haroon M. Barakat, El-Sayed M. Nigm and Osama M. Khaled This book first published 2019 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2019 by Haroon M. Barakat, El-Sayed M. Nigm and Osama M. Khaled All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-3207-0 ISBN (13): 978-1-5275-3207-6

Contents

Preface Notations and abbreviations List of illustrations List of tables 1

Introduction: Some basic and miscellaneous results 1.1 The convergence concept in probability theory 1.1.1 Modes of convergence of RVs 1.1.2 Further limit theorems on weak convergence 1.2 Statistical methods 1.2.1 Maximum likelihood method 1.2.2 Kolmogorov-Smirnov (K-S) test 1.2.3 Genetic algorithms (GA) 1.3 Bootstrap technique

2

Asymptotic theory of order statistics: A historical retrospective 2.1 Order statistics 2.2 Extreme value theory under linear normalization 2.3 Max-domains of attraction of univariate -max-stable laws 2.4 Limit theory of intermediate order statistics under linear normalization 2.4.1 Asymptotic theory of intermediate order statistics 2.4.2 Domains of attraction of intermediate limit laws 2.5 Central order statistics—domains of attraction of central limit laws 2.5.1 Asymptotic theory of central order statistics 2.5.2 Domains of attraction of central limit laws

x xiJ xW xvJJ 1 1 2 4 6 6 9 11 13 18 18 20 26 30 30 32 33 33 36

vi

Contents

2.6

2.7 2.8 2.9 2.10 2.11

2.12

2.13 2.14

2.15 2.16 3

Asymptotic theory of extremes under nonlinear normalization 2.6.1 Characterization of the class of ML-laws and the GM A group 2.6.2 The class of ML-laws 2.6.3 The class of max-stable laws (MS-laws) Comments on Pancheva’s work Extreme value theory under power normalization Max-domains of attraction of univariate p-max-stable laws Comparison between domains of attraction Asymptotic central order statistics under nonlinear normalization 2.11.1 The class of weak limits of central order statistics under general normalization 2.11.2 Asymptotic central order statistics under power normalization 2.11.3 Examples 2.11.4 Comparisons between the domains of attraction of weak limits of central order statistics under linear and power normalizing constants Asymptotic intermediate order statistics under nonlinear normalization 2.12.1 The class of weak limits of intermediate order statistics under general normalization 2.12.2 The domains of attraction of the lower intermediate power types Asymptotic theory of order statistics under exponential normalization Generalized order statistics and dual generalized order statistics 2.14.1 Distribution theory of the m-gos and m-dgos 2.14.2 Asymptotic theory of univariate m−gos and m−dgos 2.14.3 Further limit theorems of gos and dgos Restricted convergence Review of extreme value analysis in environmental studies

Bootstrap order statistics and calibration of the subsample bootstrap method

37 39 39 42 43 45 47 49 50 51 54 55

57 57 58 62 64 68 69 71 75 77 79 83

xii

Contents

3

Bootstrap order statistics and calibration of the subsample bootstrap method 3.1 Bootstrapping extremes under linear normalization 3.1.1 Convergence bootstrap distributions when the normalizing constants are known 3.1.2 Convergence bootstrap distributions when an and bn are unknown 3.2 Bootstrapping extremes under power normalization 3.2.1 Convergence bootstrap distributions when the normalizing constants are known 3.2.2 Convergence bootstrap distributions when the normalizing constants are unknown 3.3 Verification of the sub-sample bootstrap method 3.4 Bootstrapping of order statistics with variable rank in the L-model 3.4.1 Bootstrapping of central order statistics under linear normalization 3.4.2 Bootstrapping of intermediate order statistics under linear normalization 3.5 Bootstrapping of order statistics with variable rank in the P-model 3.5.1 Bootstrapping central order statistics under power normalization 3.5.2 Bootstrapping intermediate order statistics under power normalization 3.6 Simulation study 3.7 Bootstrapping extreme generalized order statistics

4

Statistical modelling of extreme value data under linear normalization 4.1 The National Environmental Radiation Monitoring Network (NERMN) in Egypt 4.2 Environmental monitoring 4.3 Chemical pollutants 4.3.1 Particulate matter 4.3.2 Sulphur dioxide 4.3.3 Ozone 4.3.4 Ambient gamma radiation 4.4 Collected data 4.5 Data treatments and simulation study

83 84 84 85 86 87 88 89 92 92 98 100 100 102 104 106 112 112 113 114 114 115 116 117 118 124

Contents

4.6 5

6

7

4.5.1 Mathematical models Data treatments

Extreme value modelling under power normalization 5.1 Generalized extreme value distribution under power normalization 5.2 Statistical inference using the BM method 5.3 The GPDP DFs and their related statistical inference 5.3.1 The derivation of GPDP—The POT stability property 5.3.2 Estimation of the shape and the scale parameters within the GPDP model 5.4 Simulation study 5.5 Parameter estimation for GEVL and GEVP by using the GA technique

xiii

124 127 139 139 141 142 142 143 145 152

Methods of threshold selection 6.1 Some estimators for the EVI under linear normalization 6.2 Some methods of threshold selection 6.2.1 Graphical methods + ++ and γM 6.3 Comparison between γM R via a simulation study

154 154 159 161

Estimations under power normalization for the EVI 7.1 Counterparts of Hills estimators under power normalization 7.1.1 Counterparts of HEPs 7.2 Hill plot under power normalization (HPP) 7.3 Simulation study 7.4 Harmonic t-Hill estimator under power normalization 7.5 Moment and moment-ratio estimators under power normalization 7.6 Further contemporaneous Hill estimators under power normalization 7.7 Four HEPs based on GPDP 7.7.1 Four HEPs that do not have counterparts in the L-model 7.7.2 Simulation study 7.8 New Hill plot (NHP) 7.9 Comparison between estimators under power normalization 7.9.1 The first simulation study

167

165

167 168 170 171 174 174 178 178 179 180 183 184 184

xiv

8

9

Contents

7.9.2 The second simulation study 7.10 The weighting between the linear and power models 7.11 Summary and conclusion

189 194 200

Some applications to real data examples 8.1 The first application to real data-related air pollution 8.2 Graphical methods to select the threshold of the given application 8.2.1 HPL and HPP to select a suitable threshold 8.2.2 MEPL and MEPP to select a suitable threshold 8.3 Test for the choice of EVI in the GPDL and GPDP models 8.4 Comparison between graphical methods of threshold selection 8.5 Fitting of the GPDL and GPDP 8.6 Comparison between some estimators of the EVI 8.7 The second application to real data

202 202

Miscellaneous results 9.1 Extreme value theory under linear-power normalization 9.1.1 The class of weak limits of lp−model 9.1.2 Statistical inference using BM method in lp−model 9.2 Real data application related to AccuWeather 9.3 Box-Cox transformation to improve the L-model and P-model 9.4 Real data application 9.5 The Kumaraswamy GEVL and GEVP DFs and further generalizations

Appendix A Summary of Hill’s estimators in the L-model and P-model References Author index Subject index

208 208 211 215 217 218 219 224 225 225 226 227 228 230 231 237 239 244 254 258

Preface

Extreme value theory is a progressive branch of statistics dealing with extreme events. The restriction of the statistical analysis to this special field is justified by the fact that the extreme data, or the extreme part of the sample, can be of outstanding importance in studying floods, hurricanes, air pollutants, extreme claim sizes, life spans, etc. A quick look at the literature reveals that all the known books in the area of extreme value analysis deal with the modelling of extreme value data based on extreme value theory under linear normalization. In this book, we will tackle some modern trends in the modelling of extremes under linear normalization, such as the bootstrap technique. In addition, we consider the problem of the mathematical modelling of extremes under power normalization with the hope that this most recent approach will be more routinely applied in practice. Finally, the present book handles some recent approaches in order to achieve an improved fit of generalized extreme value distribution for block maxima data and of generalized Pareto distribution for peak-overthreshold data, either under linear or power normalization. Among these approaches is the use of Box-Cox transformation, which provides additional flexibility in improving the model fit. This book is designed as an addition to the series of books about the modelling of extreme value data rather than as a competitor to them. To the best of the author’s knowledge, no books now in print cover the modelling of extreme data under power normalization. It is worth mentioning that the advantage of using the power normalization is that the classical linear model (i.e., using extreme value theory under linear normalization) may fail to fit the given extreme data, while the power model (i.e., using extreme value theory under power normalization) succeeds. On the other hand, although the book contains several applications, it meets the needs of readers who are interested in both the theoretical and the practical aspects of extreme value theory. In addition, the prerequisites for reading the book are minimal; readers do not need knowledge of advanced calculus or advanced theory of probability. The primary readership of this book will be researchers who have a strong mathematical background and are interested in extreme value theory and its applications in modelling extreme value data, including statisticians, and researchers who are interested in environmental and economic issues.

Preface

xi

In fact, in some cases, the book may be a primary text (for students of departments of statistics in faculties of science and postgraduate students studying ecology) and it may be supplementary or recommended reading for all students or researchers who are interested in environmental studies and economics. I am indebted to the numerous researchers who have enriched this field, especially in the modelling of extreme data concerning air pollution. Usually, these researchers worked on their own data arising from their particular habitats; consequently, we may find some diversities or even divergences in their results. However, beneath these diversities or even divergences there lies a shared basis of a general theory. Actually, I am pleased to be part of this team. In this book, I am trying with some members of my own research group to present our own experience that has extended over two decades in this field. Finally, I would like thank my earlier Ph.D student Dr Hafid A. Alaswed for many considerable contributions presented in this book, especially in Chapters 6–8 of this book. I would also like to extend my sincere gratitude to Adam Rummens who encouraged me to write this book. The principal author H. M. Barakat June 2018

Notations and abbreviations AB AIC AM AN AV BIC Box-Cox-GL BM C.V CVC CVCL CVCP DF DFs D Dp dgos EEAA E(X) evir EVT EVI GA GEVL GEVLs gos GP GPDL GEVPs GPDP GPDPs GPVLP GPVLPs K-S HP HPL HPP HE HEs

Asymptotic bias Akaike information criterion Asymptotic mean squared error Asymptotic normality Asymptotic variance Bayesian Akaike criterion Box-Cox-GEVL Block maxima Coefficient of variation Coefficient of variation criterion Coefficient of variation under linear normalization Coefficient of variation under power normalization Distribution function Distribution functions Domain of attraction under linear normalization Domain of attraction under power normalization Dual generalized order statistics Egyptian Environmental Affairs Agency Expected value of X Extreme values in R package Extreme value theory Extreme value index Genetic algorithms Generalized extreme value DF under linear normalization Generalized extreme value DFs under linear normalization Generalized order statistics Generalized Pareto distribution Generalized Pareto DF under linear normalization Generalized extreme value DFs under power normalization Generalized Pareto DF under power normalization Generalized Pareto DFs under power normalization Generalized Pareto DF under linear-power normalization Generalized Pareto DFs under linear-power normalization Kolmogorov-Smirnov (test) Hill plot Hill Plot under linear normalization Hill plot under power normalization Hill estimator Hill estimators

Notations and abbreviations

HEL HELs HEP HEPs HMEL HMEP HMEPs iid LI LAQN LRT MEL MEP MEPL MEPP m−gos m−dgos ML-laws ML MLE MLEs MREL MREP MSE MSEs MSEL MSEP MS-laws NERMN NCNSRC NHP NO NO2 PDF PDFs PM PM10 POT RLP RV

xiii

Hill estimator under linear normalization Hill estimators under linear normalization Hill estimator under power normalization Hill estimators under power normalization Harmonic moment estimator under linear normalization Harmonic moment estimator under power normalization Harmonic moment estimators under power normalization Independent and identically distributed Location-invariant London air quality network Likelihood ratio test Moment estimators under linear normalization Moment estimators under power normalization Mean excesses plot under linear normalization Mean excess plot under power normalization m−generalized order statistics m−dua generalized order statistics The class of maximum limit laws Maximum likelihood Maximum likelihood estimate Maximum likelihood estimates Moment-ratio estimator under linear normalization Moment-ratio estimator under power normalization Mean squared error Mean squared errors Mean squared error under linear normalization Mean squared error under power normalization Class of max-stable laws National Environmental Radiation Monitoring Network National Center for Nuclear Safety and Radiation Control New Hill plot Nitric oxide Nitrogen dioxide Probability density function, also density function Probability density functions, also density functions Particulate matter PM of diameter less than 10 mm Peak over threshold Return level plot Random variable

xiv

RVs SE SP SO2 STD SI SC TCP UA Var(X) WC WHO F =1−F R N (μ, σ) Φ(.) −→ n w −→ n d

−→ n p −→ n a.s. −→ n

Notations and abbreviations

Random variables Standard error Stability plot Sulphur dioxide Standard deviation Scale invariant Strong consistence Threshold choice plot Uniform assumption Variance of X Weak consistence World health organization Survival function Real line Normal distribution with mean μ and variance σ Standard normal distribution function Convergence, as n → ∞ Weak convergence, as n → ∞ Convergence in distribution, as n → ∞ Convergence in probability, as n → ∞ Convergence almost surly, as n → ∞

Illustrations

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 6.1 6.2 6.3 6.4 6.5 7.1 7.2 7.3

Mobile gas monitoring station used to monitor pollution over the course of a full year and provided by NCNSRC Hourly average of particulate matter concentration for 10th of Ramadan Hourly average of particulate matter concentration for 10th of Ramadan Hourly average sulphur dioxide concentration for 10th of Ramadan Hourly average sulphur dioxide concentration for Zagazig Thirty minutes average ozone concentration for 10th of Ramadan Fifteen-minute average gamma radiation level for Zagazig Fifteen-minute average gamma radiation level for Zagazig SO2 in Zagazig SO2 in 10th of Ramadan PM10 in Zagazig PM10 in 10th of Ramadan O3 in 10th of Ramadan after bootstrap Ambient gamma radiation in Zagazig after bootstrap The relation between threshold selection and k The threshold selection by using the MEPL for the River Nidd data. Vertical dashed lines mark these thresholds The threshold selection by using the MEPP for the River Nidd data. Vertical dashed lines mark these thresholds The threshold selection by using the HPL. Vertical dashed lines mark these thresholds The threshold selection by using the TCP function for Nidd data The left panel is the Hill plot of γp++ , with k = 112, and the right panel is the Hill plot of γp++≺ , with k = 82 The threshold selection using the New Hill plot for Nidd data The Hill plot of γP+≺ M , with k=19

120 120 121 121 122 122 122 123 132 132 133 133 134 134 161 162 163 164 165 170 183 193

xvi

7.4 7.5 7.6 7.7 7.8 8.1 8.2

8.3 8.4 8.5 8.6

8.7 8.8

8.9 8.10

8.11

8.12

8.13

9.1

Illustrations

The Hill plot of The Hill plot of The Hill plot of

γP+ M , with k=51 γP+≺ H , with k=94 γP+ H , with k=9

Samples generated from GEVL (0.10 ≤ γ ≤ 2.60, μ = 7, σ = 1): n=100, replicates=1000 Samples generated from GEVP(0.10 ≤ γ ≤ 2.60): n=100, replicated=1000

Return level plot of two pollutants in the two cities Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) of PM10 in 10th of Ramadan Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) PM10 in Zagazig Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) SO2 in 10th of Ramadan Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) SO2 in Zagazig Selection of the threshold of PM10 in 10th of Ramadan. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++ Selection of the threshold of PM10 in Zagazig. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++ Selection of the threshold of SO2 in 10th of Ramadan. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++ Selection of the threshold of SO2 in Zagazig. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++ Mean excess plot for selecting the threshold of PM10 in 10th of Ramadan. The left and the right panels indicate respectively the MEPL and the MEPP Mean excess plot for selecting the threshold of PM10 in Zagazig. The left and the right panels indicate respectively the MEPL and the MEPP Mean excess plot for selecting the threshold of SO2 in 10th of Ramadan. The left and the right panels indicate respectively the MEPL and the MEPP Mean excess plot for selecting the threshold of SO2 in Zagazig. The left and the right panels indicate respectively the MEPL and the MEPP Graphical representation of the data set and the fitted distribution P1;ˆγ (x; cˆ; a ˆ, ˆb)

193 193 194 199 200 204

206 207 207 208

210 210

211 211

212

212

214

214 229

Tables

2.1 3.1 3.2 3.3 3.4

Domains of attraction of the most common distributions Estimated GEVL and GEVP models, for F1 Estimated GEVL and GEVP models, for F2 Estimated GEVL and GEVP models, for F3:1 Generated data from N (μ, σ = 1): bootstrap technique for quantiles 3.5 K-S Test: bootstrap technique for quantiles 3.6 Simulation study for K = 1 3.7 Simulation study for K = 2 4.1 Zagazig and 10th of Ramadan for GEVL 4.2 Zagazig and 10th of Ramadan for GEVL, after bootstrap 4.3 K-S test for the data with and without bootstrap 4.4 Simulation study for choosing a suitable number of POT (k)—k  denotes the best value 4.5 Simulation study for choosing a suitable number of POT (k)—k  denotes the best value 4.6 Simulation study for chosen m sub-sample bootstrap—m denotes the best value 4.7 Simulation study for chosen m sub-sample bootstrap—m denotes the best value 4.8 Zagazig and 10th of Ramadan for GPDL 4.9 Zagazig and 10th of Ramadan for GPDL after bootstrap 4.10 Zagazig and 10th of Ramadan for GEVL 5.1 Estimating the shape parameter γ in the GEVP(γ, 1, 1), defined in (2.32), by using the ML method and the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best 5.2 Estimating the shape parameter γ in the GPDP, defined in (5.4), by using the ML method—“∗” in the superscript of a value means that this value is the best

29 91 91 91 96 97 106 106 131 131 135 136 136 137 137 138 138 138

146

147

xviii

Tables

5.3

Estimating the shape parameter γ in the GPDP (5.4) by using the suggested estimate (5.11)—“∗” in the superscript of a value means that this value is the best Estimate parameters of GEVP (5.4) by using the ML method for the sub-sample bootstrapping method Estimating the shape parameter γ in the GEVP(0.4, 1, 1), defined in (5.4) by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best Estimating the shape parameter γ in the GEVP(0.2, 1, 1), defined in (5.4), by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best Estimating the shape parameter γ in the GEVP(0, 1, 1), defined in (5.4), by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best Estimating the shape parameter γ in the GEVP(−0.2, 1, 1), defined in (5.4) by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best Estimating the shape parameter γ in the GEVP(−0.4, 1, 1), defined in (5.4) by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best Simulation study for estimate GEVL and GEVP by using the GA technique Comparison between some desirable properties of HELs for EVI Simulation output for assessing and comparing the estimators + ++ and γM γM R Simulation output for assessing the estimators γp++ and γp++≺ Simulation output for assessing the estimators γp+− and γp+−≺ Simulation output for assessing the estimators γp−+≺ and γp−+ Simulation output for assessing the estimators γp−−≺ and γp−− Simulation output for assessing the HEP γp++ , γ > 0 Simulation output for assessing the HEP γp+− , γ < 0 Simulation output for assessing the HEP γp−+ , γ > 0 Simulation output for assessing the HEP γp−− , γ < 0 Simulation output for assessing and comparing the HEPs γp++≺ , ++ γp++ , γp++ , γP++≺ H , and γP H Simulation output for assessing and comparing the HEPs γp+−≺ , γp+− , and γp+− Simulation output for assessing and comparing the HEPs γp−+≺ , γp−+ , and γp−+ Simulation output for assessing and comparing the HEPs γp−−≺ , γp−− , and γp−−

5.4 5.5

5.6

5.7

5.8

5.9

5.10 6.1 6.2 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12

148 148

149

149

150

150

151 153 159 166 172 172 173 173 181 181 182 182 185 186 187 188

Tables

7.13 Simulation output for assessing and comparing the estimators +≺ ++≺ γP++≺ , γP++≺ H , γP M , and γP M R 7.14 Simulation output for assessing and comparing the estimators + ++ γP++ , γP++ H , γP M , and γP M R + 7.15 Simulation output for assessing the criterion CVC for γM ++ 7.16 Simulation output for assessing the criterion CVC for γP 7.17 Simulation output for assessing the criterion CVC for γ++ 7.18 Simulation output for assessing the criterion CVC for γP++ 8.1 Statistics summary of the two pollutants in the two cities 8.2 ML estimates for the shape parameter of the two pollutants in the two cities 8.3 The LRT of the two pollutants, p-value, and decision 8.4 Threshold selection by using HPL and HPP 8.5 MEPL and MEPP for selecting an appropriate threshold 8.6 Gomes and van Monfort test for GPDL and GPDP of PM10 in 10th of Ramadan 8.7 Gomes and van Monfort test for GPDL and GPDP of PM10 in Zagazig 8.8 Gomes and van Monfort test for GPDL and GPDP of SO2 in 10th of Ramadan 8.9 Gomes and van Monfort test for GPDL and GPDP of SO2 in Zagazig 8.10 Graphical methods of threshold selection of PM10 8.11 Graphical methods of threshold selection of SO2 8.12 Parameter estimates for the GP distribution of PM10, under linear and power normalizing constants 8.13 Parameter estimates for the GP distribution of SO2 , under linear and power normalizing constants 8.14 Different estimates of the EVI for 10th of Ramadan under the linear and power models 8.15 Different estimates of the EVI for Zagazig under the linear and power models 8.16 Comparison between the linear and power models for pollutants PM10 and SO2 8.17 Estimates of the EVI of Danish fire insurance claims under linear and power normalizing constants 8.18 Comparison between the linear and power models 9.1 Summary statistics 9.2 Parameter estimation for maximum temperature 9.3 Descriptive statistics for maximum data for air pollution 9.4 The MLEs for the GEVL and GPDL models—the application of AIC and BIC 9.5 K-S test

xix

191 192 196 197 198 199 203 205 205 209 213 215 216 216 217 217 218 218 219 220 221 223 224 224 228 229 232 233 233

xx

Tables

9.6

The MLEs for Box-Cox-GL model and the application of AIC and BIC 9.7 K-S test 9.8 The MLEs for GEVP and GPVP models—the application of AIC and BIC 9.9 K-S test 9.10 The MLEs for the Box-Cox-GP1 model and the application of the AIC and BIC 9.11 K-S test A.1 Four HELs A.2 Eight counterparts of HEPs A.3 Four HEPs based on GPDP A.4 Four moment estimators under power normalization A.5 Four moment ratio estimators under power normalization

234 234 235 235 236 236 239 240 241 242 243

1 Introduction: Some basic and miscellaneous results

In practice, we usually do not know the true probability models of random phenomena, such as a human behaviour. George Box once said that there is no true model, but there are useful models. Even if there was a true probability model, we would never be able to observe it. Fortunately, in many cases a complicated situation can be replaced by a comparatively simple asymptotic model. The most important example of such cases is when the extremes govern the law of interest (e.g., air pollution, floods, strength of material, etc.). More precisely, the asymptotic theory of extreme order statistics provides approximate probability models that are not true but are definitely useful. Therefore, we must connect what we can observe with these approximate models. The key idea here is that we use a large set of observations (or a set of realizations) to figure out the approximate probability model given the data we have. Clearly, the cornerstone of the approximate probabilities model is the concept of the convergence in probability theory. In Section 1.1, we will discuss different types of convergence in the probability theory and statistics. On the other hand, some important tools of data treatments, such as the Maximum Likelihood Method, Genetic Algorithms (GA), and the Kolmogorov-Smirnov (K-S) test, are discussed in Sections 1.1 and 1.2. 1.1 The convergence concept in probability theory There are several convergence concepts associated with the limiting behaviour of a sequence of RVs. Convergence in distribution (or weak convergence), convergence in probability, and almost sure convergence are the prominent ones. In the case of the sample mean, these concepts lead us to the classical central limit theorem, weak law of large numbers, and strong law of large numbers, respectively. In this book we will mostly be concerned with weak convergence results for order statistics. In the context of weak

2

Introduction: Some basic and miscellaneous results

convergence, we are interested in identifying the possible non-degenerate limit distributions for appropriately normalized sequences of RVs of interest. These limiting distributions can be of direct use in suggesting inference procedures when the sample size is large. These concepts and some required theorems of a purely analytical nature will be briefly discussed in this section. Throughout what follows the symbol ( −→ n ) stands for convergence, as n → ∞.

1.1.1 Modes of convergence of RVs Definition 1.1 (almost sure convergence) We say that a sequence of RVs a.s. X1 , X2 , ... converges to a RV X almost surely, written Xn −→ X, if n {ω ∈ Ω : Xn (ω) −→ n X(ω)} is an event whose probability is one, where Xn and X are defined on the same probability space (Ω, F, P ). Definition 1.2 (convergence in probability) A sequence of RVs {Xn } is p X, if said to converge in probability to a RV X, as n → ∞, written Xn −→ n for every  > 0 we have P (| Xn −X |< ) −→ n 1, or equivalently P (| Xn −X | ≥ ) −→ n 0. Definition 1.3 (convergence in the rth mean) A sequence of RVs X1 , X2 , ... is said to converge in the rth mean, or in the norm ||.||r , to a RV X, written r Xn −→ X, if r ≥ 1, E|Xn |r < ∞, ∀n, and n lim E(|Xn − X|r ) = 0.

n→∞

The most important cases of convergence in rth mean are: • When Xn converges in rth mean to X, for r = 1, we say that Xn converges in mean to X. • When Xn converges in rth mean to X, for r = 2, we say that Xn converges in mean square to X. Convergence in the rth mean, for r > 0, implies convergence in probability (by Chebyshev’s inequality), while if r > s ≥ 1, convergence in rth mean implies convergence in sth mean. Hence, convergence in mean square implies convergence in mean. Definition 1.4 (convergence in distribution or weak convergence) Assume that X1 , X2 , ... is a sequence of RVs with corresponding DFs F1 , F2 , ... and

1.1 The convergence concept in probability theory

3

the RV X has the DF F. We say that the sequence of RVs {Xn } cond X (or the verges in distribution to the RV X, as n → ∞, written Xn −→ n sequence of DFs {Fn } converges weakly to the DF F, as n → ∞, writw ten Fn (x) −→ F (x)) if Fn (x) converges pointwise to F (x) at all continuity n points of F, that is Fn (x) −→ n F (x) at all points x, where F is continuous. d X, since weak Remark Many authors avoid using the notation Xn −→ n convergence pertains only to the DF of X and not to X itself. However, we only use this notation in this section for the sake of notation uniformity; w F (x). however, in the sequel we will use the notation Fn (x) −→ n

Remark Unless otherwise stated, we assume that the limiting function F (x) is non-degenerate proper DF, i.e., that there exists a real number x such that 0 < F (x) < 1 and F (∞) − F (−∞) = 1, in this case, we say that Fn (x) converges properly to F (x) or simply Fn (x) converges weakly to F (x). On the contrary, if F (∞) − F (−∞) < 1, F (x) will be called improper DF and in this case the aforesaid convergence will be called improper convergence. Some important relations between the modes of convergence are given in the next theorems. Theorem 1.5 Assume that X1 , X2 , ..., Xn are RVs on the same probability space (Ω, F, P ). If so, the following implications hold: a.s.

p

• If Xn −→ X, then Xn −→ X. n n p

d X, then Xn −→ X. • If Xn −→ n n r

p

X, then Xn −→ X. • If Xn −→ n n Theorem 1.6 (Continuous Mapping Theorem) Let {Xn }∞ n=1 be a sequence of RVs, f : R → R be a continuous function, and X be an RV. a.s.

a.s.

• If Xn −→ X, then f (Xn ) −→ f (X). n n d d • If Xn −→ X, then f (Xn ) −→ f (X). n n p

p

X, then f (Xn ) −→ f (X). • If Xn −→ n n The preceding results hold equivalently for a sequence of random vectors and matrices. Also, an important special case here is that X = c, where c ∈ p a.s. a.s. f (c), if Xn −→ c. Similarly, if Xn −→ c, R. In this case, we get f (Xn ) −→ n n n p then f (Xn ) −→ f (c). n Theorem 1.7 (Slutzky’s Theorem)

p

d X and Yn −→ C, where Let Xn −→ n n

d d CX and Xn + Yn −→ X + C. C ∈ R is a constant. Then, Yn Xn −→ n n

4

Introduction: Some basic and miscellaneous results p

d An important special case of Theorem 1.7 is that if Xn −→ X and Yn −→ n n d 0, then Xn + Yn −→ X. In this case, we say that Zn = Xn + Yn and Xn are n p asymptotically equivalent because Zn − Xn −→ 0. Clearly, Slutzky’s theon rem, as well as the convergence concepts, can be readily extended to random vectors and random matrices.

Theorem 1.8

w

If Fn −→ F and F is continuous, then n sup | Fn (x) − F (x) | −→ n 0, x

which means that the convergence is uniform with respect to x. 1.1.2 Further limit theorems on weak convergence The meaning of any limit theorem for a random sequence {Xn } is that it gives a sufficiently simple approximation to the DF Fn (x) = P (Xn < x). w P (X < x), where Gn (.) Namely, let Fn (Gn (x)) = P (G−1 n (Xn ) < x) −→ n is a monotone continuous function (we may take Gn (x) = an x + bn ) and G−1 n (.) is the inverse of Gn . If the limit F (x) = P (X < x) is continuous, then Theorem 1.8 implies that −1 −→ n = sup |P (G−1 n (Xn ) < x) − P (X < x)| = ρ(Gn (Xn ), X) n 0. x

Since the metric ρ is invariant with respect to strongly monotone continuous transformations of RVs, we have ρ(Xn , Gn (X)) = n −→ n 0, i.e., we receive a uniform approximation to P (Xn < x) = Fn (x) by means of some universal DF of the RV X (see Pancheva, 1984). Such a viewpoint to the limit theorems deprives the traditionally linear transformation of its exclusiveness. Thus, it makes sense to extend the class of normalizing transformations, {Gn (x)}, to any strongly monotone continuous transformations for constructing a simplified approximation if only one can prove a suitable limit theorem. Chapter 5 will rely on this idea. The next result gives equivalent characterizations of the weak convergence. Theorem 1.9 If ψ and {ψn } are the characteristic functions with the DFs F and {Fn }, respectively, then the following statements are equivalent: w F; (i) Fn −→ n (ii) ψn (t) −→ n ψ(t), for every t ∈ R;  g(x)dF (x) for every bounded continuous function (iii) g(x)dFn (x) −→ n g.

1.1 The convergence concept in probability theory

5

Let F and Fn be the DFs of the RVs X and Xn , respectively (notice that X1 , X2 , ... and X need not to be defined on the same probability space). w d F (or equivalently Xn −→ X). Then, in this case, the DF F is Let Fn −→ n n usually called the asymptotic (or limiting) distribution of the sequence Xn . Clearly, the convergence in distribution depends only on the involved DFs and does not require that the relevant RVs approximate each other. However, the only relationship between the weak convergence and the convergence in probability is given in the following theorem. Theorem 1.10

p

d If Xn −→ C, where C is a constant, then Xn −→ C. n n

The following definition and theorem, due to Helly (see Feller, 1979), are basic tools in studying the weak convergence of the sequence of DFs. Definition 1.11 Let {Xn } be a sequence of RVs with corresponding DFs {Fn }. Then, the sequences {Xn } and {Fn } are said to be stochastically bound, if for each  > 0, there exists a number c such that P (| Xn |≥ c) < , for all sufficiently large n. Theorem 1.12 (A) Every sequence of DFs {Fn } possesses a subsequence {Fnk }, that converges (properly or improperly) to a limit F (remember that the improper convergence means that the limit is an extended DF, i.e., F (∞) − F (−∞) < 1). (B) In order that all such limits be proper it is necessary and sufficient that {Fn } be stochastically bounded. w F (x), it is necessary and sufficient that the limit (C) In order that Fn −→ n of every convergence subsequence equals F. We will end this section with an important known theorem, which will be needed in the sequel. Theorem 1.13 (Khinchin’s type theorem) DFs. Furthermore, let

Let Fn (x) be a sequence of

w

Fn (Gn (x)) −→ F (x), n with Gn (x) = an x + bn , an > 0. Then, with G∗n (x) = cn x + dn , cn > 0, we have F ∗ (x), F ∗ is a non-degenerate DF, Fn (G∗n (x)) −→ n w

∗ −1 ∗ −→ if and only if G−1 n (Gn (x)) = Gn oGn (x) n g(x), ∀x, where g(x) = ax + dn −bn −→ cn −→ ∗ b, an n a, an n b and F (x) = F (g(x)).

Theorem 1.13 leads to the following definition:

6

Introduction: Some basic and miscellaneous results

Definition 1.14 We say that the DFs G(x) and G∗ (x) are of the same type, under linear transformation, if there are real numbers A and B > 0 such that G∗ (x) = G(Ax + B). Clearly the relation between G and G∗ in Definition 1.14 is symmetrical, reflexive, and transitive. Hence, it gives rise to equivalence classes of DFs. Sometimes we shall indicate a type by one representative of the equivalence classes. These facts convince us that the probability limit theory basically deals with the types of DFs rather than the DFs themselves. Remark (Why the weak convergence mode?) It is natural to wonder why we use weak convergence in statistical modelling, although it is the weakest mode of convergence. Actually, Barakat and Nigm (1996) have investigated the mixing property of order statistics. The notion of mixing sequences of RVs was first introduced by R´enyi (1962, 1970). In the sense of Renyi, a sequence {Xn } of RVs is called mixing if for any event E of positive probability, the conditional DF of Xn under the condition E converges weakly to a non-degenerate DF, which does not depend on E. Barakat and Nigm (1996) have shown that any sequence of order statistics (extreme, intermediate, and central), under linear normalization, is mixing. On the other hand, they also showed in the same work that any mixing sequence of RVs X1 , X2 , ..., Xn cannot converge in probability to an RV X∞ that has non-degenerate DF. This simply means that any sequence of order statistics, particularly the sequence of extreme order statistics, cannot converge in probability to any RV with non-degenerate DF (except for convergence in probability to a constant) and the only available mode of convergence is the weak convergence.

1.2 Statistical methods 1.2.1 Maximum likelihood method A general and flexible method of estimation of the unknown parameter θ within a family F is the maximum likelihood method. Each value of θ ∈ Θ defines a model in F that attaches (potentially) different probabilities (or probability densities) to the observed data. The probability of the observed data as a function of θ is called the likelihood function. Plausible values of θ should have a relatively high likelihood. The principle of maximum likelihood estimation is choosing the model with greatest likelihood, among all the models under consideration, i.e., this is the one that assigns highest probability to the observed data.

1.2 Statistical methods

7

To see this in greater detail, we can refer back to the situation in which we have a data set X whose density is defined by some d-dimensional parametric model with parameter θ = (θ1 , ..., θd ). Write the density evaluated at X = x in the form f (x; θ). The likelihood function for θ based on the data X is just f (x; θ) interpreted as a function of θ. Usually, we work with the log likelihood X (θ) = log[f (x; θ)]. The maximum likelihood estimate (MLE) θˆ (of the parameter θ) is the value of θ which maximizes X (θ). Usually, we assume X (θ) is differentiable with a unique interior maximum, so the MLE is given by solving the likelihood equations ∂ X (θ) = 0, j = 1, ..., d. ∂θj For the maximization of X (θ), for a general model indexed by θ, this may be performed using a packaged nonlinear optimization subroutine, of which several excellent versions are available. Example 1.15 Consider the general extreme value DF under linear normalization (GEVL) 





Gγ (x; μ, σ) = exp − 1 + γ

x−μ σ

− 1  γ

(1.1)

defined on {x : 1 + γ(x − μ)/σ > 0}. In this distribution γ is a shape parameter, μ is a location parameter and σ is a scale parameter. This DF is the foremost pillar of the statistical modelling of extreme value data under linear normalization that will be discussed in detail in Chapter 4. For the GEVL (1.1), the density g(x; μ, σ, γ) is obtained by differentiating Gγ (x; μ, σ) with respect to x. The likelihood function based on observations x1 , ..., xk is k

g(xi ; μ, σ, γ)

i=1

and so the log likelihood is given by X (μ, σ, γ; x) = −k log σ +

k

i=1





− 1+γ



xi − μ σ

− 1 γ







1 − 1+ log 1 + γ γ



xi − μ σ



, (1.2)

8

Introduction: Some basic and miscellaneous results

provided {1 + γ(xi − μ)/σ > 0} for each i; otherwise, (1.2) is undefined. The following practical points should be considered for this example: 1. Although the maximization is unconstrained, there are some practical constraints. For example, (1.2) requires γ > 0 as well as {1+γ(xi −μ)/σ > 0} for each i. It is advisable to test explicitly for such violations and to set − X (θ) equal to some very large value if the conditions are indeed violated. 2. All Newton-type routines require the user to supply starting values, but the importance of good starting values can be overemphasized. Simple guesses usually suffice, e.g., in (1.2), one might set μ and σ equal to the sample mean and sample standard deviation respectively, with γ equal to some crude guess value such as 0.1. However, it is important to check that the initial conditions are feasible and this can sometimes not be so easy to achieve. 3. In cases of doubt about our application, where a true maximum has been found, the algorithm may be re-run from different starting values. If the results are highly sensitive to starting values, this is indicative that the problem may have multiple local maxima, or alternatively that a mistake has been made in programming. A few further comments are necessary regarding the specific application of numerical MLE to the GEVL family. There is a singularity in the likelihood for γ < 0, as μ → Xmax = max(X1 , ..., Xk ) in (1.2) and the effect is that X (θ) → ∞. However, in the most practical cases, there is a local maximum (of X (θ)) that is some distance from the singularity and the presence of the singularity does not interfere with the convergence of the nonlinear optimization algorithm to the local maximum. In this case, the correct procedure is to ignore the singularity and use the local maximum. However, it is possible that no local maximum exists and the singularity dominates. In this case, MLE fails and some other method must be sought. However, this very rarely happens with environmental data. Finally, we should say something about the theoretical status of the approximations involved. The asymptotic theory of MLE for the GEVL model is valid provided γ > −0.5 (cf. Smith, 1985). Cases with γ ≤ −0.5 correspond to an extremely short upper tail and hardly ever occur in environmental applications. A more serious problem is that even when γ > −0.5, the asymptotic theory may give rather poor results with small sample sizes, see Hosking et al. (1985). In summary: it is possible that MLEs will fail either numerically or in terms of their asymptotic properties, especially if the sample size is small. The user should be aware of their possible difficulties but should not be

1.2 Statistical methods

9

deterred from using these extremely powerful and general methods. For more details about this subject, see Prescott and Walden (1980, 1983), Mached (1989), and Smith (1985). An alternative method for quantifying the uncertainty in the MLE is based on the deviance function, or the likelihood ratio test (LRT) (see Theorems 2.6 and 2.7 in Coles, 2001), which is defined by LRT = −2(log L0 − log L1 ),

(1.3)

where log L0 and log L1 are the values of the log-likelihood under the null and alternative hypothesis, respectively. The statistic LRT is distributed as χ2n , with degrees of freedom equal to the number of restrictions under the null hypothesis. The method of the LRT is summarized as follows: 1. Let L0 (M0 ) and L1 (M1 ) be the maximized values of the log-likelihood for models M0 and M1 , respectively. 2. Test of the validity of model M0 relative to M1 at a suitable chosen level of significance. Reject M0 in favour of M1 if LRT = −2(log L0 − log L1 ) > cα , where cα is the (1 − α) quantile of the χ2n distribution.

1.2.2 Kolmogorov-Smirnov (K-S) test In statistics, the K-S test is a nonparametric test of the equality of continuous one-dimensional DFs that can be used to compare a sample with a reference DF (one-sample K-S test), or to compare two samples (two-sample K-S test). It is named after Andrey Kolmogorov and Nikolai Smirnov. The K-S statistic quantifies a distance between the empirical DF of the sample and the DF of the reference distribution, or between the empirical DFs of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution Fˆ (x) (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In each case, the considered distributions under the null hypothesis are continuous DFs, but are otherwise unrestricted. Let X1 , X2 , ..., Xn be independent and identically random sample distributed under the null-hypothesis H0 , as F0 . Therefore, the K-S test statistic Dn is defined by Dn = sup |F0 (x) − Fn (x)|, x

10

Introduction: Some basic and miscellaneous results

where sup x is the supremum of the set of distances and Fn (x) is the empirical DF that increases by n1 at each data value. Namely, Fn =

n 1

I (Xi ), n i=1 [−∞,x]

where I[−∞,x] (Xi ) is the indicator function, which is equal to 1 if Xi ≤ x and is equal to 0 otherwise. By the Glivenko-Cantelli theorem, if the sample comes from the DF F0 (x), then the statistic Dn converges to 0 almost surely in the limit when n goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence. In practice, the statistic requires a relatively large number of data points to properly reject the null hypothesis. The K-S statistic has been used for goodness-of-fit testing for continuous populations for decades, although other tests have made slight improvements in terms of power. The K-S test appeal includes the straightforward computation of the test statistic and the distribution-free characteristic of Dn . Its drawback is that the DF of Dn , under the null hypothesis (i.e., the assumption that data was drawn from a population with DF F0 (x)), is difficult to determine, leaving one to calculate critical values with various approximation methods. An algorithm for computing the distribution of Dn , for small to moderate values of n, was given by Drew et al. (2000). As the supremum must be achieved at a data value, the computational formula for computing Dn , is Dn = max(Dn+ , Dn− ), where x





Dn+ = sup[Fn (x) − F0 (x)] = max max [ 1≤i≤n

x



i − F0 (Xi:n ], 0 , n

Dn− = sup[F0 (x) − Fn (x)] = max max [F0 (Xi:n − x

1≤i≤n



i−1 ], 0 n

and X1:n , X2;n , ..., Xn:n are the order statistics corresponding to the random sample X1 , X2 , ..., Xn . The maximum positive difference, Dn , detects the largest vertical deviation between the two DFs, where the fitted DF F0 (x) is below the empirical DF. Likewise, the maximum negative difference detects the largest vertical deviation between the two DFs, where the fitted DF is above the empirical DF. The smallest value of Dn that can be achieved is 1/2, which corresponds to the DF of the fitted DF F0 (x) bisecting all the risers of the steps associated with the empirical DF. Assume we have the random sample X1 , X2 , ..., Xn and the hypothesistesting situation H0 : FX (x) = F0 (x), for all x, where F0 (x) is a completely specified continuous DF. The differences between FX (x) and F0 (x) should be

1.2 Statistical methods

11

small for all x except for sampling variation, if the null hypothesis is true. For the usual two-sided goodness-of-fit alternative, H1 : FX (x) = F0 (x), for some x, large absolute values of these deviations tend to discredit the hypothesis. In this book, almost all computations are achieved by the Matlab package, where we have four functions [H, P, KSST AT, CV ]. Namely, H is equal to 0 or 1, P is the p−value, KSST AT is the maximum difference between the data and fitting curve, and CV is a critical value. Therefore, • We accept H0 , if H = 0, KSST AT ≤ CV and P > level of significant, • We reject H0 , if H = 1, KSST AT > CV and P ≤ level of significant.

1.2.3 Genetic algorithms (GA) Genetic algorithm (GA) is a search-based optimization technique based on the principles of genetics and natural selection. It is frequently used to find optimal or near-optimal solutions to difficult problems, which otherwise would take a lifetime to solve. It is frequently used to solve optimization problems, in different types of research and in machine learning. GA was developed by John Holland and his students and colleagues at the University of Michigan. The basic concept of GA is designed to simulate processes in natural systems necessary for evolution, specially those that follow the principles first laid down by Charles Darwin of the survival of the fittest, as they represent an intelligent exploitation of a random search within a defined search space to solve a problem. Basically, several random sets of parameters are applied to an algorithm and a fitness value (optimization value) is calculated for each. On the basis of this fitness value, the best sets are mixed (this is a combination of Selection, Crossover, and Mutation) together and new sets are again applied to the algorithm until an optimal parameter is obtained. This effect is usually obtained by breaking the genetic algorithm into a few smaller parts. Not only does GA provide alternative methods to solving problems, it consistently outperforms other traditional methods in most of the problems. Many of the real-world problems involved finding optimal parameters, which might prove difficult for traditional methods but ideal for GA. The algorithm begins with a set of solutions (represented by chromosomes), which are called a population. Solutions from one population are taken and used to form a new population. This is motivated by a hope that the new population will be better than the old one. Solutions that are then selected to form new solutions (offspring) are selected according to their fit-

12

Introduction: Some basic and miscellaneous results

ness—the more suitable they are the more chances they have to reproduce. This is repeated until some condition (for example, the number of populations or the improvement of the best solution) is satisfied (see Kramer, 2017). Working principle: The GA is an iterative optimization procedure. Instead of working with a single solution in each iteration, a GA works with a number of solutions (collectively known as population) in each iteration. In the absence of any knowledge of the problem domain, a GA begins its search from a random population of solutions. But now notice how a GA processes strings in an iteration. If a termination criterion is not satisfied, three different operators—reproduction, crossover, and mutation—are applied to update the population of strings. One iteration of these three operators is known as a generation in the parlance of GA. Since the representation of a solution in the GA is similar to a natural chromosome and GA operators are similar to genetic operators, the above procedure is called a genetic algorithm, see Chatterjee et al. (1996) and Chatterjee and Laudato (1997). Outline of the basic GA: The basic steps of a GA can be given as follows: 1. Start: Generate random population of n chromosomes (a set of suitable solutions). 2. Fitness: Evaluate the fitness f (x) of each chromosome x in the population. 3. New population: Create a new population by repeating the following steps until the new population is complete. • Selection: Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected). • Crossover: With a crossover probability cross over the parents to form new offspring (children). If no crossover was performed, offspring will be the exact copy of parents. • Mutation: With a mutation probability mutate new offspring at each locus (position in a chromosome). 4. Use the new generated population for a further run of the algorithm. 5. If the end condition is satisfied, stop, and return the best solution in the current population. 6. Go to Step 2. Fitness function: The Fitness function is a particular type of objective function that prescribes the optimality of a solution (that is, a chromosome) in a GA so that particular chromosome may be ranked against all the other chromosomes. Optimal chromosomes, or at least chromosomes that are more optimal, are allowed to breed and mix their data sets by any of several techniques, producing a new generation that will (hopefully) be even better.

1.3 Bootstrap technique

13

Here our target is to maximize the complete log likelihood, hence the trivial fitness function is the complete log-likelihood function. Below we list some advantages and disadvantages of the GA: Advantages: • It can solve every optimization problem which can be described with the chromosome encoding. • It solves problems with multiple solutions. • Since the GA execution technique is not dependent on the error surface, we can solve multi-dimensional, non-differential, non-continuous, and even nonparametric problems. • Structural GA gives us the possibility to solve the solution-structure and solution-parameter problems at the same time. • The GA method is easy to understand and almost does not demand knowledge of mathematics. The Genetic algorithms are easily transferred to existing simulations and models. Disadvantages: • Certain optimization problems (called variant problems) cannot be solved by means of genetic algorithms. This occurs due to poorly known fitness functions that generate bad chromosome blocks despite the fact that only good chromosome blocks crossover. • There is no absolute assurance that a GA will find a global optimum. It happens very often when populations have a lot of subjects. • Like other artificial intelligence techniques, the GA cannot assure constant optimization response times. Even more, the difference between the shortest and the longest optimization response times is much larger than with conventional gradient methods. This unfortunate property of the GA limits the genetic algorithms’ use in real-time applications. • Genetic algorithm applications in controls, which are performed in real time, are limited because of random solutions and convergence. In other words, this means that the entire population is improving, but this could not be said for an individual within this population. Therefore, it is unreasonable to use genetic algorithms for online controls in real systems without testing them first on a simulation model. 1.3 Bootstrap technique Bootstrapping is a general approach to statistical inference based on building a sampling distribution for a statistic by re-sampling from the data at hand. The term “bootstrapping,” due to Efron (1979), is an allusion to the

14

Introduction: Some basic and miscellaneous results

expression “pulling oneself up by one’s bootstraps” in this case, using the sample data as a population from which repeated samples are drawn. At first blush, the approach seems circular, but has been shown to be sound. Two S libraries for bootstrapping are associated with extensive treatments of the subject: Efron and Tibshirani (1993)—bootstrap library—and Davison and Hinkley (1997)—boot library. There are several forms of the bootstrap, and, additionally, several other re-sampling methods that are related to it, such as jackknifing, cross-validation, randomization tests, and permutation tests. The bootstrap method is shown to be successful in many situations, which is accepted as an alternative to the asymptotic methods. Consider the problem of estimating variability of location estimates by the bootstrap method. If we view the observations x1 , x2 , ..., xn as realizations of independent RVs with common DF F, it is appropriate to investigate the variability and sampling distribution of a location estimate calculated from a sample of size n. ˆ We would like to know the Suppose we denote the location estimate as θ. ˆ sampling distribution of θ, but we are faced with two problems: 1. We don’t know F, and 2. Even if we did know F, θˆ may be a such a complicated function of x1 , x2 , ..., xn that finding its distribution would exceed our analytic abilities. First, we address the second problem. Suppose we know F. How could we find the DF of θˆ without going through incredibly complicated analytic calculations? The computer comes to our rescue, we can do it by simulation. We generate many samples, say B in number, of size n from F ; then, from ˆ The empirical DF of the resulting each sample we calculate the value of θ. ∗ ∗ ˆ ˆ which is good if B is very ˆ values θ1 , ...θB is an approximation to the DF of θ, large. All this would be well and good if we knew F, but we don’t. So, what do we do? We will consider two different cases. In the first case, F is unknown up to an unknown parameter ϕ, i.e., F (x|ϕ). Without knowing ϕ, the above approximation cannot be used. The idea of the parametric bootstrap is to simulate data from F (x|ϕ), ˆ where ϕˆ should be a good estimate of ϕ. Then, it utilizes the structure of F. In the second case, F is completely unknown. The idea of the nonparametric bootstrap is to simulate data from the empirical DF Fn . Here, Fn is a discrete DF that gives probability n1 to each observed value x1 , ..., xn . A sample of size n from Fn is thus a sample of size n drawn with replacement from the collection x1 , ..., xn . The standard deviation of θˆ is then estimated by B 1

(θi∗ − θ¯∗ )2 , Sθˆ = B i=1

1.3 Bootstrap technique

15

∗ θ1∗ , ..., θB

where are produced from B samples, each of of size n, from the collection x1 , x2 , ..., xn . Now, we rewrite the above nonparametric bootstrap procedure into the following steps: Step 1: Construct Fn from the sample by replacing a probability of n1 at each point, x1 , x2 , ..., xn , of the sample. This is the empirical DF of the sample, which is the nonparametric MLE of the population distribution, F. Step 2: From Fn , draw a random sample of size n with replacement. This is a re-sample. Step 3: Calculate the statistics of interest Tn , for this re-sample, yielding Tn∗ . Step 4: Repeat Steps 2 and 3 B times, where B is a large number, in order to create B samples (re-samples). Step 5: Construct the relative frequency histogram from the B number of Tn∗ ’s by placing a probability of B1 at each point, Tn∗1 , Tn∗2 , ..., Tn∗B . The obtained distribution is the bootstrapped estimate of the sampling distribution of Tn . This distribution can be used to make inferences about the parameter θ, which is to be estimated by Tn . Refer to Efron and Tibshirani (1993) for detailed discussions. The bootstrap technique was extended, modified, and refined to handle a wide variety of problems including: confidence intervals, hypothesis tests, linear and nonlinear regression, time series analysis, and other problems. Some of these problems will be briefly discussed in the following: Consider a sample with n = 1, ..., N independent observations of a dependent variable y(y = y1 , ..., yN ) and independent variable x(x = x1 , ..., xN ). A paired bootstrap sample is obtained by independently drawing N pairs (xi , yi ) from the observed sample with replacement. The bootstrap sample has the same number of observations; however, some observations appear several times and others never. The bootstrap involves drawing a large number B of bootstrap samples. An individual bootstrap sample is denoted by (x∗b , yb∗ ). Bootstrap standard errors The empirical standard deviation of a series of bootstrap replications of θˆ ˆ of an estimator θ. ˆ can be used to approximate the standard error SE(θ) • Draw B independent bootstrap samples (x∗b , yb∗ ) of size N from (x, y). Usually B = 100 replications is sufficient. ∗ , b= • Estimate the parameter θˆ of interest for each bootstrap sample: θˆN,b 1, 2, ..., B.

16

Introduction: Some basic and miscellaneous results

ˆ by • Estimate SE(θ)

∗ = where θˆN

1 B



B 1

∗ )2 , (θˆ∗ − θˆN B − 1 b=1 N,b

B  ∗ . θˆN,b

b=1

Bootstrap confidence intervals We can construct a two-sided equal-tailed (1 − α) confidence interval for an estimate θˆ from the empirical DF of a series of bootstrap replications. The α α 2 and the 1 − 2 empirical percentiles of the bootstrap replications are used as lower and upper confidence bounds. This procedure is called a percentile bootstrap. • Draw B independent bootstrap samples (x∗b , yb∗ ) of size N from (x, y). Usually B = 1000 replications is sufficient. ∗ , b= • Estimate the parameter θˆ of interest for each bootstrap sample, θˆN,b 1, 2, ..., B. ∗ ˆ∗ • Order the bootstrap replications of θˆ such that θˆN (1) ≤ θN (2) ≤ ... ≤ ∗ ˆ . θ N (B)

The lower and upper confidence bounds are the [ α2 B] and [(1 − α2 )B] ordered elements, where [x] as usual denotes the greatest integer part of x. The es∗ ˆ∗ timated (1 − α) confidence interval of θˆ is [θˆN ([ α B]) , θN ([(1− α )B]) ]. Note that 2 2 the confidence interval need not be symmetrical. Bootstrap hypothesis tests The approximate confidence interval from the above section can be used to perform an approximate two-sided test of a null hypothesis of the form H0 : θ = θ0 . The null hypothesis is rejected on the significance level α if θ0 lies outside the two-tailed (1 − α) confidence interval. Bootstrap t confidence intervals If the bootstrap distribution of a statistic shows a normal shape and small bias, we can get a confidence interval for the parameter by using the bootstrap standard error and the familiar t distribution. 1. Estimate the t-value of θˆ for each bootstrap sample: t∗N b =

∗ −θ ˆ θˆN,b , b = 1, 2, ...B, ∗ ˆ SE N,b

∗ and SE ˆ ∗ are estimates of the parameter θ and its standard error where θˆN,b N,b using the bootstrap sample.

1.3 Bootstrap technique

17

≤ t∗N (2) ≤ ... ≤ t∗N (B) . the [ α2 B] and [(1 − α2 )B]

t∗N (1)

2. Order the bootstrap replications of t such that The lower and upper critical values (which are then elements, respectively) are

t α2 = t∗N ([ α B]) and t1− α2 = t∗N ([(1− α )B]) , respectively. 2

2

These critical values can now be used in otherwise usual t-tests for θ. In addition, the bootstrap-t procedure can create confidence intervals like in asymptotic theory but using bootstrap critical values instead of the ones from the standard normal tables: ˆ ∗ , θˆ + t∗ [θˆ − t∗N ([ α B]) SE N,b N ([(1− α )B]) ]. 2

2

The confidence interval from bootstrap-t is not necessarily better then the percentile method. However, it is consistent with bootstrap-t hypothesis testing. It is worth mentioning that Efron’s bootstrap does not approximate the distribution of some statistics at all. The maximum quantity of RVs and actually all order statistics are examples for which Efron’s bootstrap fails to be consistent. This subject will be discussed in detail in Chapter 3.

2 Asymptotic theory of order statistics: A historical retrospective

Most of this chapter provides an overview of principle results in and related to the limit theory of order statistics under linear and nonlinear normalizing transformations. In addition to the increased emphasis on the asymptotic theory of order statistics, we briefly discuss some generalizations of the order statistic model such as generalized order statistics as a unification of distribution theory for order statistics and upper record values among other order RVs. The chapter will end with a review of applications of extreme value theory in environmental studies. Although, the material of this chapter is mathematically rigorous, the prerequisites for reading it are minimal. This is because the material of this chapter is almost skimmed from the proofs of the theorems, which are required to go somewhat beyond basic calculus and basic probability theory. 2.1 Order statistics The theory of order statistics may be regarded as roughly half a century old. The rigorous demand of the natural sciences to study the statistical aspects of floods and drought durability, fatigue breaking strength of materials, air pollution, corrosion, natural disasters, selection problems, etc., called for a further development of order statistics theory. Actually, the preceding applications of order statistics are not exhaustive, but should serve to convince the reader that this text will not focus on some abstract concept of little practical utility. Order statistics were discussed more or less extensively in a number of earlier books, but the first book devoted exclusively to this subject was a book edited by Gumbel (1958). Developments in order statistics until 1962 were synthesized in an edited volume by Sarhan and Greenberg (1962). An encyclopedic article on order statistics was written by David (1981). Arnold

2.1 Order statistics

19

et al. (1992) present a good textbook on order statistics at an introductory level. An excellent survey of the developments concerning the theory of order statistics and its applications was made by David and Nagaraja (2003). Observations on a chance variable usually occur in random order. For example, in medical science, biological science, physics, engineering, and manufacture, the data can be interpreted as life times. In this case, they afford an ideal application of order statistics, since by the nature of the experiment the observations arrive in ascending order of magnitude. The rigorous definition of the model of order statistics is given in the following definition: Definition 2.1 (the definition of order statistics) Let the vector x = (x1 , x2 , ..., xn ) ∈ Rn , n ≥ 2, denote the observed value of the random vector X = (X1 , X2 , ..., Xn ), where Rn is the n-dimensional Euclidean space. Let the function φ(.) : Rn → Rn be given by φ(x) = (x1:n , x2:n , ..., xn:n ), x ∈ Rn , where (x1:n , x2:n , ..., xn:n ) is a vector in Rn , which is obtained from the vector x by arranging its components from the smallest to the largest, i.e., the components x1:n ≤ x2:n ≤ ... ≤ xn:n are realizations of the statistics X1:n ≤ X2:n ≤ ... ≤ Xn:n , respectively. The statistics X1:n , X2:n , ..., Xn:n are called the order statistics corresponding to the random vector X. In particular, Xk:n is called the kth order statistic. The statistic X1:n is called the minimum (the smallest value) and the statistic Xn:n is called the maximum (the largest value). The minimum and the maximum will be called extremes regardless of n being large or small. When the RVs X1 , X2 , ..., Xn are independent and have the same DF F (x), the DF of the kth order statistic is given by Φk:n (x) = P (Xk:n < x) = IF (x) (k, n − k + 1) = EF (x) (n, k), | x |< ∞, where Ix (a, b) =

1 β(a, b)

 x

ta−1 (1−t)b−1 dt, β(a, b) =

0

and Ex (a, b) =

a

(a − 1)!(b − 1)! Γ(a)Γ(b) = Γ(a + b) (a + b − 1)!

(ai )xi (1 − x)a−i .

i=b

As a special case we have Φ1:n (x) = P (X1:n < x) = 1 − (1 − F (x))n , which is the DF of the minimum of order statistics, and Φn:n (x) = P (Xn:n < x) = F n (x), which is the DF of the maximum of order statistics. For fixed k ≥ 1, as for n → ∞, Xk:n and Xn−k+1:n will be called the lower and upper kth extremes, respectively. By using the well-known equality

20

Asymptotic theory of order statistics: A historical retrospective

max(x1 , x2 , ..., xn ) = − min(−x1 , −x2 , ..., −xn ), the theory for lower extremes is identical to that for upper extremes. For this reason some statements will be made for either Xk:n or Xn−k+1:n . A sequence {Xk:n } is called a sequence of order statistics with variable rank if 1 ≤ k = kn ≤ n and kn −→ n ∞. Here, we have the following two cases: kn (1) If knn −→ n 0 (or n −→ n 1), then Xk:n is called the lower intermediate order statistic (or the upper intermediate order statistic). (2) If knn −→ n λ (0 < λ < 1), then Xk:n is called the central order statistic. The following diagram assigns the relative location of each of the extreme, intermediate, and central terms in the sequence of the order statistics: L.I. C. U.I. U.E. L.E.   r is fixed, min(r, n) → n ∞, min(r, n) → n ∞, r = n−r +1, r = n− r +1, r → r → r →  → ∞,  is fixed, 0 0 λ, r r n n n n n n n r → r → λ ∈ (0, 1) n n 1 n n 1 In the above diagram the words upper, lower, extreme, intermediate, and central are abbreviated U., L., E., I. and C., respectively. A remarkable example of central order statistics is the pth sample quantile, where kn = [np], 0 < p < 1, where [x] denotes the largest integer not exceeding x, see David (1981). Most of the aforesaid applications of order statistics concern the order statistics with fixed and central ranks. However, intermediate order statistics also have many applications, e.g., in the theory of statistics, they can be used to estimate probabilities of future extreme observations and to estimate tail quantiles of the underlying distribution that are extremely relative to the available sample size, on which see Pickands (1975). Many authors, e.g., Teugels (1981) and Mason (1982) have also found estimates that are based, in part, on intermediate order statistics. 2.2 Extreme value theory under linear normalization One of the most important parts of the field of order statistics is asymptotic theory, which plays an important role in all applications of order statistics, because the exact distributions of order statistics are often analytically simple but numerically complicated. On the contrary, asymptotic distributions are often analytically complicated but numerically simple. Galambos (1987) prepared a textbook dealing primarily with the asymptotic theory of extreme order statistics, while the asymptotic theory of central and intermediate order statistics have been briefly discussed by Leadbetter et al. (1983). Let X1 , X2 , ... be a sequence of iid RVs. Then, much of “classical” extreme

2.2 Extreme value theory under linear normalization

21

value theory deals with the distribution of Xn:n = max(X1 , ..., Xn ) and especially with its properties, as n → ∞. The DF of Xn:n may be written exactly as P (Xn:n < x) = P (X1 < x, ..., Xn < x) = F n (x), where F denotes the common DF of each Xi ; however, there is nevertheless importance in obtaining asymptotic distributions, which are less dependent on the precise form of F, i.e., the relation of the form w

Fn:n (an x + bn ) = P (Xn:n < an x + bn ) −→ H(x). n

(2.1)

The central result of classical extreme value theory was discovered first by Fisher and Tippett (1928) and later it was derived in complete generality by Gnedenko (1943). It has shown that the possible non-degenerate DFs, which may occur as limits in (2.1), form precisely the class of max-stable distributions, which consist of three different types (commonly called the three extreme value distributions). Theorem 2.2 (Extremal type theorem) Let Xn:n = max(X1 , X2 , ..., Xn ), where Xi are iid RVs. If (2.1) holds, for some constants an > 0, bn and some non-degenerate DF H(x), then H(x) must have one and only one of the following three types (α > 0) Type I (Gumbel): H1;α (x) = H1 (x) = exp(−U1,α (x)) = exp(−e−x ), ∀x. Type II (Fr´echet): 

H2;α (x) = exp(−U2,α (x)) =

0, exp(−x−α ),

x ≤ 0, x > 0.

Type III (max-Weibull): 

H3;α (x) = exp(−U3,α (x)) =

exp(−(−x)α ), 1,

x ≤ 0, x > 0.

(2.2)

Conversely, each DF Hi;α , i ∈ {1, 2, 3} (U1,α (x) = U1 (x)) of extreme value type may appear as a limit in (2.1) and, in fact, appears when Hi;α itself is the DF of each Xi . The parameter α, appearing in the second and third types, is a shape parameter. The limiting results for minima can clearly be obtained from those for maxima. It is easily to show that Theorem 2.2 will also hold for minima with the possible limit types Li,α (x) = 1− Hi,α (−x), i = 1, 2, 3, α > ∗ (x) = U ∗ (x)), where 0 (U1,α 1

22

Asymptotic theory of order statistics: A historical retrospective

Type I (min-Gumbel): L1 (x) = 1 − exp(−U1∗ (x)) = 1 − exp(−ex ), ∀x. Type II (min-Fr´echet): L2;α (x) = 1 −



∗ exp(−U2,α (x))

=

Type III (Weibull):



∗ (x)) = L3;α (x) = 1 − exp(−U3,α

1 − exp(−(−x)−α ), 1,

0, 1 − exp(−xα ),

x ≤ 0, x > 0.

x ≤ 0, x > 0.

(2.3)

The third type is the well-known Weibull distribution largely used in problems of fatigue. For statistical purpose, it is inconvenient to work with the three limit laws Hi;α , i = 1, 2, 3, so it is preferable to adopt a parameterization, which unifies these laws. According to von Mises (1936) and Jenkinson (1955), we can merge the three standard distributions defined in (2.2) in one-parameter family of DFs. This family, which is known as the standard generalized extreme value distribution (GEVL) or as the extreme value Jenkinson-von Mises, is defined by 

Gγ (x) =

−1

exp[−(1 + γx) γ ], exp(− exp(−x)),

if if

γ= 0, γ = 0,

(2.4)

where 1 + γx > 0 and γ is the shape parameter known as the extreme value index (EVI). Actually, the EVI is the primary parameter of extreme events and it is mathematically related to the asymptotic behaviour of the right tail of the DF F. Moreover, with γ = 0, γ = α1 > 0 and γ = − α1 < 0, the GEVL Gγ (x) corresponds to the Gumbel, max-Weibull, and Fr´echet types, respectively, i.e., H1 (x) = G0 (x), H2,α (x) = G 1 (α(1 − x)), and α H3,α (x) = G− 1 (α(1 − x)), respectively. The location-scale parameter family α of the standard GEVL (2.4) can be introduced by replacing the argument x above by (x − μ/σ) for μ ∈ R, σ > 0, that is Gγ ( x−μ σ ) = Gγ (x; μ, σ) (this model is considered in Example 1.15, relation (1.1)). In this case, the model has three parameters, the location parameter (μ), the scale parameter (σ > 0), and the shape parameter (γ, which is the EVI). The GEVL provides an approach (which is known as the block maxima [BM] approach) for the extreme value analysis. Its application consists of partitioning a data

2.2 Extreme value theory under linear normalization

23

set into blocks of equal length, and fitting the GEVL distribution to the set of BM, which is usually taken as the annual maxima. This method is known in the literature as Gumbel’s approach. As for the sample maximum, we can unify the three types defined in (2.3) by 

G∗γ (x)

= 1 − Gγ (−x) =

−1

1 − exp[−(1 − γx) γ ], 1 − exp(− exp(x)),

if γ = 0, if γ = 0.

Moreover, the location-scale parameter family of the standard GEVL G∗γ (x) is given by G∗γ ( x−μ σ ), where −∞ < μ < ∞, σ > 0 and γ = 0. A straightforward proof of Theorem 2.2 is given in Leadbetter et al. (1983) and here we mention only the fact that the proof consists of two parts. The first part is to show that the class of limit laws H(x) in (2.1) is precisely the class of max-stable DFs. Specifically, a DF H(x) is called max-stable if for each n = 1, 2, ..., the DF H n (x) is of the same type as H(x) (under linear transformation). The second part is to identify the class of max-stable DFs with the types I, II, and III extreme value DFs. If the relation (2.1) holds for some sequences an > 0, bn , we shall say that F belongs to the (iid) domain of attraction (f or maxima) of H, under linear transformation, and write F ∈ D (H). It is of course important to know which (if any) of the three types of limit laws applies when {Xi } has a given DF F. Necessary and sufficient conditions are known, involving only the behaviour of the tail F = 1 − F (x), as x increases, for each possible limit in (2.2). These conditions will be discussed in the next section. The following almost trivially proved result is also used in “domain of attraction” determination. Theorem 2.3 (Leadbetter et al., 1983) Let {un }, n > 1, be a sequence of real numbers and let τ be such that 0 ≤ τ ≤ ∞. If X1 , X2 , ... are iid RVs with DF F (x), then −τ P (Xn:n < un ) −→ n e ,

(2.5)

n(1 − F (un )) −→ n τ.

(2.6)

if and only if

It may be noted that (2.1) is a special case of (2.5) using a linear parameterization, namely, by making the identifications τ = − log H(x) and un = an x + bn . Thus a necessary and sufficient condition for the limit

24

Asymptotic theory of order statistics: A historical retrospective

Hi;α (x), i ∈ {1, 2, 3}, is n(1 − F (an x + bn )) −→ n − log H i;α (x) = Ui,α (x), i ∈ {1, 2, 3}, for each x and some an > 0, bn , where Hi;α (x) is defined in (2.2). This explains the relevance of the tail 1 − F (x) for domain of attraction criteria. Now we present the parallel result for the asymptotic theory of minimum order statistics X1:n . The necessary and sufficient condition for the limit Li;α (x), i ∈ {1, 2, 3}, is ∗ nF (αn x + βn ) −→ − log(1 − Li;α (x)) = Ui,α , i ∈ {1, 2, 3}, n

for some suitably normalizing constants αn > 0, βn , where L1 , L2;α and L3;α are defined in (2.3). We now turn, in this brief tour of classical results, to other extreme order statistics. Let Xn−k+1:n be the kth upper extreme among the iid X1 , X2 , ..., Xn with a common DF F (x). Suppose that Xn:n has the limiting distribution H(x) = Hi;α (x), i ∈ {1, 2, 3}, as in (2.1). By identifying un = an x + bn , τ = − log H(x), it follows that (2.6) holds. Let Sn be the number of exceedances of un by X1 , X2 , ..., Xn , i.e., the number of i, 1 ≤ i ≤ n, such that Xi > un . Then, Sn is a binomial RV, with parameters (n, pn = 1 − F (un )). Moreover, npn −→ n τ, so that the DF of Sn has a Poisson limit with mean τ. The obvious equivalence of the events {Xn−k+1:n < un } and {Sn < k} leads directly to the relation w

1 − Γk (− log H i;α (x)) P (Xn−k+1:n < an x + bn ) −→ n = 1 − Γk (Ui,α (x)), i ∈ {1, 2, 3},

(2.7)

where Γk (.) is the incomplete gamma function. Thus, if the maximum Xn:n has the limiting distribution H(x), then the kth largest Xn−k+1:n has a  1 i limiting distribution Hk (x) = H(x) k−1 i=0 i! (− log H(x)) (with the same normalizing constants an , bn as the maximum itself). The same result for lower extremes can easily be obtained by the same argument. For example, if X1:n has the limiting distribution Li;α (x), i ∈ {1, 2, 3}, the corresponding relation as (2.7) will be ∗ P (Xk:n < αn x+βn ) −→ Γk (− log(1−Li;α (x))) = Γk (Ui,α (x)), i ∈ {1, 2, 3}, n w

with the same normalizing constants αn > 0, βn as the minimum itself. We end this section with an important asymptotic result concerning the extreme order statistics, which provides an essential tool in extreme value analysis. Actually, in general, we are not only interested in the maxima

2.2 Extreme value theory under linear normalization

25

of observations, but also in the behaviour of large observations which exceed a high threshold. Balkema and de Haan (1974) derived the limit distribution of scaled excesses over high thresholds. Specifically, the conditional DF F [u] (x + u) = P (X < x + u|X > u) may be approximated for large u (i.e., the threshold u is close to the right endpoint ρ) by a family “Wγ (x),” which is called the generalized Pareto distribution under linear normalization (GPDL), provided that the DF of BM converges weakly to the limit Gγ . Balkema and de Haan (1974) showed that there is a close relation between the GPDL and the generalized extreme value distribution under linear normalization (i.e., GEVL), namely 

−1

1 − (1 + γx) γ , 1 + γx > 0, x > 0, if γ = 0, 1 − exp(−x), x > 0, if γ = 0. (2.8) Notice that the GPDL nests the Pareto, uniform, and exponential distributions. Moreover, the GPDL is the only continuous DF such that for a certain choice of constants bu and au > 0, Wγ (x) = 1 + log Gγ (x) =

Wγ[u] (bu + au x) = Wγ (x) is again the exceedance DF at u. This property is known as the POT-stability of GPDL (cf. Balkema and de Haan, 1974). On the basis of the result of Balkema and de Haan (1974), Pickands (1975) proposed the second essential approach for analyzing the extreme value data, called the peak-over-threshold (POT) approach. In the POT approach, we shall deal with the right tail F (x) = 1 − F (x), for large x, i.e., we shall deal with top order observations. In the POT approach we use, instead of just annual maxima, several of the largest order observations in each year. Evidently, in the POT approach the whole data are used, in opposition to the case of the method of the block maxima. The steps to analyze the tail behaviour of extremes using the POT approach can be summarized as follows: • Selection of a suitable threshold, over which the GPDL is fitted. • Estimations of the EVI, as well as the location and the scale parameters of the GPDL. • Evaluating the goodness of fit. • Performing the inference of the extreme value model. Remark In the BM method, the choice of extreme value distributions (2.2) (i.e., the choice of (2.4) with γ > 0, or with γ < 0, or with γ = 0) is motivated by the following facts:

26

Asymptotic theory of order statistics: A historical retrospective

1. The extreme value distributions (2.2) are the only ones that can appear as the limit of linearly normalized maxima. 2. They are the only ones that are “max-stable,” i.e., such that a change of block size only leads to a change of location and scale parameters in the distribution. On the other hand, the POT method is supposed to have all observed values, which are larger than a particular suitable threshold. These values are then assumed to follow the GPDL family distribution defined by (2.8). The choice of GPDL is motivated by two characterizations: 1. The distribution of scale normalized exceedance over threshold asymptotically converges to a limit belonging to GPDL, if and only if the distribution of BM converges (as the block length tends to infinity) to one of extreme value distributions (2.2). 2. The distributions belonging to the GPDL are the only “stable” ones, i.e., the only ones for which the conditional distribution of an exceedance is the scale transformation of the original distribution.

2.3 Max-domains of attraction of univariate -max-stable laws It is of course important to know which (if any) of the limit types (2.2) applies when each RV has a given DF F. Various necessary and sufficient conditions are known for each limit type. We shall state these conditions in Theorems 2.5–2.8, omitting the proofs. Definition 2.4 A DF F is said to belong to the max-domain of attraction of a DF H under linear normalization if the norming constants an > 0 w and bn ∈ R exist; such that F n (an x + bn ) −→ H(x). In this case, we write n F ∈ D (H) and call H a max-stable DF under linear normalization or simply -max-stable DF. First, we give simple and useful sufficient conditions, due to von Mises (see de Haan, 1976), which can be applied when the DF F has a PDF f. Define the two notations (F ) = inf{x : F (x) > 0} and r(F ) = sup{x : F (x) < 1}, for the left and right endpoints of the DF F, respectively. Sometimes, we will use the abbreviations ρ = (F ) and ρ = r(F ) (e.g., as in Theorems 2.5 and 2.6) unless there is confusion due to the existence of two or more different DFs with different endpoints. Theorem 2.5 Suppose that the DF F is absolutely continuous with density

2.3 Max-domains of attraction of univariate -max-stable laws

27

function f. Then, sufficient conditions for F to belong to each of the three possible limit types (2.2) are: Type I: f has a negative derivative f  , for all x in the interval (ρ, ρ), ρ < ∞, f (x) = 0 for x ≥ ρ, and lim t↑ρ

f  (t)(1 − F (t)) = −1. f 2 (t)

Type II: f (x) > 0, for all x ≥ ρ, where ρ is finite, and lim

t→−∞

tf (t) = β > 0. 1 − F (t)

Type III: f (x) > 0, for all x in the finite interval (ρ, ρ), f (x) = 0 for x > ρ, and (ρ − t)f (t) lim = α > 0. t↑ρ 1 − F (t) In the case that F is an arbitrary DF, the necessary and sufficient conditions for F ∈ D (H), for each H of the three −max-stable laws defined in (2.2), are obtained in the next theorem. Theorem 2.6 (Galambos, 1987) The necessary and sufficient conditions for a DF F to belong to any one of the three types, which are defined in (2.2), are: Type I: There exists some strictly positive function g(t) such that lim t↑ρ

1 − F (t + xg(t)) = e−x , 1 − F (t) 

for all real x. It may be shown that 0∞ (1 − F (u))du < ∞ when the type I limit holds and one appropriate choice of g is given by g(t) =

 ρ 1 − F (u) t

1 − F (t)

du, for t < ρ.

1−F (tx) −α , α > 0, for each x > 0. 1−F (t) = x (ρ−xh) α limh↓0 1−F 1−F (ρ−h) = x , α > 0, for each x > 0.

Type II: ρ = ∞ and limt→∞ Type III: ρ < ∞ and

Determining the values of the constants an and bn is as important as claiming their existence. These constants, while they are not unique, depend on the type of H(x). Meanwhile, Khinchin’s type of theorem (Theorem 1.13) determines the limitations under which these constants can be varied. Convenient choices are indicated in the following result:

28

Asymptotic theory of order statistics: A historical retrospective

Corollary (2.1) [Leadbetter et al., 1983] The constants an and bn in (2.1) may be taken in each case above as: For type I: an = [g(γn )]−1 , bn = γn , with γn = F − (1 − n1 ) = inf{x : F (x) ≥ 1 − n1 }, where F − is the generalized inverse of F. For type II: an = γn −1 , bn = 0. For type III: an = (ρ − γn )−1 , bn = ρ. Remark When F is strictly monotone we always use the usual notation for the inverse function of F, i.e., F −1 (in this case clearly we have F − = F −1 ). An alternative (equivalent) necessary and sufficient condition, under which F ∈ D (Gγ ), is given by Castillo et al. (2014). Theorem 2.7 (Castillo et al., 2014) The necessary and sufficient conditions for any continuous DF F to belong to the max-domain of attraction of GEVL Gγ (x) is that F − (1 − e) − F − (1 − 2e) = 2−γ , e−→0 F − (1 − 2e) − F − (1 − 4e) lim

where e is the base of the natural logarithm. This implies that • if γ = 0, then F ∈ D (Gumbel); • if γ < 0, then F ∈ D (Fr´echet); • if γ > 0, then F ∈ D (max-Weibull). We can also easily state Castillo’s corresponding theorem for the minimum order statistics. Theorem 2.8 (Castillo et al., 2014) The necessary and sufficient conditions for any continuous DF F to belong to the min-domain of attraction of the GEVL (for the minimum order statistics) G∗γ (x) = 1 − Gγ (−x) is that F − (e) − F − (2e) = 2−γ . e−→0 F − (2e) − F − (4e) lim

This implies that: • if γ = 0, then F ∈ D (min-Gumbel); • if γ < 0, then F ∈ D (min-Fr´echet); • if γ > 0, then F ∈ D (Weibull). Table 2.1 shows the maximum and minimum limit laws of some selected common distributions.

Maxima

Types III

Type I

Types III

Type I

Type III

Distribution

max-Weibull DF

Weibull

uniform DF

Rayleigh DF

beta DF

Types III

Types III

Types III

Types III

Type I

Minima

Weibull Domains, γ < 0

logistic DF

exponential DF

normal DF

min-Gumbel DF

Gumbel DF

Distribution

Type I

Type I

Type I

Type I

Type I

Maxima

Types I

Types III

Type I

Type I

Type I

Minima

Gumbel Domains, γ = 0

log-normal DF

Cauchy DF

Pareto DF

min-Fr´ echet DF

Fr´ echet DF

Distribution

Types I

Types II

Types II

Type I

Types II

Maxima

Types I

Types II

Types III

Types II

Type I

Minima

Fr´ echet Domains, γ > 0

Table 2.1 Domains of attraction of the most common distributions

30

Asymptotic theory of order statistics: A historical retrospective

2.4 Limit theory of intermediate order statistics under linear normalization The limit theory of order statistics with variable rank {kn } (i.e., intermediate and central cases) has been studied by many authors, such as Smirnov (1952), Chibisov (1964), Wu (1966), and Balkema and de Haan (1987a, 1978b). Smirnov (1952) has shown that, for any nondecreasing variable rank {kn }, there exist constants an > 0, bn such that w

G(x), P (Xkn :n < an x + bn ) −→ n

(2.9)

for some DF G(x), if and only if nF (an x + bn ) − kn −→  n V (x), kn (1 − knn )

(2.10)

where V (.) is a nondecreasing right continuous and extended real function satisfying limx→−∞ V (x) = −∞, limx→∞ V (x) = +∞, and G(x) = Φ(V (x)), where Φ is the standard normal DF. As we have seen before in Section 2.1, the variable ranks are classified into intermediate (lower or upper) and central ranks. We will consider the intermediate order statistics in this section, while in the next section we will consider the central order statistics.

2.4.1 Asymptotic theory of intermediate order statistics If

kn n

−→ n 0 (i.e., lower intermediate), then (2.10) reduces to nF (an x + bn ) − kn −→ √ n V (x). kn

When the intermediate rank-sequence (Chibisov’s condition) 

lim ( kn+zn (ν) −

n→∞

{kn } 

kn ) =

satisfies the limit relation ανl , 2

(2.11)

for any sequence of integer values {zn (ν)}, for which zn1−(ν) α −→ n ν, where n 2 0 < α < 1, l > 0 and ν is any real number, Chibisov (1964) has proved that the only possibilities for G(x) in (2.9) are Type I: G1 (x) = Φ(V1 (x; β)) = Φ(x), ∀x,

2.4 Limit theory of intermediate order statistics under linear normalization 31

Type II:



G2;β (x) = Φ(V2 (x; β)) = Type III:

Φ(−β log |x|), 1,



G3;β (x) = Φ(V3 (x; β)) =

0, Φ(β log x),

x ≤ 0, x > 0,

x ≤ 0, x > 0,

(2.12)

where β is some positive constant depends only on the type of F (x) and the values of α and l. The corresponding possible non-degenerate limiting distributions for the upper intermediate term Xk:n (k = kn , knn −→ n 1) are 1 − Φ(Vi (−x; β)), i = 1, 2, 3 (note that V1 (x; β) = V1 (x) = x). As Chibisov l2 . Barakat and himself noted, the condition (2.11) implies that nknα −→ n Omar (2011b) showed that the latter condition implies Chibisov’s condition. Actually, the result of Barakat and Omar (2011b) reveals that the class of intermediate rank-sequences, which satisfy Chibisov’s condition, is a very wide class, and consequently, Chibisov limit types are widely applicable. However, the importance of these types was also emphasized when Wu (1966) generalized Chibisov’s result for any nondecreasing intermediate rank-sequence, where instead of Chibisov’s condition he assumed that lim inf n→∞

kn+1 − kn √ = 0. kn

(2.13)

The condition (2.13) is wider than the condition (2.11). Moreover, Wu (1966) proved that under the condition (2.13), the only possibilities of G(.) are the same types defined by (2.12). Some emendations and complements to the work of Wu (1966) were made by Barakat and Ramachandran (2001). The corresponding possible non-degenerate limiting distributions for the upper intermediate term Xrn :n , under linear normalization, are Ti;β (x) = 1 − Φ(Vi (−x; β)), i = 1, 2, 3 (also, note that T1;β = T1 ). Remark Note that T1;β = G1;β , T2;β = G3;β and T3;β = G2;β . Therefore, we have {Gi:β , i = 1, 2, 3} ≡ {Ti:β , i = 1, 2, 3} . After the types of limit distributions have been obtained, it seems to us that the fields of attraction should be considered. Some results in this direction were obtained by Chibisov (1964) and Smirnov (1967, 1970). However, these are highly dependent on the rank-sequence {kn }. For example, a class of rank-sequences {kn }, such that kn ∼ l2 nα (0 < α < 1), was studied by Chibisov (1964). If F is any DF, it is known that there is at most one

32

Asymptotic theory of order statistics: A historical retrospective

pair of (l, α) such that F (x) ∈ D(G1,β (x)) and the same statement holds for G2;β (x). In addition, there are rank-sequences such that only the normal law Φ(x) is a possible limit. In the next subsection, we will discuss the domain of attraction of the limit laws defined in (2.12) in more detail.

2.4.2 Domains of attraction of intermediate limit laws Chibisov (1964) derived three theorems for the domains of attraction of the intermediate rank corresponding to the three types defined in (2.12). These theorems are summarized as follows: Theorem 2.9 In order that a distribution F (x) belongs to the domain of attraction of the type G1 (x), it is necessary and sufficient that the sequence bn , defined as the smallest numbers for which F (bn ) ≤ nk ≤ F (bn + 0) satisfy bn+zn (ν) −bn ν bn+zn (μ) −bn = μ , for any sequences and zn1−(μ) α −→ n μ, respectively. n 2

the condition limn→∞ satisfy

zn (ν) α n1− 2

−→ ν n

zn (ν) and zn (μ)

Theorem 2.10 In order that a distribution F (x) belongs to the domain of attraction of the type G2;β (x) and kn satisfies (2.11), it is necessary and sufficient that 1. there exist an x0 such that F (x0 ) = 0 and F (x0 + ) > 0, for any  > 0, and −1 o )−F (xo +x) = l 1−α β log τ. 2. for any τ > 0, limx→0+ F (τ x+x 2−α F 2−2α (xo +x)

Remark In Chibisov’s (1964) original paper, there is a typo in limit relation (2.12), namely, it was written x → x0 (instead of x → 0+ ). Theorem 2.11 In order that a distribution F (x) belongs to the domain of attraction of the type G3;β (x) and kn satisfies (2.11), it is necessary and sufficient that lim

x→−∞

F (τ x) − F (x) F

2−α 2−2α

(x)

−1

= −l 1−α β log τ,

for every τ > 0. Remark In Chibisov’s (1964) original paper, there is a typo in the limit relation in Theorem 2.11, namely it was written x → ∞ (instead of x → −∞). Wu (1966) classified certain nondecreasing rank-sequences so that the rank-sequences in the same class may possess some common properties with

2.5 Central order statistics—domains of attraction of central limit laws

33

respect to the types of limit distributions and their domains of attraction. This classification is given in the following definition. 

Definition 2.12 (Wu, 1966) Two rank-sequences { knn } and { knn } are called √  asymptotically identical in rank if kn (1 − kknn ) is bounded. Provided that + ∞, kn is nondecreasing in n and knn −→ kn −→ n 0. n It is easy to verify that the relation defined in Definition 2.12 (asymptotically identical in rank) is an equivalent relation, hence it divides all the rank-sequences under consideration into equivalent classes. Wu (1966) derived this relevant theorem. Theorem 2.13 (Wu, 1966) For every rank-sequence in the same equivalent class, the domains of attraction of the same type of limit distributions are identical. 2.5 Central order statistics—domains of attraction of central limit laws In this section, we turn to the investigation of limiting distribution for sequences of central terms. Supposing as before that the ratio nk of the considered terms converges, as n increases, to a limit λ, 0 < λ < 1. This case of central ranks has been studied by Smirnov (1952). 2.5.1 Asymptotic theory of central order statistics First, we note that it is possible for two sequences {kn } and {kn } with  limn→∞ knn = limn→∞ knn to lead to different non-degenerate limiting DFs for Xkn :n and Xkn :n . Specifically, as shown by Smirnov (1952), we may have w

G(x) P (Xkn :n < an x + bn ) −→ n

(2.14)

and w

G (x), P (Xkn :n < an x + bn ) −→ n 

(2.15)

kn  where an , an > 0, knn −→ n λ and n −→ n λ , and at the same time G(x) and G (x) are non-degenerate DFs of different types. However, this is not possible, if √ kn (2.16) n( − λ) −→ n 0, n √ i.e., kn = λn + o( n) as shown in the next lemma.

34

Asymptotic theory of order statistics: A historical retrospective

Lemma 2.14 Suppose that (2.14) and (2.15) hold, where G(x) and G (x) are non-degenerate DFs and kn , kn both satisfy (2.16). Then, G (x) = G(ax + b), for some a > 0, b, i.e., G(x) and G (x) are of the same type. It turns out that (cf. Smirnov, 1952) for any sequence {kn } satisfying (2.16) just four forms of limiting distributions G(x) satisfying (2.14) are possible for Xkn :n . For completeness, we state this result in the next theorem. Theorem 2.15 The types of limiting distribution laws, which can have domains of normal λ−attraction (i.e., that satisfy (2.16)) are Φ(Wi,β ), i = 1, 2, 3, 4, where Type I: 

W1;β (x) = Type II:



W2;β (x) = Type III:



W3;β (x) =

−∞, cxβ ,

x ≤ 0, x > 0, c, β > 0,

−c | x |β , ∞,

x ≤ 0, x > 0, c, β > 0,

−c1 | x |β , c2 xβ ,

x ≤ 0, c1 > 0, x > 0, c2 , β > 0,

Type IV: W4;β (x) = W4 (x) =

⎧ ⎪ ⎨ −∞,

0, ⎪ ⎩ ∞,

x ≤ −1, − 1 < x ≤ 1, x > 1.

(2.17)

Remark It is worth mentioning that, whatever A and B are such that −∞ < A < B < ∞, the family of DFs, Φ(W4 (x; A, B)), has the same type (under the linear transformation) as Φ(W4 (x)), where W4 (x; A, B) = Proof

⎧ ⎪ ⎨ −∞,

0, ⎪ ⎩ ∞,

x ≤ A, A < x ≤ B, x > B.

To prove this fact it is sufficient to show that ∃ c > 0 and d ∈ R

2.5 Central order statistics—domains of attraction of central limit laws

35

such that W4 (cx + d; A, B) = W4 (x), where W4 (cx + d; A, B) =

⎧ ⎪ ⎨ −∞,

0, ⎪ ⎩ ∞, ⎧ ⎪ ⎨ −∞,

=

0,

⎪ ⎩ ∞,

cx + d ≤ A, A < cx + d ≤ B, cx + d > B x ≤ A−d c , < x ≤ B−d c , x > B−d . c

A−d c

= 1 and A−d Put B−d c c = −1. Then, solving c and d, we get d = A+B c = B − 2 = B−A 2 , which was to be proved.

A+B 2

and

If the restriction (2.16) is removed, the situation becomes more complicated, and the range of possible limit distributions is much larger. Namely: √ (i) When n( knn −λ) −→ n t, −∞ < t < ∞, Smirnov (1952) has shown that the only possible limit DFs of the normalized Xkn :n are Φ(Wi;β (x)+cλ t), i = 1, 2, 3, 4, where cλ = √ 1 and the functions Wi;λ (x), i = 1, 2, 3, 4 are deλ(1−λ)

fined in (2.17). √ ± ∞, −∞ < t < ∞. (ii) When n( knn − λ) −→ n −b X Wu (1966) has shown that all possible limit DFs of kna:nn n , an > 0 belong to the normal type or log-normal type. √ (iii) When n( knn − λ) is bounded but does not tend to a limit, Wu (1966) X −b has shown that the only possible type of the limit DF of kna:nn n , an > 0 is the normal one. Balkema and de Haan (1987a, 1978b) studied the limit DFs of the order statistic Xkn :n under more general conditions. They called any DF F prolific in λ ∈ [0, 1], if for every DF G, there exist sequences {kn } satisfying min(kn , n − kn ) −→ ∞ and normalizing constants an > 0, bn , such that n kn −→ λ, and n n w

IF (an x+bn ) (kn , n − kn + 1) −→ G(x). n

(2.18)

The following theorem, taken from Balkema and de Haan (1987a, 1978b), shows that the order statistics have an unruly behavior unless we impose a regularity condition on the sequence kn . Theorem 2.16 There are DFs that are prolific in each λ ∈ [0, 1]. Moreover, the set of such DFs is dense. Definition 2.17 The sequence {kn } is regular if 1 ≤ kn ≤ n, min(kn , n − √ √ kn ) −→ n ∞ and kn+1 − kn = o(min( kn , n − kn )).

36

Asymptotic theory of order statistics: A historical retrospective

Theorem 2.18 Assume that {kn } is regular. If so, then G in (2.17) can only have the possible form Φ(U (.)), where U (.) is defined by (2.12) or (2.17).

2.5.2 Domains of attraction of central limit laws We now turn to the exposition of conditions completely characterizing the domains of normal λ−attraction of the limit laws (types) found in (2.17). Theorem 2.19 (Smirnov, 1952) In order that a distribution F (x) belongs to the domain of normal λ−attraction of the type Φ(W1;β ), it is necessary and sufficient that there exists a value x0 such that 1. F (x0 + 0) = λ, while for each  > 0, F (x0 − ) > λ. F (x0 +x)−λ 2. λ−F (x0 −x) → 0, (x → +0). 3. For each τ > 0,

F (x0 +τ x)−λ F (x0 +x)−λ

→ τ β , (x → +0).

Theorem 2.20 (Smirnov, 1952) In order that a distribution F (x) belongs to the domain of Φ(W2;β ), it is necessary and sufficient that there exists a value x = x0 such that 1. F (x0 − 0) = λ, and also F (x0 − ) < λ when  > 0. (x0 −x) 2. λ−F F (x0 +x)−λ → 0, (x → +0). 3. For each τ > 0,

λ−F (x0 −τ x) λ−F (x0 −x)

→ τ β , (x → +0).

Theorem 2.21 (Smirnov, 1952) In order that a distribution F (x) belongs to the domain of Φ(W3;β ), it is necessary and sufficient that there can be found a point of continuity, x = x0 , of the DF F (x) such that 1. F (x0 ) = λ, and also for each  > 0 F (x0 − ) < F (x0 ) < F (x0 + ). 2. limx→+0

F (x0 +x)−F (x0 ) F (x0 )−F (x0 −x)

= A, where A is a positive real number.

3. For each τ > 0, limx→+0

F (x0 +τ x)−F (x0 ) F (x0 +x)−F (x0 )

= τ β.

Theorem 2.22 (Smirnov, 1952) In order that a distribution F (x) belongs to the domain Φ(W4 ), it is necessary and sufficient that one of the following conditions is satisfied 1. aλ < aλ ; 2. aλ = aλ = aλ , λ = F (aλ + 0) and F (aλ − 0) < λ; 3. aλ = aλ = aλ , λ = F (aλ − 0) and F (aλ + 0) > λ;

2.6 Asymptotic theory of extremes under nonlinear normalization

37

4. aλ = aλ = aλ , F (aλ − 0) = F (aλ + 0) = λ, so that aλ is a point of continuity (and a point of increase) of the DF F (x), where aλ = inf{x : F (x) > λ}, aλ = sup {x : F (x) < λ}. From Lemma 2.14, if the DF F (x) belongs to any domain of attraction of √ these types, which are defined in (2.17), and the condition n( knn −λ) −→ n 0 is satisfied, then we say that the DF F (x) belongs to the normal λ−domain of √ attraction of that type. On the other hand, if n( knn − λ) −→ n t, −∞ < t < ∞, t = 0, then we say that F (x) belongs to (λ, t)−domain of attraction of that given type, or in other words, that type has (λ, t)−domain of attraction. √ In particular under the condition n( knn − λ) −→ n t, −∞ < t < ∞, we get the following theorem, which was derived by Smirnov (1952). Theorem 2.23 In order that the non-degenerate law Φ(W(x)) can have a domain of (λ, t)−attraction, it is necessary and sufficient that the law , possesses a domain of normal λΦ(W(x) + tcλ ), where cλ = √ 1 λ(1−λ)

attraction. √ kn n( n − λ) is bounded but Theorem 2.24 (Wu, 1966) If knn −→ n λ and without limit, then the necessary and sufficient condition for F (x) to belong to the normal λ-domain of attraction is that there exists a real number xo such that, (i) F (xo ) = F (xo + 0) = λ, F (xo − ) < λ and F (xo + ) > λ, ∀  > 0, o +τ x)−λ (ii) limx→0+ FF(x (xo +x)−λ = τ, ∀τ > 0, (iii) limx→0+

F (xo +x)−λ λ−F (xo −x)

= 1.

2.6 Asymptotic theory of extremes under nonlinear normalization Previously, in Section 2.2, we have seen that under linear normalization a sequence of maxima of iid RVs converges in distribution to one of the three max-stable laws. Therefore, in view of the discussion in Subsection 1.1.2, the max-stable theory provides a sufficiently simple approximation to the distribution of the maxima; meanwhile, any nonlinear strictly monotone continuous transformations may achieve the same purpose. Specifically, the power transformation Gn (x) = bn |x|an S(x), an , bn > 0, where S(x) = sign(x) =

⎧ ⎪ ⎨ +1,

−1, ⎪ ⎩ 0,

x > 0, x < 0, x = 0,

38

Asymptotic theory of order statistics: A historical retrospective 1

x a with G−1 n (x) = | bn | n S(x), will serve for constructing a simplified approximation if only one can prove a suitable limit theorem. During the last two decades E. Pancheva and her collaborators developed the limit theory for extremes and extremal processes under nonlinear but monotone increasing normalizing mappings. In fact, any limit theorem for convergence of normalized maxima of iid RVs to a max-stable law H separates a subclass of DFs called max-domain of attraction of H, D(H). Thus, if we use a wider class of normalizing mappings than the linear ones, we get a wider class of limit laws, which can be used in solving approximation problems. Another reason for using nonlinear normalization concerns the problem of refining the accuracy of approximation in limit theorems using relatively non-difficult monotone mappings in certain cases that can achieve a better rate of convergence (see Barakat et al., 2010).

Clearly, the employment of a strictly monotone continuous transformation does not cause any wastage of information, which is contained in the data under consideration (e.g., the sufficiency property is preserved under one to one transformation). Nevertheless, we may lose some flexibility when using nonlinear normalization. For example, under linear normalization, all negative data can be transformed to positive numbers and vice versa, but this cannot be done under power normalization Gn (x) = bn |x|an S(x), an , bn > 0. It is important to emphasize that no one can claim that the employment of nonlinear normalization in general is preferable, but as Pancheva (1994) (and other many authors) showed in some cases of practical interest, not only is it better to use nonlinear transformation, but also we have to use it. For example, when Weinstein (1973) used real data (maxima are drawn from groups of n = 100 Gaussian samples) he observed that the nonlinear normalized maximum converges weakly much faster to the asymptote than the linear normalized maximum, and the tail probability estimates of a given accuracy can be formulated with a much smaller number of data. In this way the employment of the nonlinear normalization suggests a generalization that could be advantageous in certain estimation problems (for other important advantages of the employment, see Pancheva, 1994, p. 316). Pancheva (1984) considered the power normalization and derived all the possible limit DFs of Xn:n subjected to this normalization. These limit DFs are usually called the power max-stable DFs (p−max-stable DFs). Mohan and Ravi (1993) showed that the p−max-stable DFs (six p−types of DFs) attract more than linear stable DFs (see also Subramanya, 1994). Therefore, using the power normalization, we get a wider class of limit DFs which can be used in solving approximation problems. According to that, one can

2.6 Asymptotic theory of extremes under nonlinear normalization

39

essentially extend the field of applications of the extreme value model. A unified approach to the results of Mohan and Ravi (1993) and Subramanya (1994) has been obtained by Christoph and Falk (1996). 2.6.1 Characterization of the class of ML-laws and the GM A group We begin our discussion on the class of maximum limit laws (ML-laws) under general normalization by imposing the following question: Can any arbitrary continuous function serve as a normalizing function in the class of ML-laws? Our definite answer is no. Pancheva (1993) (see also Pancheva, 2010) emphasized that any normalizing function in the class of ML-laws should be strictly increasing continuous max-automorphism mapping. The next definition gives more details. Definition 2.25 The max-automorphism mapping is a function L : R −→ R, which preserves the max-operation, i.e., L(max(X, Y )) = max(L(X), L(Y )). According to the restrictions imposed on the mapping L (i.e., it should be a strictly increasing continuous function), it has an inverse L−1 . Moreover, the class of these mappings constitutes a group with respect to the composition “◦.” This group is denoted by GM A (clearly both linear and power transformations satisfy these conditions). On the other hand, the new norming mappings call for a new understanding of the notion type (F ) and a new formulation of the convergence to type theorem (Theorem 1.13). We call this new formulation of the convergence to type theorem the modification of Khinchin’s theorem. We say that a DF F1 belongs to type (F2 ) if ∃ g ∈ GM A such that F1 = F2 ◦ g. A convergence to type takes place if both w w F1 and Fn ◦Gn −→ F2 , where Gn ∈ GM A, imply the limit relations Fn −→ n n F2 ∈ type (F1 ). Pancheva (1993) showed that the compactness of the normalizing sequence {Gn (x)}n is necessary and sufficient for the convergence to type, i.e., the modification of Khinchin’s theorem is applicable. Clearly, this theorem is applicable for linear and power transformations. Therefore, in the sequel, we consider only the normalization function, which satisfies the previous conditions. 2.6.2 The class of ML-laws Let X1 , X2 , . . . , Xn be independent RVs taking on values in R with the corresponding DFs FXk (x) = P (Xk < xk ), k = 1, 2, . . . , n. Assume that

40

Asymptotic theory of order statistics: A historical retrospective

there exists a sequence {Gn (x)}n (where Gn ∈ GM A) such that w

H(x), P (Xn:n < Gn (x)) −→ n

(2.19)

where H(x) is a non-degenerate DF. The following assumptions are two foremost pillars in the theory of ML-laws. Assumption 1 (uniform assumption) The random sequence {Xn } is said to satisfy the uniform assumption (UA) with respect to the sequence {Gn (.)}n if sup [1 − FXk (Gn (x))] −→ n 0. k≤n

Assumption 2 The following implication 

lim

n→∞

n



[1 − FXk (Gn (x))] < ∞

k=1





lim

n→∞

mn



[1 − FXk (Gn (x))] < ∞

k=1

(2.20) holds, for each sequence of integers {mn }k such that mn < n, mn −→ n ∞ θ ∈ (0, 1). and mnn −→ n Clearly, if the RVs X1 , X2 , ..., Xn , ... are identical to a common DF F (x) and the relation (2.19) is satisfied, then Assumptions 1 and 2 are satisfied. Indeed, the relation (2.19) in this case is satisfied, if and only if n(1 − F (Gn (x))) −→ n −log H(x), which implies 1−F (Gn (x)) −→ n 0, ∀x ∈ R, x < ρ, and mn (1 − F (Gn (x))) ∼ θn(1 − F (Gn (x))) −→ − θ log H(x) < ∞. By n using Assumptions 1 and 2, Pancheva (1984) proved the next theorem. Theorem 2.26 (Pancheva, 1984) Let the sequence {Xn } satisfy the UA 1. Then, the relation (2.19) is true, if and only if n

u(x) = lim

n→∞

[1 − FXk (Gn (x))] < ∞.

k=1

Moreover, H(x) = exp[−u(x)]. Theorem 2.26 characterizes the ML-laws through the next definition. Definition 2.27 A non-degenerate DF H belongs to the class of ML-laws, if it is a weak limit of P (Xn:n < Gn (x)) under the UA 1 and the condition (2.20). Remark If the UA 1 is satisfied, then the condition (2.20) is equivalent to that lim P (Xmn :mn < Gn (x)) = lim P (max(X1 , X2 , . . . , Xmn ) < Gn (x))

n→∞

n→∞

exists and is a non-degenerate DF.

2.6 Asymptotic theory of extremes under nonlinear normalization

41

The condition (2.20) is essential for the DF H to belong to a class, which is called self-decomposable laws (see Definition 2.30), according to the next theorem. Theorem 2.28 Let the sequence {Xn } satisfy the relation (2.19) and Assumptions 1 and 2. Then, for every θ ∈ (0, 1), there exists a DF Hθ (x) and a function gθ (x) such that H(x) = H(gθ (x))Hθ (x),

(2.21)

where H(x) is the limit DF in (2.19) and gθ (x) is determined by the next lemma. Lemma 2.29

Under the conditions of Theorem 2.28, the limit gθ (x) = lim G−1 mn (Gn (x)) n→∞

(2.22)

exists and satisfies the following functional equation gt (gs (x)) = gts (x), t, s ∈ (0, 1),

(2.23)

at each continuity point x of H(x). Definition 2.30 Any class of DFs that are characterized by the functional equation (2.20), is called a class of a self-decomposable laws. If for each x, the function gθ (x), considered as a function of θ, is solvable (i.e., each equation of the form gθ (x) = t for given x and t has a unique solution θ = g¯(t, x)), then the solution of the functional equation (2.23) has the following form gθ (x) = h−1 [h(x) − log θ] ,

(2.24)

where h(.) is an invertible continuous function (see Pancheva, 1984, and the Subsection 1.5.3 of the comments on Pancheva’s work). We will consider only normalizing transformations, {Gn }, such that the limit function (2.22) is solvable with respect to θ. Remark The above made assumptions, regarding normalizing transformations, {Gn (x)}, and consequently the function gθ (x), were made by Pancheva (1984). These assumptions are automatically satisfied if we consider only the normalizing transformations belong to GM A, see Definition 2.25. Now, we can write H(gθ (x)) = Hoh−1 (h(x) − log θ) and we get the next lemma. Lemma 2.31 Under the conditions of Theorem 2.28, the function log (H ◦ h−1 )(x) is concave.

42

Asymptotic theory of order statistics: A historical retrospective

It is easy to prove that the function Hθ (x) in Theorem 2.28 is a DF, i.e., the limit H(x) is self-decomposable in the sense that it may be represented in the form (2.22). It turns out that the converse is also true. Theorem 2.32 (Pancheva, 1984) Let a non-degenerate DF H have the decomposition (2.22) for each θ ∈ (0, 1), where Hθ (x) is a DF and gθ (x) is a solution of 2.23. Then, H belongs to the class ML-laws. More precisely, there exist two sequences {Xn } of independent RVs and {Gn (x)}n ∈ GM A w H(x). satisfying Assumptions 1 and 2 such that P (Xn:n < Gn (x)) −→ n The last two theorems imply that the class ML-laws coincides with the class of self-decomposable laws, in the sense of (2.21). 2.6.3 The class of max-stable laws (MS-laws) Let X1 , X2 , . . . , Xn be independent identically distributed RVs with a non-degenerate common DF F (x) = P (Xj < x), for all j = 1, 2, . . . , n. Definition 2.33 A DF F is said to be max-stable if for each positive integer n there exists a strongly monotone continuous transformation Gn (x) ∈ GM A, such that P (Xn:n < Gn (x)) = F n (Gn (x)) = F (x). Suppose now that there exists a sequence {Gn (x)}n such that w

P (Xn:n < Gn (x)) = F n (Gn (x)) −→ H(x). n

(2.25)

This convergence means, as usual, that F belongs to the domain of attraction of the DF H (notation F ∈ D(H)). Obviously, Assumptions 1 and 2 in the last subsection are fulfilled in the case of iid RVs if (2.25) is assumed. The characteristic decomposition (2.21) of the limit distribution H is reduced to the following functional equation H(x) = H(gθ (x))H(g1−θ (x)). Now, Theorem 2.28 can be formulated in a more simple way. Theorem 2.34 (Pancheva, 1984) If the weak convergence (2.25) holds, then the limit distribution H has the form H(x) = exp[−e−h(x) ],

(2.26)

where h(x) is the invertible continuous function determined by (2.24) (see Section 2.7, comments on Pancheva’s work). Since H(x) is a DF, we have lim h(x) = ∞,

x→ρ

lim h(x) = −∞,

x→ρ

2.7 Comments on Pancheva’s work

43

where ρ = r(H) (the right endpoint of H) and ρ = (H) (the left endpoint of H). Corollary (2.2)

Each limit DF H of (2.24) is max-stable.

Actually, consider the independent copies X1 , X2 , . . . , Xn of a RV X with DF H(x) = exp[−e−h(x) ] and choose the normalizing transformation Gn (x) as follows: Gn (x) = h−1 [h(x) + log n]. Then, H(Gn (x)) = exp[− n1 e−h(x) ], that is H n (Gn (x)) = H(x). Here the tail of the DF H has the asymptotic behaviour e−h(x) = − log H(x) = − log[1 − (1 − H(x))] = [1 + O(1)][1 − H(x)], as x → ρ. Corollary (2.3) Each strongly monotone continuous DF F is max-stable. In fact, in this case we have h(x) = − log log

1 . F (x)

Pancheva (1984) considered a non max-stable DF F, which belongs to the domain of attraction of H, where H is a max-stable DF. The construction of the normalizing transformation Gn (.) and the asymptotic behaviour of the tail of F are given in the next theorem, due to Pancheva (1984) (for the correction of this theorem, see Theorem 2.36). Theorem 2.35

A non-degenerate DF F belongs to D(H) if and only if 1 − F (x) = [1 + o(1)]L(h(x))e−h(x) ,

as x → ρ (= r(H)), where L(x) is a regularly varying function (i.e., L(tx) L(x) → tρ , −∞ < ρ < ∞, as x → ∞, ∀t > 0, see de Haan, 1970). The normalizing transformations can be chosen as Gn (x) = h−1 {h(x) + log[nL(log n)]}. 2.7 Comments on Pancheva’s work In this section, some emendations are made for the seminal work of Pancheva (1984). Some of these emendations are due to Sreehari (2009) and the other are due to Barakat and Omar (2011a). Sreehari (2009) gave two examples to show that the necessary part of Pancheva’s result (Theorem 2.35) is incorrect. He gave the correct necessary and sufficient condition and demonstrated

44

Asymptotic theory of order statistics: A historical retrospective

its usefulness. He also introduced some notations that help in the derivation of Gn (x), which is given in Pancheva (1984). The corrected result, given by Sreehari (2009), is presented in the next theorem. Theorem 2.36 If a non-degenerate DF F ∈ D(H), where H is a maxstable DF, then there exists a sequence of positive functions {L∗ (x, n)} such that K(h(x) + log[nL∗ (x, n)]) −→ (2.27) n 1, L∗ (x, n) for x ∈ (ρ, ρ), where K(x) = [1 − F oh−1 (x)]ex . Conversely, if (2.27) holds for some strictly increasing continuous function h(x) and sequence of positive functions {L∗ (x, n)}, then F ∈ D(H), where H(x) = exp[− exp(−h(x))]. In this case Gn (x) can be chosen as h−1 [h(x) + log(nL∗ (x, n))]. There is confusion over the exact form of H given by Pancheva (1988) (also in Pancheva 1984, 1994). In two texts (1984, 1988), Pancheva proved that −h(x) H = e−e (see Theorem 2.34, Equation (2.24) and Corollary 2.2). How−βh ever, in Pancheva (1994), it was mentioned that H = H(h; β) = e−e , β > 0. This is a two-parameter (h, β) general max-stable law, h being a parametric function. In order to show from where this confusion came, at first we note that the solution (2.24) of the functional equation (2.23) does not exhaust all solutions of this equation. Namely, the complete class of solutions of the equation (2.23) is given by gθ (x) = h−1 (h(x) − μ log θ), μ > 0, θ ∈ (0, 1), x ∈ R,

(2.28)

gθ (x) = −1 ( (x) + μ ´ log θ), μ ´ > 0, θ ∈ (0, 1), x ∈ R,

(2.29)

gθ (x) = x, ∀θ ∈ (0, 1), x ∈ R,

(2.30)

see Sreehari (2009) and Barakat and Omar (2011a). Actually, Pancheva (1984) derived all possible non-degenerate limit types of the DF of the maximum order statistics, under general strongly monotone transformation Gn (x), which generates the functional equation (2.23), considering only the solution (2.28) with μ = 1, and ignoring the two other solutions (2.29) and (2.30). Fortunately, Barakat and Omar (2011a) showed that this incompletion in the proof of Pancheva’s result does not alter it, if we assume μ = 1.

2.8 Extreme value theory under power normalization

45

Moreover, when μ = 1 (as it should be in general), we can reduce the solution H = H(h; β) to the solution (2.26) as follows: gθ (x) = h−1 (h(x) − μ log θ) = h−1 (μ[μ−1 h(x) − log θ]) = h∗−1 (h∗ (x) − log θ), where h∗ (x) = μ−1 h(x). On the other hand, the use of any of the two representations H(x) and H(h; β) depends on the problem that we consider. More precisely, for nonparametric investigations, e.g., tail-and-quantile estimations, we simply put h(x) = − log (− log H(x)) and this function takes part in determining the extremal behaviour of the process, e.g., H(x) ∼ e−h(x) , x → ∞. But for parametric investigations we are obliged to take the representation H = H(h; β). However, in view of the modification of Khinchin’s type theorem, no one guarantees that the two functions h and h∗ belong to the same type. Therefore, in our opinion, the representation H = H(h; β) (or as in our notation −μh H = e−e ) should be used in all cases. It is worth noting that in the linear and power normalization cases there is no difference between the two representations. For example, for linear cases we have e−h = x−β , x > 0; or e−h = (−x)β , x < 0; or e−h = e−x , ∀x, which implies h(x) = β log x, x > 0; or h(x) = β log |x|, x < 0; or h(x) = x, ∀x, respectively. Clearly, h∗ has the same type as h, in all the above three cases under linear transformation.

2.8 Extreme value theory under power normalization Let Gn (x) = bn |x|an S(x), where an , bn > 0. In this case, Pancheva (1984) showed that gθ (x) = βθ |x|αθ S(x), where 

βθ = lim

n→∞

bn bm n



1 amn

,

αθ = lim

n→∞

an , amn

and

mn −→ n θ ∈ (0, 1). n

Moreover, there are two possibilities for the coefficients βθ and αθ , namely (i) αθ = 1 and βθ = θk

(ii) αθ = θm and βθ = exp[k(θm − 1)],

where m and k are constants. Finally, Pancheva (1984) derived the following max-stable limit distributions under the power normalization Gn (x) (with  i;β (x) = exp[−ui;β (x)], i = 1, 2, . . . , 6, β > 0, where β = β(m)) H 

u1;β (x) =

∞, x ≤ 1, (log x)−β , x > 1;

⎧ ⎪ ⎨ ∞,

u2;β (x) =

x ≤ 0, (− log x)β , 0 < x ≤ 1, ⎪ ⎩ 0, x > 1;

46

Asymptotic theory of order statistics: A historical retrospective ⎧ ⎪ x ≤ −1, ⎨ ∞,

u3;β (x) =

(− log(−x))−β , −1 < x ≤ 0, x > 0;

⎪ ⎩ 0,



u4;β (x) = 

u5 (x) =

(log(−x))β , x ≤ −1, 0, x > −1;

∞, x ≤ 0, 1 x , x > 0;



u6 (x) =

| x |, x ≤ 0, 0, x > 0,

(2.31)

and we adopt the notation ui;β (x) = ui (x), i = 5, 6. Notice that the first four types are examples of distributions, which are max-stable under power normalization, but they are not max-stable under linear normalization. The corresponding min-stable distributions can be easily written as  i;β = 1 − H  i;β (−x) L 



= 1 − exp −u∗i;β (x) = 1 − exp (−ui;β (−x)) , i = 1, 2, . . . , 6, β > 0. Clearly, under the power normalization the notion of type DF takes the following form: we say that two DFs H1 and H2 are of the same p-type if there exist A > 0 and B > 0 such that H1 (x) = H2 (A|x|B S(x)), ∀x. As in the case of nonlinear normalization, Nasri-Roudsari (1999) has summarized these types by the following von Mises-type representations: 

P1;γ (x; a, b) = exp −(1 + γ log axb ) and



− γ1



P2;γ (x; a, b) = exp −(1 − γ log a(−x)b )

, x > 0, 1 + γ log axb > 0, (2.32)

− γ1



, x < 0, 1 − γ log a(−x)b > 0. (2.33) Both families (2.32) and (2.33) are called generalized extreme value distribution under power normalization (GEVP). Moreover, it can be shown that (see Barakat et al., 2013a) the DFs (2.32) and (2.33) are the only ones satisfying the p−max-stable property, i.e., for every n there exist power normalizing n (C (x); a, b) = P (x; a, b), t = constants an , bn > 0, for which we have Pt;γ n t;γ 1, 2. This shows that both GEVP (2.32) and (2.33) satisfy the p−max-stable property (i.e., Pt;γ ∈ Dp (Pt;γ )). Barakat et al. (2013a) incorporated (2.32) and (2.33) into a unified formula and showed (see also Barakat et al., 2014a and Barakat et al., 2015a) that the two parametric models enable us to apply the BM method under power normalization. Moreover, an estimator for the shape parameter was proposed within the model GEVP. This estimator corresponds to the Dubey estimate in the GEVL model.

2.9 Max-domains of attraction of univariate p-max-stable laws

47

Barakat et al. (2013a) (see also Barakat et al., 2014a, 2015a) derived the generalized Pareto distribution under power normalization (GPDP) for each of the models (2.32) and (2.33), respectively, by Q1;γ (x; b) = 1 + log P1;γ (x; 1, b), Q2;γ (x; b) = 1 + log P2;γ (x; 1, b).

(2.34)

It can be shown that (cf. Theorem 2.2, Barakat et al., 2013a) each of the GPDPs satisfies the POT stability property, i.e., the left truncated GPDP yields again a GPDP. In Chapter 5, we will give a simple proof of the two models defined in (2.34), as well as the proof of their POT stability property. We end this section with an extension of Pancheva’s (1984) result to extreme order statistics under power normalization. Theorem 2.37 (Barakat and Nigm, 2002) The DF of the normalized kth 1 X | an S(Xn−k+1:n ) converges weakly upper extreme order statistic | n−k+1:n bn  k (x), where an > 0 and bn > 0 are suitable norto a non-degenerate DF H malizing constants, if and only if n(1 − F (bn | x |an S(x))) −→ n ui,β (x), i ∈  k (x) = 1 − Γk (ui,β (x)). {1, 2, ..., 6}, where ui,β ’s are defined in (2.31) and H The corresponding result for the lower extreme order statistics is given by the following theorem. Theorem 2.38 The DF of the normalized kth lower extreme order statis1  k (x), | αn S(Xk:n ) converges weakly to a non-degenerate DF L tic | Xβk:n n where αn > 0 and βn > 0 are suitable normalizing constants, if and only if   nF (βn | x |αn S(x)) −→ n ui,β (x), i ∈ {1, 2, ..., 6}, where ui,β (x) = ui,β (−x)   and Lk (x) = Γk (ui,β (x)). 2.9 Max-domains of attraction of univariate p-max-stable laws Necessary and sufficient conditions for DFs to belong to the max-domains of attraction of p-max-stable laws have been given in Mohan and Ravi (1993). The work of Mohan and Ravi (1993) extends the results given in Galambos (1987), which concern linear normalization, to p−max-stable laws. Moreover, it has been shown that every DF attracted to −max-stable law is necessarily attracted to some p−max-stable and that p−max-stable laws in fact attract more. The results of Mohan and Ravi (1993) are summarized in the next definition and the next theorem. Definition 2.39 A DF F is said to belong to the max-domain of attraction  under power normalization if there exist norming constants of a DF H

48

Asymptotic theory of order statistics: A historical retrospective

 an > 0 and bn > 0, such that F n (bn |x|an S(x)) −→ H(x). In this case, n  where H  is called a max-stable DF under power we write F ∈ Dp (H), normalization or simply p−max-stable DF. w

Theorem 2.40

Let F be a given DF with right endpoint ρ = r(F ). Then,

 1,β (x)), if and only if (1) F ∈ Dp (H

ρ = ∞,

and

lim

t→∞

1 − F (exp[ty]) = y −β , 1 − F (exp[t])

y > 0.

In this case, we may set an = log F − (1 − 1/n) and bn = 1, where F − is the generalized inverse of the DF F, i.e., F − (y) = inf{x : F (x) > y}.  2,β (x)), if and only if (2) F ∈ Dp (H 0 < ρ < ∞,

and

lim

t→∞

1 − F (ρ exp [−y/t]) = yβ , 1 − F (ρ exp [−1/t])

y > 0.

Here we may choose an = log[ρ/F − (1 − 1/n)] and bn = ρ.  3,β (x)), if and only if (3) F ∈ Dp (H ρ = 0,

and

lim

t→∞

1 − F (− exp[−ty]) = y −β , 1 − F (− exp[−t])

y > 0.

The normalizing constants can be chosen as an = − log[−F − (1 − 1/n)] and bn = 1.  4,β (x)), if and only if (4) F ∈ Dp (H ρ < 0,

and

lim

t→∞

1 − F (ρ exp[y/t]) = yβ , 1 − F (ρ exp[1/t])

y > 0.

In this case a choice for the norming constants is an = log[F − (1 − 1/n)/ρ] and bn = −ρ.  5,β (x)), if and only if (5) F ∈ Dp (H (i) ρ > 0,

and

(ii) lim t↑ρ

1 − F (t exp[yf (t)]) = e−y , 1 − F (t)

for some positive valued function f. If (ii) holds for some function  f, then aρ [(1 − F (t))/x]dx < ∞ for 0 < a < ρ and (ii) holds with the  choice f (t) = [1 − F (t)]−1 tρ [(1 − F (t))/x]dx. Then, the normalizing constants here may be chosen as an = f (bn ) and bn = F − (1 − 1/n).  6,β (x)), if and only if (6) F ∈ Dp (H (i)

ρ > 0,

and

(ii)

lim t↑ρ

1 − F (t exp[yf (t)]) = ey , 1 − F (t)

2.10 Comparison between domains of attraction

49

for some positive valued function f. If (ii) holds for some function ρ f, then < ∞ for a < ρ and (ii) holds with a [(1 − F (t))/x]dx  f (t) = −[1 − F (t)]−1 tρ [(1 − F (t))/x]dx. The normalizing constants in this case may be chosen as an = f (−bn ) and bn = −F − (1 − 1/n).

2.10 Comparison between domains of attraction In the following theorem, the three max-domains of attraction under linear normalization are compared with the six max-domains of attraction under power normalization. In order to distinguish between the max-domain of attraction under linear normalization and power normalization it is suitable now and in what follows to use the notation F ∈ D (H) to indicate that F belongs to the max-domain of attraction under linear normalization. Theorem 2.41 (Mohan and Ravi, 1993) the following implications:

Let F be a DF. If so, we have

 5;β ); (1) F ∈ D (H1;β ) or F ∈ D (H3 ), ρ = ∞ =⇒ F ∈ Dp (H  5;β ), ρ < ∞; (2) F ∈ D (H3 ), 0 < ρ < ∞ ⇐⇒ F ∈ Dp (H  6;β ), ρ < 0; (3) F ∈ D (H3 ), ρ < 0 ⇐⇒ F ∈ Dp (H  6;β ); (4) F ∈ D (H2;β ) or F ∈ D (H1;β ), ρ = 0 =⇒ F ∈ Dp (H  2;β ); (5) F ∈ D (H2;β ), ρ > 0 ⇐⇒ F ∈ Dp (H  4;β ). (6) F ∈ D (H2;β ), ρ < 0 ⇐⇒ F ∈ Dp (H

The right endpoint ρ ∈ (−∞, ∞] of the DF F plays a crucial role, namely  i;β ), i = the upper-tail behaviour of F might determine whether F ∈ Dp (H  i;β ), i = 3, 4, 6. In the first case, the right endpoint ρ has 1, 2, 5, or F ∈ Dp (H to be positive, while for the second case necessarily ρ ≤ 0. Moreover, ρ > 0 is linked to the max-stable distributions H (under linear normalization) and ρ ≤ 0 to the min-stable distributions L (under linear normalization). This explains the number of six types of p−max-stable DFs. Furthermore, if ρ < ∞ is not a point of continuity of F and P (X1 = ρ) = ρ > 0, then F ∈  for any non-degenerate DF H.  In this case, P (Xn:n = ρ) = 1 − Dp (H), n −→ P {Xn:n < ρ} = 1 − (1 − ρ) n 1, and 

F n (bn | x |an S(x))

≤ (1 − ρ)n , = 1,

if bn | x |an S(x) < ρ, if bn | x |an S(x) ≥ ρ.

 has to be degenerate. The following result was Hence, the limiting DF H obtained by Christoph and Falk (1996).

50

Asymptotic theory of order statistics: A historical retrospective

Theorem 2.42 (i) Suppose that ρ > 0. Put F ∗ (x) = 0, if x ≤ min{log( ρ2 ), 0} and F ∗ (x) = F (exp(x)), elsewhere. Then,  for some non-degenerate H}  ⇐⇒ {F ∗ ∈ Dmax (ξ(x))}, {F ∈ Dp (H),  with H(x) = ξ((log(x) − b)/a), x > 0, for some a > 0 and b ∈ R. (ii) Suppose that ρ ≤ 0. Put F∗ (x) = 1 − F (− exp(x)). Then,  for some non-degenerate H}  ⇐⇒ {F∗ ∈ Dmin (η)}, {F ∈ Dp (H),  with H(x) = 1 − η((log(−x) − b)/a), x < 0, for some a > 0, b ∈ R, where Dmax and Dmin denote respectively the max and min domains of attraction under linear normalization.

2.11 Asymptotic central order statistics under nonlinear normalization In this section, Pancheva’s (1984) work on extreme order statistics under nonlinear normalization is extended to central order statistics. Two unexpected results are given. The first result is that under nonlinear normalization, the non-degenerate type (actually the family of types) of the DFs with two finite growth points is a weak limit of any central order statistic with a regular rank-sequence. The second result is that the possible non-degenerate weak limits of any central order statistic with regular rank under traditionally linear normalization and power normalization are the same. Some examples and comparisons between the domains of attraction of the weak limits under linear and power normalization cases are presented. In Section 2.5, we consider the central order statistic Xkn :n , with variable √ rank-sequence {kn }, which satisfies the condition n( knn − λ) −→ n t ∈ R, λ ∈ (0, 1). We begin with the normal λ−attraction case, in which t = 0 (the results when t = 0 are then simply followed). Let Gn (x) ∈ GM A and G−1 n be the inverse function of Gn . In the normal λ−attraction case, we have (cf. Smirnov, 1952, Leadbetter et al., 1983, p. 46–47) w

Ψ(x), Fkn :n (Gn (x)) = IF (Gn (x)) (kn , n − kn + 1) −→ n if and only if





n

F (Gn (x))−λ √ λ(1−λ)



(2.35)

−1 −→ n Φ (Ψ(x)). Moreover, we have the

following result. Lemma 2.43 (Barakat and Omar, 2011a) Let the relation (2.35) be satisfied with a non-degenerate limit DF Ψ(x). Then, for each sequence of integers

2.11 Asymptotic central order statistics under nonlinear normalization

51

−→ {mn } such that mn < n, mn −→ n θ ∈ (0, 1), we have n ∞ and √ −1 θ Φ ((Ψ(x))) = Φ−1 (Ψ(gθ (x))), (2.36) mn n

where gθ (x) = limn→∞ G−1 mn ◦ Gn (x) exists and satisfies the functional equation (2.23). Smirnov (1952) solved the functional equation (2.36), when Gn (x) = an x + bn , an > 0 and bn ∈ R. This functional equation provides the limit types (2.17) for the DF of Xkn :n , where each of these limit types has a domain of normal λ-attraction, see Subsection 2.5.2. In the next subsection, we introduce another functional equation, which characterizes the possible non-degenerate limit DFs of Xkn :n under strictly increasing continuous transformations, {Gn (x)} ∈ GM A. The general solution of the functional equation (2.23) is given by (2.28)–(2.30) (see Section 2.7).

2.11.1 The class of weak limits of central order statistics under general normalization We begin this section by deriving a functional equation, due to Barakat and Omar (2011a), which characterizes the possible non-degenerate limit DFs of Xkn :n under general strictly increasing continuous transformations, {Gn (x)}. Let the relation (2.35) be satisfied with a non-degenerate DF Ψ and the transformation Gn (x) ∈ GM A. Then, we have 

mn

F (Gn (x)) − λ  λ(1 − λ) 

(n − mn ) and



n

2 −1 −→ ◦ Ψ(gθ (x)))2 , n (Φ

F (Gn (x)) − λ  λ(1 − λ)

F (Gn (x)) − λ  λ(1 − λ)

2



−→ n

2

−→ n



Φ−1 ◦ Ψ(g1−θ (x))

Φ−1 ◦ Ψ(x)

2

2

.

Now, adding the first two above relations and comparing the resulting sum with the third one, we get (Φ−1 ◦ Ψ(gθ (x)))2 + (Φ−1 ◦ Ψ(g1−θ (x)))2 = (Φ−1 ◦ Ψ(x))2 . Putting J(x) = Φ−1 ◦ Ψ(x), we get the functional equation J 2 (gθ (x)) + J 2 (g1−θ (x)) = J 2 (x).

(2.37)

52

Asymptotic theory of order statistics: A historical retrospective

Since both Φ and Ψ are nondecreasing functions in x, then J(x) is also nondecreasing in x. Moreover, if ρ = sup{x : Ψ(x) < 1} and ρ = inf{x : Ψ(x) > 0} we have J(ρ) = ∞ and J(ρ) = −∞. On the other hand, the transformation Gn (x) is monotone and continuous and consequently gθ (x) is nondecreasing in x, because it is the limit of the nondecreasing function of x (Fkmn :mn (Gn (x))). Finally, from (2.28)–(2.30) and the fact that 0 < θ < 1 and μ, μ ´ > 0 (i.e., μ log θ and μ ´ log θ < 0, which implies h − μ log θ > h and +μ ´ log θ < , respectively) we have ⎧ −1 ⎪ ⎨ h (h(x) − μ log θ) > x, or

gθ (x) =

´ log θ) < x, or −1 ( (x) + μ

⎪ ⎩ x, ∀θ ∈ (0, 1).

Therefore, gθ (x) ≥ (or ≤)x, ∀θ ∈ (0, 1), x ∈ R. In particular gθ (0) ≥ (≤)0. The next lemma describes the solution of the functional equation (2.37). Lemma 2.44 (Barakat and Omar, 2011a) Let the functional equation (2.37) be satisfied. Then, gθ (ρ) = ρ and gθ (ρ) = ρ, ∀θ ∈ (0, 1). The next lemma, due to Barakat and Omar (2011a), reveals an interesting fact that the DF with two finite growth points is always a possible non-degenerate limit type of the central order statistics under any general monotone continuous transformation Gn (x) ∈ GM A. Lemma 2.45 (Barakat and Omar, 2011a) Under any general strictly increasing continuous transformation Gn (x), the non-degenerate DF Ψ(0) (x) =

⎧ ⎪ ⎨ 0,

1 2,

⎪ ⎩ 1,

x ≤ ρ, ρ < x ≤ ρ, x > ρ,

where −∞ < ρ < ρ < ∞, is a possible limit type of the central order statistic Xkn :n . It is worth mentioning that the type Ψ(0) (x), under linear transformation, represents one type, for all −∞ < ρ < ρ < ∞ (see the remark in Subsection 2.5.1). Thus, we can choose ρ = −1 and ρ = 1, and this type will be denoted by Ψ(0) (x). On the other hand, Barakat and Omar (2011a) showed that the DF Ψ(0) (x) is distinguished into six types Ψ(01) (x); ...; Ψ(06) (x), according to ρ < ρ < 0; 0 < ρ < ρ; ρ < 0 < ρ, | ρ | = ρ; | ρ |= ρ; 0 = ρ < ρ and 0 = ρ > ρ, respectively. The next two theorems give a general result concerning the non-degenerate limit types, for central order statistics under general normalization, Gn (x) ∈ GM A.

2.11 Asymptotic central order statistics under nonlinear normalization



53

− Theorem 2.46 (Barakat and Omar, 2011a) Let kn −→ n ∞, 0 and G (x) be any strictly increasing continuous transformation λ) −→ n n for which (2.35) is satisfied and Gn (x) ∈ GM A. Then, the possible nondegenerate types of Fk :n (Gn (x)) are n

Ψ(0) (x) =

⎧ ⎪ ⎨ 0,

1 2,

⎪ ⎩ 1,

n( knn

x ≤ ρ, ρ < x ≤ ρ, x > ρ,

where −∞ < ρ < ρ < ∞, gθ (ρ) = ρ, gθ (ρ) = ρ; Ψ(1) (x) =

⎧ ⎪ ⎨ 0,

Φ(c1 e ⎪ ⎩

x ≤ x01 , (x) 2μ ´

1,

), x01 < x ≤ ρ, x > ρ,

where gθ (x01 ) = x01 > −∞ ( (x01 ) = −∞) and gθ (x) < x, ∀x > x01 . Moreover, gθ (ρ) = ρ ≤ ∞ ( (ρ) = ∞); Ψ(2) (x) =

⎧ ⎪ ⎨ 0,

x ≤ ρ,

1 − Φ(c2 e ⎪ ⎩



1,

h(x) 2μ

), ρ < x ≤ x02 , x > x02 ,

where gθ (x02 ) = x02 < ∞, (h(x02 ) = ∞) and gθ (x) > x, ∀x < x02 . Moreover, gθ (ρ) = ρ ≥ −∞ (h(ρ) = −∞), and finally,

Ψ(3) (x) =

⎧ 0, ⎪ ⎪ ⎪ ⎪ ⎨ 1 − Φ(c e− h(x) 2μ ), 2

(x) ⎪ ⎪ Φ(c1 e 2μ´ ), ⎪ ⎪ ⎩

1,

x ≤ ρ, ρ < x ≤ x03 , x03 < x ≤ ρ, x > ρ,

where gθ (x03 ) = x03 , gθ (ρ) = ρ, gθ (ρ) = ρ and −∞ ≤ ρ < x03 < ρ ≤ ∞. Moreover, gθ (x) > x, ∀x < x03 and gθ (x) < x, ∀ x > x03 . Remark Apart from gθ (x) ≡ x, ∀x, in order to get non-degenerate limit types, the roots of the equation gθ (x) = x can be 1. only, two roots ρ = x01 > −∞ and ρ ≤ ∞ (see Diagram 1), or 2. only, two roots ρ = x02 < ∞ and ρ ≥ −∞ (see Diagram 2), or 3. only, three roots ρ, ρ and x03 such that −∞ ≤ ρ < x03 < ρ ≤ ∞ (see Diagram 3). From 1–3 we can see that in order to get non-degenerate limit types, apart from Ψ(0) (x), it is necessary for there to exist at least a finite root of the equation gθ (x) = x (x01 or x02 or x03 ) and at most three roots.

54

Asymptotic theory of order statistics: A historical retrospective

Diagram 1

Diagram 2



ρ=x01 >−∞

ρ≤∞



 

 y=gθ (x) 

y=x

Diagram 3

ρ=x02 0. In this case Barakat and Omar (2011a) stated the following interesting theorem. √ kn Theorem 2.48 Let kn −→ n( n − λ) −→ n ∞, n 0 and Gn (x) ∈ GM A be any power transformation, i.e., Gn (x) = bn | x |an S(x), an , bn > 0. If so, the possible non-degenerate types of Fkn :n (Gn (x)) are Ψ(0i) (x), Ψ(02) (x), ..., Ψ(06) (x) and 

Ψ

(1)

(x) = 

Ψ(2) (x) = 

Ψ

(3)

(x) =

0, x ≤ 0, Φ(x), x > 0, 1, x > 0, Φ(− | x |), x ≤ 0,

Φ(−c2 | x |), x ≤ 0, x > 0. Φ(c1 x),

The type Ψ(3) (x) represents a family of the two types c1 = c2 and c1 = c2 = 1. Remark Theorem 2.48 states that the possible non-degenerate limit types for the central order statistics under the traditionally linear normalization

2.11 Asymptotic central order statistics under nonlinear normalization

55

and under the power normalization are the same. Although this fact is interesting, however, it should not surprise us. This is because in the extreme case where the lower and the upper extremes are sharply distinguished, Christoph and Falk (1996) (Theorem 1.5.8) showed that the upper, as well as the lower, tail behaviour of the original DF F might determine whether the DF F belongs to one or other of the six possible power limit types. However, on the one hand, neither the lower nor the upper tail behaviour has any influence on the weak convergence of the central order statistics. But on the other hand, there is no sharp distinction between the lower and upper central order statistics. Theorem 2.48 prompts us to make comparisons between the domains of attraction of each (as well as the speed of the convergence to each) of these possible limit types under linear and power normalizing constants. For the comparison between the rates of convergence of extremes under linear and power normalizing constants, see Barakat et al. (2010). 2.11.3 Examples The following examples, besides being illustrative, show that the domains of attraction of the four possible limits for central order statistics under power normalization are not empty. Example 2.49

Consider the DF ⎧ ⎪ ⎨ 0,

F (x) =

⎪ ⎩

1 2, 1 2 (1

x ≤ −1, −1 < x ≤ 1, − x1 + e ), x > 1.

−1 > 0. Then, Let λ = 12 . Furthermore, let an = 1 and bn −→ n ρ ⎧ √ x ≤ − b1n , ⎪ − n, √ F (bn |x|an S(x)) − λ ⎨ 1  0, − bn < x ≤ b1n , n = ⎪ λ(1 − λ) ⎩ √ 1 n exp (− bn x ), x > b1n

−→ n

⎧ ⎪ ⎨ −∞,

0, ⎪ ⎩ ∞,

x ≤ −ρ, −ρ < x ≤ ρ, x > ρ.

√ w an Therefore, ∀kn such that n ( knn − 12 ) −→ n 0, we get Fkn :n (bn |x| S(x)) −→ n −1 and Ψ(04) (x). Clearly, if we consider the linear normalization cn −→ n ρ dn = 0, we get w

Fkn :n (cn x + dn ) −→ Ψ(04) (x). n

56

Asymptotic theory of order statistics: A historical retrospective

Example 2.50

Consider the DF ⎧ ⎪ ⎨ 0,

F (x) =

1 2 (x

⎪ ⎩ 1,

x ≤ 0, + 1), 0 < x ≤ 1, x > 1.

Let λ = 12 . Furthermore, let an = 1 and bn =

√1 . n

Then,

⎧ √ ⎪ − n, x ≤ 0, √ F (bn |x|an S(x)) − λ ⎨ √  n = x, 0 < x ≤ n, √ √ ⎪ λ(1 − λ) ⎩ n, x> n



−→ n

−∞, x ≤ 0, x, x > 0.

√ w an Therefore, ∀kn such that n( knn − 12 ) −→ n 0, we get Fkn :n (bn |x| S(x)) −→ n 1 (1) Ψ (x). Clearly, if we consider the linear normalization cn = √n and dn = 0, we get w

(1)

Fkn :n (cn x + dn ) −→ Φ1 (x) = Ψ(1) (x). n Example 2.51

Consider the DF ⎧ ⎪ ⎨ 0,

F (x) =

1 2 (x

⎪ ⎩ 1,

x ≤ −1, + 1), −1 < x ≤ 0, x > 0.

Let λ = 21 . Furthermore, let an = 1 and bn =

√1 . n

Then,

⎧ √ √ ⎪ − n, x ≤ − n, √ F (bn |x|an S(x)) − λ ⎨ √  n = −|x|, − n < x ≤ 0, ⎪ λ(1 − λ) ⎩ √n, x>0



−→ n

−|x|, x ≤ 0, ∞, x > 0.

√ w an Therefore, ∀ kn such that n( knn − 12 ) −→ n 0, we get Fkn :n (bn |x| S(x)) −→ n w (2) (x) = Ψ(2) (x), where c = √1 and Ψ(2) (x). Clearly, Fkn :n (cn x+dn ) −→ Φ n 1 n n dn = 0. Example 2.52 Let F (x) = 12 (x + 1), −1 ≤ x ≤ 1. Furthermore, let λ = 12 , √ √ √ |x|an S(x))−λ = x, − n ≤ x ≤ n, bn = √1n and an = 1. Therefore, n F (bn√ λ(1−λ)

which implies that w

Fkn :n (bn |x|an S(x)) −→ Φ(x). n w

Clearly, Fkn :n (cn x + dn ) −→ Φ(x), where cn = n

√1 n

and dn = 0.

2.12 Asymptotic intermediate order statistics under nonlinear normalization 57

2.11.4 Comparisons between the domains of attraction of weak limits of central order statistics under linear and power normalizing constants In all the preceding examples, we noted that the DF of the central order statistic under linear and power normalizing constants converges weakly to the same limit DF. The following theorem shows that this remark is not true in general.

Theorem 2.53 (Barakat and Omar, 2011a) Under the linear normalization, let the DF F belong to the domain of normal λ−attraction of the (3) limit type Φβ (x), such that F −1 (λ) = 0. Then, the domains of normal λ−attraction of the limit types Ψ(1) (x), Ψ(2) (x) and Ψ(3) (x), under the power normalization, do not contain the DF F. Remark Example 2.52 shows that Theorem 2.53 is not true if F −1 (λ) = 0. Moreover, the theorem shows that if any DF F belongs to the domain of normal λ−attraction of the standard normal DF, where F −1 (λ) = 0 (as happens in many cases, e.g., when F is absolutely continuous with finite positive probability density function at F −1 (λ)), then, under the power normalization, the domains of normal λ−attraction of the three limit types Ψ(1) (x), Ψ(2) (x), and Ψ(3) (x) do not contain the DF F. Clearly, the uniform DF over (0, 1) satisfies these conditions. Remark As we have seen, in contrast to the case of extreme order statistics, power normalization and linear normalization lead to the same families of limit DFs. But even the question of the existence of nonlinear normalization within a larger domain of attraction is still open for central order statistics.

2.12 Asymptotic intermediate order statistics under nonlinear normalization We begin this section (in the next subsection) with some of Pancheva’s (1984) results concerning weak limits of extremes under nonlinear normalization (with slight emendations). These results lead to a method, by which the class of all possible weak limits for lower and upper intermediate order statistics is derived under power normalization from the corresponding weak limits of extremes under power normalization. Moreover, the suggested method may be extended to any other nonlinear normalizing transformation, which belongs to GM A. Clearly, we can translate the result obtained for left

58

Asymptotic theory of order statistics: A historical retrospective

order statistics into the case of right order statistics and vice versa. Therefore our concentration will mostly focus on left intermediate order statistics.

2.12.1 The class of weak limits of intermediate order statistics under general normalization The next lemma concerning the asymptotic behaviour of maximum order statistics under Gn (x) ∈ GM A is an essential tool for developing the limit theory of intermediate order statistics under general normalization. Lemma 2.54 (Leadbetter et al., 1983) Let H be a non-degenerate possible limit of the DF of maximum order statistics under general strongly monotone transformation Gn (x) ∈ GM A. Then, Fn:n (Gn (x)) = F n (Gn (x)) w H(x), if and only if n(1 − F (Gn (x)) −→ − log H(x) = u(x). −→ n n By proceeding along the same line as that in Subsection 2.2.1, we can deduce the functional equation log H(gθ (x)) + log H(g1−θ (x)) = log H(x). Putting J(x) = log H(x) ≤ 0, we can easily show that J(x) is a nondecreasing function in x and satisfies the functional equation J(gθ (x)) + J(g1−θ (x)) = J(x).

(2.38)

Moreover, if we again proceed along the same line as in Subsection 2.11.1, we get gθ (ρ) = ρ and gθ (ρ) = ρ, ∀θ ∈ (0, 1), where ρ = (H) and ρ = r(H). On the other hand, in order to get all possible non-degenerate types of H, we should consider the three solutions (2.28)–(2.30) to the functional equation (2.23). The solution gθ (x) ≡ x (i.e., (2.30)) implies 2J(x) ≡ J(x), which leads to J(x) ≡ −∞, or J(x) ≡ 0, or J(x) ≡ ∞, where the last possibility should be rejected, since J(x) ≤ 0. The other two possibilities give H(x) = 0 and H(x) = 1, respectively, which are degenerate types. Therefore, we seek the solutions for (2.38) under (2.28) and (2.29) for ρ < x < ρ. The solution (x)

(2.29) gives a unique monotone solution for (2.38), J(x) = ce μ´ , where c > 0, since J(x) is nondecreasing in x. However, this solution should be rejected because J(x) is negative. The remaining solution (2.28) gives only the possible non-degenerate type (see Sreehari, 2009) ⎧ ⎪ ⎨ −∞,

J(x) =

−ce ⎪ ⎩ 0,



x ≤ ρ, h(x) μ

,

ρ < x ≤ ρ, x > ρ,

2.12 Asymptotic intermediate order statistics under nonlinear normalization 59

where h(ρ) = −∞, h(ρ) = ∞. Pancheva (1984) showed, in the above representation, that we can take c = 1. Therefore,

H(x) = e

J(x)

⎧ ⎪ 0, ⎪ ⎨

=

e−e ⎪ ⎪ ⎩ 1,

x ≤ ρ, −

h(x) μ

,

ρ < x ≤ ρ, x > ρ.

The corresponding non-degenerate limit DF L(x) of the minimum order statistics under general strictly monotone transformation Gn (x) ∈ GM A is (x)

L(x) = 1−exp[−e μ´ ], ρ < x ≤ ρ , where ρ = (L) and ρ = r(L). Finally, by choosing Gn (x) = cn x+dn , cn > 0, and Gn (x) = bn | x |an S(x), an , bn > 0, we get the known possible non-degenerate limit types for the maximum order statistics under linear and power normalizing constants. However, these types can be derived by considering another functional equation, which describes and represents the class of all these limit types. Namely, for maximn mum case and any arbitrary subsequence mn −→ n ∞, where n −→ n m > 0, Barakat and Omar (2016) used the corresponding modification of Khinchin’s theorem, to derive the following functional equation: mu(x) = u(˜ gm (x)), x ∈ R,

(2.39)

◦ Gn (x). Moreover, g˜(x) satisfies the functional where g˜m (x) = lim G−1 n→∞ mn equation g˜m ◦ g˜m (x) = g˜m m (x), ∀m , m > 0.

(2.40)

Clearly, the functional equation (2.40) also provides the general solution (2.28)–(2.30) by replacing θ with m. In view of Lemma 2.54, the limit DF H is given by H(x) = e−u(x) . For linear normalization, the functional equation (2.39) was introduced by Gnedenko (1943) and Smirnov (1952) to derive all the possible non-degenerate limit types for the maximum and upper extreme order statistics, respectively. Moreover, under power normalization, in view of the results of Pancheva (1984) and Christoph and Falk (1996), the functional equation (2.39) must only have the possible solutions given in (2.31). On the other hand, Barakat and Omar (2010) (see also Barakat and Omar, 2016) used the next lemma to study the class of the limit types of intermediate order statistics under the transformation Gn (x) ∈ GM A. Lemma 2.55 (Wu, 1966 and Leadbetter et al., 1983) Let Gn be a real sequence and min(kn , n−kn ) −→ n ∞. Furthermore, let τ ∈ R. Then, Fkn :n (Gn )

60

Asymptotic theory of order statistics: A historical retrospective

= IF (Gn ) (kn , n − kn + 1) −→ n τ, if and only if ⎛

k



F (Gn ) − nn n ⎠ → Φ−1 (τ ) = v (v may be ± ∞).  ⎝  kn kn 1− n

Namely, Barakat and Omar (2010) used the modification of Khinchin’s theorem and Lemma 2.55 to get the functional equation v(gν (x)) + lν(1 −  (x). α) = v(x), where gν (x) = lim G−1 ◦ Gn (x) and gν ◦ gμ (x) = gνμ n→∞ n+zn (ν) Note that this functional equation also has only three solutions, which are of the same forms as the solutions (2.28)–(2.30) (with replacing log | . | instead  of log .). However, in our case we have gν (x) = βν | x |αν S(x), where 

αν = lim and

an

n→∞ an+zn (ν) m = elν(1−α) ,

and βν = lim

bn

n→∞

a−1

n+zn (ν)

bn+z

n (ν)

. Putting u(x) = e−v(x)

we get mu(x) = u(β˜m | x |α˜ m S(x)),

(2.41)

˜ m = αlog m . Comparing the functional equations where β˜m = β log m and α l(1−α)

l(1−α)

(2.41) and (2.40), we can easily see that the only non-degenerate solutions of the functional equation (2.40) are ui;β (x), i = 1, 2, ..., 6, where ui;β (x) is de fined in (2.12). Therefore, the possible types of the limits G(x) and T(x) are Φ(− log ui;β (x)) = 1 − Φ(log ui;β (x)), i = 1, 2, ..., 6, and Φ(log ui;β (−x)), i = 1, 2, ..., 6, respectively, i.e., 

 1;β (x) = G

 2;β (x) = G

 3;β (x) = G

0, Φ(β log log x),

⎧ ⎪ ⎨ 0,

Φ(−β log(− log x)), ⎪ ⎩ 1, ⎧ ⎪ ⎨ 0,

Φ(β log(− log |x|)), ⎪ ⎩ 1, 

 4;β (x) = G

Φ(−β log log |x|), 1, 

 5 (x) = G

0, Φ(log x),

x ≤ 1, x > 1; x ≤ 0, 0 < x ≤ 1, x > 1; x ≤ −1, −1 < x ≤ 0, x > 0; x ≤ −1, x > −1; x ≤ 0, x > 0;

2.12 Asymptotic intermediate order statistics under nonlinear normalization 61  Φ(− log |x|), x ≤ 0,  G6 (x) = (2.42)

1,

x > 0.

The corresponding types of the upper intermediate order statistics are 

T1;β (x) =

T2;β (x) =

T3;β (x) =

x > −1, x ≤ −1;

1, 1 − Φ(β log((log | x |))),

⎧ ⎪ ⎨ 1,

Φ(β log(− log | x |)), ⎪ ⎩ 0,

x > 0, −1 < x ≤ 0, x ≤ −1;

⎧ ⎪ ⎨ 1,

x > 1, 0 < x ≤ 1, x ≤ 0;

1 − Φ(β log(− log x)), ⎪ ⎩ 0, 

Φ(β log log x), 0,

T4;β (x) = 

T5 (x) =

1, Φ(− log | x |), 

T6 (x) =

x > 1, x ≤ 1; x > 0, x ≤ 0;

Φ(log x), x > 0, 0, x ≤ 0.

 i;β = Ti;β , i = 1, 2, ..., 6, a closer Remark Although, in general, we have G look at the two classes of possible limit laws of lower and upper intermediate order statistics under normalization ! " power ! " shows that they coincide, i.e.,  i;β , i = 1, 2, ..., 6 ≡ Ti;β , i = 1, 2, ..., 6 . This fact is also true for the G two classes of possible limit laws of lower and upper intermediate order statistics under linear normalization.

Remark It is worth mentioning that the possible limiting types of the intermediate order statistics under linear normalization, which are defined in (2.12), coincide with those of a suitably linearly normalized record value (see Barakat, 2010). This resemblance is because both classes are governed by the same functional equation (see Barakat, 2010). Therefore, it is not accidental to find that the possible limiting power types (2.42) coincide with the possible limiting types of record value under power normalization, which are obtained by Grigelionis (2006) (see also Nigm, 2009).

62

Asymptotic theory of order statistics: A historical retrospective

Corollary (2.3) The above argument, which was applied to get the class of all possible weak limits for lower and upper intermediate order statistics under power normalization from the corresponding weak limits of extremes under power normalization, may also be applied to any other nonlinear normalization. Moreover, as we have mentioned before, Wu (1966) generalized Chibisov’s result for any nondecreasing intermediate rank-sequence and proved that the only possible limit types for these cases are the same types defined by (2.12). Since, in Wu’s argument, the behaviour of the rank is the crucial issue and not the normalizing constants, which are used, we expect also that for any nondecreasing intermediate rank-sequence, under power normalization, the only possible limit types are those defined in (2.42). Example 2.56 Consider the DF F (x) = 1 − x−κ , x ≥ 1, κ > 0. In view of Christoph and Falk (1996) we have  w

F n (bn | x |an S(x)) −→ n

0, x ≤ 0, = e−u5 (x) , 1 e− x , x > 0

1

where an = κ1 and bn = n κ . Thus, according to the preceding procedure, the DF of the upper intermediate order statistic Xrn :n , rn = n − kn + 1, converges weakly to Φ(log u5 (−x)) = T5 (x).

2.12.2 The domains of attraction of the lower intermediate power types  i;β ) to Throughout this subsection, we write F ∈ D (Gi;β ) and F ∈ Dp (G indicate that F belongs to the domains of attraction of the laws Gi;β (defined  i;β (defined in (2.12), under linear normalization Gn (x) = cn x + dn ) and G a n in (2.42), under power normalization Gn (x) = bn | x | S(x)), respectively. The next theorems determine the domains of attraction of each limit laws defined in (2.42), where the normalizing constants given at the end of each of these theorems are defined in some cases for large n only.  1;β ), if Theorem 2.57 (Barakat and Omar, 2011b, 2016) A DF F ∈ Dp (G and only if ∃ x0 such that F (ex0 ) = 0, and F (ex0 + ) > 0, ∀ > 0 and for any τ > 0,

lim x↓0

F (exp(x0 + xτ )) − F (exp(x0 + x)) [F (exp(x0 + x))]

2−α 2−2α

=l

−1 1−α

β log τ.

In this case we may set bn = ex0 and an = log(F − ( knn )) − x0 .

2.12 Asymptotic intermediate order statistics under nonlinear normalization 63  2;β ), if A DF F ∈ Dp (G

Theorem 2.58 (Barakat and Omar, 2011b, 2016) and only if (F ) = 0, and for any τ > 0, lim

F (exp(τ x)) − F (exp(x))

x→−∞

[F (exp(x))]

2−α 2−2α

= −l

−1 1−α

β log τ.

We may set bn = 1 and an = − log(F − ( knn )).  2;β ), if and only if F (ex ) ∈ Corollary (2.4) If (F ) = 0, then F (x) ∈ Dp (G D (G2;β ).  3;β ), Theorem 2.59 (Barakat and Omar, 2011b, 2016) A DF F ∈ Dp (G −x −x 0 0 if and only if ∃x0 such that F (−e ) = 0, and F (−e + ) > 0, ∀ > 0, and for any τ > 0,

lim

F (− exp(−(x0 + xτ ))) − F (− exp(−(x0 + x)))

x↓0

[F (− exp(−(x0 + x)))]

2−α 2−2α

=l

−1 1−α

β log τ.

In this case we may set bn = e−x0 and an = −(log(−F − ( knn )) + x0 ) −→ n ∞. Theorem 2.60 (Barakat and Omar, 2011b, 2016) and only if (F ) = −∞ and for any τ > 0, lim

x→−∞

F (− exp(−τ x) − F (− exp(−x)) [F (− exp(−x))]

 4;β ), if A DF F ∈ Dp (G

= −l

2−α 2−2α

−1 1−α

β log τ.

In this case, we may set bn = 1, an = log(−F − ( knn )) −→ n ∞.  5 ), if Theorem 2.61 (Barakat and Omar, 2011b, 2016) A DF F ∈ Dp (G and only if (F ) ≥ 0 and the sequence {dn }, which is defined as the smallest log number, for which F (edn ) ≤ knn ≤ F (edn + 0) (i.e., dn = log F − ( knn ) −→ n (F )), satisfies the condition

lim

n→∞

dn+zn (ν) − dn ν = , dn+zn (μ) − dn μ

for all sequences {zn (t)}, t ∈ R, satisfying set bn = edn = F − ( knn ) and an = log

(2.43)

−→ n t. In this case, we may

zn (t) α n1− √ 2

k + k F −( n n n ) F − ( knn )

.

 6 ), Theorem 2.62 (Barakat and Omar, 2011b, 2016) A DF F ∈ Dp (G if and only if (F ) < 0 and the sequence {dn }, which is defined as the smallest numbers, for which F (−e−dn ) ≤ knn ≤ F (−e−dn + 0), (i.e., dn = − log(− (F )) > −∞), satisfies the condition (2.43). − log(−F − ( knn )) −→ n

In this case, we may set bn = e−dn and an = √1 log ξ

F − ( knn ) √ . k + k ξ F −( n n n )

64

Asymptotic theory of order statistics: A historical retrospective

The next theorem compares the domains of attraction of limit laws of intermediate order statistics under power normalization with those of limit laws of intermediate order statistics under linear normalization. Theorem 2.63 (Barakat and Omar, 2011b) DF F produces the following implications:

Any univariate continuous

 1;β ) ⇐⇒ G ∈ D (G1;β ), where 1. If 0 < (F ) < ∞, then F ∈ Dp (G 

0, y ≤ log (F ), F (ey ), y > log (F ).

G(y) =

 2;β ) ⇐⇒ G(y) = F (ey ) ∈ D (G2;β ). 2. If (F ) = 0, then F ∈ Dp (G −y

 3;β ) ⇐⇒ G(y) = F (−e ) ∈ D (G1,β ∗ ), 3. If −∞ < (F ) < 0, then F ∈ Dp (G F (0) n ] and β ∗ = √ β , where [θ] denotes the with rank-sequence kn∗ = [ Fk(0) F (0)

integer part of θ.  4;β ) ⇐⇒ G ∈ D (G2;β ), where 4. If (F ) = −∞, then F ∈ Dp (G 

G(y) =

F (−e−y ), y ≤ 0, 1, y > 0.

 5 ) ⇐⇒ F (ey ) ∈ D (G3 ). 5. (i) If (F ) = 0, then F ∈ Dp (G  5 ) ⇐⇒ G(y) ∈ D (G3 ), where (ii) If (F ) > 0, then F ∈ Dp (G 

G(y) =

0, y ≤ 0, F (ey ), y > 0.

 6 ) ⇐⇒ G(y) = 6. If (F ) < 0, then F ∈ Dp (G n rank-sequence kn∗ = [ Fk(0) ].

F (−e−y ) F (0)

∈ D (G3 ), with

The normalizing constants, which can be used in each of the above cases, are defined in Barakat and Omar (2011b). Moreover, these normalizing constants are defined in some cases for large n only.

2.13 Asymptotic theory of order statistics under exponential normalization Ravi and Mavitha (2016) introduced the exponential normalization, which is of the form Tn (x) = Tun ,vn (x) = exp{un (| log |x||)vn S(log |x|)}S(x). Under this transformation, we naturally adopt the following notion of the term “type.”

2.13 Asymptotic theory of order statistics under exponential normalization 65

Definition 2.64 if

We say that any DFs F1 and F2 are of the same e-type

F1 (x) = F2 (exp{(u(| log |x||)v sign(log |x|))}S(x)) = F2 (Tu,v (x)), for some constants u > 0, v > 0. In this case, a non-degenerate DF Λ(.) is said to be an e-max-stable law if there exists a DF F and norming constants un > 0, vn > 0 such that P (Tn− (Xn:n )} ≤ x) #

=P



| log |Xn;n || exp un

1/vn

$

S(log |Xn:n |) w



S(Xn:n ) ≤ x

= P (Xn:n ≤ Tn (x)) = F n (Tn (x)) −→ Λ(x). n

(2.44)

Moreover, if (2.44) is satisfied, then we say that the DF F belongs to the emax-domain of attraction of the non-degenerate DF Λ under e-normalization, denoted by F ∈ De (Λ). Finally, in view of Lemma 2.54, the relation (2.44) is satisfied, if and only if n[1 − F (Tn (x))] −→ − log Λ(x). n

(2.45)

The following lemma, due to Ravi and Mavitha (2016), gives a chain of equivalences, which can be used to obtain p-max-domains of attraction from l-max-domains of attraction and vice versa, FX denotes the DF of a RV X. Moreover, this chain of equivalences is extended from power normalization to the exponential normalization. Lemma 2.65 (Ravi and Mavitha, 2016) tions:

We have the following implica-

1. FX ∈ Dl (Fξ ) ⇐⇒ Fexp(X) ∈ Dp (Fexp(ξ) ) ⇐⇒ F− exp(−X) ∈ Dp (F− exp(−ξ) ), where Fξ denotes an l-max stable DF, Fexp(ξ) and F− exp(−ξ) denote p-max stable DFs. 2. FX ∈ Dp (Fξ ) ⇐⇒ Fexp(X) ∈ De (Fexp(ξ) ) ⇐⇒ F− exp(−X) ∈ De (F− exp(−ξ) ), where Fξ denotes a p-max stable DF, Fexp(ξ) and F− exp(−ξ) denote e-max stable laws. Ravi and Mavitha (2016) utilized Lemma 2.65 to derive the following e-max-stable laws, wherein the first six e-max-stable DFs have the right

66

Asymptotic theory of order statistics: A historical retrospective

endpoint r(Λ) = sup{x : Λ(x) < 1} > 0 and the subsequent six e-maxstable DFs have r(Λ) ≤ 0. [1] Λ1,β (x) = exp(−(log log x)−β )I{x≥e} . [2] Λ2, β(x) = exp(−(− log log x)β )I{1≤x 0. Clearly, this property justifies the name “e-max-stable laws.” In order to establish the asymptotic theory of order statistics with variable rank under the exponential normalization (central and intermediate) we need the modification of Khinchin’s theorem of this normalizing mapping, which is stated below as a lemma. Lemma 2.66 (Barakat and Omar, 2019) Let Fn be a sequence of DFs and H1 a non-degenerate DF. Let un > 0 and vn > 0 be constants, for which w

H1 (x). Fn (Tun ,vn (x)) −→ n Then, for some non-degenerate DF H2 and constants un > 0 and vn > 0, w

H2 (x), Fn (Tun ,vn (x)) −→ n if and only if   1 u vn n

un

−→ n u > 0 and

vn −→ v > 0. vn n

2.13 Asymptotic theory of order statistics under exponential normalization 67

In this case, we have H2 (x) = H1 (Tu,v (x)) . It is worth mentioning that the modification of Khinchin’s theorem (Lemma 2.66) holds for the power normalization Gn (x) = un |x|vn S(x) (cf. Barakat and Nigm, 2002). Barakat and Omar (2019) used Theorem 2.46 and Lemma 2.66 to prove the next theorem concerning the class of possible limit non-degenerate types of central order statistics under the exponential normalization. √ Theorem 2.67 (Barakat and Omar, 2019) Let kn → ∞, n( knn − λ) → 0 and Tun ,vn (x) be an exponential transformation. Then, the possible nondegenerate limit types of Fkn :n (Tn (x)) are Ω(0i) (x), i = 1, ..., 6, Ω(1) (x) = Φ(c(log x)β )I{x>1} , Ω(2) (x) = Φ(−c| log |x||β S(log |x|))I{x≤0} + I{x>0} and Ω(3) (x) = Φ(−c1 | log x|β )I{01} , where c, c1 , c2 , β > 0. Remark Barakat and Omar’s (2011a) result shows that the possible nondegenerate limit types for central order statistics under linear and power normalizing constants are the same. Theorem 2.67 reveals that the possible non-degenerate limit types for the central order statistics under exponential normalization are different from those possible limit types under linear and power normalizing constants. Moreover, it is notable that under exponential normalization the normal DF is no longer a possible type of quantile (central order statistics) and, actually, the log-normal DF superseded it (the lognormal type appears as the type Ψ(3) (.), when c1 = c2 = β = 1). Example 2.68 Consider the DF F (x) = 21 (log x + 1)I{1e} . Let λ = 12 . Furthermore, let vn = 1 and un = √1n . Then, √ √ F (Tun ,vn (x)) − λ n  = − nI{x≤1} + (log x)I{1e n }

−→ n

−∞, x ≤ 1, log x, x > 1.

w

Ψ(1) (x). Therefore, we get Fkn :n (Tun ,1 (x)) −→ n We now consider again the sequences {kn }, which satisfy Chibisov’s condition (2.11). In this case, by using Corollary (2.3), the possible e-max-stable laws (2.46) enable us to derive the following theorem that characterizes the class of possible limit non-degenerate types of Fkn :n (Tun ,vn (x)).

68

Asymptotic theory of order statistics: A historical retrospective

Theorem 2.69 (Barakat and Omar, 2019) Let kn be a Chibisov ranksequence and Tun ,vn (x) be an exponential normalization. Then, the possible non-degenerate limit types of Fkn :n (Tn (x)) are L1,β (x) = Φ(β log(log log x))I{x>e} ; L2,β (x) = Φ(−β log(− log log x))I{1e} ; L3,β (x) = Φ(β log(− log(− log x)))I{ 1 1} ; e

L4,β (x) = Φ(−β log(log(− log x)))I{0 1 } ; e

e

L5,β (x) = Φ(β log(log(− log(−x))))I{ 1 0} ; e

L6,β (x) = Φ(−β log(− log(− log(−x))))I{−1− 1 } ; e

e

L7,β (x) = Φ(β log(− log log(−x)))I{−e−1} ; L8,β (x) = Φ(−β log(log log(−x)))I{x≤−e} + I{x>−e} ; L9,β (x) = Φ(log log(x))I{x>1} ; L10,β (x) = Φ(− log(− log x)I{≤x≤1} + I{x>1} ; L11,β (x) = Φ(log(− log(−x))I{−10} and 1 L12,β (x) = Φ(− log(− log(− ))I{x>−1} . x 2.14 Generalized order statistics and dual generalized order statistics Generalized order statistics (gos) have been introduced in Kamps (1995) as a unification of several models of RVs organized in ascending order. The gos model enables us to handle well-known limit laws for upper and lower extremes, intermediate and central order statistics, record values, Pfeiffer record values, and m−generalized order statistics (m−gos) within one framework. On the other hand, Burkschat et al. (2003) have introduced the concept of dual generalized order statistics (dgos) to enable a common approach to RVs organized in descending order, like reversed order statistics and lower records models. The two models gos and dgos are introduced via a distributional approach, where both models depend on n unknown parameters.

2.14 Generalized order statistics and dual generalized order statistics

69

However, the sub-models m−gos and m−dgos, which contain most of the known important models of ordered RVs, are defined only by two unknown parameters.

2.14.1 Distribution theory of the m-gos and m-dgos  ≡ U (r, n, Using notations from Kamps and Cramer (2001), ordered RVs Ur:n m, ˜ k), r = 1, 2, ..., n are called uniform gos based on (positive) parameters γ1 , γ2 , ..., γn , if they have the probability density function (PDF)



f

 ,U  ,...,U  U1:n n:n 2:n

(u1 , u2 , ..., un ) = ⎝

n



γj ⎠

j=1



n−1

×⎝



(1 − uj )γj −γj+1 −1 ⎠ (1 − un )γn −1 ,

j=1

where 0 ≤ u1 ≤ u2 ≤ ... ≤ un < 1. The parameters γ1 , γ2 , ..., γn are defined by γn = k > 0 and γr = k + n − r + Mr , r = 1, 2, ..., n − 1,  where Mr = n−1 ˜ = (m1 , m2 , · · · , mn−1 ) ∈ Rn−1 . The RVs j=r mj and m X(1, n, m, ˜ k), ..., X(n, n, m, ˜ k) are called gos based on the DF F and on the d  ), r = 1, 2, ..., n, where ˜ k) = F − (Ur:n parameters γ1 , γ2 , ..., γn , if X(r, n, m, − F denotes the quantile function of F (i.e., the generalized inverse of F ) and d “=” means identical distributions. Choosing particular values for the parameters γ1 , γ2 , ..., γn , distributions of well-known models of ordered data results. For instance, m−gos (γn = k, γr = k + (n − r)(m + 1), r = 1, 2, ..., n − 1), ordinary order statistics (γn = 1, γr = n − r + 1, r = 1, 2, ..., n − 1, i.e., k = 1, mi = 0, i = 1, 2, ..., n − 1), sequential order statistics (sos) (γn = αn , γr = (n − r + 1)αr , r = 1, 2, ..., n − 1), progressive type II censored order statistics with censoring scheme (R1 , ..., RM ) (pos) (γn = RM + 1, γr =  n−r+1+ M j=r Rj , if r ≤ M − 1 and γr = n − r + 1 + RM , if r ≥ M ) and upper records (γr = 1, 1 ≤ r ≤ n, i.e., k = 1, mi = −1, i = 1, 2, ..., n − 1) (see Kamps, 1995, 1999, and Cramer, 2003). A crucial result in a distributional analysis of gos is a stochastic representation in terms of a product of iid uniform RVs B1 , ..., Bn due to Kamps and Cramer (2001). The RVs F





1 γ1

1 − B1





, ..., F



1−

n

1 γi



Bi

i=1

are gos based on the DF F and on the parameters γ1 , γ2 , ..., γn .

(2.47)

70

Asymptotic theory of order statistics: A historical retrospective

In a wide subclass of gos, specifically when m1 = m2 = ... = mr−1 = m, (m,k) ˜ a representation of the marginal DF Φr:n (x) = P (X(r, n, m, ˜ k) ≤ x) was given in Kamps (1995). Namely, m,k) ˜ Φ(r:n (x) = 1 − Cr−1,n (1 − F (x))γr

where

⎧ % & 1 m+1 , ⎪ ⎨ m+1 1 − (1 − x)

gm (x) =

⎪ ⎩

− log(1 − x),

r−1

1 (gm (x))j , j!C r−j−1 j=0 m = −1; m = −1,

for x ∈ [0, 1),

'

and Cr−1,n = ri=1 γi , r = 1, 2, ..., n, with γn = k. If m = −1, (m+1)gm (x) = m+1 Gm (x) = 1 − (1 − F (x))m+1 = 1 − F (x) is a DF.  Burkschat et al. (2003) defined the uniform dgos Ud;r:n , r = 1, 2, ..., n, by their PDF ⎛

f

   ,Ud;2:n ,...,Ud;n:n Ud;1:n

(u1 , u2 , ..., un ) = ⎝

n j=1

⎞⎛

γj ⎠ ⎝

n−1



γ −γ −1 uj j j+1 ⎠ uγnn −1

,

j=1

where 1 ≥ u1 ≥ ... ≥ un > 0. The dual RVs based on a DF F is defined by the quantile transformation d  ˜ k) = F − (Ud;r:n ), r = 1, 2, ..., n. Xd (r, n, m, Nasri-Roudsari (1996) (see also Barakat, 2007) has derived the marginal DF of the rth m−gos, m = −1, in the form Φ(m,k) r:n (x) = IGm (x) (r, N − r + 1), k + n − 1. By using the well-known relation Ix (a, b) = 1 − where N = m+1 Ix (b, a), where x = 1 − x, the marginal DF of the (n − r + 1)th m−gos, m = −1, is given by (m,k)

Φn−r+1:n (x) = IGm (x) (N − Rr + 1, Rr ), k where Rr = m+1 + r − 1. Similarly, by putting Tm (x) = F m+1 (x), the marginal DFs of the rth and (n − r + 1)th m−dgos, m = −1, can be written, respectively, as

(x) = ITm (x) (N − r + 1, r) Φd(m,k) r:n

(2.48)

and d(m,k)

Φn−r+1:n (x) = ITm (x) (Rr , N − Rr + 1).

(2.49)

2.14 Generalized order statistics and dual generalized order statistics

71

We can investigate the connections between m−gos and m−dgos. Let Dj , 1 ≤ j ≤ n, be independent RVs with respective beta distribution β(γj , 1), i.e., Dj follows a power function distribution with exponent γj = k + (n − j)(m + 1). The central distribution theoretical result concerning m−gos and m−dgos is that they can alternatively be defined by the product of the independent power function distributed RVs Dj , 1 ≤ j ≤ n (see Cramer, 2003 and Burkschat et al., 2003). According to (2.47), we have r d Dj ), r = 1, 2, ..., n X(r, n, m, k) = F − (1 − j=1

and d Dj ), r = 1, 2, ..., n. Xd (r, n, m, k) = F − ( r

j=1

Moreover, by using the results of Kamps (1995) and Burkschat et al. (2003), we can write explicitly the marginal PDFs of the rth m−gos and m−dgos, respectively, (m,k)

fr

(x) =

d(m,k)

fr:n

(x) =

Cr−1,n γr −1 r−1 F (x)gm (F (x))f (x), −∞ < x < ∞, Γ(r) Cr−1,n γr −1 r−1 (x)gm (F (x))f (x), −∞ < x < ∞. F Γ(r)

2.14.2 Asymptotic theory of univariate m−gos and m−dgos (m,k) ˜

The possible limit DFs of Φn:n , i.e., the limit DF of the maximum gos, under the condition m1 = m2 = ... = mn−1 = m = −1 (in this case, clearly, the record values are excluded) and their domains of attraction under linear normalization are shown in Nasri-Roudsari (1996). By using Christoph and Falk’s (1996) technique, the analogous results under power normalization were derived by Nasri-Roudsari (1999). If mi = m, 1 ≤ i ≤ n − 1, the corresponding gos are called m−gos (cf. Cramer, 2003). The limit possible non-degenerate limit distributions and the convergence rate of the upper extreme m−gos, i.e., (n−r +1)th m−gos for fixed r, were discussed in NasriRoudsari (1999). The asymptotic normality of intermediate and central gos, which depends on the differentiability of the underlying DF F , was derived by Cramer (2003) (see Section 5.7 in Cramer, 2003). Moreover, the necessary and sufficient conditions of the weak convergence, as n → ∞, as well as the form of the possible limit DFs of extreme, intermediate and central m−gos were derived in Barakat (2007).

72

Asymptotic theory of order statistics: A historical retrospective

The derivation of the extreme value theory for arbitrary gos is based on limit theorems obtained for order statistics and record values. It turns out that the limiting behaviour of gos can be categorized by a series of powers of the underlying parameters. Moreover, one class behaves like order statistics with respect to the weak limits, whereas the asymptotic of another family shows similarities to that of record values. The following two theorems extend the well-known results concerning the asymptotic theory of extreme order statistics to the case of univariate extreme m−gos and m−dgos. These theorems can be easily proved by applying the following asymptotic relations, due to Smirnov (1952) (see also Barakat, 1997a) Γr (nAn ) − δ1n ≤ IAn (r, n − r + 1) ≤ Γr (nAn ) − δ2n , if nAn ∼ A < ∞, as n → ∞, and 1 − Γr (nAn ) − δ2n ≤ IAn (n − r + 1, r) ≤ 1 − Γr (nAn ) − δ1n , if nAn ∼ A < ∞, as n → ∞, where δin > 0, δin −→ n 0, i = 1, 2, and 0 < An < 1. However, the results concerning gos were originally derived by Nasri-Roudsari (1996) and Nasri-Roudsari (1999) (see also Barakat, 1997a), while those concerning the dgos can be easily derived by using (2.48), (2.49) and the relations between gos and dgos, see Burkschat et al. (2003). Theorem 2.70 (see Barakat, 2010, Barakat et al., 2014b, and Barakat et al., 2014c) Let m > −1 and r be a fixed rank. Then, there exist normalizing constants cn , c˜n > 0 and dn , d˜n , for which w

Φ(m,k) (x) Φ(m,k) r:n (cn x + dn ) = IGm (cn x+dn ) (r, N − r + 1) −→ r n

(2.50)

and d(m,k) w ˆ d(m,k) Φ cn x + d˜n ) = ITm (˜cn x+d˜n ) (Rr , N − Rr + 1) −→ (x), (2.51) Φn−r+1:n (˜ r n (m,k) ˆ d(m,k) where Φr (x) and Φ (x) are non-degenerate DFs, if and only if there r exist normalizing constants αn > 0 and βn , for which ∗ Γr (Uj,α (x)), β > 0. Φ(0,1) r:n (αn x + βn ) = Φn−r+1:n (αn x + βn ) −→ n d(0,1)

w

(m,k) ∗ m+1 ∗ (x)) and Φ ˆ d(m,k) (x) = Γr (Uj,α (x) = ΓRr (Uj,α (x)), j ∈ In this case, Φr r {1, 2, 3}. Moreover, cn , dn , c˜n , and d˜n may be chosen such that cn = αψ(n) , dn = βψ(n) , c˜n = αφ(n) and d˜n = βφ(n) , where φ(n) = n1/(m+1) and ψ(n) = n(m + 1). Finally, (2.50) and (2.51) hold, if and only if ∗ N Gm (cn x + dn ) −→ n Uj,α (x)

2.14 Generalized order statistics and dual generalized order statistics

73

and ∗ m+1 cn x + d˜n ) −→ (x) N Tm (˜ n Uj,α

(note that N ∼ n, as n → ∞), respectively. Theorem 2.71 (Barakat, 2007 and Barakat et al., 2014b) Let m > −1 and r be a fixed rank. Then, there exist normalizing constants an , a ˜n > 0 and bn , ˜bn , for which (m,k) w ˆ (m,k) (x) (2.52) Φn−r+1:n (an x + bn ) = IGm (an x+bn ) (N − Rr + 1, Rr ) −→ Φ r n

and w d(m,k) Φrd(m,k) (x), (˜ an x + ˜bn ) = ITm (˜an x+˜bn ) (N − r + 1, r) −→ Φr:n n (m,k)

(2.53)

d(m,k)

ˆr (x), and Φr (x) are non-degenerate DFs, if and only if there where Φ exist normalizing constants α ˆ n > 0 and βˆn , for which (0,1) w Φn−r+1:n (α ˆ n x + βˆn ) = Φd(0,1) αn x + βˆn ) −→ 1 − Γr (Ui,α (x)), α > 0. r:n (ˆ n (m,k)

d(m,k)

m+1 ˆr (x) = 1−ΓRr (Ui,α (x)) and Φr (x) = 1−Γr (Ui,α (x)), In this case, Φ ˜ i ∈ {1, 2, 3}. Moreover, an , bn , a ˜n and bn may be chosen such that an = α ˆ φ(n) , bn = βˆφ(n) , a ˜n = α ˆ ψ(n) and ˜bn = βˆψ(n) . Finally, (2.52) and (2.53) hold, if and only if m+1 N Gm (an x + bn ) −→ n Ui;α (x)

and N T m (˜ an x + ˜bn ) −→ n Ui;α (x). The following theorems extend the well-known results concerning the asymptotic theory of univariate central and intermediate m−gos and m−dgos. These theorems are due to Barakat (2007) and Barakat et al. (2014c, 2015b). Theorem 2.72 (Barakat, 2007, and Barakat et al., 2014c) Let r = rn be & √ % rn n n − λ −→ such that n 0, where 0 < λ < 1. Furthermore, let m1 = m2 = ... = mn−1 = m > −1. Then, there exist normalizing constants an > 0 and bn for which w

Φ(m,k) Φ(m,k) (x; λ), r:n (an x + bn ) −→ n where Φ(m,k) (x; λ) is a non-degenerate DF, if and only if √ Gm (an x + bn ) − λ −→ n n W(x), Cλ

(2.54)

74

Asymptotic theory of order statistics: A historical retrospective  where Cλ = λ(1 − λ), Φ(m,k) (x; λ) = Φ(W(x)). Moreover, (2.54) is sat-

isfied for some non-degenerate DF Φ(m,k) (x; λ), if and only if F ∈ Dλ(m)

(Φ(Wi,β (x)), for some i ∈ {1, 2, 3, 4}, where λ(m) = 1 − λ

1/(m+1)

and λ =

 Cλ(m) Cλ

1 − λ. In this case, we have W(x) = (m + 1)Wi,β (x), where Cλ = Cλλ (note that, when m = 0, we get W(x) = Wi,β (x)). & √ % Theorem 2.73 (Barakat et al., 2015b) Let r = rn be such that n rnn − λ −→ n 0, where 0 < λ < 1. Furthermore, let m1 = m2 = ... = mn−1 = m > −1. Then, there exist normalizing constants a ˜n > 0 and ˜bn for which w (˜ an x + ˜bn ) −→ Φd(m,k) (x; λ), Φd(m,k) r:n n

(2.55)

where Φd(m,k) (x; λ) is a non-degenerate DF, if and only if √ λ − T m (˜ an x + ˜bn ) −→ n n U (x), Cλ where Φd(m,k) (x; λ) = Φ(U (x)). Moreover, (2.55) is satisfied for some nondegenerate DF Φd(m,k) (x; λ), if and only if (0,1)

w

an x + ˜bn )(= Φn−r+1:n (˜ an x + ˜bn )) −→ Φ(Wi,β (x)), Φd(0,1) r:n (˜ n for some i ∈ {1, 2, 3, 4}. In this case, we have U (x) = where Cλ =

Cλ . λ

 Cλ(m) Cλ (m + 1)Wi,β (x),

Theorem 2.74 (Barakat, 2007, and Barakat et al., 2014c, 2015b) Let m1 = m2 = ... = mn−1 = m > −1, and let rn be a nondecreasing intermediate rank-sequence. Then, there exist normalizing constants an > 0 and bn such that w

Φ(m,k) (x), Φ(m,k) rn :n (an x + bn ) −→ n

(2.56)

where Φ(m,k) (x) is a non-degenerate DF, if and only if N Gm (an x + bn ) − rN −→ √ n V(x), rN where Φ(m,k) (x) = Φ(V(x)). Furthermore, let rn be a variable rank-sequence defined by rn = rθ−1 (N ) , k + n − 1, then θ(n) = n, where θ(n) = (m + 1)N (remember that N = m+1 if m = 0, k = 1, i.e., in the case of order statistics). Then, there exist normalizing constants an > 0 and bn for which (2.56) is satisfied for some

2.14 Generalized order statistics and dual generalized order statistics

Φ(m,k) (x),

non-degenerate DF αn > 0 and βn for which

(0,1)

75

if and only if there are normalizing constants w

Φr :n (αn x + βn ) −→ Φ(0,1) (x), n n

where Φ(0,1) (x) is some non-degenerate DF, or equivalently

 nF (αn x+βn )−rn √ rn

(0,1) (x) = Φ(V (x; β)). In this case, a and −→ i n n Vi (x; β), i ∈ {1, 2, 3}, and Φ bn may be chosen such that an = αθ(n) and bn = βθ(n) . Moreover, Φ(m,k) (x) must have the form Φ(Vi (x; β)), i.e., V(x) = Vi (x; β).

Theorem 2.75 (Barakat et al., 2014c, 2015b) Let m1 = m2 = ... = mn−1 = m > −1, and let rn be a nondecreasing intermediate rank-sequence. Then, there exist normalizing constants a ˜n > 0 and ˜bn such that w

an x + ˜bn ) −→ Φd(m,k) (x), Φd(m,k) rn :n (˜ n

(2.57)

where Φd(m,k) (x) is a non-degenerate DF, if and only if an x + ˜bn ) rN − N T m (˜ −→ √ n t(x), rN where Φd(m,k) (x) = Φ(t(x)). Furthermore, let rn be a variable rank-sequence defined by rn = rθ−1 (N ) , where θ(n) = (m + 1)N. Then, there exist normalizing constants a ˜n > 0 and ˜bn for which (2.57) is satisfied for some non-degenerate DF Φd(m,k) (x), if and only if there are normalizing constants α ˜ n > 0 and β˜n , for which d(0,1) w Φr :n (˜ αn x + β˜n ) −→ Φd(0,1) (x), n n

where Φd(0,1) (x) is some non-degenerate DF, or equivalently

 −nF (α ˜ rn √˜ n x+βn ) rn

d(0,1) (x) = Φ(V (x; β)). In this case, a −→ ˜n and i n Vi (x; β), i ∈ {1, 2, 3}, and Φ ˜bn may be chosen such that a ˜n = α ˜ θ(n) and ˜bn = β˜θ(n) . Moreover, Φd(m,k) (x) must have the form Φ(Vi (x; β)), i.e., t(x) = Vi (x; β).

2.14.3 Further limit theorems of gos and dgos Although, Theorems 2.70 and 2.71 provide a set-up which includes many interesting models such as ordinary order statistics, sos and pos with censoring scheme (R1 , ..., RM ), R1 = ... = RM , a lot number of models contained in the families gos and dgos are excluded from this set-up, e.g., pos with general censoring scheme (R1 , ..., RM ). A more general set-up, which includes

76

Asymptotic theory of order statistics: A historical retrospective

pos with general censoring scheme is when the vector m ˜ is arbitrarily chosen such that mi > −1, i = 1, 2, ..., n − 1, and the parameters γ1 , γ2 , ..., γn are pairwise different, i.e., γi = γj , i = j, for all i, j ∈ {1, ..., n}. For instance, this assumption does not restrict the pos with a general censoring scheme (R1 , ..., RM ). The marginal DF of the rth gos in this general case was given in Kamps and Cramer (2001) as (m,k) ˜ Φr:n (x) = 1 − Cr−1

r

ai (r)

γi

i=1

(1 − F (x))γi .

Moreover, the DF of the rth dgos is given by (cf. Kamps and Cramer, 2001) m,k) ˜ (x) = Cr−1 Φd( r:n

r

ai (r) γi F (x), i=1

where Cr−1 =

'r

j=1 γj

and ai (r) =

r ' j=1

γi

1 γj −γi

=

'r

j=1

1 γj −γi .

Since the

j=i

parameters γ1 , γ2 , ..., γn eventually depend on n, we indicate this attribute subsequently by an additional index, i.e., γi,n , i = 1, 2, ..., n. The next two theorems, which were derived by Barakat and El-Adll (2012, 2009), extend Theorem 2.70 to wide subclasses of gos and dgos. Theorem 2.76 (cf. Barakat and El-Adll, 2012) more, under the condition

Let γ1,n −→ n ∞. Further-

∗ Γr (Ui;α (x)) Φ(0,1) r:n (dn x + cn ) −→ n w

and for all m ˜ ∈ Rn−1 such that mi > −1, i = 1, ..., n−1, and γi,n = γj,n , i = j, for all i, j ∈ {1, ..., n}, we get m,k) ˜ ∗ (dn,m Γr (Ui;α (x)) Φ(r:n ˜ x + cn,m ˜ ) −→ n w

=1−

∗ (x))j (Ui;α ∗ (x)), i = 1, 2, 3, exp(−Ui;α j! j=0

r−1

where cn,m ˜ = dγ1,n . ˜ = cγ1,n and dn,m Theorem 2.77 (cf. Barakat and El-Adll, 2009) more, under the condition

Let γ1,n −→ n ∞. Further-

∗ ´n ) −→ Γr (Ui;α (x)) Φn−r+:n (´bn x + a n d(0,1)

w

and for all m ˜ ∈ Rn−1 such that mi > −1, i = 1, ..., n − 1, and γi,n = γj,n , i =

2.15 Restricted convergence

77

j, for all i, j ∈ {1, ..., n}, we get d(m,k) ˜ w ´n,m Φn−r+:n (´bn,m ˜x+a ˜ ) −→ n

=1−

s−1

 (x))j (Ui;α  (x)), exp(−Ui;α j! j=0

where the normalizing constants can be chosen such that a ´n,m = cγ1,n and ´bn,m = dγ . 1,n The following two theorems extend Theorems 2.76 and 2.77 to the case when γ1,n −→ n γ1 , 0 < γ1 < ∞. It is worth mentioning that most of the known models, e.g., ordinary order statistics, sos, and pos, are excluded from this situation. Theorem 2.78 (cf. Barakat and El-Adll, 2012) Under the conditions of Theorem 2.76, except γ1,n −→ n γ1 , 0 < γ1 < ∞, we get w

m,k) ˜ (x) −→ 1− Φ(r:n n

r r

(

i=1

j=1

γj γ )F i (x), γ j − γi

j=i

where γj,n −→ γj , j = 1, 2, ..., r. Theorem 2.79 (cf. Barakat and El-Adll, 2009) Under the conditions of Theorem 2.77, except γ1,n −→ n γ1 , 0 < γ1 < ∞, we get d(m,k) ˜

w

Φn−r+:n (x) −→ n

r r

(

i=1

j=1

γj )F γi (x), γj − γi

j=i

where γj,n −→ γj , j = 2, 3, ..., r. 2.15 Restricted convergence In this section, an important stability property of extreme order statistics (as well as order statistics with variable rank, gos and dgos) is discussed. It is proved that the restricted convergence of the normalized extremes on an arbitrary non-degenerate interval implies the weak convergence. Namely, we consider the situation where the DF of the suitably normalized extreme order statistics, on an interval [c, d], c < d, converges to arbitrary nondecreasing function. The continuation of this convergence (weak) on the whole real line to the extreme types (which are given in Section 2.2) is then proved (stability of the convergence of the extremes under linear normalization).

78

Asymptotic theory of order statistics: A historical retrospective

This problem is analogous to a problem that arose in the framework of a new theory of sums of independent RVs which was stimulated by an idea of V. M. Zolotarev (this idea was explained in Rossberg, 1995, see also Riedel, 1977). The first result in this field was given by Rossberg (1974). Some results of this property concerning the asymptotic theory of order statistics have been obtained by Gnedenko (1982) and Gnedenko and Senocy Bereksy (1982a, 1982b, 1983). The results of these papers may be summarized as follows: (1) If the DF of a suitably linear normalized extreme order statistic converges to a nondecreasing function N (x), for all continuity points of N (x), in a restricted set S = (c, d), where c < 0, d > 0, N (d) − N (c) > 0, and N (x) is equal to one of the extreme value distributions, for all x ∈ S, then this convergence will continue weakly, for all x, to this extreme value distribution. (2) If the DF of a suitably linear normalized extreme order statistic converges to a nondecreasing function N (x), for all continuity points of N (x), in a set S = {x : x ≤ A}, A is a constant, where N (x) has at least two growth points on S and N (−∞) = 0, then the convergence will continue weakly, for all x, to an extreme value distribution which coincides with N (x) on (−∞, A]. When r is a positive fixed integer, the first item was discussed, for rth upper extremes, in Gnedenko and Sherif (1983), while in Gnedenko et al. (1985) the second item was investigated for the joint DF of the rth and sth (r < s) upper extremes, with the same original DF F (x). Some important applications and interpretations “due to A. D. Solove’ev” of the above stated results were given in Gnedenko and Senocy Bereksy (1982b). Rossberg’s (1995) survey paper stresses different points of the restricted convergence of the DFs of sums of independent RVs and DFs of order statistics. The item (1), for the rth extreme order statistics, was discussed by Barakat (1997b), without the restrictive conditions c < 0, d > 0, and Nr (x) is equal to one of the extreme value distributions, for all x ∈ (c, d). The continuation property for order statistics with variable ranks, in the first item, was also investigated by Barakat (1997b) and more recently by Barakat and Ramachandran (2001). The continuation property is of obvious theoretical importance, but it also has important practical applications, especially in the modelling of extreme value data, because the actual application of the extreme value distributions is that they provide in some cases exact but in most cases approximate probabilistic models for random quantities when the extremes govern the laws of interest (strength of materials, floods, drought, etc.). Therefore, a

2.16 Review of extreme value analysis in environmental studies

79

complicated situation can be handled by a comparatively simple asymptotic model. This can often be done by collecting a statistical sample on a random quantity and performing a goodness of fit test. However, there are often two major difficulties, which may restrict the advantage of this approach. The first difficulty is that the range of values, for which the statistical samples are collected, is always limited. Therefore, the data actually enables us to identify the limit DF of the extreme order statistic, which is of interest, only on a finite interval. The second difficulty is that many sampling procedures give only discrete statistical observations while many physical phenomena are (seemingly) continuous by nature. However, these two difficulties may be considered as extrapolation and interpolation problems, respectively, and they become serious in some situations, for example in the medical research and for agencies which regulate food and drug safety standards. Further results on this subject concerning the order statistics under power normalization and gos were done by Barakat (2003, 2010), Barakat et al. (2013b, 2016). 2.16 Review of extreme value analysis in environmental studies Extreme value theory has found widespread applications, primarily in the modelling of environmental phenomena, where extreme levels of some physical process may lead to substantial damage, e.g., pollutant concentrations and river heights. In a number of areas of statistical applications, one is primarily interested in drawing inferences about the extreme values of a population. In the 1920s, a number of individuals simultaneously began deriving the statistical theory of extreme values. An early theoretical breakthrough was produced by the British statisticians R. A. Fisher and L. H. C. Tippett, who derived the limiting form of the distribution of the maximum or minimum value in a random sample (see Fisher and Tippett, 1928). Tippett immediately applied this theory to the strength of a yard of cotton, a situation in which the “weakest link” governs failure. This application could be viewed as a precursor to the field of engineering reliability, in which structural failure is modelled statistically. In subsequent decades, extreme value theory found applications in other areas in which extreme events naturally play an important role. Perhaps the ultimate extreme event results in the extinction of a population, with extinction probabilities, received much attention in the ecological literature in Ludwig (1996). Another topic, with ecological implications, is longevity, particularly the variation in life spans among different species of plants and animals; it was discussed in Carey (2003).

80

Asymptotic theory of order statistics: A historical retrospective

In the second half of the last century, Gumbel was a pioneer in applications of the statistical theory of extreme values, particularly in fields such as climatology and hydrology (see Gumbel, 1958). Gumbel (1958) divided the data into sub-samples and fitted one of the extreme data points in each sub-sample. Now, several decades after this quotation appeared, the engineers are indeed convinced of the theory’s utility in water resources management, building design, etc. (e.g., Katz et al., 2002). Yet perhaps this quotation would remain apt if the word “engineers” was replaced with “ecologists.” Extreme events, rare but not necessarily unprecedented, play an important role in ecology (cf. Gutschick and BassiriRad, 2003). The problem of ecological disturbances is commonly associated with the occurrence of extreme events, such as an excursion of a climate variable like temperature outside of a particular range (e.g., above a relatively high, or below a relatively low, threshold). Compounding the problem is the spectre of global climate change, with the given that the occurrence of such a range is by definition unusual; it has been a challenge for statisticians to devise appropriate methods for quantifying the likelihood and intensity of anticipated increases in the frequency of extreme events such as hot spells or intense precipitation (cf. Folland and Karl, 2001). Applications directly relevant to ecology have included environmental variables such as those in climate (e.g., temperature, precipitation, wind speed), hydrology (e.g., stream flow), and oceanography (e.g., sea level, wave height), with several of these variables being included in the examples in Gaines and Denny (1993). Ecological extremes from a scientific perspective—the importance of extreme events in ecology—is well recognized. Focusing on plants, Gutschick and BassiriRad (2003) developed the thesis that extreme events “play a disproportionate role in shaping the physiology, ecology and evolution of organisms.” Despite fire being an integral component of ecosystems, large fires are a graphic example of a disturbance that can disrupt ecosystem-level processes, cf. Moritz (1997). In recent years, numerous studies on air pollution problems using statistical methods have been published, considering troposphere or ground-level ozone. In these studies, techniques such as time series analysis, regression methods, multivariate statistical analysis, or spatial statistics were used to deal with problems like forecasting high levels of O3 , identifying trends in high levels of this pollutant, or mapping the spatial distribution of this element in a region. There are also some interesting works on the analysis of ground-level ozone using extreme value theory (see Smith, 1989, Kuchenhoff and Thamerus, 1996, and Reyes et al., 2009). This methodology could be used, for example, to estimate the probability of obtaining an ozone level

2.16 Review of extreme value analysis in environmental studies

81

higher than a threshold, or to estimate the mean number of times that a threshold could be exceeded in a certain period of time. Therefore, it can be helpful if environmental agencies give out public health warnings or evaluate the effectiveness of their regulation programs. The study of extreme values and prediction of ozone data is an important topic of research when dealing with environmental problems (see Quintela-del-Rio and Francisco-Fern´ andez, 2011). Another application of extreme value analysis is the assessment of a meteorological event in site evaluations for nuclear power plants, which is important in safety standards series. Nuclear Power Plants (NPPs) are sources of four types of environmental impact (radiation, chemical and thermal and caused by urbanization of the NPP region). These impacts should be limited in order to conserve a certain quality of the ecosystem, so that the monitoring system around the nuclear power station must include the extreme value model, which is used in emergencies when a pollutant exceeds the limit value; the use of this model and the meteorological data enable us to avoid the damages caused by this pollutant. Nowadays, there exist several research papers about air pollution. In the following, we review some selected works concerning air pollution: 1. Sungpurwalla (1972) showed how certain empirical relationships observed in an analysis of air pollution data can be interpreted by using extreme value theory. 2. Sharma et al. (1999) used extreme value theory for making predictions of the expected number of violations of national ambient air quality standards. 3. Perez (2001) reported a study on the possibility of predicting hourly averages of atmospheric SO2 concentrations on the basis of data obtained in a station located at a fixed point near downtown Santiago. 4. Tatsuya and Kanda (2002) presented a comparison of two probabilistic models for directional annual maximum wind speeds to clarify the characteristics of directional maximum wind speeds. 5. Hurairah et al. (2003) modified extreme value distribution by introducing a new parameter, namely: shape parameter. The results indicate that the new GEVL is better with extreme values data than the original data. 6. Kan and Chen (2004) determined that the log-normal, Pearson V, and extreme value distributions are the best statistical distributions for the daily average concentration data of PM10, SO2 , and NO2 data in Shanghai, respectively. They concluded that the results can be further applied to local air pollution prediction and control.

82

Asymptotic theory of order statistics: A historical retrospective

7. Hurairah et al. (2005) applied a new extreme value model to air pollution data in Malaysia. 8. Sfetsos et al. (2006) Modeled daily PM10 concentration values from an industrial area in west Macedonia by using extreme value theory. 9. Zhou et al. (2012) used generalized Pareto distribution (under linear normalization-GPDL) in extreme value theory to fit the extreme pollution concentrations of three main pollutants: PM10 , NO2 , and SO2 , from 2005 to 2010 in Changsha, China. 10. Barakat et al. (2014a) used the BM and POT methods to model the air pollution in two cities in Egypt, where a new method was suggested to choose a suitable threshold value. 11. Marlier et al. (2016) reviewed and assessed recent findings on the impacts of extreme air pollution in the most populous megacities. 12. Barakat et al. (2017a) derived several Hill estimators under power normalization. Moreover, the authors used these estimators in studying the air pollution in two cities in Egypt. 13. Barakat et al. (2018) developed the modelling of extreme values via power model by suggesting a simple technique to obtain a parallel estimator of the extreme value index (EVI) in the power model for every known estimator to the corresponding parameter in the linear model. The authors applied the linear and the power models on a real data set of the air pollutant Particulate Matter (PM10) during the year 2008 to assess the quality of the air in two cities in Egypt. It is worth noting that although the published papers in the above sample on air pollution by using extreme value analysis mostly focused on some specific air pollutants in the authors’ countries, the developed approaches by these authors may be applied to other pollutants in other regions in any country.

3 Bootstrap order statistics and calibration of the sub-sample bootstrap method

The bootstrap is an extremely flexible technique that can be applied to a wide variety of problems. A good introduction is Efron and Gong (1983) or the monograph by Efron (1982). The bootstrap allows us to approximate the distribution of a statistic. One of the desired properties of the bootstrapping method is consistency, which guarantees that the limit of the bootstrap distribution is the same as that of the distribution of the given statistic. It has been known for a long time that for the bootstrap of the maximum of a sample to be consistent, the bootstrap sample size needs to be of a smaller order than the original sample size. Actually, Athreya and Fukuchi (1993) showed that by employing a sub-sample bootstrap, where the re-samples have a size of an order of magnitude smaller than the size of the original sample, the bootstrap distribution of maximum order statistics converges to one of Gnedenko’s extreme value distributions (2.2). The inconsistency, weak consistency and strong consistency of bootstrapping maximum order statistics under linear normalization were investigated by Athreya and Fukuchi (1993), while for maximum order statistics under power normalization this study was extended by Nigm (2006). Barakat et al. (2015a) studied the inconsistency, weak consistency, and strong consistency of bootstrapping central and intermediate order statistics under linear normalization with an appropriate choice of re-sample size. Barakat et al. (2015c) studied also the consistency property of the bootstrapping central and intermediate order statistics under power normalization. Although the bootstrap is inconsistent for the maximum order statistics, the works of Bretagnolle (1983), Athreya (1987), and Arcones and Gin´e (1989) revealed that if the re-sample size is m = o(n), then the bootstrap could be made consistent for maximum order statistics. This is indeed the case and it is the focus of this chapter.

84 Bootstrap order statistics and calibration of the sub-sample bootstrap method

3.1 Bootstrapping extremes under linear normalization This section presents results on the asymptotic behaviour of the bootstrap distribution. Bickel and Freedman (1981) showed that the bootstrap of the maximum of iid uniform RVs is inconsistent. Angus (1993) further showed that the bootstrap distribution converges in distribution to a random distribution and obtained the explicit form of the limit when normalizing constants are estimated. In this section, it is shown that, when normalizing constants are known, the bootstrap distribution converges in distribution to a random distribution. Now, assume Yj , j = 1, 2, ..., m, where m = m(n) −→ n ∞, are conditionally independent RVs with 1 P (Y1 = Xj |Xn ) = , j = 1, 2, ..., n, ¯ n

(3.1)

where Xn = (X1 , X2 , ..., Xn ) is a random sample of size n from an unknown ¯ DF F. Hence, Y1 , ..., Ym is a re-sample of size m from the empirical DF Fn (x) =

n 1

1 I(−∞,x) (Xi ) = Sn (x), n i=1 n

(3.2)

where IA (x) is the usual indicator function and Sn (x) is an RV distributed as a binomial distribution B(n, F ). Furthermore, let Y1:m , Y2:m , ..., Ym:m be the corresponding order statistics of Y1 , ..., Ym . Now, let (2.1) be satisfied with the normalizing constants an > 0 and bn and define m Hn,m (x) = P {a−1 m (Ym:m − bm ) ≤ x|Xn } = Fn (am x + bm ).

(3.3)

Hn,m (x) is called the bootstrap distribution of a−1 n (Xn:n − bn ), where n and m are the sample size and re-sample size, respectively.

3.1.1 Convergence bootstrap distributions when the normalizing constants are known The following theorems, taken from Athreya and Fukuchi (1993), show that if m = n, Hn,m (x) has a random limit and thus the naive bootstrap fails to approximate Hn (x). Theorem 3.2 shows that if m increases to infinity but is slower than n, the bootstrap distribution is consistent. Theorem 3.1 Let F ∈ D (H). Furthermore, let Hn,n , an > 0, bn be such that (2.1) holds. Then, d exp {−P (x)}, Hn,n (x) −→ n

3.1 Bootstrapping extremes under linear normalization

85

where P (x) is Poisson distributed with parameter Ui,α , defined in (2.2), and d ” denotes convergence in distribution. “ −→ n Theorem 3.2

Let (2.1) be satisfied and m = ◦(n). Then, p

sup | Hn,m (x) − H(x)) | −→ 0. n

(3.4)

x∈R



∞ Furthermore, if n=1 λ n < ∞, for every 0 < λ < 1, then the relation (3.4) holds with probability 1. m

3.1.2 Convergence bootstrap distributions when an and bn are unknown If F is unknown, am and bm need to be estimated from the data for Hn,m (x) to be used. Let a ˆm and ˆbm be some estimators of am and bm based on X1 , X2 , ..., Xn . Now, define the bootstrap distribution of a−1 n (Xn:n − bn ) with estimated normalizing constants by ˆ ˆ n,m (x) = P {ˆ H a−1 m (Ym:m − bm ) ≤ x|Xn }. ˆ n,m (x) to be consistent. The next theorem gives a sufficient condition for H Theorem 3.3 (cf. Athreya and Fukuchi, 1993) Assume that a ˆm , ˆbm and m(n) satisfy the following conditions: (i) Hn,m (x) −→ n H(x), with probability one, −→ 1 with probability one, (ii) aaˆm n m ˆbm − bm ) −→ (iii) a−1 ( n 0, with probability one. m Then, ˆ n,m (x) − H(x)) | −→ sup | H n 0, x∈R

with probability one. For the bootstrap distribution to be consistent, as Theorems 3.2 and 3.3 indicate, we need to choose a ˆm and ˆbm satisfying (ii) and (iii) in Theorem ˆm and 3.3. Since am and bm are functions of F, then the natural choices of a ˆbm are the empirical counter parts of am and bm . Let ln = [ n ] and `ln = [ n ], m em where [.] is the integer part. In the following theorem, we define a ˆm and ˆbm for each domain of attraction. Theorem 3.4 (cf. Athreya and Fukuchi, 1993)

Let m = ◦(n). Then,

1 1 ˆm = Fn−1 (1− em 1. if F ∈ D(H1 ), take a )−Fn−1 (1− m ) = Xn−`ln :n −Xn−ln :n , ˆbm = F −1 (1 − 1 ) = Xn−l :n , n n m

86 Bootstrap order statistics and calibration of the sub-sample bootstrap method 1 2. if F ∈ D(H2,α ), take a ˆm = Fn−1 (1 − m ) = Xn−ln :n , ˆbm = 0, 1 ) = Xn:n − Xn−ln :n ˆbm = 3. if F ∈ D(H3,α ), take a ˆm = ρFn − Fn−1 (1 − m ρFn = Xn:n .

Under the above choices of the estimates of the normalizing constants, we get p ˆ n,m (x) − H(x) | −→ 0, (3.5) sup | H n x∈R

where H = H1 , or H = H2,α , or H = H3,α (H1 , H2,α and H3,α are defined in  m (2.2)) according to the domain of attraction of F. Moreover, if ∞ n=1 λ n < ∞, 0 < λ < 1, then the relation (3.5) holds with probability 1. Remark By combining (2.1) and (3.5) we see that p

ˆ n,m (x) − Hn:n (an x + bn ) | −→ 0, sup | H n

x∈R



if m = ◦(n), and with probability 1, if ∞ n=1 λ n < ∞, 0 < λ < 1. Therefore, ˆ n,m (x) approximates Hn:n (an x + bn ) uniformly in R, when n −→ ∞. Note H m



also that m = ◦( logn n ) is sufficient for ∞ n=1 λ n < ∞, for every 0 < λ < 1. Deheuvels et al. (1993) proved the strong consistency under the weaker condition m = ◦( log nlog n ). m

The following theorem shows that the joint distribution of a−1 n (Xn:n − −1 −1 bn ), a−1 n (Xn−1:n −bn ), an (Xn−2:n −bn ), ..., an (Xn−r+1:n −bn ) can be bootstrapped consistently. −1 Theorem 3.5 Let P {a−1 n (Xn:n − bn ) ≤ x1 , ..., an (Xn−r+1:n − bn ) ≤ xr } w Hr (x1 , ..., xr ). Assume the hypotheses on F and choose a ˆm and ˆbm −→ n as in Theorem 3.4 according to the domain of attraction of F. If m = ◦(n), then

sup

x1 >x2 >...>xr

−1 ˆ ˆ | P {ˆ a−1 m (Ym:m − bm ) ≤ x1 , ..., am (Ym−r+1:m − bm ) ≤ xr |Xn } p

Moreover, if probability 1.

∞

n=1 λ

−Hr (x1 , ..., xr ) | −→ 0. n m n

(3.6)

< ∞, 0 < λ < 1, then the relation (3.6) holds with

3.2 Bootstrapping extremes under power normalization In this section, we define different bootstrap distributions for different types of the domain of attraction of extremes under power normalization. Nigm (2006) studied the inconsistency, weak consistence and strong consistence of

3.2 Bootstrapping extremes under power normalization

87

bootstrapping with an appropriate choice of re-sample size when the normalizing constants are known. Nigm (2006) investigated the same problem, when the normalizing constants are unknown. Moreover, Nigm (2006) investigated the bootstrap for the joint distributions. Now, assume Yj , j = 1, 2, ..., m, where m = m(n) −→ n ∞, are defined as in (3.1). Define G∗n (x) = P

(  (1 ( Xn:n ( an ( ( S(Xn:n ) ≤ x|X , ( b ( ¯n

(3.7)

n

w  H(x) F n (bn |x|an S(x)) −→ n

(3.8)

 can only take one and only one of the types (recall that the limit DF H(x)  i,β (exp(ui,β (x))), i = 1, 2, ..., 6, where ui,β (x); i = 1, 2, .., 3, are defined in H (2.31) in Section 2.8) and  n,m (x) = P H

(  ( 1 ( Ym:m ( am ( ( S(Ym:m ) ≤ x|Xn = Fnm ((bm |x|am S(x))). ( b ( ¯ m

( (

( (

1

 n,m (x) is called the bootstrap distribution of ( Xn:n ( an S(Xn:n ). Moreover, H bn n and m are called the sample size and re-sample size, respectively.

3.2.1 Convergence bootstrap distributions when the normalizing constants are known  n,m (x) has a random limit and thus Theorem 3.6 shows that if m = n, H the naive bootstrap fails to approximate G∗n (x). Theorem 3.7 shows that if m increases to infinity, but slower than n, the bootstrap distribution is consistent.  Furthermore, let H  n,n (x) Theorem 3.6 (cf. Nigm, 2006) Let F ∈ Dp (H). an > 0, bn > 0, be such that (3.8) holds. Then, d  n,n (x) −→ H exp {−P (x)}, n

where P (x) is Poisson distributed with parameter ui,β , defined in (2.31). Theorem 3.7 (cf. Nigm, 2006) ◦(n), then

Let (3.8) be satisfied. Moreover, if m =

 n,m (x) − H(x))  sup | H | −→ 0. n p

(3.9)

x∈R

If

∞

m

n=1 λ n

< ∞, 0 < λ < 1, then Relation (3.9) holds with probability 1.

88 Bootstrap order statistics and calibration of the sub-sample bootstrap method

3.2.2 Convergence bootstrap distributions when the normalizing constants are unknown  n,m (x) If F is unknown, am and bm need to be estimated from the data for H ∗ ∗ to be used. Let am and bm be some estimators of am and bm based on ( (

( (

1

a X1 , X2 , ..., Xn . Now, define the bootstrap distribution of ( Xbn:n ( n S(Xn:n ), n with estimated normalizing constants by

(  ( 1 ( Ym:m ( a∗m ( ( S(Xn:n ) ≤ x|Xn . n,m (x) = P ( ∗ ( ¯ b

∗ H

m

a∗m

b∗m

Since and are functions of F, then the natural choices of a∗m and b∗m are the empirical counter parts of am and bm . We define a∗m and b∗m for each domain of attraction in the following theorem (see Theorem 2.40 in Section 2.9). Theorem 3.8 (cf. Nigm, 2006) Let F be a given DF with right endpoint n ρ = sup{x : F (x) < 1}. Furthermore, let m = ◦(n) and rn = [ m ]. Then,  1,β (x)), take if F ∈ Dp (H

a∗n = log F − (1 − 1/n) = log Xn−rn :n and b∗n = 1;  2,β (x)), take if F ∈ Dp (H

a∗n = log[ρ/F − (1 − 1/n)] = log Xn:n − log Xn−rn :n and b∗n = Xn:n ;  3,β (x)), take if F ∈ Dp (H

a∗n = − log[−F − (1 − 1/n)] = − log Xn−rn :n and b∗n = 1;  4,β (x)), take if F ∈ Dp (H

a∗n = log[F − (1−1/n)/ρ] = log Xn−rn :n −log Xn:n and b∗n = −ρ = −Xn:n ;  5,β (x)), take if F ∈ Dp (H

a∗n = f (b∗n ) and b∗n = F − (1 − 1/n) = log Xn−rn :n ;  6,β (x)), take if F ∈ Dp (H

a∗n = f (−b∗n ) and b∗n = −F − (1 − 1/n) = − log Xn−rn :n . Under the above choices of the estimates of the normalizing constants, we get  ∗ (bm |x|am S(x)) − H(x)  | −→ 0, sup | H n,m) n p

x∈R

(3.10)

3.3 Verification of the sub-sample bootstrap method

89

  i;β (x) = exp[−ui;β (x)], i = 1, 2, . . . , 6, are dewhere β > 0 and H(x) =H

fined in (2.31) according to the domain of attraction of F. Moreover, if ∞ m n=1 λ n < ∞, 0 < λ < 1, then the relation (3.10) holds with probability 1. Remark Combining (3.8) and (3.10), we get  ∗ (x) − G∗ (x) | −→ 0, sup | H n n,m) n p

x∈R



if m = ◦(n), and with probability 1 if ∞ n=1 λ n < ∞, 0 < λ < 1. Therefore,  ∗ (x) approximates G∗ (x) uniformly in R, when n −→ ∞. Note also H n n,m) that m = ◦( logn n ) is sufficient for

∞

m

m

n=1 λ n

< ∞, for every 0 < λ < 1. ( (

( (

1

a The next theorem shows that the joint distribution of ( Xbn:n ( n S(Xn:n ), n

( (1 ( (1 ( Xn−1:n ( an (X ( an ( bn ( S(Xn−1:n ), ..., ( n−r+1:n ( S(Xn−r+1:n ) can be bootstrapped conbn

sistently. Let

(  (1 ( (1 ( Xn:n ( an ( (a ( ( S(Xn:n ) ≤ x1 , ..., ( Xn−r+1:n ( n S(Xn−r+1:n ) ≤ xr ( b ( ( ( b

P

n

n

 r (x1 , ..., xr ). H −→ n w

Theorem 3.9 (cf. Nigm, 2006) Assume the hypotheses on F and choose a∗m and b∗m as in Theorem 3.8 according to the domain of attraction of F. If m = ◦(n), then sup

x1 >x2 >...>xr

( (1 ( (1 ( Xn:n ( a∗n ( Xn−r+1:n ( a∗n ( ( ( ( S(Xn−r+1:n ) ≤ xr } | P {( ∗ ( S(Xn:n ) ≤ x1 , ..., ( ( b b∗ n

n

 r (x1 , ..., xr ) | −→ 0, −H n ∞

in probability. Moreover, if n=1 λ (3.11) holds with probability 1.

m n

(3.11)

< ∞, 0 < λ < 1, then the relation

3.3 Verification of the sub-sample bootstrap method In this section, we will study the modelling of maxima by GEVL and GEVP for different DFs by using the sub-sample bootstrap technique. We consider some distributions and apply the sub-sample bootstrap technique based on sample size n = 20000 and different values of m. Moreover, we take 400 bootstrap replicates (each of size m) or blocks. In order to estimate the mean squared error (MSE) of the estimated parameter γ of the models GEVL and GEVP, we repeat our procedure 50 times. The following three examples are taken from Barakat et al. (2015d).

90 Bootstrap order statistics and calibration of the sub-sample bootstrap method

Example 3.10

Consider the DF F1 , which is defined by 

F1 (x) =

0,   exp − x1 ,

x ≤ 1, x > 0.

Clearly F1 ∈ D (G1 (x)) (Gγ is defined in (2.4)) and F1 ∈ DP (P1,0 (x; 1, 1)) (P1,γ (x; 1, 1) is defined in (2.32)). Table 3.1 gives the convergence of the DF in cases of GEVL and GEVP. It is noted that the value γ = 1.0013 is close to one with MSE= 0.0129 in GEVL at m = 50. Also, the value γ = −0.0060 is close to zero with MSE= 1.064 in GEVP at m = 50. This result agrees with the theoretical result given by Barakat et al. (2010). Example 3.11 Let F2 has the uniform distribution over the interval (2,22), denoted by U (a, b). We already know that F2 ∈ D (G−1 ) (Gγ is defined in (2.4)) and F2 ∈ DP (H1,−1 ) (P1,γ (x; 1, 1) is defined in (2.32)). In Table 3.2, we note that the value γ = −0.9954 is close to -1, with MSE= 0.0019 in GEVL at m = 50. Also, the value γ = −0.9981 is close to -1, with MSE= 0.0026 in GEVP at m = 50. Again, this result agrees with the theoretical result given in Barakat et al. (2010). Example 3.12

Consider the DF 

F3:k (x) =

1,   1 − exp −(log(x))k ,

x ≤ 1, x > 1, k > 0.

For k = 1, we have F3:1 ∈ D (G1 ) (Gγ is defined in (2.4)) and F3:1 ∈ DP (H1,0 ) (P1,γ (x; 1, 1) is defined in (2.32)). From Table 3.3, we note that the value γ = 0.9995 is close to one, with MSE= 0.0066 in GEVL at m = 50. Also, the value γ = −0.0035 at m = 50 is close to zero, with MSE= 1.0088 in GEVP at m = 50. Again, this result agrees with the theoretical result given by Barakat et al. (2010).

3.3 Verification of the sub-sample bootstrap method

91

Table 3.1 Estimated GEVL and GEVP models, for F1 Random sample from F1 (x), n = 20000 m

30

50

100

150

200

300

400

0.9978 0.0234

0.9915 0.0243

0.9806 0.0271

-0.0126 1.020

-0.0170 1.0416

-0.0276 1.0679

GEVL γ MSE

1.0076 0.023

1.0013 0.0129

0.9982 0.0133

-0.0064 1.0111

-0.0060 1.0146

-0.0098 1.0229

0.9983 0.0172

GEVP γ MSE

-0.0105 1.0237

Table 3.2 Estimated GEVL and GEVP models, for F2 Random sample from U (2, 22), n = 20000 m

30

50

100

150

200

300

400

-1.0281 0.0077

-1.0372 0.0102

-1.0332 0.0066

-1.0012 0.0058

-1.014 0.0065

-1.0179 0.0064

GEVL γ MSE

-0.9907 0.0022

-0.9954 0.0019

-1.0116 0.0044

-1.0029 0.0034

-0.9981 0.0026

-1.0079 0.0044

-1.271 0.0063

GEVP γ MSE

-1.0099 0.0056

Table 3.3 Estimated GEVL and GEVP models, for F3:1 Random sample from F3:1 (x), n = 20000 m

30

50

100

150

200

300

400

0.9882 0.0166

0.9804 0.0211

0.9873 0.0217

-0.0232 1.0517

-0.0231 1.0548

-0.0387 1.0883

GEVL γ MSE

1.014 0.0072

0.9995 0.0066

1.0059 0.0102

0.9782 0.0166

GEVP γ MSE

-0.0102 0.9815

-0.0035 1.0088

-0.0079 1.0094

-0.0083 1.0198

92 Bootstrap order statistics and calibration of the sub-sample bootstrap method

3.4 Bootstrapping of order statistics with variable rank in the L-model This section presents the inconsistency and weak consistency of bootstrapping central and intermediate order statistics under linear and power normalizing constants for an appropriate choice of re-sample size. A simulation study is given as an illustrative numerical example.

3.4.1 Bootstrapping of central order statistics under linear normalization In this subsection, we study the asymptotic behaviour of the bootstrap distribution of central order statistics under linear normalization. The inconsistency and weak consistency of bootstrapping central order statistics under linear normalization for an appropriate choice of re-sample size are investigated. A simulation study is given as an illustrative numerical example. When the rank-sequence r = rn of the central order statistic Xr:n is √ assumed to satisfy the regular condition (2.16), i.e., n( nr − λ) −→ n 0, and w

G(x), Φλ:n (an x + bn ) = P (Xr:n ≤ an x + bn ) −→ n

(3.12)

where G is non-degenerate DF, then Subsection 3.3.1 shows that G must have one and only one of the types Φ(Wi;β (x)), i = 1, 2, 3, 4, where Wi;β (x)), i = 1, 2, 3, 4, are defined in (2.17). Moreover, due to Smirnov (1952) (see also Leadbetter et al., 1983, p. 46-47), (3.12) is satisfied with G(x) = Φ(Wi;β (x)), for some i ∈ {1, 2, 3, 4}, if and only if √ F (an x + bn ) − λ −→ n n Wi,α (x), i ∈ {1, 2, 3, 4}, Cλ

(3.13)



where Cλ = λ(1 − λ). Although, in general the convergence in (3.12) is not to continuous types, the following lemma (cf. Barakat and El-Shandidy, 2004) shows that this convergence is uniform. √ Lemma 3.13 Under the condition n( nr − λ) −→ n 0, we have for large n 

Φλ:n (an x + bn ) = Φ

√ F (an x + bn ) − λ n Cλ



+ Rn (x),

where Rn (x) −→ n 0, uniformly with respect to x. The following theorem gives the asymptotic behaviour of the bootstrap distribution Hλ,n,m (am x + bm ) = P (Xrm :m ≤ am x + bm |Xn ) of the central ¯ order statistic Xrn :n of Xn . ¯

3.4 Bootstrapping of order statistics with variable rank in the L-model

Theorem 3.14 (cf. Barakat et al., 2015a) G(x) = Φ(Wi;β (x)), i ∈ {1, 2, 3, 4}. Then,

93

Let (3.12) be satisfied with

d Φ(η(x)), Hλ,n,n (an x + bn ) −→ n

where η(x) has a normal distribution with mean Wi;β (x) and unit variance, i.e., P (η(x) ≤ z) = Φ(z − Wi;β (x)). Moreover, if m = ◦(n), then p

0. sup | Hλ,n,m (am x + bm ) − Φ(Wi;β (x)) | −→ n

(3.14)

x∈R

Theorem 3.14 shows that if m = n, Hλ,n,m (am x + bm ) has a random limit and thus the naive bootstrap fails to approximate Φλ:n (an x + bn ). In other words the naive bootstrap of the rth central order statistic when m = n fails to be a consistent estimator for the limit DF Φ(Wi;β (x)), while the relation (3.14) shows that this bootstrap is consistent if m = ◦(n). Proof

By applying Lemma 3.13, we get Hλ,n,m (am x + bm ) = Φ (Bλ,n,m (x)) + Rm ,

(3.15)

√ m )−λ where Bλ,n,m (x) = m Fn (am x+b and Rm −→ n 0, uniformly in x. Now Cλ assume that the condition m = n is satisfied, then in view of (3.2) and by applying the central limit theorem, we get Sn (an x + bn ) − nF (an x + bn ) d −→ Z, nF (an x + bn )(1 − F (an x + bn )) n



(3.16)

where Z is the standard normal RV. On the other hand, under the condition of Theorem 3.11, the relation (3.13) is satisfied. Thus, we get F (an x + bn ) −→ n λ, for all x such that Wi;β (x) < ∞. Therefore, we get 

nF (an x + bn )(1 − F (an x + bn )) −→ √ n 1 nCλ

and nF (an x + bn ) − nλ −→ √ n Wi;β (x). nCλ The above two limit relations thus enable us to apply Khinchin’s theorem on the relation (3.16) to get Bλ,n,n (x) =

Sn (an x + bn ) − nλ d √ Z + Wi;β (x). −→ n nCλ

(3.17)

Combining relations (3.15) and (3.17) we get (3.12). Turning now to prove

94 Bootstrap order statistics and calibration of the sub-sample bootstrap method

the relation (3.14). In view of (3.2) and (3.13), we get E(Bλ,n,m (x)) =

√ F (am x + bm ) − λ −→ m n Wi;β (x). Cλ

(3.18)

Moreover, in view of (3.2), (3.13), and the condition m = ◦(n), we get m Var(Bλ,n,m (x)) = 2 Var(Fn (am x + bm )) = Cλ m n2 Cλ2 =

nF (am x + bm )(1 − F (am x + bm ))

mF (am x + bm )(1 − F (am x + bm )) −→ n 0. nCλ2

(3.19)

The relation (3.14) follows immediately by using both relations (3.18) and (3.19). This completes the proof of Theorem 3.14. In the rest of this subsection, we will present a simulation study that shows that the sub-sample bootstrap technique, based on the BM technique and Theorem 3.12, suggests an efficient technique for modelling the quantile values such as the median. For applying the suggested technique, we first generated five random samples from the normal distributions N (μ = −2, σ = 1), N (μ = −1, σ = 1), N (μ = 0, σ = 1), N (μ = 1, σ = 1), and N (μ = −2, σ = 1). All these samples have the same size n = 20000. Our aim is to estimate the limit DF of the sample quantile value Xrn :n , rn = [ 21 n + 1], i.e., the sample median, of the preceding five populations. We know that in this case the limit DF of this quantile again has a normal distribution with mean μ. Therefore, we apply the sub-sample technique to get the estimated models and then check the compatibility of these estimates with the theoretical models. First, we choose a suitable value for the size of the bootstrap replicates or the blocks size m. Theorem 3.12 shows that this value should be small enough to satisfy the condition m = ◦(n) and at the same time should 4 be large enough to satisfy the condition m −→ n ∞. In our case (n = 2×10 ), 3 we can initially differentiate between the two values m = 2 × 10 = 2000 and m = 2 × 102 = 200 based on the accuracy of the estimates of μ. These estimates, denoted by μ ˆ, are obtained by withdrawing from each of the original sample a large number of bootstrap replicates (each of size m), namely 300 replicates or blocks (each of which has size m), and determining the sample median of each block. Then, these medians are used as a sample drawn from a normal DF to get the estimates for its mean and standard deviation, denoted by σ ˆ , by using the ML method. For all cases, we found that the value m = 200 gives the best estimates. To get a more accurate

3.4 Bootstrapping of order statistics with variable rank in the L-model

95

value of m, we consider five further values with median m = 200, namely m = 100, 150, 200, 250, 300, and repeat the preceding procedures for all these values. Table 3.4 presents the results for these values and the corresponding estimates for μ, as well as the estimated standard deviations, where in Table 3.4 the value with the asterisk is the best value. All the obtained estimates for these values are close to the true values of the median (μ) and the best of them is presented in the form μ ˆ∗ . Finally, the sub-samples corresponding to these best values were fitted in Table 3.5, by using the K-S test. Table 3.5 shows that we have a good fit.

96 Bootstrap order statistics and calibration of the sub-sample bootstrap method

Table 3.4 Generated data from N (μ, σ = 1): bootstrap technique for quantiles μ

m = 100 ∗

m = 150

m = 200

m = 250

m = 300

−2

μ ˆ = −1.9918 σ ˆ = 0.1238

μ ˆ = −2.0285 σ ˆ = 0.1003

μ ˆ = −2.0163 σ ˆ = 0.0864

μ ˆ = −1.9793 σ ˆ = 0.0819

μ ˆ = −1.9864 σ ˆ = 0.0723

−1

μ ˆ = −0.9951 σ ˆ = 0.1221

μ ˆ = −0.9986∗ μ ˆ = −0.9956 σ ˆ = 0.1023 σ ˆ = 0.0884

μ ˆ = −1.0024 σ ˆ = 0.0805

μ ˆ = −0.9901 σ ˆ = 0.0706

0

μ ˆ = −0.0171 σ ˆ = 0.1267

μ ˆ = −0.0042 σ ˆ = 0.1027

μ ˆ = 0.0142 σ ˆ = 0.0899

μ ˆ = −0.0016∗ μ ˆ = −0.0133 σ ˆ = 0.0825 σ ˆ = 0.0703

1

μ ˆ = 0.9968 σ ˆ = 0.1291

μ ˆ = 0.9906 σ ˆ = 0.1044

μ ˆ = 0.9969 σ ˆ = 0.0906

μ ˆ = 1.0018∗ σ ˆ = 0.0801

μ ˆ = 1.0082 σ ˆ = 0.0705

2

μ ˆ = 2.0021∗ σ ˆ = 0.1268

μ ˆ = 1.9959 σ ˆ = 0.1022

μ ˆ = 1.9896 σ ˆ = 0.0861

μ ˆ = 2.0141 σ ˆ = 0.0780

μ ˆ = 2.0159 σ ˆ = 0.0656

H 0 0 0 0 0

m = 100, N (−1.9918, 0.1238) m = 150, N (−0.9986, 0.1023) m = 200, N (−0.0016, 0.0825) m = 250, N (1.0018, 0.0801) m = 100, N (2.0021, 0.1268) 0.5333 0.6900 0.8092 0.6510 0.4900

P 0.0276 0.0211 0.0159 0.0228 0.0295

KSSTAT 0.0608 0.0608 0.0608 0.0608 0.0608

CV accept accept accept accept accept

null null null null null

hypothesis hypothesis hypothesis hypothesis hypothesis

Decision the the the the the

Table 3.5 K-S Test: bootstrap technique for quantiles

m, N (ˆ μ ,σ ˆ∗)



98 Bootstrap order statistics and calibration of the sub-sample bootstrap method

3.4.2 Bootstrapping of intermediate order statistics under linear normalization In this subsection, we study the asymptotic behaviour of the bootstrap distribution of intermediate order statistics under linear normalization. The inconsistency and weak consistency of bootstrapping intermediate order statistics under linear normalization for an appropriate choice of re-sample size are obtained. Chibisov (1964) studied a wide class of intermediate order statistics Xrn :n , where rn = l2 nα (1+◦(1)), 0 < α < 1, l2 > 0 (see Subsection 2.4.1). Chibisov (1964) showed that if there are normalizing constants an > 0 and bn such that w

Φrn :n (an x + bn ) = P (Xrn :n ≤ an x + bn ) −→ G(x), n

(3.20)

where G(x) is a non-degenerate DF, then G(x) must have one and only one of the types Φ(Vi;β (x)), i = 1, 2, 3, where Vi;β (x), i = 1, 2, 3, are defined in (2.12). Moreover, (3.20) is satisfied with G(x) = Φ(Vi;β (x)), for some i ∈ {1, 2, 3}, if and only if nF (an x + bn ) − rn √ F (an x + bn ) − r¯n −→ √ = n √ n Vi;β (x), rn r¯n

(3.21)

where r¯n = rnn . We note that the convergence in (3.20) is always to a continuous type. Thus, the convergence in (3.20) is uniform with respect to x. This implies that for large n we can write 

Φrn :n (an x + bn ) = Φ

√ F (an x + bn ) − r¯n √ n r¯n



+ ρn (x),

(3.22)

where ρn (x) −→ n 0, uniformly with respect to x. The following theorem studies the asymptotic behaviour of the bootstrap distribution Hrm ,n,m (am x + bm ) = P (Xrm :m ≤ am x + bm |Xn ) of the inter¯ mediate order statistic Xrn :n of Xn . ¯ Theorem 3.15 (cf. Barakat et al., 2015a) Let (3.20) be satisfied with G(x) = Φ(Vi;β (x)), i ∈ {1, 2, 3}. Then, d Φ(ξ(x)), Hrn ,n,n (an x + bn ) −→ n

(3.23)

where ξ(x) has a normal distribution with mean Vi;β (x) and unit variance, i.e., P (ξ(x) ≤ z) = Φ(z − Vi;β (x)). Moreover, if m = ◦(n), then p

0. sup | Hrm ,n,m (am x + bm ) − Φ(Vi;β (x)) | −→ n

x∈R

(3.24)

3.4 Bootstrapping of order statistics with variable rank in the L-model

99

Theorem 3.15 shows that if m = n, the naive bootstrap of the rn th intermediate order statistic when m = n fails to be a consistent estimator for the limit DF Φ(Vi;β (x)), while the relation (3.24) shows that this bootstrap is consistent if m = ◦(n). Proof

In view of (3.22), we get

(3.25) Hrm ,n,m (am x + bm ) = Φ (Brm ,n,m (x)) + ρm , √ Fn (am x+bm )−¯rm √ and ρm −→ where Brm ,n,m (x) = m n 0, uniformly in x. Now, r¯m assume that the condition m = n is satisfied, then in view of (3.2) and by applying the central limit theorem, we get Sn (an x + bn ) − nF (an x + bn ) d −→ Z, nF (an x + bn )(1 − F (an x + bn )) n



(3.26)

where Z is the standard normal RV. On the other hand, under the condition of Theorem 3.12, the relation (3.21) is satisfied. Thus, we get F (αn x + βn ) ∼ r¯n −→ n 0, for all x such that Vi;β (x) < ∞. Therefore, we get 

nF (an x + bn )(1 − F (an x + bn )) ∼ rn

and

)

n¯ rn =1 rn

nF (an x + bn ) − rn rn nF (an x + bn ) − n¯ −→ = √ √ n Vi;β (x). rn rn

The above two limit relations thus enable us to apply Khinchin’s theorem on the relation (3.26) to get Brn ,n,n (x) =

Sn (an x + bn ) − rn d Z + Vi;β (x). √ −→ n rn

(3.27)

Combining relations (3.25) and (3.27), we get (3.23). We now turn to prove the relation (3.24). In view of (3.2) and (3.21), we get E(Brm ,n,m (x)) =



m

F (am x + bm ) − r¯m −→ √ n Vi;β (x). r¯m

(3.28)

Moreover, in view of (3.2), (3.21), and the condition m = ◦(n), we get m Var(Brm ,n,m (x)) = Var(Fn (am x + bm )) r¯m = =

m nF (am x + bm )(1 − F (am x + bm )) n2 r¯m

mF (am x + bm )(1 − F (am x + bm )) m ∼ ∼ ◦(1) −→ n 0. n¯ rm n

(3.29)

100Bootstrap order statistics and calibration of the sub-sample bootstrap method

The relation (3.24) follows immediately by using both relations (3.28) and (3.29). This completes the proof of Theorem 3.12.

3.5 Bootstrapping of order statistics with variable rank in the P-model It has been known for a long time that to bootstrap the distribution of the extremes under the traditional linear normalization of a sample consistently, the bootstrap sample size needs to be of a smaller order than the original sample size (see the preceding sections in this chapter). In this section, we show that the same is true if we use the bootstrap for estimating a central or an intermediate quantile under power normalization.

3.5.1 Bootstrapping central order statistics under power normalization Barakat and Omar (2011a) considered the weak convergence of the power ( (

( (

1

normalized central order statistic ( Xrbnn:n ( an S(Xrn :n ), an , bn > 0, (  (1 ( Xrn :n ( an w ( ( S(Xrn :n ) ≤ x = IF (bn |x|an S(x)) (rn , n − rn + 1) −→ Ψ(x). P ( n b ( n

(3.30) Barakat and Omar (2011a) in Theorem 2.48 investigated the class of possible limit distributions of Ψ. Moreover, (3.30) is satisfied with Ψi (x), i ∈ {1, 2, 3, 4} (see Theorem 2.48), if and only if √ F (bn |x|an S(x))) − λ −1 −→  n n Φ (Ψi (x)) = Wi;β (x). λ(1 − λ) Let Hλ,n,m (bm |x|

am

(3.31)

(  ( 1 ( Xrm :m ( am ( ( S(x)) = P ( S(Xrm :m ) ≤ x|Xn ( ¯ b m

( ( 1 ( ( be the bootstrap DF of ( Xrbmm:m ( am S(Xrm :m ). A full-sample bootstrap is the

case when m = n. In contrast, a sub-sample bootstrap is the case when m < n. The following theorem determines the asymptotic behaviour of the bootstrap distribution Hλ,n,m (bm |x|am S(x)) = P (Xrm :m ≤ bm |x|am S(x)|Xn ) of the central order statistic Xrn :n of Xn . ¯ ¯

3.5 Bootstrapping of order statistics with variable rank in the P-model

Theorem 3.16 (cf. Barakat et al., 2015c) Ψ(x) = Φ(Wi;β (x)), i ∈ {1, 2, 3, 4}. Then,

101

Let (3.30) be satisfied with

d Φ(Z(x)), Hλ,n,n (bn |x|an S(x)) −→ n

(3.32)

where Z(x) has a normal distribution with mean Wi;β (x) and unit variance, i.e., P (Z(x) ≤ z) = Φ(z − Wi (x)). Moreover, if m = ◦(n) −→ n ∞, then p

sup | Hλ,n,m (bm |x|am S(x)) − Φ(Wi;β (x)) | −→ 0. n

(3.33)

x∈R

Theorem 3.16 shows that if m = n, Hλ,n,m (bm |x|am S(x)) has a random limit and thus the naive bootstrap fails to approximate Φλ:n (bm |x|am S(x)). In other words, the naive bootstrap of the rn th central order statistic, when m = n, fails to be a consistent estimator for the limit DF Φ(Wi (x)), while the relation (3.33) shows that this bootstrap is consistent if m = ◦(n). Proof By applying Lemma 2.1 in Barakat et al. (2015c) (see also Remark 2.2 of Barakat et al., 2015c), we get Hλ,n,m (bm |x|am S(x)) = Φ (Tλ,n,m (x)) + Rm , (3.34) √ Fn (bm |x|am S(x))−λ √ where Tλ,n,m (x) = m and Rm −→ n 0. Now, assume that λ(1−λ)

the condition m = n is satisfied, then in view of (3.2) and by applying the central limit theorem we get 

Sn (bn |x|an S(x)) − nF (bn |x|an S(x)) nF (bn |x|an S(x))(1 − F (bn |x|an S(x)))

d Z, −→ n

(3.35)

where Z is the standard normal RV. On the other hand, under the condition of Theorem 3.16, the relation (3.31) is satisfied. Thus, we get F (bn |x|an S(x)) −→ n λ, for all x such that Wi (x) < ∞. Therefore, we get 

nF (bn |x|an S(x))(1 − F (bn |x|an S(x))) −→ √  n 1 n λ(1 − λ)

and nF (bn |x|an S(x)) − nλ −→ √  n Wi (x). n λ(1 − λ) The above two limit relations enable us to apply the modification of Khinchin’s theorem (cf. Barakat and Omar, 2011a) on the relation (3.35) to get Tλ,n,n (x) =

Sn (bn |x|an S(x)) − nλ d Z + Wi (x) = Z(x). √  −→ n n λ(1 − λ)

(3.36)

102Bootstrap order statistics and calibration of the sub-sample bootstrap method

Combining relations (3.34) and (3.36), we get (3.32). To prove the relation (3.33), we first notice that (3.31) and (3.32) imply that E(Tλ,n,m (x)) =

√ F (bm |x|am S(x)) − λ −→  m n Wi (x). λ(1 − λ)

(3.37)

Moreover, in view of (3.31), (3.32), and the condition m = ◦(n), we get m Var(Tλ,n,m (x)) = Var(Fn (bm |x|am S(x))) λ(1 − λ) =

m nF (bm |x|am S(x))(1 − F (bm |x|am S(x))) n2 λ(1 − λ)

=

mF (bm |x|am S(x))(1 − F (bm |x|am S(x))) −→ n 0. nλ(1 − λ)

(3.38)

The relation (3.33) follows immediately by combining (3.37) and (3.38). This completes the proof of Theorem 3.16. 3.5.2 Bootstrapping intermediate order statistics under power normalization Barakat and Omar (2011b) extended the work of Chibisov to the power normalization case by considering the limit relation Φrn (βn |x|αn S(x)) = P

(  ( 1 ( Xrn :n ( αn w ( ( S(Xr :n ) ≤ x −→ L(x), n ( β ( n

(3.39)

n

where αn , βn > 0 and L(x) is a non-degenerate DF. Barakat and Omar ˜ i,β (x), (2011b) proved that the class of possible limit distributions of L(x) is {G i = 1, 2, ..., 6} (see (2.42), Subsection 2.12.1). Moreover, (3.39) is satisfied ˜ i;β (x), i ∈ {1, 2, ..., 6}, if and only if with L(x) = G √ F (βn |x|αn S(x)) − r¯n ˜ i;β (x)) = Ui (x), √ −→ Φ−1 (G n r¯n

(3.40)

where r¯n = rnn . Since, all the limit types in (3.39) are continuous, then the convergence in (3.39) is uniform with respect to x ∈ R. Therefore, we have Φrn (βn |x|αn S(x)) = Φ



√ F (βn |x|αn S(x)) − r¯n √ n r¯n



+ ρn (x), for large n,

(3.41) 0, uniformly with respect to x ∈ R. where ρn (x) −→ n The next theorem determines the asymptotic behaviour of the bootstrap distribution Hrm ,n,m (βm |x|αm S(x)) = P (Xrm :m ≤ βm |x|αm S(x)|Xn ) of the ¯ intermediate order statistic Xrn :n of Xn . ¯

3.5 Bootstrapping of order statistics with variable rank in the P-model

Theorem 3.17 (cf. Barakat et al., 2015c) L(x) = Φ(Ui (x)), i ∈ {1, 2, ..., 6}. Then,

103

Let (3.39) be satisfied with

d Hrn ,n,n (βn |x|αn S(x)) −→ Φ(ξ(x)), n

(3.42)

where ξ(x) has a normal distribution with mean Ui (x) and unit variance, i.e., P (ξ(x) ≤ z) = Φ(z − Ui (x)). Moreover, if m = ◦(n), then p

0. sup | Hrm ,n,m (βm |x|αm S(x)) − Φ(Ui (x)) | −→ n

(3.43)

x∈R

Theorem 3.17 shows that if m = n, the naive bootstrap of the rn th intermediate order statistic fails to be a consistent estimator for the limit DF Φ(Ui (x)), while the relation (3.43) shows that this bootstrap is consistent if m = ◦(n). Proof

In view of (3.41), we get Hrm ,n,m (βm |x|αm S(x)) = Φ (Trm ,n,m (x)) + ρm ,

(3.44)

αm √ where Trm ,n,m (x) = m Fn (βm |x|√r¯mS(x))−¯rm and ρm −→ n 0, uniformly in x. Now, assume that the condition m = n is satisfied, then in view of (3.2) and by applying the central limit theorem we get



Sn (βn |x|αn S(x)) − nF (βn |x|αn S(x))

nF (βn |x|

αn

S(x))(1 − F (βn |x|

αn

S(x)))

d Z. −→ n

(3.45)

On the other hand, under the condition of Theorem 3.17, the relation (3.40) is satisfied. Thus, we get F (βn |x|αn S(x)) ∼ r¯n −→ 0, for all x such that Ui (x) < ∞. Therefore, we get 

nF (βn |x|αn S(x))(1 − F (βn |x|αn S(x))) rn

)



n¯ rn =1 rn

and rn nF (βn |x|αn S(x)) − n¯ nF (βn |x|αn S(x)) − rn −→ = √ √ n Ui (x). rn rn The above two limit relations enable us to apply the modification of Khinchin’s type theorem (cf. Barakat and Omar, 2011a) to the relation (3.45) to get Trn ,n,n (x) =

Sn (βn |x|αn S(x)) − rn d Z + Ui (x). √ −→ n rn

(3.46)

Thus, by combining relations (3.44) and (3.46), we get (3.42). We turn now

104Bootstrap order statistics and calibration of the sub-sample bootstrap method

to prove the relation (3.43). In view of (3.2) and (3.40) we get E(Trm ,n,m (x)) =



m

F (βm |x|αm S(x)) − r¯m −→ √ n Ui (x). r¯m

(3.47)

Moreover, in view of (3.42), (3.40), and the condition m = ◦(n), we get m Var(Trm ,n,m (x)) = V ar(Fn (βm |x|αm S(x))) r¯m = =

m nF (βm |x|αm S(x))(1 − F (βm |x|αm S(x))) n2 r¯m

m mF (βm |x|αm S(x))(1 − F (βm |x|αm S(x)) ∼ ∼ ◦(1) −→ n 0. n¯ rm n

(3.48)

The relation (3.43) follows immediately by combining (3.47) and (3.48). This completes the proof of Theorem 3.17. 3.6 Simulation study In this section, via a simulation study, we show that the sub-sample bootstrap technique suggests an efficient technique for modelling quantile values such as the median. We consider the uniform DF F  (x) = 12 (x + 1), − 1 ≤ x ≤ 1. Let bn = √1n and an = K > 0. Therefore, for any λ ∈ (0, 1), we an √ √ √  get n F (bn√|x| S(x))−λ = |x|K S(x), − n ≤ x ≤ n, which implies that λ(1−λ)

w

Φλ:n (bn |x|an S(x)) −→ Φ(|x|K S(x)) (cf. Barakat and Omar, 2011a). For n this limit DF, we have μ = mean = median = 0. Our aim is to apply the suggested technique given in Theorem 3.16 to estimate the limit DF of the sample median Xrn :n , rn = [λn + 1] = [ 12 n + 1], where [θ] denotes the integer part of θ, for the two values K = 1, 12 . For K = 1, we first generate a random sample of size n = 20000 from the DF F  (x). We apply the sub-sample technique to get the estimated model and then check the compatibility of this estimate with the theoretical model. We first choose a suitable value of the size of the bootstrap that replicates m. Theorem 3.16 shows that this value should be small enough to satisfy the condition m = ◦(n) and at the same time should be large enough to satisfy the condition m −→ n ∞. The simple way to determine a suitable value of m is to put n in the form a(10)b + c, where a, b, and c are integers such that 1 ≤ a < 10, 0 ≤ c ≤ (10)b−1 . Thus, in our case a = 2, b = 4, and c = 0. Moreover, in view of the two conditions m = ◦(n) and m −→ n ∞, we can initially take two possible values of m such that m = 2 × (10)3 = 2000 and m = 2 × (10)2 = 200. After that, we can differentiate between these two values on the bases of the accuracy of the

3.6 Simulation study

105

estimated value of the median. This estimate, denoted by μ ˆ, is obtained by withdrawing from the original sample a large number of bootstrap replicates or blocks, namely 1000 blocks, each of which has size m, and determining the sample median of each block. Then, these medians are used as a sample drawn from a normal distribution to get the estimates for its mean and standard deviation, denoted by σ ˆ , by using the ML method. We found that the value m = 200 gives the best estimate. To get a more accurate value of m, we consider an appropriate discrete neighbourhood of m = 200, namely m = 100, 150, 200, 250, 300 (Table 3.6). Moreover, we repeat the preceding procedures for all values of this neighbourhood to select a value, which gives the best estimate for μ = 0. Table 3.7 presents the results for these values and the corresponding estimates for μ, as well as the estimated standard deviations. All the obtained estimates for these values are close to the true value of the median μ = 0 and the best of them was obtained at m = 200. Finally, the sub-samples corresponding to these values were fitted by using the K-S test. For the value K = 21 (see Table 3.7), a similar procedure  is applied, except that we generate the original sample from the DF F  ( | x |S(x)) and we choose the values m = 100, 150, 200, 250, 300 (we can differentiate between these values by the corresponding values of KSST AT, i.e., the best value of m, which is m = 200, has a minimum value of KSST AT ). Moreover, the sub-samples corresponding to these values were fitted by using the K-S test to the DF  ˆ | x |S(x)), where here σ ˆ is the ML estimate of the scale parameter. Φ( σ In this study, all computations are achieved by the Matlab package, where we have four functions [H, P, KSST AT, CV ], H = 0, or H = 1, P is the p−value, KSST AT is the maximum difference between the data (i.e., the empirical DF) and fitting curve and CV is a critical value. We accept H0 , if H = 0, KSST AT ≤ CV and P > level of significance, otherwise, we reject H0 . Tables 3.6 and 3.7 show that the estimated models, for all values of m, are compatible with the theoretical models.

106Bootstrap order statistics and calibration of the sub-sample bootstrap method

Table 3.6 Simulation study for K = 1 ,

m, normal(ˆ μσ ˆ ) = Φ(ˆ μ, σ ˆ)

H

P

KSSTAT

CV

m = 300, Φ(0.0011, 0.0578) m = 250, Φ(−0.0010, 0.0624) m = 200, Φ(0.0003, 0.0702) m = 150, Φ(0.0013, 0.0799) m = 100, Φ(0.0006, 0.0987)

0 0 0 0 0

0.1676 0.1058 0.2936 0.1940 0.6890

0.0297 0.0333 0.0246 0.0285 0.0135

0.0385 0.0385 0.0385 0.0385 0.0385

Decision accept accept accept accept accept

H0 H0 H0 H0 H0

Table 3.7 Simulation study for K = 2 m, σ ˆ

H

P

KSSTAT

CV

m = 300, σ ˆ = 295.2 m = 250, σ ˆ = 256.3 m = 200, σ ˆ = 203.3 m = 150, σ ˆ = 152 m = 100, σ ˆ = 101.3

0 0 0 0 0

0.4645 0.4128 0.5324 0.4419 0.6004

0.0284 0.0277 0.0242 0.0250 0.0296

0.0385 0.0385 0.0385 0.0385 0.0385

Decision accept accept accept accept accept

H0 H0 H0 H0 H0

3.7 Bootstrapping extreme generalized order statistics In this section, we review Barakat et al.’s (2011b) result, in which Athreya and Fukuchi’s (1993) results were extended to extreme generalized order statistics. Assume that (3.1) is satisfied. Let Bj , 1 ≤ j ≤ N be independent RVs with respective beta distribution β(γj , 1), i.e., Bj follows a power function distribution with exponent γj = k + (N − j)(m + 1), j = 1, ..., N. Therefore, in view of Cramer’s (2003) results (see Subsection 2.14.1), we get ⎛  Yr:N

= Y (r, N, m, k) =

Fn−1 ⎝1



r



Bj ⎠ ,

r = 1, 2, ..., N,

j=1  is the rth m−gos based on the empirical DF F . Furthermore, where Yr:N n let (m,k)

Hr,n,N (x) = P

  Y

N −r+1:N

aN

− bN



≤ x| Xn = IΨn,m:N (x) (N  − r + 1, r )

(3.49) X −bn  = r + k − 1, , where r be the bootstrap distribution of n−r+1:n an m+1 k − 1 and Ψn,m:N (x) = 1 − (1 − Fn (aN x + bN ))m+1 is a DF. N  = N + m+1 (0,k)

In the next theorem, we show that if N = n, Hr,n,N (x) has a random limit (0,k)

and thus the naive bootstrap fails to approximate Φn−r+1:n (an x + bn ). This means that the naive bootstrap of the rth upper m−gos with m = 0, n = N

3.7 Bootstrapping extreme generalized order statistics

fails to be a consistent estimator for the DF same theorem shows that if m > 0 and N = (m,k)

estimator for Φr

107

(0,k) Φr (x). Furthermore, the (m,k) n, Hr,n,N (x) is a consistent

(x) (see (2.50)).

Theorem 3.18 (cf. Barakat et al., 2011b) Let the relation (2.50) be sat(m,k) m+1 (x)), i ∈ {1, 2, 3}, an > 0 and bn ∈ R. isfied with Φr (x) = 1 − Γr (Ui,α Then, d (0,k) (x) −→ 1 − Γr (P (x)), Hr,n,n n

(3.50)

where P (x) has a Poisson distribution with parameter Ui,α (x). Moreover, if m > 0, then p

(m,k) (x) −→ Φ(m,k) Hr,n,n (x), ∀x ∈ R. r n

(3.51)

Proof Since Ψn,m:n (x) = 1 − (1 − Fn (x))m+1 = 1 − F¯nm+1 (x) is a DF, ¯ n,m:n (x) = 1 − Ψn,m:n (x) = F¯ m+1 (x) and n = n + k − 1 −→ Ψ n ∞, the n m+1 result of Smirnov (1952) (Theorem 3, Page 133, or Lemma 3.1 in Barakat, 1997a and (3.49)) yields that %

&

¯ n,m:n (an x + bn ) − σn ≤ H (m,k) (x) ≤ 1 − Γr n Ψ r,n,n %

&

¯ n,m:n (an x + bn ) + ρn , 1 − Γr  n  Ψ where σn and ρn converge to zero, as n → ∞ (or equivalently, as n → ∞). ¯ n,m:n (x) = F¯nm+1 (x) and n ∼ n, the above inequalities can be Since Ψ written in the form 



(m,k) 1 − Γr nF¯nm+1 (an x + bn ) − σn ≤ Hr,n,n (x) ≤





1 − Γr nF¯nm+1 (an x + bn ) + ρn , or





m+1 (m,k) 1 − Γr Pn,m:n (x) − σn ≤ Hr,n,n (x) ≤





m+1 (x) + ρn , 1 − Γr Pn,m:n

(3.52)

where 1

1

Pn,m:n (x) = n m+1 −1 Sn,m:n = n m+1 −1

n

I(an x+bn ,∞) (Xj )

j=1

and Sn,m:n is a binomial distributed, namely BIN(n, F¯ (an x + bn )). Clearly, when m = 0, we get Pn,0:n (x) = Sn,0:n being binomially distributed as BIN(n, F¯ (αn x + βn )), where nF¯ (αn x + βn ) −→ n Ui,α (x) (recall that an =

108Bootstrap order statistics and calibration of the sub-sample bootstrap method 1

αφ(n) , bn = βφ(n) and φ(n) = n m+1 , see Theorem 2.71, Subsection 2.14.1), d P (x), where which by Poisson’s theorem implies Pn,0:n (x) = Sn,0:n −→ n P (x) has a Poisson distribution with parameter Ui,α (x). Therefore, upon using (3.52) with m = 0, we get the relation (3.50). Hence the first result. Turning now to the case when N = n and m > 0. In this case, we have 1 Pn,m:n (x) = nc Sn,m:n , where c = m+1 − 1 < 0 and Sn,m:n is binomially dis¯ the characteristic function  of Sn,m:n is tributed as BIN(n, F(an x+bn)). Thus  n given by CS (t) = E eitSn,m:n = F (an x + bn ) + F¯ (an x + bn )eit . Therefore, the characteristic function of Pn,m:n (x) is given by 

CP (t) = F (an x + bn ) + F¯ (an x + bn )ein %

= 1 + itnc F¯ (an x + bn )(1 + ◦(1)

&n

ct

n

.

(3.53)

On the other hand, since nc+1 F¯ (an x + bn ) = φ(n)F¯ (αφ(n) x + βφ(n) ) −→ n Ui,α (x), we get



CP (t) =

itnc+1 F¯ (an x + bn )(1 + ◦(1)) 1+ n

n

itUi,α (x) , ∀ x ∈ R. −→ n e

(3.54) Consequently, the converse limit theorem and the elementary limit probap bility theory lead to Pn,m:n (x) −→ Ui,α (x), ∀x ∈ , which in turns, in view n of (3.52) and Theorem 2.71, implies the relation (3.51). This completes the proof of (3.51), as well as of Theorem 3.18. Theorem 3.19 (cf. Barakat et al., 2011b) Assume that there exist constants an > 0 and bn ∈ R such that the week convergence (2.50) holds and 1 let φ(N ) = N m+1 = o(n), then ( ( (m,k)

( (

p

0. sup (Hr,n,N (x) − Φ(m,k) (x)( −→ r n

(3.55)

x∈R

Moreover, if

∞  n=1

n

θ φ(N ) < ∞, ∀ θ ∈ (0, 1), then the relation (3.55) holds with

probability 1, written

( ( (m,k)

( ( a.s.

0. (x)( −→ sup (Hr,n,N (x) − Φ(m,k) r n

(3.56)

x∈R

Proof

Since N  ∼ N, as n → ∞, the relation (3.52) takes the form 



(m,k)





m+1 m+1 1−Γr Pn,m:N (x) −σN  ≤ Hr,n,N (x) ≤ 1−Γr Pn,m:N (x) +ρN  , (3.57)

3.7 Bootstrapping extreme generalized order statistics

109

1 m+1

F¯n (aN x + bN ). Recalling that Fn is the empirical where Pn,m:N (x) = N distribution of the random sample Xn , thus in view of (2.50) and by applying Theorem 2.71, we get 1

E (Pn,m:N ) = N m+1 F¯ (aN x + bN ) = φ(N )F¯ (αφ(N ) x + βφ(N ) ) −→ n Ui,α (x) (3.58) and 2

VAR (Pn,m:N ) = = (since

N m+1 F (aN x + bN )F¯ (aN x + bN ) n

  φ(N )  φ(N )F¯ (αφ(N ) x + βφ(N ) ) F (αφ(N ) x + βφ(N ) ) −→ n 0 (3.59) n

φ(N ) n

−→ n 0 and F (αφ(N ) x + βφ(N ) ) −→ n 1). Therefore, p

Ui,α (x), ∀x ∈ R, Pn,m:N −→ n

(3.60)

which implies by virtue of (3.57) and Theorem 2.70 that (m,k)

p

Φ(m,k) (x), ∀x ∈ R. Hr,n,N (x) −→ r n

(3.61)

This proves the relation (3.55). We turn now to prove the relation (3.56). ) ¯ Since Pn,m:N = φ(N n Qn,m:N + E(Pn,m:N ), where Qn,m:N = nFn (aN x + U (x). We have to show that bN ) − nF¯ (aN x + bN ) and E(Pn,m:N ) −→ i,α n a.s. a.s. φ(N ) Q 0 to prove P (x) U (x). By the Borel Cantelli −→ i,α n,m:N −→ n,m:N n n n lemma, it is enough to prove that ∞



P

n=1



φ(N ) |Qn,m:N | >  < ∞, ∀ > 0. n

(3.62)

Let MN (t) = F (aN x + bN ) + F¯ (aN x + bN )et be the moment generating function of the Bernoulli distribution (F¯ (aN x + bN )). Then, for all t > 0, φ(N ) log P n ≤ = =



   φ(N ) φ(N ) t n Qn,m:N >  = log P etQn,m:N > e φ(N ) n n

  φ(N ) −t n log e φ(N ) E etQn,m:N n   φ(N ) ¯ −t − tφ(N )F¯ (aN x + bN ) + log E etnF (aN x+bN ) n −t − tφ(N )F¯ (αφ(N ) x + βφ(N ) ) + log (MN (t))φ(N )

Ln (, t) =



−→ t − tUi,α (x) + log eUi,α (x)(e n =

t −1)



−t + Ui,α (x)(et − t − 1) = L(t, ).

(3.63)

110Bootstrap order statistics and calibration of the sub-sample bootstrap method

It is easy to verify that  the function L(t, ) attains its minimum value U (x)+ at t0 = log Ui,αi,α (x) > 0. Let L() = L(t0 , ). Then, L(0+) = 0 and

d d L() = −t0 < 0, ∀ > Ln (, t0 ) → L(, t0 ) = L(). ∞

n=1



P

0. On the other hand, we note that Ln () = Let t = t0 in (3.63). Then,



% φ(N ) & ∞ ∞



n φ(N ) L () elog P n Qn,m:N > ≤ e φ(N ) n . Qn,m:N >  = n n=1 n=1

Given  > 0, there exists δ = δ() > 0 such that L() + δ < 0 and there exists n = n () ∈ ℵ, for which Ln () < L() + δ, ∀n ≥ n . Therefore, ∞



e

n L () φ(N ) n

 −1 n

=⎝

n=1

+



⎞ n

⎠ e φ(N ) Ln ()

n=n

n=1  −1 n



n

e φ(N )

Ln ()



+

n

e φ(N )

(L()+δ)

< ∞,

n=n

n=1

by the assumption of the second part of Theorem 3.19. Thus, we have 



P

n=1



φ(N ) Qn,m:N >  < ∞, n

∀  > 0.

By a similar reasoning we can show that ∞

n=1



P



φ(N ) Qn,m:N < − < ∞, n

∀ > 0.

This completes the proof of Theorem 3.19. Suppose now the normalizing constants an and bn are unknown. Let a ˆN and ˆbN be the estimators of an and bn , respectively, based on Xn . Let the (m,k) (an x+bn ), with the  estimated normalizing bootstrap distribution of Φn−r+1:n  ˆ (m,k) (x) = P constants, be H r,n,N

YN −r+1:N −ˆbN a ˆN

≤ x| Xn . The next theorem

shows that the same choice of N (n) as in the case of known normalizing constants, i.e., as in Theorem 3.19, gives the same result of the consistency ˆ (m,k) (x), if a of H ˆN and dˆN are correctly chosen. r,n,N n n  Theorem 3.20 Let k = [ φ(N ) ] and k = [ eφ(N ) ]. Furthermore, let the assumptions of Theorem 3.19 be satisfied. Then, the bootstrap distribution ˆ (m,k) (x) satisfies the relations (3.55) and (3.56) with H r,n,N

3.7 Bootstrapping extreme generalized order statistics

111

Φ(m,k) (x) r ⎧ ⎪ ⎪ ⎪ ⎪ m+1 ⎪ ⎪ 1−Γr (U1,α (x)), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨

=

m+1 ⎪ 1−Γr (U2,α (x)), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ m+1 ⎪ ⎪ 1−Γr (U3,α (x)), ⎪ ⎪ ⎩

1 1 −1  if a ˆN = Fn−1 (1− eφ(N ) )−Fn (1− φ(N ) ) = Xn−k :n 1 −1 ˆ −Xn−k:n and bN = Fn (1− φ(N ) ) = Xn−k:n , 1 ˆ if cˆN = Fn−1 (1− φ(N ) ) = Xn−k:n and bN = 0, 1 if a ˆN = ωn − Fn−1 (1− φ(N ) ) = Xn:n −Xn−k:n and ˆbN = ωn = Xn:n ,

where ωn = sup{x : Fn (x) < 1}. Proof In view of the definition of the normalizing constants, which is given in Theorem 2.71, we can see that the proof of Theorem 3.20 is similar to the case of maximum order statistics under linear normalization, which is given in Athreya and Fukuchi (1993).

4 Statistical modelling of extreme value data under linear normalization

This chapter focuses primarily on case studies and associated data analytic methods in the extreme value theory under linear normalization. In addition, this chapter provides an up-to-date review of that topic to the readership. More precisely, the BM and POT methods are used to evaluate and compare the measurements of some pollutants in two cities, Zagazig and 10th of Ramadan of the Al Sharqiya governorate, in Egypt. The simulation study is used to choose the threshold of GPDL. Furthermore, the ML method is used to estimate the parameters of the models. These estimates are improved by using the bootstrap technique. The data treatment and the methods used may be applied to other pollutants in other regions in any country (see Khaled, 2012). 4.1 The National Environmental Radiation Monitoring Network (NERMN) in Egypt After the Chernobyl accident, it became clear that radioactive pollution could take place in other countries apart from those where the accident took place as radioactive pollutants can cross borders and affect different countries. As Egypt is surrounded by many nuclear facilities from the east in Israel, from the west in Libya, and from the north in the European countries and especially from the Mediterranean countries, accidents could have effects on its environment. Also, the wide use of radioactive isotopes in modern technology in medical fields, industrial fields, agricultural fields, and other research activities makes contamination of the environment probable. In addition, due to the existence of a 2MW nuclear research reactor and a new 22MW research rector in Egypt, continuous monitoring of adjacent areas is needed. Also, continuous planning for a power reactor for peaceful purposes necessitates the continuous monitoring of the environment many years be-

4.2 Environmental monitoring

113

fore building the reactor to know the normal background and compare it continuously with radiation levels after the operation; this will identify any releases that were too small to be detected on their initial release but that could be traced after some accumulation, and will allow countermeasures to be taken. In addition, ships transporting radioactive materials through the Suez Canal and the use of nuclear warships and submarines, which run in the Mediterranean and Red Seas, make it necessary to monitor the environment, especially water, for radioactive pollutants. For these reasons, the Egyptian government took steps to set up a national response plan to deal with internal and external accidents. The key elements of the plan are the establishment of the Nuclear Emergency Response System (NERS) and NERMN. NERMN will detect radioactivity resulting from any accident affecting the Egyptian territory, even if it is not formally reported under international agreements or if there are delays in notification. Thereafter, the system provides the means of assembling and analyzing the radiological monitoring data related to the accident.

4.2 Environmental monitoring Environmental monitoring design deals mainly with two quite different sorts of design problems: monitoring for a trend, where spatial and temporal dependence are of importance, and monitoring for “hot spots,” or regions of local high intensity, which is often used for monitoring compliance with pollution regulations. The basic theory of optimal design for spatial random fields was outlined in Ripley (1981), Chapter 3. Among the popular designs are systematic random sampling designs, in which a point is chosen uniformly over the study area, and a regular design (consisting of squares, triangles, or hexagons) is put down starting at the chosen point. When the sample mean is used to estimate the spatial mean of an isotropic random field over a region, the regular sampling plans are most efficient (cf. Mat´ern, 1960, Chapter 5). The hexagonal design requires fewer sampling sites than the square or triangular one to cover the same area, but does not take into account spatial covariance heterogeneity or temporal non-stationarity. Guttorp et al. (1993) developed an approach to network design, which can deal with heterogeneous random fields. The basic idea is to consider a number of potential monitoring sites, some of which are gauged and some ungauged. In a multivariate normal setting, the design maximizes the amount of information about the ungauged sites that can be obtained from the gauged sites. This can be particularly useful when trying to redesign a current network,

114

Statistical modelling of extreme value data under linear normalization

by adding and removing stations. It is frequently the case that data from a monitoring network will serve more than one purpose: 1. Primary pollutants consist of materials that enter the atmosphere through natural and human-made events, such as nitrogen dioxide and hydrocarbons, which remain in the same chemical form as when they were released. 2. Secondary pollutants consist of primary pollutants that have reacted with each other or with the basic components of the atmosphere to form new toxic substances, such as ozone.

4.3 Chemical pollutants The assessment of air quality is presently being linked to levels of the air pollution and to population distribution. For the protection of health, the concentrations of selected harmful air pollutants should be limited and related to given ambient air quality standards. Several investigations have been performed by the Egyptian Environmental Affairs Agency (EEAA) to estimate the impact on human health from various air pollutants. Air quality limit values are given in the Executive Regulations of Environmental Law no. 4 of Egypt, 1994.

4.3.1 Particulate matter Particulate matter (PM) may be divided into two broad classes depending upon the manner in which it is introduced into the atmosphere. A primary component consists of those particles released directly from their source, while a secondary component consists of those particles created in the atmosphere via chemical reactions between pollutants that were originally emitted as gases. There are numerous examples of secondary particulate compounds that can form in the atmosphere, but only a few of these compounds are present in large enough concentrations to contribute significantly to particulate air pollution, in general, secondary particles are dominated by − sulphates SO2− 4 and nitrates NO3 (see Schlesinger and Cassee, 2003). Epidemiological studies provide fairly convincing evidence for an association between exposure to ambient PM and increased mortality and morbidity, particularly among those people with respiratory or cardiovascular diseases (see Krewski et al., 2004). The assessment of human exposure to multiple compounds and the modelling of risks for constantly changing mixtures of chemicals, like PM, presented an enormous challenge. The simplest approach is to measure the

4.3 Chemical pollutants

115

toxicological activity of the total mass. For example, the Salmonella assay has been a convenient and usable assay to compare genotoxic activity by composition, location, meteorological conditions, sources, and other modifying conditions. The associated mutagenic compounds of PM could be hydrocarbons, oxygen-containing compounds, nitrogen-containing compounds, sulphur-containing compounds, halogen-containing compounds, or organometallic compounds. The PM polar organic fraction, which would contain nitroaromatic compounds, aromatic amines, and aromatic ketones, consistently contributes to the highest percentage of mutagenicity. The various studies in literature used volatile organic solvents to remove the chemical matrix from the solid underlying matrix—acetone and dichloromethane are commonly used solvents in mutagenic research (see Claxton et al., 2004). PM exposure may be the largest potential health problem from air pollution in all European areas. PM10 (particle diameter less than 10 mm, often called “inhalable particles”) concentration is at present the legislative standard for PM in Egypt. The air quality guidelines, published by the World Health Organization (WHO), represent the most widely agreed and up-to-date assessment of health effects of air pollution, recommending targets for air quality at which the health risks are significantly reduced. By reducing particulate matter (PM10) pollution from 70 to 20 μg/m3 , we can cut air-quality-related deaths by around 15%. By reducing air pollution levels, we can help countries to reduce the global burden of disease from respiratory infections, heart disease, and lung cancer. The WHO guidelines provide interim targets for countries that still have very high levels of air pollution to encourage the gradual cutting down of emissions. These interim targets are: a maximum of three days a year with up to 150 micrograms of PM10 per cubic metre (for shortterm peaks of air pollution) and 70 μg/m3 for long-term exposures to PM10. More than half of the burden from air pollution on human health is borne by people in developing countries. In many cities, the average annual levels of PM10 (the main source of which is the burning of fossil fuels) exceeds 70 μg/m3 . The guidelines say that, to prevent ill health, those levels should be lower than 20 micrograms per cubic metre.

4.3.2 Sulphur dioxide Sulphur dioxide (SO2 ) is formed by the oxidation of sulphur impurities in fuels during combustion processes. A very high proportion of SO2 emissions originate from power stations and industrial sources. Though virtually no

116

Statistical modelling of extreme value data under linear normalization

SO2 is emitted from petrol engine vehicles, it is emitted from diesels, which leads to an increase of SO2 concentrations in urban and roadside areas. The diurnal pattern of SO2 is characterized by the maximum concentration in the morning. The higher levels of SO2 during morning hours in Cairo are due to a combination of anthropogenic emissions, boundary layer processes, chemistry, and the local surface wind patterns. During night hours, the boundary layer descends and remains low till early morning; thereby, this action resists the mixing of the anthropogenic emissions with the upper layer. Hence, pollutants get trapped in the shallow surface layer and show higher levels (see El-Dars et al., 2004). During these hours of maximum concentrations of SO2 , anthropogenic emissions are also prominent due to rush hours. The monthly variation in the average of SO2 concentrations ranges from a minimum value of about 20 μg/m3 during January to a high value of about 55 μg/m3 during December (see El-Hussainy and Sharobiem, 2002). 4.3.3 Ozone In nature, ozone is usually found as a result of electric shocks produced in storms and in the upper layers of the atmosphere, as a result of the action of ultraviolet rays on the molecules of dioxygen. In the earth’s atmosphere, it is usually concentrated in the stratosphere, forming a protective layer called the ozone layer (stratospheric ozone). This layer protects people from shortwavelength ionizing radiation. Ozone can also be found concentrated in the lower layers of the atmosphere (tropospheric ozone), where it has become one of the most common pollutants in urban areas, with clear adverse effects on human health. The ozone layer (stratospheric ozone) is an effective filter that protects all life from harmful ultraviolet radiation, whose effects cause changes in the DNA of living organisms, as well as adversely affecting the photosynthesis in plants and phytoplankton. Hence, it is important that the processes of stratospheric ozone formation and depletion are not affected by human action. The destruction of the stratospheric ozone, with its potential effects on the increase of the UV-B radiation at ground level, represents a characteristic of the atmosphere at a global scale. The causes of ozone destruction are fairly well understood and there are many studies on this topic. The anthropogenic emissions of certain chemicals are largely responsible. Based on extensive scientific evidence, international agreements have restricted the use of these ozone-destroying chemicals, particularly the chlorofluorocarbons. As a result of these agreements, concentrations of these compounds in the atmosphere have already stabilized, giving hope for future closure of the ozone

4.3 Chemical pollutants

117

hole. The amount of ozone in the atmosphere is often expressed in Dobson Units (DU), defined as the amount equivalent to 2.7 ∗ 1020 ozone molecules per square metre. The average value for the earth’s surface is 300 DU. The normally available DU values refer to the amount of ozone measured in a graduated column and these values change depending on latitude, climate, or season. Considering this quantity measured through time in a particular geographical area, a time series can be formulated and analyzed using statistical methods to obtain reliable predictions. The corresponding time series (e.g., the mean monthly ozone concentrations) is complex and nonlinear, cf. Monge-Sanz and Medrano-Marqus (2004). Moreover, different types of modelling, like ARIMA (Autoregressive Integrated Moving Average) models (see Box et al., 2015), can be used for this aim (cf. Rieder et al., 2010). In the troposphere, ozone has become a first-order contaminant. It is a secondary pollutant produced from the reaction between nitrogen dioxide (NO2), hydrocarbons, and sunlight. High levels of O3 can produce harmful effects on human health and the environment in general. The WHO in the 2005 global update of its quality guidelines (WHO, 2006) reduced the guideline levels for ozone from 120 μg/m3 (8-h daily average) given in its second edition (WHO, 2000) to 100 μg/m3 for a daily maximum 8-h mean. Ozone levels above this value can possibly produce health problems. These problems depend on the sensitivity of each individual and on the type of exposure. Moreover, these problems go from slight disabilities to permanent damages. Therefore, controlling high levels of ozone, as well as other harmful sources which can cause global warming and serious environmental problems, is an important task for governments. 4.3.4 Ambient gamma radiation Instability of nuclei takes place under one of several conditions. Instability leads to a decay or a series of decay steps ending in a stable state and each condition of instability leads to a few different types of decay. Decay takes place among the heavy elements by the emission of an α-particle which is the nucleus of He4 . This decay carries the element down on a mass scale by 4 units and a charge scale by 2 units. These α-emitters occur naturally and most of them have long lives. The α-particles have a strong interaction with matter and very small penetrating power. The β-decay can occur by emitting either a negative or a positive electron. The result of such decay is the transformation of one neutron into a proton or vice versa, respectively. This carries the element one step in charge β-particles, which are not highly penetrating because of their interaction with matter. Another form of de-

118

Statistical modelling of extreme value data under linear normalization

cay, which is very closely related to β-decay, takes place through electron capture by the nucleus from the inner atomic shells—γ-decay usually accompany other forms of decay. It results from the presence of extra energy within the nucleus and it is a form of short-wave electromagnetic radiation. This radiation, due to its extremely short wavelength, behaves like particles according to the quantum theory of radiation. There particles are sometimes called photons. Photons have high penetrating power because they do not interact strongly with matter. The data gives the quantity of gamma radioactivity dose rate, which is measured by microsieverts per hour (μSv/h) and it is recorded every fifteen minutes.

4.4 Collected data The main source for this chapter is the findings of a project in 2009, between Zagazig University and the National Center for Nuclear Safety and Radiation Control (NCNSRC) under the title “Order Statistics and Modelling Study of Pollution Episodes over Al Sharqiya Governorate.” This project studied the air pollution over Al Sharqiya governorate, particularly the cities of Zagazig and 10th of Ramadan. The Al Sharqiya governorate’s geographic location and its industrial and population development make it vulnerable to the problems caused by atmospheric pollutants. Because of the situation outlined above and the Al Sharqiya governorate’s location at latitude 30o N with more than 340 days of sunshine per year, it is expected that high levels of photochemical smog will occur in the area today, with higher levels predicted for the near future. We can summarize the project through its objectives, outcomes, and beneficiaries as follows: Objectives: The objectives of this project are to use statistical modelling to assess pollution and help environmental decision-making in the Al Sharqiya governorate to control air pollution. Outcomes: The overall objective of this project is to support environmental decision-making in the Al Sharqiya governorate to control air and water pollution. The project focuses on the following major project components: 1. Detailed analysis of available emission data, ambient concentrations, and meteorological data during high-pollution episodes in urban locations in Al Sharqiya governorate. 2. The use of the classical extreme value model under linear normalizing constants and the modern extreme value model under power normalizing constants (in this chapter we confine our discussion

4.4 Collected data

119

to the classical extreme value model) to simulate the spatial and temporal distributions of different pollutants in the Al Sharqiya governorate. 3. The use of models and measurements to estimate emission controls needed to control pollution in the Al Sharqiya governorate. Beneficiaries: Through its use of complementary observational and modelling analysis to study pollution episodes in the Al Sharqiya governorate, the proposed research project provides valuable guidance to regulatory environmental agencies in Egypt. Additionally, this study helps the private sector in the Al Sharqiya governorate in developing its own voluntary measures to mitigate the pollution problem in the Al Sharqiya governorate. The results, analysis and modelling tools developed under this project are disseminated through active communication with scientists and engineers from the private sector and academia, and will also be available on the internet. Air pollutants such as SO2 , NOx, O3 , CO, PM10, non-methane Hydrocarbons (NMHCs), and gamma radiation are collected by the EEAA and the Egyptian Radiation and Environmental Monitoring Network through automatic monitoring stations, see Figure 4.1. In this project, we focus on SO2 , O3 , PM10, and ambient gamma radiation, see Figures 4.2–4.8. Devices have been installed to monitor these pollutants in different places in the cities of Zagazig and 10th of Ramadan. The first city is one of the largest industrial cities in Egypt and the second is one of the most populous. The places of these devices were selected very carefully by experts in identifying sites and environmental measurements. Moreover, the devices were calibrated by an expert in the calibration of environmental monitoring devices. The data for these pollutants was recorded each hour, on the hour, twenty-four hours a day throughout 2009 for the two cities, except the ozone and ambient gamma radiation, which were recorded every 30 and 15 minutes, respectively. Moreover, the study of the ozone pollutant was restricted to 10th of Ramadan, while the gamma radiation level was restricted to Zagazig.

120

Statistical modelling of extreme value data under linear normalization

Figure 4.1 Mobile gas monitoring station used to monitor pollution over the course of a full year and provided by NCNSRC

Figure 4.2 Hourly average of particulate matter concentration for 10th of Ramadan

4.4 Collected data

Figure 4.3 Hourly average of particulate matter concentration for 10th of Ramadan

Figure 4.4 Hourly average sulphur dioxide concentration for 10th of Ramadan

121

122

Statistical modelling of extreme value data under linear normalization

Figure 4.5 Hourly average sulphur dioxide concentration for Zagazig

Figure 4.6 Thirty minutes average ozone concentration for 10th of Ramadan

Figure 4.7 Fifteen-minute average gamma radiation level for Zagazig

4.4 Collected data

Figure 4.8 Fifteen-minute average gamma radiation level for Zagazig

123

124

Statistical modelling of extreme value data under linear normalization

4.5 Data treatments and simulation study In this section, the BM and POT methods are used to model the air pollution in two cities in Egypt. A simulation technique is suggested to choose a suitable threshold value. The validity of the full bootstrapping technique for improving the estimation parameters in extreme value models has been checked by the K-S test. An efficient approach for the modelling of extreme values is suggested. This approach can convert any ordered data to enlarged block data by using a sup-sample bootstrap. Although, this study was applied to three pollutants in two cities in Egypt, the applied methods can be extended to any other pollutants and to any other place in the world. The results of this section are quoted from Barakat et al. (2011a, 2012, 2014a, 2015a).

4.5.1 Mathematical models The traditional method of analyzing extreme-values is based on the extremevalue limiting distributions defined in (2.2), which are often used to model natural phenomena such as sea levels, river heights, rainfall, and air pollution. As we have seen before at the end of Section 2.2, there are two main methods for modelling the data sets that arise from these phenomena, the BM and POT methods. Moreover, the BM method depends on the GEVL, which is defined in (2.4). Apart from a change of origin (the location parameter μ) and a change in the unit on the x−axis (the scale parameter σ > 0), the GEVL yields the three extreme value distributions defined by (2.2). In this case, any suitable standard statistical methodology from parametric estimation theory can be utilized in order to derive estimates of the parameters μ, σ and γ. In this chapter, we use the ML method and improve the obtained estimates by the bootstrap technique. The classic bootstrap approach uses Monte Carlo simulation to generate an empirical estimate for the sampling distribution of the statistic by randomly drawing a large number of samples of the same size n from the data, where n is the size of the sample under consideration. Therefore, the classic bootstrap is a way of finding the sampling distribution, at least approximately, from just one sample. Here is the procedure: Step 1: Re-sampling. A sampling distribution is based on many random samples from the population. In place of many samples from the population, create many re-samples by repeating sampling with replacement from this one random sample. Thus, generate bootstrap samples by sampling with replacement from the original sample, using the same sample size.

4.5 Data treatments and simulation study

125

Step 2: Bootstrap distribution. Compute the statistic of interest, called bootstrap statistic, for each of the bootstrap samples. Therefore, collect the statistics for many bootstrap samples to create a bootstrap distribution. The BM approach is adopted whenever the data set consists of a maxima of independent samples. In practice, some blocks may contain several among the largest observations, while other blocks may contain none. Therefore, the important information may be lost. Moreover, in the case that we have a few numbers of data, the BM method cannot be actually implemented. For all these reasons, the BM method may seen restrictive and not very realistic. In our study, we used this method to get the preliminary result, which helps us simulate data with the same nature as the real data. An alternative approach, the POT method (see Section 2.2), to determine the type of asymptotic distribution for extremes, is based on the concept of GPDL defined in (2.8). This approach is used to model data arising as independent threshold exceedances. Actually, the POT method is based on the fact that the conditional DF F [u] (x + u) = P (X < x + u|x > u) may be approximated for large u by the family (cf. Balkema and de Haan, 1974, see also Section 2.2) x −1 Wγ (x) = 1 − (1 + γ ) γ , σ provided that the DF of the BM converges weakly to the limit Gγ . In this case, we have σ = σ − γμ (see Reiss and Thomas, 2007). This family is connected with the GEVL by the simple relationship Wγ (x; σ) = 1 + log Gγ (x; 0, σ), log Gγ (x; 0, σ) > −1. Moreover, the left truncated GPDL again yields a GPDL (see Section 2.2). Namely: Wγ[c] (x; σ ∗ ) = Wγ (x; σ ¯ ),

(4.1)

where σ ∗ = σ + γc. Possibly, the most important issue in the statistical modelling of threshold exceedances data is the choice of threshold u. Did we choose a high enough threshold? the threshold should be high enough to justify the assumptions of the model but low enough to a capture a reasonable number of observations. A threshold choice based on the observed sample is required to balance these two opposing demands. In this chapter, we use the simulation technique to choose a suitable threshold value. Now, let γ0 , σ0 , and μ0 be the preliminary estimates of the parameters γ, σ, and μ, respectively (which is obtained by the BM method). Then, simulate data with the [c] same size n as the realistic collected data from the GPDL Wγ0 (x; σ0∗ ), with c = min{x1 , x2 , ..., x2 }, where {x1 , x2 , ..., xn } is the realistic data set (this choice of c grantees that the simulated and realistic data have nearly the

126

Statistical modelling of extreme value data under linear normalization

same range) and σ0∗ = σ0 + γ0 (c − μ0 ) (by using the equation (4.1)). In view of the POT stability property of GPDL, the simulated data will have the same nature as the realistic collected data. Moreover, any POT u from the simulated data follows the GPDL with the same shape parameter. Therefore, we choose the value of u that makes the estimate of the known shape parameter as best as we can. Finally, we take this value of u as a suitable threshold for our real data. All the described models so far can be fitted by the method of ML. Actually, the log likelihood function of the GEVL is given by the example 1.15 and the equation (1.2). Also, the log likelihood function for GPDL, defined in (2.8), is given by k γxi 1

log(1 + ), ¯ , γ) = −n log σ ¯ − (1 + ) l∗ (x; σ ¯ γ i=1 σ ¯

(4.2)

where k is the number of POT. Sub-sample bootstrap technique Although the bootstrap has been widely used in many areas, the method has its limitation in extremes. It was shown in some cases that a full-sample bootstrap does not work for extremes (see Chapter 3). Namely, assume Yj , j = 1, 2, ..., m, where m = m(n) −→ n ∞, are conditionally iid RVs with 1 P (Y1 = Xj |Xn ) = , j = 1, 2, ..., n, ¯ n where Xn = (X1 , X2 , ..., Xn ) is a random sample of size n from the unknown ¯ DF F. A full-sample bootstrap is the case when m = n. In contrast, a subsample bootstrap is the case when m < n. If the DF of the BM converges to the limit Gγ , the bootstrap technique suggests an efficient estimate for the GEVL by using the BM method, even if the data set does not consist of blocks (in this case the bootstrap replicates of size m, from Fn , are treated as blocks). For applying the suggested technique, we have to choose a suitable value of m (i.e., the size of bootstrap replicates or the block size). Actually, the suitable choice of the value m is the cornerstone of this technique. However, this value should be small enough to satisfy the stipulation m = ◦( logn n ) (cf. Chapter 3) and in the same time should be large enough to satisfy the stipulation m −→ n ∞. To determine a suitable value of m, we first simulate a sample with the same size as the realistic data set, from the known GEVL Gγ0 (.; μ0 , σ0 ). Then, put logn n in the form a(10)b +c, where a, b and c are integers such that 1 ≤ a < 10, 0 ≤ c ≤ (10)b−1 . Thus, in view of the above two stipulations, we can take m ≈ m ˆ = a(10)b−1 . Consequently, to choose a suitable value of m, ˆ we select a value from an appropriate discrete

4.6 Data treatments

127

neighbourhood of m ˆ (see Example 4.1) that gives the best estimate γˆ0 for the shape parameter γ0 . The estimate γˆ0 is obtained by withdrawing, from each of the original samples, a large number of bootstrap replicates (each of size m) and determining the corresponding maxima. Then, we use these maxima, as a sample drawn from the parametric Gγ0 , to estimate the shape parameter γ0 , by using the ML method. Example 4.1 Suppose we have n = 20000, then a = 2, b = 3 and c = 19.490588. Consequently, m ˆ = 200. In this case, we can select a suitable value of m from the discrete neighbourhood {100, 150, 200, 250, 300} that gives the best estimate γˆ0 compared with the other values in the neighbourhood, provided that this value does not equal 100 or 300. Otherwise, we should enlarge this neighbourhood. 4.6 Data treatments In this section, we will answer the following three questions: • Did the bootstrap improve the estimation of the parameters of the extreme models? • How can we choose a suitable POT number for each pollutant? • How can we choose the re-sample size m? To answer the first question, we use the observed maximum values over 365 blocks (daily maximum through one year) for each pollutant and estimate the shape, scale, and location parameters of Gγ (see Table 4.1). Apply the full-bootstrap 50000 times for the data and again estimate the same parameters for each pollutant (see Table 4.2). To fit the real data, concerning SO2 , PM10, and O3 , we use the K-S test and calculate its functions H, P, KSST AT, and CV, with and without the bootstrap (see Table 4.3). In the case of “without a bootstrap” Table 4.3 shows that we do not have a good fit for SO2 and PM10 in Zagazig and 10th of Ramadan, respectively, where H = 1, KSST AT > CV, and P ≤ level of significant. On the other hand, in the case of “with bootstrap” we have have a good fit for both pollutants in the two cities. Moreover, the maximum distances between the fitting curve and the given data (KSST AT ) in the case of “with bootstrap” are less than those distances in the case of “without bootstrap,” see Figures 4.9–4.14 (Figures 4.9–2.14 compare the empirical GEVL and Gγ0 (.; μ0 , σ0 ) curves, for all pollutants after a bootstrap). Therefore, the bootstrap works to improve the parameter estimation. To answer the second question, we generate 2000 random samples, each of which has the same size n (say) as the realistic data of the pollutant under

128

Statistical modelling of extreme value data under linear normalization

consideration, from the GPDL Wγc0 (.; σ0∗ ), see Tables 4.4 and 4.5 (see also Subsection 4.5.1). Note that the size of the generated samples actually is less than 365 × 24 = 8760, for SO2 and MP10, or 365 × 48 = 17520, for O3 ; this is due to the inactivation and maintenance of the monitor devices in some hours on some days. In view of the imposed stipulations on the threshold u (and consequently on the number of POT k), we vary the number of n n ], [ 19 ], ..., [ n4 ], where [θ] is the integer part of θ, POT k over the values [ 20 see Tables 4.4 and 4.5. Actually, we wrote only 7 values of k in Tables n 4.4 and 4.5, including [ 20 ] and the best value. Then, we look for the value of k (or u), which gives the best estimate γˆ0 of the shape parameter (its true value γ0 is known), where the estimate γˆ0 here is the mean value of 2000 estimates. When two values of k give the same best mean estimate, we favour between them by using the coefficient of variation (C.V). For example, in the case of SO2 , in 10th of Ramadan in Tables 4.4 and 4.5, we see that the values k = 2047 and k = 2132 give the same best estimate γˆ0 = 0.0987 (the true value is γ0 = 0.1). Since, the second value corresponds the C.V=1.389, which is less than the C.V=1.4044 concerning the first value, we then choose the second value, i.e., the suitable number of POT is k  = 2132. In this case, the corresponding threshold u is the upper quantile of  = 8530−2132 ). Now, order [λn] = [0.7500586n] = 6397 (note that λ  n−k n 8530 by using the determined suitable threshold values, from Tables 4.4 and 4.5, we can apply the POT method to the realistic data for each pollutant to determine its extreme value model, see Table 4.8. Finally, apply the full bootstrap technique (50000 times) to improve the obtained estimates, see Table 4.9. To answer the third question, we generate 2000 random samples, each of which has the same size n as the realistic data of the pollutant under consideration, from the GEVL, Gγ0 (.; μ0 , σ0 ), see Tables 4.6 and 4.7. Determine, for each pollutant the value m ˆ = a(10)b−1 . We can see that m ˆ = 90, for the SO2 and PM10, while m ˆ = 170, for the O3 . By checking the discrete neighbourhood {60, 70, 80, 90, 100, 110, 120}, we find that the best value of m (according to the given method in Subsection 4.5.1) is the lower value 60. Thus, we consider a new discrete neighbourhood {20, 30, 50, 60}, which yields the value m = 30, see Tables 4.6 and 4.7. In a similar way for O3 , we check the discrete neighbourhoods {110, 130, 150, 170, 190, 210}, {60, 70, 80, 90, 100, 110}, and {20, 30, 50, 60}. The last neighbourhood gives the value m = 30. Therefore, for all pollutants the value 30 is a more a suitable value of m. Take this value and apply the sub-sample bootstrap technique on the realistic data to get a more suitable extreme value models for these pollutants (as we have shown in Subsection 4.5.1), see Table 4.10.

4.6 Data treatments

129

As a direct and important application of the estimated statistical models of the considered pollutants, given in Table 4.9, we calculate the probabilities of exceeding the allowed upper limits for those pollutants in the light of the law number 4 for Egypt, 1994 (Ambient air quality limit values). These probabilities for SO2 , O3 , and PM10 in the city of 10th of Ramadan are 8.7 ∗ 10−4 , 0.0, and 0.038, respectively. Moreover, these probabilities for SO2 , gamma radiation, and PM10 in Zagazig are 1.7 ∗ 10−6 , 0.0, and 0.23, respectively. Evidently, the probability of exceedance of SO2 is very small and the exceedance event of O3 , and gamma radiation is nearly impossible, since the allowed upper limit of these pollutants lies above the right endpoint of the DF of the estimated model (given by Table 4.9). Therefore, these pollutants do not represent any real danger to the public health in the two cities. On the other hand, although the probabilities of the exceedances of the pollutant PM10 in Zagazig and 10th of Ramadan seem small; however, if we adopt the interpretation that every 1000 hours there are 38 and 230 hours in which the level of the pollutant PM10 will exceed the allowed upper limits, respectively in Zagazig and 10th of Ramadan, we see that this pollutant represents a concrete danger to public health in the two cities, especially in 10th of Ramadan. Therefore, we invite the decision-maker to take some precautions to reduce the concentration of this pollutant in the ambient air in the two cities. Also, if we calculate these probabilities by using GEVL models with the sup-sample bootstrap in Table 4.10, we find the probability of exceedance the max limit of the Egypt law. These probabilities for SO2 , O3 , and PM10 in 10th of Ramadan are 4.7 ∗ 10−4 , 0.0 and 0.03, respectively. Moreover, these probabilities for SO2 , gamma radiation, and PM10 in Zagazig are 3.4 ∗ 10−6 , 0.0, and 0.49, respectively. From these results we can see that the two models are close to each other. Remark (the validity of approximation by iid RVs) Although, the assumption that our variables are iid RVs is rarely correct in practice, we have many dependent models such as stationary and mixing models, where the asymptotic results remain the same as for iid variables (see Galambos, 1987). To be more specific, let the concentration of a given pollutant be in the jth time interval (in our study, hour or half an hour). It is reasonable to assume that the variables are identically distributed but the successive values are dependent. However, the dependence weakens as time passes. As a first approximation dependence model, it is reasonable. More cautious researchers would incline toward mixing models (see Galambos, 1987). In any case, the approximation by iid variables is reasonable if the asymptotic extreme value distributions are of interest.

130

Statistical modelling of extreme value data under linear normalization

Remark We note that all the data sets of the considered pollutants with the exception of ozone lead to positive values of the shape parameter, which implies a Frechet domain of attraction, that is, unlimited range. This discrimination is because the three pollutants differ radically in their chemical and physical properties so we do not expect to obtain the same models for them. Probably, negative values for the estimated shape parameter of ozone were obtained because the monitored values of the given data fell in a narrow range compared to the other pollutants. On the other hand, we can find many works in which the shape parameter has a negative value for ozone, e.g., Example 5.1.2 in Reiss and Thomas (2007). Furthermore, the difference in any pollutant’s place can lead to a sharp change in its estimated shape parameter, where the shape parameter’s sign may be changed. For example, the different values -0.06, +0.06, 0.00, 0.12, and 0.11 were obtained as the estimates for the parameter for ozone for the years 1983–1987 for five different monitoring stations in San Francisco, cf. Example 12.3.2 in Reiss and Thomas (2007).

4.6 Data treatments

131

Table 4.1 Zagazig and 10th of Ramadan for GEVL ML parameters estimation SO2 γ0 Zagazig 10th of Ramadan

0.16 0.106

PM10

μ0

σ0

21.9 81.24

11.72 39.49

γ0

μ0

0.099 0.22

196.78 249.75

O3 σ0

γ0

66.01 67

μ0

-0.087

σ0

54.9

9.6

ML parameters estimation for ambient gamma radiation γ Zagazig

0.003

μ

σ

41.54

3.93

Table 4.2 Zagazig and 10th of Ramadan for GEVL, after bootstrap ML parameters estimation SO2

Zagazig 10th of Ramadan

PM10

γ0

μ0

σ0

0.15 0.1

21.6 11.69 81.3 39.4

γ0

μ0

0.094 0.21

O3 σ0

197 67.5 249.8 65.9

γ0 -0.1

μ0

σ0

54.98

9.5

ML parameters estimation for ambient gamma radiation γ Zagazig

-0.01

μ 41.55

σ 3.9

132

Statistical modelling of extreme value data under linear normalization

Figure 4.9 SO2 in Zagazig

Figure 4.10 SO2 in 10th of Ramadan

4.6 Data treatments

Figure 4.11 PM10 in Zagazig

Figure 4.12 PM10 in 10th of Ramadan

133

134

Statistical modelling of extreme value data under linear normalization

Figure 4.13 O3 in 10th of Ramadan after bootstrap

Figure 4.14 Ambient gamma radiation in Zagazig after bootstrap

4.6 Data treatments

135

Table 4.3 K-S test for the data with and without bootstrap Data of SO2 in Zagazig

without bootstrap with bootstrap

H

P

KSST AT

CV

Decision

1 0

0.0446 0.0709

0.0656 0.0605

0.0644 0.0644

reject the null hypothesis accept the null hypothesis

Data of SO2 in 10th of Ramadan

without bootstrap with bootstrap

H

P

KSST AT

CV

Decision

0 0

0.2962 0.3065

0.0507 0.0502

0.0706 0.0706

accept the null hypothesis accept the null hypothesis

Data of PM10 in Zagazig

without bootstrap with bootstrap

H

P

KSST AT

CV

Decision

0 0

0.4389 0.4614

0.0450 0.0442

0.0706 0.0706

accept the null hypothesis accept the null hypothesis

Data of PM10 in 10th of Ramadan

without bootstrap with bootstrap

H

P

KSST AT

CV

Decision

1 0

0.0305 0.0548

0.0752 0.0697

0.0706 0.0706

reject the null hypothesis accept the null hypothesis

Data of O3 in 10th of Ramadan

without bootstrap with bootstrap

H

P

KSST AT

CV

Decision

0 0

0.1845 0.2537

0.0565 0.0528

0.0707 0.0707

accept the null hypothesis accept the null hypothesis

Data of ambient gamma radiation in Zagazig

without bootstrap with bootstrap

H

P

KSST AT

CV

Decision

0 0

0.0623 0.0701

0.0735 0.0723

0.0759 0.0759

accept the null hypothesis accept the null hypothesis

136

Statistical modelling of extreme value data under linear normalization

Table 4.4 Simulation study for choosing a suitable number of POT (k)—k  denotes the best value SO2 in Zagazig: GPDL with γ0 = 0.15,

σ0∗ = 8.48,

c = 0.226,

n = 8633

k

431

1033

1549

1721

1979

2056

2151

γ ˆ0 C.V σ ˆ0∗

0.144 0.624 13.45

0.1504 0.538 11.67

0.1506 0.738 10.99

0.1505 0.565 10.8

0.1504 0.544 10.59

0.1502 0.4144 10.5

0.1505 0.336 10.45

σ0∗ = 31.5,

SO2 in 10th of Ramadan: GPDL with γ0 = 0.1,

c = 2.5,

n = 8530

k

432

1027

1549

1707

1962

2047

2132

γ ˆ0 C.V σ ˆ0∗

0.0934 4.69 42.7

0.098 2.322 38.68

0.0982 2.708 37.67

0.0985 1.585 37.02

0.0986 1.4156 36.5

0.0987 1.4044 36.35

0.0987 1.389 36.21

PM10 in Zagazig: GPDL with γ0 = 0.094,

σ0∗ = 49.2,

c = 2,

n = 8540

k

460

970

1480

1735

1990

2075

2160

γ ˆ0 C.V σ ˆ0∗

0.0857 4.94 65.22

0.0891 3.84 60.66

0.0901 4.12 58.63

0.0911 3.77 57.36

0.0913 3.3 56.6

0.0914 2.95 56.39

0.0914 2.93 56.187

Table 4.5 Simulation study for choosing a suitable number of POT (k)—k  denotes the best value PM10 in 10th of Ramadan: GPDL with γ0 = 0.21,

σ0∗ = 14.8,

c = 3.6,

n = 8720

k

440

962

1484

1745

2006

2093

2180

γ ˆ0 C.V σ ˆ0∗

0.2047 1.33 27.92

0.2092 0.5372 23.52

0.2092 0.4247 21.48

0.2097 0.3832 20.74

0.2098 0.3727 20.33

0.2096 0.3736 20.14

0.2097 0.3239 19.8

O3 : GPDL with γ0 = −0.1,

σ0∗ = 14.25,

c = 7.46,

n = 17000

k

850

2040

3060

3400

3910

4080

4250

γ ˆ0 C.V σ ˆ0∗

- 0.1053 0.68 10.6

-0.1026 0.52 11.56

-0.102 0.36 12.03

-0.1018 0.23 12.266

-0.1018 0.2003 12.32

-0.1018 0.2333 12.38

-0.1017 0.2427 12.43

Gamma radiation in Zagazig: GPDL with γ0 = −0.01,

σ0∗ = 4.3,

c = 1.43,

n = 29786

k

1486

3870

5658

5950

6850

7148

7446

γ ˆ0 C.V σ ˆ0∗

-0.0123 1.725 4.192

-0.011 1.233 4.207

-0.0108 1.042 4.233

-0.0107 0.9916 4.238

-0.0106 1.0253 4.241

-0.0106 0.9856 4.243

-0.0105 0.8898 4.244

4.6 Data treatments

137

Table 4.6 Simulation study for chosen m sub-sample bootstrap—m denotes the best value SO2 in Zagazig: GEVL with γ0 = 0.15, m

σ0 = 11.69,

μ0 = 21.6,

γ ˆ0

20 30 50 60

0.147 0.152 0.1402 0.1355

0.352 0.374 0.421 0.507

SO2 in 10th of Ramadan: GEVL with γ0 = 0.1, m

σ0 = 39.4,

μ0 = 81.3,

γ ˆ0

20 30 50 60

n = 8633

C.V

n = 8530

C.V

0.0844 0.0994 0.0925 0.087

0.742 0.517 0.622 0.76

MP10 in Zagazig: GEVL with γ0 = 0.094,

σ0 = 67.5,

μ0 = 197,

m

γ ˆ0

C.V

20 30 50 60

0.0854 0.0987 0.0782 0.074

0.5911 0.7977 1.741 0.941

n = 8640

Table 4.7 Simulation study for chosen m sub-sample bootstrap—m denotes the best value MP10 in 10th of Ramadan: GEVL with γ0 = 0.21, m

γ ˆ0

20 30 50 60

0.2017 0.2064 0.1906 0.1909 O3 : GEVL with γ0 = −0.1,

m

γ ˆ0

20 30 50 60

-0.1122 -0.1077 -0.1168 -0.1178

γ ˆ0

20 30 40 60

-0.011 -0.0115 -0.01038 -0.0167

μ0 = 249.8,

n = 8720

C.V 0.2987 0.2890 0.3552 0.3652 σ0 = 9.5,

Gamma radiation: GEVL with γ0 = −0.01, m

σ0 = 65.9,

μ0 = 54.98,

n = 17000

C.V 0.4165 0.4033 0.3807 0.4212 σ0 = 3.9, C.V 3.6322 3.4217 3.4045 02.514

μ0 = 41.54,

n = 29786

138

Statistical modelling of extreme value data under linear normalization

Table 4.8 Zagazig and 10th of Ramadan for GPDL ML parameters estimation SO2 γ Zagazig 10th of Ramadan

σ

0.164 0.046

PM10

O3

σ

γ

γ

7.16 33.44

0.047 0.13

57.64 68.27

Gamma radiation σ

γ 0.031

-.083

σ 4.58

8.8

Table 4.9 Zagazig and 10th of Ramadan for GPDL after bootstrap SO2 γ Zagazig 10th of Ramadan

σ

0.157 0.062

PM10

O3

σ

γ

γ

7.13 32.4

0.052 0.14

57.3 67.9

Gamma radiation σ

γ 0.026

-0.087

σ 4.6

8.89

Table 4.10 Zagazig and 10th of Ramadan for GEVL ML parameters estimation by sub-sample SO2 γ Zagazig 10th of Ramadan

0.176 0.119

C.V

μ

0.253 0.258

26.39 108.9

C.V 0.0134 0.187

σ 7.34 32.02

C.V 0.0463 0.0489

PM10 γ Zagazig 10th of Ramadan

0.117 0.26

C.V

μ

0.3728 0.17

264.41 340.67

C.V 0.0121 0.0124

σ 55.05 70.587

C.V 0.043 0.088

O3 γ 10th of Ramadan

-0.08

C.V 0.739

μ 64.36

C.V

σ

C.V

0.0056

6.8

0.044

C.V

σ

C.V

4.3

0.0483

Gamma radiation γ 10th of Ramadan

-0.033

C.V 2.719

μ 36.24

0.0078

5 Extreme value modelling under power normalization

As we have seen in Subsections 2.8–2.10, the max-stable laws under power normalization attract more distributions than under linear normalization. This fact practically means that the classical linear model (L-model) may fail to fit the given extreme data, while the power model (P-model) succeeds. The main objective of this chapter and the subsequent chapters is developing the modelling of extreme values via the P-model in the following stages: • Suggesting an effective technique to obtain a parallel estimator for the EVI in the P-model (to enable us to use the P-model for the extreme data modelling) for every known estimator to the corresponding parameter in the L-mode. For example, as we will see in Chapter 7, an application of this technique yields two classes of moment and moment ratio estimators for the EVI in the P-model. Moreover, in Chapter 7, some suggested estimators, based on the GPDP and MLE technique, which cannot be obtained by using the suggested technique, do not have any counterparts in the L-model. Therefore, besides the remark made by Mohan and Ravi (1993) this fact again emphasizes that the P-model is richer than the L-model. • Assessing the performance of the obtained estimators for the EVI in the P-model, via a simulation study. • Proposing a criterion to compare the two models and thus choose the best model. In the present chapter, we give some estimates of the shape parameter EVI within the GEVP. Moreover, the statistical inference about the upper tail of a DF by using the power normalization is studied. Two models for GPDP are discussed in detail. In addition, estimates for the shape and scale parameters within these GPDPs are obtained. Finally, an implemented simulation study illustrates and corroborates theoretical results. The theoretical results of this chapter are quoted from Barakat et al. (2013a) and Barakat (2013).

5.1 Generalized extreme value distribution under power normalization As we have seen before in Section 2.8, DFs (2.32) and (2.33) are the only ones that satisfy the p−max-stable property, i.e., for every n there exist power normalizing

140

Extreme value modelling under power normalization

n (Gn (x); a, b) = Pt;γ (x; a, b), t = 1, 2, constants an , bn > 0, for which we have Pt;γ an where Gn (x) = bn |x| S(x). It is worth mentioning that DFs (2.32) and (2.33), which are based on a DF F (the population distribution), can be incorporated into a unified formula by using the result of Christoph and Falk (1996) and by adopting the notation S ∗ (x) = −1, if x ≤ 0 and S ∗ (x) = +1, if x > 0 (cf. Barakat, 2013).   1 Pγ (x; a, b) = exp −(1 + S ∗ (ρ)γ log a|x|b )− γ , S ∗ (ρ)γ log a|x|b ) > 0, (5.1)

where ρ = r(F ) = sup{x : F (x) < 1}. However, for the modelling of extreme values, two models, (2.32) and (2.33), are more flexible than model (5.1). Actually, the two parametric models (2.32) and (2.33) enable us to apply the BM method under power normalization. For these models, the parametric approach for modelling the extremes is based on the assumption that the data in hand forms a random sample (i.e., iid RVs) from an exact GEVP(γ, a, b) DF in (2.32) or (2.33). The application of BM consists of partitioning a data set into blocks of equal lengths and fitting the GEVP distribution to the set of block maxima (which usually is taken as annual maxima). However, the result in Christoph and Falk (1996) (see Section 2.10, Theorem 2.42) reveals that the upper tail behaviour of F might determine whether F belongs to the domain of attraction of P1;γ (x; a, b) or P2;γ (x; a, b). In the first case, the right endpoint ρ has to be positive, while for the second case necessarily ρ ≤ 0. Moreover, ρ > 0 is linked to the max-stable distributions under linear normalization and ρ ≤ 0 to the min-stable distributions under linear normalization. This explains the number of six types of p−max-stable DFs. Therefore, based on the preceding fact the modelling under power normalization by using the BM method can only be applied if all the observations in the data set of the maximum values have the same sign. More specifically, if all the maximum observations are only positive, or only negative. Although, the P-model theoretically outperforms the L-model, because the latter model (L-model) may fail to fit the given data, while the first one (P-model) succeeds to do that, we have the following practical drawbacks in using the power normalization: 1. We may lose some flexibility when using power normalization. For example, under linear normalization all negative data can be transformed to positive numbers and vice versa, but this cannot be done under power normalization (this is clearly because of the lack of the location parameter in power normalization). 2. Modelling under power normalization by using the BM method can only be applied if all observations in the data set of the maxima have the same sign. Actually, as we will see in Chapters 7–8, the modelling under power normalization by using the POT method also can only be applied if all observations in the data set of the maxima have the same sign. In Chapter 9, we will give a trial to overcome the above two drawbacks by suggesting the linear-power normalization, which was introduced by Barakat et al. (2017b). At the end of this section, it is appropriate to emphasize that, although the L-model and P-model are connected by a relation, which was given by Christoph and Falk (1996), they are governed by two DFs of different types. This means that the suggested estimators in this chapter and the subsequent chapters are not competitors to any known estimators of the EVI under the L-model and no one should

5.2 Statistical inference using the BM method

141

compare the suggested estimators under the P-model with any other estimators under the L-model, but the compare only should be done, after estimating the shape parameter in each model, between the two models to decide which of them is more favorable for the given data. In Chapters 7 and 8 this issue will be discussed in more detail.

5.2 Statistical inference using the BM method Considering the BM method, let x1:n ≤ x2:n ≤ ... ≤ xn:n be the given data for the maximum values of the given blocks. Theorem 2.42 shows that the modelling under the power normalization can be applied only if all these maximum values have the same sign. More specifically, if 0 < x1:n ≤ x2:n ≤ ... ≤ xn:n we select model (2.32) and if x1:n ≤ x2:n ≤ ... ≤ xn:n < 0 we select model (2.33). In this case, the ML estimate (ˆ γ, a ˆ, ˆb) of (γ, a, b) must be numerically evaluated as a solution to the likelihood equations based on model (2.32) or model (2.33). The estimate of the shape parameter γ corresponds to the Dubey estimate (see Reiss and Thomas, 2007, cf. Page 111) of the GEVL model is linear combinations of ratios of spacing Rn =

log | xnq2 :n | − log | xnq1 :n | , log | xnq1 :n | − log | xnq0 :n |

where q0 < q1 < q2 and qi = ni . Note that, this statistic is invariant under power transformation. Now, relaying on the following facts: −1 (qi ) = xi:n , t = 1 if xi:n > 0 and t = 2 if xi:n < 0, i = 1, 2, ..., n, where Ft;n 1. Ft;n is the sample DF, 2. for large n, we have Ft;n (x)  F n (x)  Ht;γ (Cn−1 (x)), −1 −1 (Cn (x)) = Cn−1 (Ft;n (x)), 3. Ft;n

we get

Rn =

⎧ ⎪ ⎪ ⎪ ⎨

−1 −1 log P1;γ (q2 )−log P1;γ (q1 )

⎪ ⎪ ⎪ ⎩

−1 −1 log P2;γ (q2 )−log P2;γ (q1 )

−1 −1 log P1;γ (q1 )−log P1;γ (q0 )

−1 −1 log P2;γ (q1 )−log P2;γ (q0 )

,

if xi:n > 0, i = 1, 2, ..., n,

,

if xi:n < 0, i = 1, 2, ..., n,

which implies, after routine calculations, Rn =

(− log q2 )−γ − (− log q1 )−γ = (− log q1 )−γ − (− log q0 )−γ



log q0 log q2

 γ2 ,

(5.2)

where q0 , q1 , q2 satisfy the equation (− log q1 )2 = (− log q2 )(− log q0 ). In this manner, by taking the logarithm of both sides of (5.2), one obtains the estimate γˆ =

2 log Rn . log(log q0 / log q2 )

Such an estimate was suggested in Johnson et al. (1994), attributed to S. D. Dubey for estimating the shape parameter of Weibull distribution.

142

Extreme value modelling under power normalization 2

On the other hand, if q0 = q, q1 = q a , q2 = q a , for some 0 < q, a < 1, we get log Rn 1 the estimate family γˆ = − log a . By taking a = 2 , we get γˆ =

log Rn . log 2

(5.3)

5.3 The GPDP DFs and their related statistical inference In this section, we derive the GPDP for both models (2.32) and (2.33), i.e., we give a simple proof of the two models defined in (2.34). Moreover, some estimators for the shape parameter (EVI) within the two models (2.32) and (2.33) are discussed.

5.3.1 The derivation of GPDP—The POT stability property The following theorem, taken from Barakat et al. (2013a), gives the GPDP for both models (2.32) and (2.33). Theorem 5.1 (a) Let (2.32) be satisfied with P (x) = P1;γ (x; a, b). Then, there exists α(u) > 0 such that o w Q ¯ F [u] (uxα(u) ) −→ 1;γ (x; b), as u ↑ x > 0,

(5.4)

where b Q1;γ (x; ¯b) = 1 + log P1;γ (x; 1, ¯b), ¯b = and c = 1 + γ log a. c (b) Let (2.33) be satisfied with P (x) = P2;γ (x; a, b). Then, there exists α(u) > 0 such that o w Q ¯ F [u] (u|x|α(u) ) −→ (5.5) 2;γ (x; b), as u ↑ x ≤ 0, where b Q2;γ (x; ¯b) = 1 + log P2;γ (x; 1, ¯b), ¯b = and c¯ = 1 − γ log a. c¯ Proof The proof of Part a: First, we note that (2.32) implies n(1 − F (bn | x |an S(x)) −→ − log P1;γ (x; a, b), for all x, which in view of Theorem 2.42 implies n − log P1;γ (x; a, b). n(1 − F (bn xan )) −→ n

(5.6)

1, for each x such that P1;γ (x; a, b) > 0. But (5.6) cannot hold unless F (bn x ) −→ n Therefore, it follows from (5.6) that an

lim (n + 1)(1 − F (bn+1 xan+1 )) = lim n(1 − F (bn+1 xan+1 )) = − log P1;γ (x; a, b).

n→∞

n→∞

(5.7) An application of the modification of khinchin’s theorem for the P-model (see Lemma 2.1 in Barakat and Nigm, 2002) on (5.6) and (5.7), thus yields 1  an+1 bn+1 an −→ −→ 1 and 1. (5.8) n n bn an Now, let u be any real number, u < x◦ , and let n be so chosen that bn ≤ u ≤ bn+1

5.3 The GPDP DFs and their related statistical inference

143

(note that by putting x = 1 in (5.6), we get bn ↑ x ). This is possible, since the sequence bn can be chosen to be nondecreasing. Therefore, in view of (5.8), we get o

1=(

bn+1 a1 u a1 ) n ≤( ) n −→ 1. n bn bn

Then, if we let α(u) ≡ an and apply again the modification of Khinchin’s theorem, (5.6) may be written in the form w

− log P1;γ (x; a, b). n(1 − F (uxα(u) )) −→ n

(5.9)

On the other hand, by putting x = 1 in (5.9) we get w

− log P1;γ (1; a, b). n(1 − F (u)) −→ n

(5.10)

Combining (5.9) and (5.10), we get, as n → ∞, or equivalently as u ↑ xo , F [u] (u|x|α(u) ) = 1 −

1 − F (u|x|α(u) ) w log P1;γ (x; a, b) 1− −→ n 1 − F (u) log P1;γ (1; a, b)

− γ1  % &− 1 γb log x = 1 − 1 + γ¯b log x γ = 1 + log P1;γ (x; 1, ¯b), =1− 1+ 1 + γ log a which was to be proved. The proof of Part b is very similar to the proof of Part a, with only the obvious changes. We shall not repeat the details here. This completes the proof of Theorem 5.1. The following elementary theorem shows that the GPDPs (5.4) and (5.5) satisfy the POT stability property. This property will be needed in the next section. Theorem 5.2 (The POT stability property, cf. Barakat et al., 2013a) The left truncated GPDP yields again a GPDP. This means that, for every 0 < k < x, [k] ¯ ), where σ ¯ = σc and c = 1 + γ log k. Moreover, we have Q1;γ (x; σ) = Q1;γ ( xk ; σ [k]

for every −1 < k < x < 0, we have Q2;γ (x; σ) = Q2;γ ( xk ; σ ¯ ), where σ ¯ = c = 1 − γ log(−k). Proof

[k]

σ c

and

Qt;γ (x;σ)−Qt;γ (k;σ) . Now, consider Q1;γ (x; σ) 1−Qt;γ (k;σ) 1 γσ log x [k] Q1;γ (x; σ) = 1−(1+ c k )− γ , xk ≥ 1, x, v >

Let k ≤ x. Then, Qt;γ (x; σ) = − γ1

=

0, 1−(1+γσ log x) , x ≥ 1. Then, where c = 1 + γσ log k. On the other hand, if we consider Q2;γ (x; σ) = 1 − (1 − 1 [k] γσ log(−x))− γ , 0 > x ≥ −1. Then, for every −1 < k < x < 0, we get Q2;γ (x; σ) = x x 1 γσ log − − ( k ) − γ1 γσ log k − γ [k] ) , 1 ≥ xk > 0 (i.e., Q2;γ (x; σ) = 1 − (1 − ) , 0> 1 − (1 − c c − xk ≥ −1), where c = 1−γσ log(−k). This completes the proof of Theorem 5.2.

5.3.2 Estimation of the shape and the scale parameters within the GPDP model In this subsection, we follow Pickands’s method (cf. Pickands, 1975), to get estimates for the shape and the scale parameters, within the GPDPs (5.4) and (5.5). These estimates correspond to the Pickands estimates in the GEVL model (2.4)

144

Extreme value modelling under power normalization

(cf. Pickands, 1975). Let n be the sample size and let m = m(n) be an integer much smaller than n. Let {Yi , i = 1, 2, ..., n} be the descending order statistics, i.e., Yi = Xn−i+1:n is the ith largest observation in the sample. We treat the values Yi Y4m , i = 1, 2, ..., 4m − 1, as though they were the descending order statistics from a ¯ ) for some sample of size 4m − 1 from a population with a DF of the form Q1;γ (x; σ σ ¯ , γ, 0 < σ ¯ < ∞, −∞ < γ < ∞. The parameters γ and σ ¯ can be estimated by the following percentile method. Since, for any 0 ≤ y ≤ 1, we have   1 ((1 − y)−γ − 1) , Q−1 ¯ ) = exp 1,γ (y; σ γσ ¯ then, we get Q−1 1,γ and Q−1 1,γ





1 ;σ ¯ 2

3 ;σ ¯ 4





1

γ

−1)



−1)

= e γ σ¯ (2

1

= e γ σ¯ (2

.

Clearly, L= which implies γ =

log L log 2 .

3 1 log Q−1 ¯ ) − log Q−1 ¯) 1,γ ( 4 ; σ 1,γ ( 2 ; σ 1 log Q−1 ¯) 1,γ ( 2 ; σ

= 2γ ,

Moreover, σ ¯=

2γ − 1 . 1 γ log Q−1 ¯) 1,γ ( 2 ; σ

%1 & In order to estimate γ and σ ¯ , we replace the population quantiles Q−1 ¯ and 1,γ 2 ; σ % % % & & & Y2m 3 ˆ −1 3 ¯ = Ym . ˆ −1 1 ; σ Q−1 ¯ with the sample quantiles Q 1,γ 4 ; σ 1,γ 2 ¯ = Y4m and Q1,γ 4 ; σ Y4m Then, log Ym − log Y2m −1 γˆ = (log 2) log (5.11) log Y2m − log Y4m and ˆ¯ = σ

2γˆ − 1 . γˆ (log Y2m − log Y4m )

(5.12)

Similarly, by following the same argument we can show that the estimates of the shape and the scale parameters γ and σ ¯ in the GPDP (5.5) are given by γˆ = (log 2)

−1

log

log | Ym | − log | Y2m | log | Y2m | − log | Y4m |

(5.13)

and ˆ¯ = σ

1 − 2γˆ . γˆ (log | Y2m | − log | Y4m |)

(5.14)

The problem of choosing m was considered by Pickands (1975). The value m = m(n) −→ should satisfy the two conditions m −→ ∞ and m 0. For further details about n n n

5.4 Simulation study

145

this problem, we refer to Pickands (1975). However, in the next section we consider this problem, within a simulation study. It is well-known, for some statistics, that Efron’s bootstrap does not approximate their distribution at all. The maximum of RV is one of the examples for which Efron’s bootstrap fails to be consistent (see Chapter 3).

5.4 Simulation study Since most of the data sets arising from environmental pollution and other natural phenomena have a positive sign by nature, we confine our discussion on the first positive model in the P-model (i.e., models (2.32) and (5.4)). In this section, we compare the ML method and the suggested estimate in (5.3) for estimating the shape parameter γ within the GEVP (2.32) (see Table 5.1). Also, from Table 5.1, we can determine the value of q, which gives the best estimate for γ. Moreover, we consider in Table 5.2 the problem of choosing a suitable POT number for estimating the shape parameter within the GPDP model (5.4), by using the ML method. Finally, in Table 5.3, the problem of choosing a suitable value of m in the estimate (5.11) is considered and then we compare between the ML method and the suggested estimator (5.11) for estimating the shape parameter γ, within the GPDP model ¯ = 1). Q1;γ (., σ In Table 5.1, for each values of γ = 0, ±0.2, ±0.4 (these values probably cover most of the environmental applications, see Smith, 1985), we generate from P1;γ (x; 1, 1) a random sample of 1000 blocks, each of these blocks has 500 observations, i.e., we get 1000 maximum values. In view of the p−max-stable property, the DF of these independent maximum values again is P1;γ . Therefore, we apply the ML method and the suggested estimator (5.3) for different quantiles q = 0.1, 0.3, 0.5, 0.7, 0.9 to get the corresponding estimates for γ. This procedure is repeated 100 times to get the average ML estimate and the average estimates (5.3) (for the given different values of q) for γ with their MSEs. Table 5.1 summarizes the obtained results. From Table 5.1 we can show that at the lower values of q (q = 0.1, 0.3) the suggested estimator (5.3) gives the best accuracy. Moreover, for these lower quantiles, the estimator (5.3) and the ML estimate have a close accuracy. In Table 5.2, for each values of γ = 0, ±0.2, ±0.4, we generate from Q1;γ (x; 1) a random sample of size n = 20000. Next, we choose the POT values k = 5000, 4500, ..., 1000 (in the interval k ≤ n4 ). In view of Theorem 5.2, the DF of the simulated data which come after any threshold value k is again Q1;γ . Therefore, we can estimate the parameter γ by using the ML method for each of these threshold values. This procedure is repeated 100 times to get the average ML estimates corresponding to these threshold values, for γ, with their MSEs. Finally, we determine the POT value k, which gives the best estimate for the parameter γ by using the ML method. Table 5.2 summarizes the obtained results. From Table 5.2, we can show that the best value of k is about 4500–5000, i.e., n5 ≤ k ≤ n4 . In Table 5.3, the same procedure is applied with the exception that we choose m instead of k as m = 125, 250, ..., 1250 (note that k = 4m). Table 5.3 shows that the best value of m is about 125–250, i.e., 125 ≤ m ≤ 250. Moreover, Tables 5.2 and 5.3 show that both methods (the ML estimate and the suggested estimator (5.11)) for estimating the shape parameter γ within the GPDP (5.4) have a very high accuracy, but in all cases the ML estimate is better than the suggested estimator (5.3). Finally, to apply the sup-sample bootstrap method for the GEVP we generate

146

Extreme value modelling under power normalization

Table 5.1 Estimating the shape parameter γ in the GEVP(γ, 1, 1), defined in (2.32), by using the ML method and the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best ML Estimate

γ

Suggested estimate (5.3) q = 0.1

q = 0.3

q = 0.5

q = 0.7

q = 0.9

0.4

γ ˆ MSE*10−4

0.3969 8.6

0.3818 661

0.4036∗ 567

0.387 327

0.4069 31

0.3878 9.9

0.2

γ ˆ MSE*10−4

0.1996 5.94

0.2098 193

0.1960∗ 198

0.1866 357

0.1957 100

0.2267 140

0

γ ˆ MSE*10−4

0.0005 5.6

−0.0009∗ 1.8

0.0117 1.9

-0.0080 128

-0.0166 549

0.0464 562

-0.2

γ ˆ MSE*10−4

−0.203 4.2

−0.2011∗ 2.4

-0.1898 2.5

-0.2157 493

-0.2104 217

-0.1453 597

-0.4

γ ˆ MSE*10−4

-0.4050 2.8

-0.4098 190

−0.3995∗ 195

-0.3952 46

-0.4056 62

-0.3393 737

20000 random samples, from the GEVP P1;γ (x; 1, 1), with different γ = 0, ±0.2, ±0.4. Determine, the value m ˆ = a(10)b−1 (as we have shown in Subsection 4.5.1, see also Example 4.1) we can see that m ˆ = 90. By checking the discrete neighbourhood {30, 50, 100, 150, 200, 250, 300, 350}, we may find the best value of m. Therefore, for all simulation data the value of m ∈ [30, 100]. We used different methods to estimate γ, namely, the ML method see Table 5.4 and the suggested estimate (5.3) with different quantiles q = 0.1, 0.3, 0.5, 0.7, 0.9, see Tables 5.5–5.9, which give us the best m and best quantile.

5.4 Simulation study

147

Table 5.2 Estimating the shape parameter γ in the GPDP, defined in (5.4), by using the ML method—“∗” in the superscript of a value means that this value is the best k

5000

4500

4000

3500

3000

2500

2000

1000

0.3979 0.443

0.3984 0.935

0.1985 0.0767

0.1986 0.284

GPDP (5.4) with γ = 0.4, σ ¯ = 1, n = 20000 γ ˆ MSE*10−4

0.3998∗ 0.105

γ ˆ MSE*10−4

0.1994 0.072

γ ˆ MSE*10−4

−0.0009∗ 0.027

γ ˆ MSE*10−4

−0.2021∗

γ ˆ MSE*10−4

−0.4022∗ 0.067

0.3998 0.120

0.3995 0.168

0.3982 0.22

0.3978 0.261

0.3974 0.319

GPDP (5.4) with γ = 0.2, σ ¯ = 1, n = 20000 0.1996∗ 0.07

0.1993 0.082

0.1993 0.084

0.1993 0.067

0.1995 0.814

GPDP (5.4) with γ = 0.0, σ ¯ = 1, n = 20000 -0.0012 0.032

-0.0014 0.045

-0.0020 0.079

-0.0026 0.097

-0.0031 0.119

−0.0034 −0.0046 0.13 0.26

GPDP (5.4) with γ = −0.2, σ ¯ = 1, n = 20000 0.062

-0.2022 0.062

-0.2024 0.08

-0.2027 0.097

-0.2029 0.112

-0.2035 0.152

−0.2040 −0.2060 0.205 0.437

GPDP (5.4), with γ ¯ = −0.4, σ ¯ = 1, n = 20000 -0.4025 0.075

-0.4026 0.077

-0.4030 0.103

-0.4032 0.127

-0.4038 0.183

−0.4045 −0.4070 0.243 0.644

148

Extreme value modelling under power normalization

Table 5.3 Estimating the shape parameter γ in the GPDP (5.4) by using the suggested estimate (5.11)—“∗” in the superscript of a value means that this value is the best 125

m

250

375

500

625

750

1000

1250

0.3625 14

0.3500 25

0.1517 24

1379 39

−0.059 352

−0.077 60

GPDP, defined in (5.4), with γ = 0.4, σ ¯ = 1, n = 20000 0.3969∗ 1

γ ˆ MSE*10−4

0.3866 3

γ ˆ MSE*10−4

0.1874 2

γ ˆ MSE*10−4

−0.0087∗ 3

0.3908 2

0.3845 3

0.3819 4

0.3740 7

GPDP, defined in (5.4), with γ = 0.2, σ ¯ = 1, n = 20000 0.1971∗ 1

0.1852 4.1

0.1737 3

0.1701 8

0.1617 9

GPDP, defined in (5.4), with γ = 0.0, σ ¯ = 1, n = 20000 -0.0168 3

-0.0199 5

-0.0296 9

-0.0374 14

-0.0439 20

GPDP, defined in (5.4), with γ = −0.2, σ ¯ = 1, n = 20000 γ ˆ MSE*10−4

−0.208∗ 3

γ ˆ MSE*10−4

−0.4133∗ 4

-0.2218 6

-0.2263 8

-0.2316 11

-0.2418 18

-0.2516 27

−0.2714 −0.2924 51 86

GPDP, defined in (5.4), with γ = −0.4, σ ¯ = 1, n = 20000 -0.4171 4

-0.4294 10

-0.4404 17

-0.4526 28

-0.4635 41

−0.4865 75

−0.509 121

Table 5.4 Estimate parameters of GEVP (5.4) by using the ML method for the sub-sample bootstrapping method m

100

150

Sample 1, GEVP (5.4) with

30

γ = 0.4,

γ MSE

0.3980 0.0031

γ MSE

0.1894 0.0024

γ MSE

-0.0046 0.0018

50

0.3993∗ 0.0025

0.3923 0.0048

0.3909 0.0062

Sample 2, GEVP (5.4) with 0.1994 0.0025

γ = 0.2,

0.1999∗ 0.0034

Sample 3, GEVP (5.4) with −0.0093∗ 0.0020

γ MSE

−0.2037∗

γ MSE

-0.4054 0.011

0.0014

-0.2052 0.0015

−0.4045∗ 0.0014

a = 1,

-0.2149 0.0036

γ = −0.4,

-0.4100 0.0025

0.1854 0.0075

a = 5,

-0.0113 0.0041

γ = −0.2,

-0.2090 0.0020

Sample 5, GEVP (5.4) with

0.3878 0.0089

0.1814 0.0058 γ = 0,

-0.0073 0.0034

Sample 4, GEVP (5.4) with

200 a = 1,

-0.4184 0.0026

-0.0123 0.0049 a = 1, -0.2236 0.0042 a = 1, -0.4183 0.0037

250

300

400

b = 1, n = 20000 0.3867 0.0085

0.3829 0.0122

0.3785 0.0152

b = 10, n = 20000 0.1840 0.0078

0.1852 0.0098

0.1846 0.0122

b = 1, n = 20000 -0.0127 0.0065

-0.027 0.0079

-0.0310 0.0103

b = 1, n = 20000 -0.2253 0.0047

-0.2225 0.0063

-0.2321 0.0097

b = 1, n = 20000 -0.4318 0.0047

-0.4289 0.0052

-0.4455 0.0085

5.4 Simulation study

149

Table 5.5 Estimating the shape parameter γ in the GEVP(0.4, 1, 1), defined in (5.4) by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best m

Suggested estimate (5.3) q = 0.1

q = 0.3

q = 0.5

q = 0.7

q = 0.9

0.4250 0.0125

0.3436 0.0125



0.3945 0.0006

0.4618 0.0765

0.3289 0.1012

γˆ MSE

0.3878 0.0029

0.4216 0.0025

0.3726 0.015

0.4051∗ 0.0005

0.4721 0.1054

100

γˆ MSE

0.3857 0.0041

0.3918∗ 0.0050

0.3812 0.0070

0.3598 0.0313

0.3816 0.0110

150

γˆ MSE

0.4033∗ 0.0002

0.3760 0.0003

0.4132 0.0094

0.4207 0.26

0.3591 0.154

200

γˆ MSE

0.4259 0.0335

0.3861∗ 0.0335

0.3736 0.0349

0.3789 0.025

0.4838 0.3510

400

γˆ MSE

0.3895 0.0055

0.410∗ 0.0055

0.3850 0.0013

0.3557 0.0980

0.3359 0.2051



30

γˆ MSE

50

Table 5.6 Estimating the shape parameter γ in the GEVP(0.2, 1, 1), defined in (5.4), by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best m

Suggested estimate (5.3) q = 0.1

q = 0.3

q = 0.5

q = 0.7

q = 0.9



0.24976 0.005

0.1706 0.0173

0.2177 0.0065

0.3183 0.0067

30

γˆ MSE

0.1842 0.005

50∗

γˆ MSE

0.2519 0.0539

0.2075∗ 0.0539

0.1461 0.0582

0.1439 0.063

0.1737 0.0138

100

γˆ MSE

0.1864 0.0041

0.1867 0.0050

0.1920∗ 0.0070

0.1465 0.0313

0.2392 0.0110

150

γˆ MSE

0.1864∗ 0.0034

0.2365 0.0043

0.1627 0.0013

0.1832 0.0573

0.2592 0.7712

200

γˆ MSE

0.1864∗ 0.0197

0.2497 0.0199

0.2012 0.0

0.1861 0.0038

0.2264 0.014

300

γˆ MSE

0.212 0.0035

0.2408 0.0031

0.186∗ 0.0003

0.2446 0.0398

0.2726 0.1053

150

Extreme value modelling under power normalization

Table 5.7 Estimating the shape parameter γ in the GEVP(0, 1, 1), defined in (5.4), by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best m

Suggested estimate (5.3) q = 0.1

q = 0.3

q = 0.5

q = 0.7

q = 0.9

30

γˆ MSE

0.0102∗ 0.0023

- 0.0235 0.0025

- 0.03706 0.0276

- 0.0489 0.0479

-0.0624 0.0779

50

γˆ MSE

- 0.0251 0.0539

−0.0195 0.0539

0.0191∗ 0.0582

- 0.044 0.063

0.148 0.0138

100∗

γˆ MSE

0.001∗ 0.0392

- 0.03868 0.0390

-0.0734 0.1078

0.0826 0.1364

-0.0061 0.0007

150

γˆ MSE

0.012∗ 0.0034

0.033 0.0043

-0.0184 0.0013

- 0.0363 0.0573

0.1084 0.7712

200

γˆ MSE

0.1864∗ 0.0001

0.2497 0.0003

0.2012 0.0068

0.1861 0.0268

0.2264 0.2358

400

γˆ MSE

0.0377 0.0153

−0.0354 0.0154

0.0232∗ 0.0108

- 0.0493 0.0487

0.1728 0.5970

Table 5.8 Estimating the shape parameter γ in the GEVP(−0.2, 1, 1), defined in (5.4) by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best m

Suggested estimate (5.3) q = 0.1

q = 0.5

q = 0.7

q = 0.9



γˆ MSE

−0.1968 0.0002

-0.1455 0.0003

-0.2450 0.0405

-0.1983 0.0001

-0.1205 0.1267

50

γˆ MSE

−0.2354∗ 0.0251

−0.1632 0.0253

-0.2899 0.1618

-0.0957 0.2175

-0.1208 0.1254

100

γˆ MSE

-0.1772 0.0104

-0.1792 0.0106

-0.2286 0.0163

−0.1965∗ 0.0002

- 0.2451 0.0407

150

γˆ MSE

−0.1818 0.0066

-0.1731 0.0056

-0.2319 0.1034

-0.1881 0.0029

−0.2072∗ 0.0012

200

γˆ MSE

-0.2360 0.0260

- 0.2156 0.0235

-0.2172 0.0059

-0.2546 0.1790

- 0.1746 0.0110

400

γˆ MSE

-0.228 0.0157

0.1707 0.0131

-0.2339 0.023

-0.1355 0.0381

−0.1846∗ 0.0047

30



q = 0.3



5.4 Simulation study

151

Table 5.9 Estimating the shape parameter γ in the GEVP(−0.4, 1, 1), defined in (5.4) by using the suggested estimate (5.3)—“∗” in the superscript of a value means that this value is the best m

Suggested estimate (5.3) q = 0.1 ∗

q = 0.3

q = 0.5

q = 0.7

q = 0.9

-0.3789 0.0033

-0.4360 0.026

-0.4311 0.0194

-0.3098 0.1629

30

γˆ MSE

−0.4128 0.0033

50

γˆ MSE

−0.3890∗ 0.0024

−0.3877 0.0026

-0.4409 0.0335

-0.4145 0.0042

-0.3834 0.0055

100∗

γˆ MSE

-0.3942∗ 0.0007

-0.4077 0.0008

-0.3989 0.0

-0.4303 0.0183

- 0.4245 1.499

150

γˆ MSE

−0.3747∗ 0.0128

-0.3497 0.0122

-0.5011 0.2043

-0.3536 0.0437

-0.2792 0.2906

200

γˆ MSE

-0.3875 0.0031

- 0.4204 0.0035

-0.3862 0.0003

-0.4581 0.07670

- 0.2966 0.2141

400

γˆ MSE

-0.4393 0.0309

0.3798∗ 0.042

-0.4380 0.0289

-0.3596 0.0326

-0.2974 0.2103



152

Extreme value modelling under power normalization

5.5 Parameter estimation for GEVL and GEVP by using the GA technique In this section, we consider GA (see Subsection 1.2.3) as another technique for estimating the parameters of GEVL and GEVP. As the genetic algorithm method is a random method this requires us to perform a simulation study for assessing the efficiency of this technique. We simulate 50 random samples each sample is of size 100 from each of GEVLs (the standard GEVL is defined in (2.4)) as follows: GEVL(μ = 20, σ = 3, γ = 0.2), GEVL(μ = 10, σ = 2, γ = −0.2), and GEVL(μ = 30, σ = 4, γ = 0). Similar simulation is implemented for GEVPs (defined in (2.32): P1,−0.2 (x, 0.1, 1), P1,0 (x; 0.1, 1) and P1,0.2 (x; 0.1, 1)). By using these samples and the GA technique we estimated the parameters (μ, σ, γ) (in the GEVL model) and (a, b, γ) (in the GEVP model). The true and the estimated values of these parameters are presented in Table 5.10. This table shows that the GA method is promising and suitable for the modelling of extremes in the L-model, as well as in the P-model. The value of the MSE is between 0.01 and 0.13 for the GEVL and is between 2.4 ∗ 10−6 and 0.04 for GEVP. It is worth mentioning that the GA technique was applied recently in treating the data given in Chapter 4 by Kamal (2017), where in some cases this technique outperformed.

5.5 Parameter estimation for GEVL and GEVP by using the GA technique153

Table 5.10 Simulation study for estimate GEVL and GEVP by using the GA technique GEVL(μ, σ, γ)

μ ˆ

σ ˆ

γˆ

GEVL(20, 3, 0.2)

20.05

3.04

0.18

MSE

0.12

0.09

0.01

GEVL(10, 2, -0.2)

10.05

2.02

-0.22

MSE

0.01

0.05

0.01

GEVL(30, 4, 0)

30.05

3.95

-0.01

MSE

0.13

0.10

0.01

P1,γ (x; a, b)

a ˆ

ˆb

γˆ

P1,−0.2 (x; 0.1, 1)

0.08

1.25

-0.29

MSE

2.4 ∗ 10−6

1.39 ∗ 10−5

2.06 ∗ 10−5

P1,0 (x; 0.1, 1)

0.10

1.15

-0.02

MSE

0.01

0.04

0.02

P1,0.2 (x; 0.1, 1)

0.18

MSE

4.52 ∗ 10

0.83 −6

0.33 −5

1.21 ∗ 10

0.02

6 Methods of threshold selection

Clearly, an important issue of the estimation of the EVI is the determination of an appropriate threshold value. Various works discussed the problem of threshold selection. The issue of the threshold selection implies a compromise between the bias and variance. On the other hand, the estimation of the EVI and the determination of an appropriate threshold value are two intertwined problems. Therefore, we begin this chapter (the first section) by reviewing some important estimations of the EVI in the L-model. Most of these estimates are of Hill types. The Hill-type estimators were initially suggested by Hill (1975) for heavy-tailed distributions (in probability theory, heavy-tailed distributions are DFs whose tails are not exponentially bounded, that is, they have heavier tails than the exponential distribution). In Section 6.2, we describe some methods of threshold selection under linear normalization. Moreover, we introduce some graphical methods of threshold selection under linear and power normalizing constants. Finally, in Section 6.3, we select two Hill’s estimators and compare between their performance via a simulation study.

6.1 Some estimators for the EVI under linear normalization (I) Pickands’s estimator Pickands (1975) was the first to derive the estimator of the EVI, γ ∈ R. The P , for γ is given by Pickands estimator, γn,k P γn,k = (log 2)−1 log

Xn−[k/4]+1:n − Xn−[k/2]+1:n , k = 1, ..., n, Xn−[k/2]+1:n − Xn−k+1:n

where [X] stands for the integer part of X. This estimator involves k of the top observations. It is worth noting that the Pickands estimator is very sensitive to the choice of the intermediate order statistics, which are used for estimation. Moreover, P has some advantage, such as weak and strong consistency and the estimator γn,k asymptotic normality (cf. Dekkers et al., 1989). Also, the Pickands estimator is a shift- and scale-invariant of the data. Pereira (1994) and Alves (1995) proposed a generalization of the Pickands estimator by introducing a tuning or control parameter θ, defined as follows: GP E = (log 2)−1 log γn,k,θ

Xn−[θ2 k]+1:n − Xn−[θk]+1:n , k = 1, ..., n, and 0 < θ < 1. Xn−[θk]+1:n − Xn−k+1:n

6.1 Some estimators for the EVI under linear normalization

155

GP E γn,k,θ

involves k of the top observations, and tuning or control paramThe estimator GP E eter 0 < θ < 1. The traditional Pickands estimator corresponds to γn,k, 1 . We refer 2 to Alves (1995) and Yun (2002), who with slightly different nuances, introduced a tuning or control parameter θ. Drees (1995) established the asymptotic normality of linear combination of Pickands estimators. In Segers (2005), the Pickands estimator for the EVI was generalized in a way that includes all of its previously known variants. (II) Hill estimator under linear normalization (abbreviated by HEL) Hill (1975) proposed another estimator for the EVI. This estimator is restricted to the heavy tail DFs, which belong to the Fr´echet max-domain of attraction, i.e., γ > 0. The proposed HEL, which is the most popular estimator for the tail index γ based on the k upper-order statistics, is given by γ++ =

k 1

log Xn−i+1:n − log Xn−k:n . k i=1

(6.1)

Remark The two signs “++” in the superscript of the estimator (6.1) are interpreted due to our adopted convention in the next section that, when we deal with the estimators for the EVI, we adopt the notation that the first superscript, in all the considered Hill’s estimators, denotes the sign of the data, i.e., “+” and “-” denote the non-negative data and the negative data, respectively. Moreover, the second superscript, in all those estimators, denotes the sign of the EVI, γ, i.e., “+” and “-” denote the non-negative EVI (γ ≥ 0) and the negative EVI (γ < 0), respectively. On the other hand, when γ ∈ R, we drop the second superscript (e.g., see the moment estimator V). Finally, the subscript “ ” always denotes that the estimator is under the L-model (in the next section and in the sequel, the subscript “P” denotes that the estimator is under the P-model). The value of k is chosen as the largest value (i.e., lowest threshold) such that the γ++ is stabilized. The properties of the HEL γ++ are weak consistency, which was established by Mason (1982), and strong consistency, which was proved by Deheuvels et al. (1988). Moreover, the asymptotic normality of γ++ was proved by Hall (1982). The properties of the estimator γ++ have been studied by many authors, among them are de Haan and Ferriera (2006) and Embrechts et al. (1997). This estimator unlike the Pickands estimator is not invariant for the shifts of the data, while the scale-invariance remains hold for it. Thereafter, a tempting problem was generated to extend the HEL to the general case γ ∈ R. Such an attempt led Beirlant et al. (2005) to the so-called adapted HEL, which is applicable for γ ∈ R. (III) Negative Hill estimator (γ < 0) The negative Hill estimator γ+− is an estimator for γ < 0, which was proposed by Falk (1995). If γ < 0, then the right endpoint of F is finite. Moreover, the DF F is in the Weibull max-domain of attraction, so that the γ+− , for γ < 0, is given by γ+− =

k−1 1

log(Xn:n − Xn−i:n ) − log(Xn:n − Xn−k:n ), k i=1

(6.2)

which involves again k upper-order observations. The HEL (6.2) is a shift- and scaleinvariant. De Haan (2006) enunciated the conditions for weak consistency of this

156

Methods of threshold selection

estimator, namely, the estimator γ+− is consistent if γ < −0.5. A very simple and direct way to get this estimator is noting that: if FX (x) = P (X ≤ x) ∈ D (Gγ (x)), with γ < 0, then the upper endpoint ρ of Fξ (x) is finite and Fξ (x) = P (ξ ≤ x) ∈ 1 (cf. Theorem 1.2.1 in de Haan and Ferriera, 2006). D (G−γ (x)), where ξ = ρ−X Therefore, after replacing ρ with its estimate Xn:n provided that γ < −0.5 (see Remark 4.5.5 in de Haan and Ferriera, 2006), we get the estimator γ+− , as defined by (6.2). By using the well-known relation between the lower-and upper-order statistics, we can easily deduce the HEL for the tail index when the data that we have is negative. In this case, the two corresponding HELs are based on the extreme value theory for the minimum case. More specifically, the two corresponding HELs are based on lower threshold and the minimum GPDL (its corresponding GEVL is G γ (x) = 1 − Gγ (−x)). In this case we get the following corresponding result. Lemma 6.1 For the order sample data X1:n ≤ ... ≤ Xn:n , let Xk+1:n < 0 be a suitable threshold for which Xi:n < 0, i = 1, 2, ..., k. Then, the HEL, for the shape parameter γ > 0, is given by γ−+ =

k 1

log |Xi:n | − log |Xk+1:n |. k i=1

(6.3)

Moreover, the negative HEL, for γ < −0.5, is given by γ−− =

k−1 1

log(|X1:n | − |Xi+1:n |) − log(|X1:n | − |Xk+1:n |). k i=1

(6.4)

(IV) Generalized Hill estimator (γ ∈ R) GHE The generalized Hill estimator, γn,k , is an another attempt to generalize the HEL GHE was suggested by Gomes and Martins for the case γ ∈ R. The estimator γn,k (2001) as GHE = γ++ + γn,k

k 1

++ (log γ,i − log γ++ ), k i=1

(6.5)

i ++ where γ,i = 1i j=1 log Xn−j+1:n −log Xn−i:n and γ++ stands for the HEL defined in (6.1). Since the estimator defined in (6.5) is based on γ++ , it involves k top GHE is scale but not location-invariant and it is observations. Also, the estimator γn,k consistent for γ ∈ R. For more details about the properties of this estimator, refer to Beirlant et al. (2005). (V) Moment estimator (γ ∈ R) An another estimator defined for γ ∈ R was suggested by Dekkers et al. (1989). Moreover, Dekkers et al. (1989) proved the consistency and asymptotic normality of this estimator, which is known as the moment estimator under linear normalization (MEL). The MEL is defined as follows: Define for j = 1, 2, ... k

ˆ :j = 1 M (log Xn−i+1:n − log Xn−k:n )j , k i=1

6.1 Some estimators for the EVI under linear normalization

157

where X1:n ≤ X2:n ≤ ... ≤ Xn:n are the order statistics of the given sample ˆ :1 = γ ++ (i.e., M ˆ :1 is a HEL, which is defined by (6.2) X1 , X2 , ..., Xn . Clearly, M  with the parameter γ+ = max{0, γ} and it is valid for γ > 0). Then, the MEL for γ = γ+ + γ− ∈ R, where γ− = min{0, γ}, is defined by + = γ++ + γˆ− , γM

where γˆ−

1 =1− 2



ˆ2 M 1 − :1 ˆ :2 M

(6.6)

−1 .

(6.7)

The estimator (6.7) is called the negative moment estimator, which is valid for + γ < 0. We observe that the moment estimator, γM , has two pieces, γ++ is valid for γ > 0, while γˆ− is valid for γ < 0. The MEL satisfies the properties of weak consistency and strong consistency. Also, the MEL is not location-invariant, but is scale-invariant. (VI) Moment ratio estimator (γ > 0) Danielsson et al. (1996) proposed a new estimator, known as the moment ratio estimator under linear normalization (MREL). This estimator is based on the ratio of the second to the first conditional moment. Namely, k 2 ˆ :2 M ++ i=1 (log Xn−i+1:n − log Xn−k:n ) . (6.8) γM = =  R k ˆ :1 2M 2 i=1 (log Xn−i+1:n − log Xn−k:n ) The estimator (6.8) can be used only for γ > 0. Moreover, Danielsson et al. (1996) proved that the MREL has a lower asymptotic square bias than the MEL (6.6) when they are evaluated at the same threshold, i.e., for the same k. (VII) Harmonic t-Hill estimator under linear normalization (γ > 0) Under linear normalization, Beran et al. (2014) defined and studied the harmonic moment tail index estimator class (HMEL). They showed that the estimators in this class are consistent and they show good robustness properties. The HMEL, for γ > 0, is defined by ⎫ ⎧# β−1 $−1 k  ⎬ 1 ⎨ −1 Xn−k:n ++ γH(β) = k − 1 , β > 0, β = 1, 1 ≤ k ≤ n − 1. ⎭ ⎩ β−1 Xn−i+1:n i=1 (6.9) ++ ++ ++ For β → 1, γH(β) is interpreted as the limit, when β → 1, i.e., γH(β) = limβ→1 γH(β) ++ = γ is the HEL (6.1). Stehl´ık et al. (2010) proposed a t-score moment estimator ++ . (harmonic t-Hill estimator), which is a HMEL with β = 2, denoted by γH (VIII) Several new tail index estimators by Paulauskas and Vaiˇ ciulis (2013, 2017) Paulauskas and Vaiˇciulis (2013) proposed a new idea in the EVI estimation by including the function log x, which is essential in all Hill estimators, into the family

158

Methods of threshold selection

of the functions

fr (x) =

1 r r (x − 1), log x,

r= 0, r = 0.

(6.10)

By applying the family (6.10) to the order statistics, we get H:n (k, r) =

  k 1

Xn−i+1:n fr . k i=1 Xn−k:n

H:n (k,r) and the Hill estimator γ++ A generalized Hill estimator is defined by 1+rH :n (k,r) ++ is obtained by taking r = 0, i.e., γ = H:n (k, 0). Another estimator of the EVI in the L-model, which can be written as a function of the statistic H:n (k, r), is HMEL, which is defined in (6.9), for γ > 0. Paulauskas and Vaiˇciulis (2017) showed ++ = γ:n (r). that γH(1−r) Paulauskas and Vaiˇciulis (2017) introduced another parametric family of functions, which has the same property that includes logarithmic function and constructed new estimators using this family. For x > 1, Paulauskas and Vaiˇciulis (2017) considered the family of functions gr,u (x) = xr (log x)u , where the parameters r and u can be arbitrary real numbers, but for the consistency purpose, they should satisfy γr < 1 and u > −1. Using the functions of this family one can form statistics, similar to H:n (k, r),

G:n (k, r, u) =

  k 1

Xn−i+1:n . gr,u k i=1 Xn−k:n

(6.11)

Paulauskas and Vaiˇciulis (2017) showed that γ++ = G:n (k, 0, 1), 1 G:n (k, 0, 2) + = G:n (k, 0, 1) + [1 − ( γM − 1)−1 ] 2 G:n (k, 0, 1) and ++ γM R =

G:n (k, 0, 2) . 2G:n (k, 0, 1)

Moreover, all estimators, which were introduced in Paulauskas and Vaiˇciulis (2013), can be written by means of statistics (6.11) only. In addition, many known estimators can be expressed in terms of statistics (6.11), for example, in Gomes and G:n (k,0,u) was considered. Finally, Paulauskas Martins (2001) the estimator Γ(1+u)G u−1 (k,0,1) :n

and Vaiˇciulis (2017) provided a general method to prove limit theorems for estimators, constructed by means of statistics G:n (k, r, u) and they proved the weak consistency for these statistics for general values of parameters r < γ1 and u > −1. We now summarize some desirable properties such as scale-invariant (SI), locationinvariant (LI), weak consistence (WC), strong consistence (SC), and asymptotic normality (AN) of the above-defined estimators of the EVI in Table 6.1.

6.2 Some methods of threshold selection

159

Table 6.1 Comparison between some desirable properties of HELs for EVI Estimator

SI

LI

WC

SC

AN

Restriction

Year

P γn,k

Yes

Yes

Yes

Yes

Yes

γ∈R

1975

GP E γn,k

Yes

Yes

Yes

Yes

Yes

γ∈R

1994

γ++

Yes

No

Yes

Yes

Yes

γ>0

1975

γ+−

Yes

Yes

Yes

Yes

Yes

γ < − 12

1995

GHE γn,k

Yes

No

Yes

Yes

Yes

γ∈R

2001

++ γM

Yes

No

Yes

Yes

Yes

γ∈R

1989

++ γM R

Yes

No

Yes

Yes

Yes

γ>0

1996

++ γH(β)

Yes

No

Yes

No, ∀β

Yes

γ>0

2014

6.2 Some methods of threshold selection The selection of an appropriate threshold is one of the important concerns of the POT approach and it is still an unsolved problem. This problem is an area of ongoing research in the literature, which can be of critical importance. Traditionally, the threshold is chosen before getting the model. Coles (2001) stated that the threshold selection process always is a trade-off between bias and variance. If a lower threshold, the variance decreases as the number of observations is larger and the bias increases. On the other hand, by taking a too high threshold, the bias decreases while the variance increases as there is not enough data above this threshold. Therefore, the following points should be taken into account when one selects the threshold: 1. The threshold must be sufficiently high, 2. The threshold choice involves balancing between bias and variance, 3. The threshold stability property satisfies GPDL (as well as GPDP). Therefore, a suitable threshold can be chosen when the estimators of the shape parameter keep stable above the threshold. On the other hand, the selection of a suitable threshold is equivalent to the selection of a suitable number, k, of upper order observations to estimate the EVI. Theoretically, k should satisfy the two conditions k −→ 0. k −→ ∞ and n n n Thus, in view of Section 2.4, we can easily see that the kth order statistic Xk:n is the intermediate order statistic. Moreover, this reveals that the choice of k depends on the sample size n and it should increase moderately as the sample size increases. The correct choice of k is crucial for the estimators to have the desirable properties in order to do a proper inference. Therefore, k must be large enough, but not too large. If k is sufficiently low, then few order statistics will be used. This implies that the estimators will have a large variance. On the other hand, if k is sufficiently high,

160

Methods of threshold selection

the number of used order statistics increases, which implies a decrease of estimator variance, but it will result in a larger bias. Therefore, the optimal value of k is then a result of balancing between bias and variance. We now present some methods for selecting k in the estimation of the EVI. In Section 6.1, we presented a series of some estimators for the EVI, γ. These estimators are sensitive toward the choice of k (the number of top observations used in the estimation). Several criteria for choosing k have been proposed, for example, Pickands (1975) suggested a specific criterion for choosing k, together with the estimator (I), but this method was never widely adopted, unlike the estimator itself. Embrechts et al. (1997) proposed a more practical way for the choice of k, the Pickands-plot (PE). Namely, for each value of k = 1, 2, . . . , n, we calculate the PE and plot it against k. The range of the values of k that corresponds to a plateau, i.e., those values of k that correspond to a reasonable horizontal plot, are considered for an elective value of the estimator, but it may be very difficult to choose the range where a stable plot is evident. Also, the same strategy is proposed for the other HELs (III) (known as Hill-plots). However, Embrechts et al. (1997) warned us that the results of Hill-plots can be very misleading. Drees et al. (2000) proved that Hill-plots are most effective only when the underlying DF is Pareto or very close to it. Therefore, some adaptive methods for choosing k were proposed for special classes of DFs (see Beirlant et al., 1996 and Resnick and Stˇ aricˇ a, 1997). Drees and Kaufmann (1998) proposed a sequential approach to construct a consistent estimator of k, that works asymptotically without any prior knowledge about the underlying DF. At the turn of this century, a simple diagnostic method for selecting k has been suggested by Guillou and Hall (2001). An important criterion, which is popular among statisticians, is choosing k which minimizes the asymptotic mean squared error (denoted by AM ) of the HEL estimator, where the AM (HEL estimator) is defined as follows: AM (HEL estimator) = AV (HEL estimator) + (AB(HEL estimator))2 , where AV and AB stand for the asymptotic variance and the asymptotic bias, respectively. Then, the optimality criterion for k, is a value, which minimizes the AM plot, {(k, AM (HEL estimator))}. Actually, many authors have suggested methods for choosing k, but no method has been universally accepted. Generally speaking, no really effective practical algorithm has yet been proposed to get an optimal value of k. Figure 6.1 illustrates the effect of threshold selection on the number of upper observations, k. If the threshold was selected in Region 1, the threshold would be too low (minimum value), this implies more of the used data in the estimation of the EVI. But if the threshold was selected in Region 4 (high), the threshold would exceed the largest observation and so there will no remaining exceedances to be used in the estimation. Thus, we must make a trade-off in choosing the threshold too low and too high. Such a trade-off was removed with an appropriate estimation of the optimal k (minimum-variance reduced bias). Consequently, we had a high variance for high thresholds, i.e., for small k, a high bias for low thresholds, i.e., for large k.

6.2 Some methods of threshold selection

161

Figure 6.1 The relation between threshold selection and k

6.2.1 Graphical methods Several graphical methods have been proposed to assist in threshold selection. In this subsection, we present some graphical plots to select a suitable threshold to fit GPDL, as well as GPDP. Actually, these methods may be considered just as exploratory ways to subjectively choose the value of the suitable threshold based on the stable region of the estimator-plot. Coles (2001) outlined the diagnostics for the choice of threshold. Various diagnostics plots, which are used in evaluating the model fit, are commonly used for choosing the threshold—for example, the mean excesses plot, the Hill plot (HP) under linear and power normalizing constants, and the stability plot (SP), or the threshold choice plot (TCP) under linear normalization. These methods are diagnostic plots drawn before fitting any model (i.e., GPDL and GPDP) to choose an appropriate threshold. In the following, we will describe four diagnostic plots of threshold selection. (I) Mean excess plot under linear normalization (MEPL) The means excess plot method for choosing a suitable threshold, introduced by Davison and Smith (1990), is one of the most widely used methods in the study of the EVT under linear transformation. This method is also known as the mean residual life function, especially in survival analysis. This method basically depends on what is called the sample mean excess plot {(u, en (u)) : X1:n < u < Xn:n }, where en (u) is the sample mean excess function defined by n 

en (u) =

i=1

(Xi − u)I{Xi >u} n  i=1

, I{Xi >u}

where IA is the usual indicator function of the set A, i.e., en (u) is the sum of the excesses over the threshold u, divided by the number of data points, which exceed the threshold u. The sample mean excess function en (u) is an empirical estimate

162

Methods of threshold selection

of the mean excess function, which describes the expected overshoot of a threshold given that exceedance occurs. In particular, if the empirical plot seems to follow a reasonably straight line with a positive gradient above a certain value of u, then this is an indication that the excesses over this threshold follow a GPDL, with γ > 0. On the other hand, a horizontal means excess plot indicates that exponentially distributed data would show γ ∼ = 0, while short-tailed data having a negatively sloped line indicates γ < 0. In the GPDL case, the mean excess function is given by σu + γu , for γ < 1. e(u) = E(X − u|X > u) = 1−γ Therefore, the empirical mean excess plot is a positive straight line with intercept σu/1 − γ and slope γ/1 − γ above the threshold u. The interpretation of the sample mean excess plot under linear normalization was explained in Embrechts et al. (1997). One difficulty with this method is that the mean excess plot typically shows very high variability, particularly at high thresholds. This can make it difficult to decide whether an observed departure from linearity. Under linear normalization, the mean excess functions of various DFs were given by Beirlant et al. (2005). Figure 6.2 gives an example of a MEPL for the River Nidd at Hunsingore Weir from 1934 to 1969 (35 years). The set of data consists of 154 exceedances of the threshold 65 from the evir package in R (see Pfaff et al., 2012)

Figure 6.2 The threshold selection by using the MEPL for the River Nidd data. Vertical dashed lines mark these thresholds

(II) Mean excess plot under power normalization (MEPP) By using the link between the affine and power norming due to Christoph and Falk (1996), see Theorem 2.42, we can address the problematic issue of threshold selection under power normalization. Actually, for any order sample data Y1:n ≤

6.2 Some methods of threshold selection

163

... ≤ Yn:n for which Y1:n > 0, the sample mean excess plot {(log u, e n (u)) : Y1:n < u < Yn:n } where

n 

e n (u) =

i=1

(log Yi − log u)I{log Yi >log u} n  i=1

, I{log Yi >log u}

concerns the order data Xi:n = log Yi:n , i = 1, 2, ..., n. Similarly, for any order sample data Y1:n ≤ ... ≤ Yn:n for which Yn:n < 0, the sample mean excess plot {(log u, e n (u)) : |Yn:n | < u < |Y1:n |}, where

n 

e n (u) =

i=1

(log |u| − log |Yi |)I{log |Yi | u (u is a suitable threshold and ν be another threshold), follow the GPDL. The shape parameters of the two GPDLs are identical (i.e., the two GPDLs have the same

+ ++ 6.3 Comparison between γM and γM R via a simulation study

165

shape), but for the scale parameters, we have the relation σν = σu + γ(ν − u). The modified scale re-parameterization σ ∗ = σν − γν is constant above u, i.e., once the GPDL provides an adequate tail approximation. A stability plot is the procedure for threshold selection to estimate the model at a range of thresholds, and then checking for stability in the parameter estimates. If the GPDL is a reasonable model for excess of a threshold u then excesses of higher threshold ν should also follow a GPDL. The argument suggests plotting both σ ∗ and γˆ together with confidence interval for each of these quantities (for more details see Coles, 2001, and Ribatet, 2009), and to select u as the lowest value for which the estimates remain nearconstant. The TCP represents the points defined by {(u1 , σ ∗ ) and (u1 , γˆ ), where u1 < Xn:n }. The plots of σ ∗ and γˆ against u1 , using a POT package (POT has several tools to define reasonable threshold, for more details see Ribatet, 2009), are shown in Figure 6.5.

Figure 6.5 The threshold selection by using the TCP function for Nidd data The change in pattern for very high thresholds that was observed in the mean excess plot is also apparent here, but the perturbations are now seen to be small relative to sampling errors. Hence, the selected threshold of u = 110 appears reasonable in TCP. The threshold of Nidd data, suggested by Davison and Smith (1990), used a Q-Q-plot for the fitted models at thresholds 70 and 100.

+ ++ 6.3 Comparison between γM and γM R via a simulation study + ++ In this subsection, we compare the performance of the estimators γM and γM R, where the comparison between the other estimators, defined in Section 6.1, may be implemented by using the same procedure. Our procedure of the simulation study, applied in Table 6.2 (cf. Barakat et al., 2018), is as follows:

166

Methods of threshold selection

1. Consider the different values of the EVI, γ = 0.1, 0.3, 0.5, 0.7, 0.9, 1, 1.5, 2. 2. For each of these true values of EVI, generate two groups of random samples each of size 1000, from the GEVL Gγ . Each random sample from the first group (second group) has 100 (300) observations. In view of the −max-stable property, we get Gγ ∈ D (Gγ ). + is bad in most cases. On Table 6.2 shows that the performance of the estimator γM ++ the other hand, the performance of the estimator γM R is good for a large sample size (i.e., for n = 300) in the range 0.1 ≤ γ ≤ 1.5 but otherwise is bad.

Table 6.2 Simulation output for assessing and comparing the estimators + ++ γM and γM R Type II (Fr´ echet),γ > 0, n = 100 Estimator + γM ++ γM R + γM ++ γM R + γM ++ γM R + γM ++ γM R + γM ++ γM R + γM ++ γM R + γM ++ γM R + γM ++ γM R

Type II (Fr´ echet),γ > 0, n = 300

True value of γ

Estimated values

MSEL

Estimated values

MSEL

0.10

0.1435 0.0993

7.60E-02 1.50E-05

0.1961 0.0999

2.96E-01 4.74E-04

0.30

0.0744 0.2994

6.34E-02 8.11E-05

0.0116 0.2999

2.88E-01 6.87E-04

0.50

0.3980 0.4999

1.97E-01 1.20E-04

0.3780 0.4999

1.21E-01 4.13E-04

0.70

0.6994 0.6999

1.60E-03 2.83E-04

0.6998 0.7001

1.18E-03 1.35E-04

0.90

0.8957 0.8987

2.94E-02 7.23E-03

0.8995 0.8993

6.37E-03 6.55E-03

1

0.9741 0.9966

1.48E-01 2.56E-02

0.9971 0.9968

1.45E-01 1.42E-01

1.5

1.8496 2.0191

2.21E+00 3.30E+04

1.2469 1.4559

3.82E-01 2.06E-01

2

5.3012 4.2526

9.03E+04 5.52E+02

4.9769 4.1717

2.37E+00 3.20E+00

7 Estimations under power normalization for the EVI

The main objective of this chapter is developing the modelling of extreme values via the P-model by suggesting a simple technique, which depends on the exponential link between the affine and power norming detected by Christoph and Falk (1996) (Theorem 2.42), to obtain a parallel estimator of the EVI in the P-model for every known estimator to the corresponding parameter in L-mode. An application of this technique yields eight counterparts of Hill estimators under power normalization (HEPs) according to the type of the EVI, threshold, and the data itself. Moreover, two classes of harmonic t-Hill, moment, and moment ratio estimators under power normalization are derived via this technique. On the other hand, four HEPs are derived based on the GPDP, which are more compact and adaptive. These estimators cannot be obtained by using the link between the P-model and L-model. Moreover, these more compact and adaptive estimators based on the GPDP and MLE technique have not any counterparts in the L-model. In addition, a new recent criterion, based on the notion of the coefficient of variation, is discussed, to select the best one of the linear and power models. Finally, all the theoretical results in this chapter are accompanied by comprehensive simulation studies, which are performed in the R-package, to check the performance of the suggested estimators, as well as the proposed criterion.

7.1 Counterparts of Hills estimators under power normalization As we have seen in Chapter 6, there has been considerable interest in the problem of tail estimation. Most of the researches in extreme value theory are concentrated on the heavy-tailed distributions, where γ > 0. Moreover, the most popular tail index estimator is the HEL, due to Hill (1975). This estimator is restricted to the Fr´echet case (γ > 0). Falk (1995) extended it to the Weibull case (γ < 0). The HEL may be considered as the conditional MLE for the GPDL based on the top order observations, to estimate the shape parameter. However, in order to estimate the EVI, γ, we deal with the right-tail F¯ (x) = 1 − F (x), for large x, i.e., we shall deal with the upper order statistics. The EVI can be classified into three different types of tail behaviour. Namely, 1. Cases where γ > 0, which corresponds to the heavy-tailed DFs (Fr´echet domain of attraction) e.g., Pareto and Cauchy DFs. 2. Cases where γ < 0, which corresponds to the short-tailed DFs (Weibull domain

168

Estimations under power normalization for the EVI

of attraction) with finite endpoint, such as uniform and beta DFs. This case comprises the short-tailed DFs. 3. The case when γ ∼ = 0, which corresponds to the medium-tailed DFs (i.e., the Gumbel domain of attraction), e.g., the normal and gamma DFs. Most of the known articles in the literature use the HEL for the positive data case. As we have seen in Section 6.2 by using the well-known relation between the lower and upper order statistics, we could deduce the HEL, when the data is negative (the negative HEL estimator (6.2)). In this section, we extend the HELs to the power normalization. Eight counterparts of HEPs for the tail index are deduced, which are applicable for the cases of long and short-tailed DFs, i.e., γ > 0, γ < 0, respectively. These estimators share many attractive properties of MLE. In addition, we compare the performance of all these HEPs, within a simulation study, to determine which HEPs are appropriate for the best threshold selection.

7.1.1 Counterparts of HEPs In this subsection, we propose eight counterparts of HEPs of (6.1)–(6.4) for the tail index. We use a link between the affine and power norming, using exponential transforms, which was obtained by Christoph and Falk (1996) (see Theorem 2.42). As an interesting result of the simple proof, that we used, the resulted counterparts estimators possess all the desirable properties of the corresponded HELs, such as the simplicity, the consistency, and the normality. The following theorem gives these counterparts for the shape parameter γ in Pancheva’s types corresponding to (2.32) (i.e., for the non-negative data) also for the shape parameter γ in Pancheva’s types corresponding to (2.33) (i.e., for the non-positive data). Theorem 7.1 (cf. Barakat et al., 2017a) Yn:n , we have

For the order sample data Y1:n < ...
0 and a suitable threshold k  log(log Yn−i+1:n ) − log(log Yn−k:n ). Yn−k:n > 1 is γp++ = k1 i=1

2. Non-negative data case. The HEP in the case γ > 0 and a suitable threshold k  log | log Yi:n | − log | log Yk+1:n |. 0 < Yk+1:n < 1 is γp++≺ = k1 i=1

3. Non-negative data case. The HEP in the case γ < −0.5 and a suitable k−1  threshold Yn−k:n > 1 is γp+− = k1 log(log Yn:n − log Yn−i:n ) − log(log Yn:n − i=1

log Yn−k:n ). 4. Non-negative data case. The HEP in the case γ < −0.5 and a suitable threshk−1  log(| log Y1:n | −| log Yi+1:n |)−log(| log Y1:n | − old 0 < Yk+1:n < 1 is γp+−≺ = k1 i=1

| log Yk+1:n |). 5. Negative data case. The HEP in the case γ > 0 and a suitable threshold k  Yk+1:n < −1 is γp−+≺ = k1 log log |Yi:n | − log log |Yk+1:n |. i=1

7.1 Counterparts of Hills estimators under power normalization

169

6. Negative data case. The HEP in the case γ > 0 and a suitable threshold k  log | log |Yn−i+1:n || − log | log |Yn−k:n ||. −1 < Yn−k:n < 0 is γp−+ = k1 i=1

7. Negative data case. The HEP in the case γ < −0.5 and a suitable threshk−1  old Yk+1:n < −1 is γp−−≺ = k1 log(log |Y1:n | − log |Yi+1:n |) − log(log |Y1:n | − i=1

log |Yk+1:n |). 8. Negative data case. The HEP in the case γ < −0.5 and a suitable threshold k−1  log(| log |Yn:n ||−| log |Yn−i:n ||)−log(| log |Yn:n || −1 < Yn−k:n < 0 is γp−− = k1 −| log |Yn−k:n ||),

i=1

where the first superscript, in all the HEPs, denotes the sign of the data, i.e., “+” and “-” denote the non-negative data and the negative data, respectively. Moreover, the second superscript, in all the HEPs, denotes the sign of the EVI, γ, i.e., “+” and “-” denote the non-negative EVI (γ ≥ 0) and the negative EVI (γ < 0), respectively. Finally, the third superscript, in all the HEPs, denotes the relationship between the threshold and ±1, namely if the third superscript is “,” then the threshold is greater than +1, or −1 (this according to the sign of the threshold, i.e., the first superscript), and if the third superscript is “≺,” then the threshold is less than +1, or −1 (this according to the sign of the threshold, i.e., the first superscript). Proof For the non-negative data case and in view of the result of Christoph and Falk (1996) (see Theorem 2.42), if F ∈ Dp (P1;γ ), use the transformation Xn−i+1:n = log Yn−i+1:n > 0, i = 1, 2, ..., k, then we get γp++ = γ++ | X and γp+− = γ+− | X, where γ | X means that the Hill estimator γ is based on data from the RV X. Moreover, by using the transformation Xi:n = log Yi:n < 0, i = 1, 2, ..., k, we get γp++≺ = γ−+ | X and γp+−≺ = γ−− | X. On the other hand, for the negative data case and in view of the result of Christoph and Falk (1996), if F ∈ Dp (P2;γ ), use the transformation Xn−i+1:n = log |Yi:n | > 0, i = 1, 2, ..., k, we get γp−+≺ = γ++ | X and γp−−≺ = γ+− | X. Moreover, by using the transformation Xi:n = log |Yn−i+1:n | < 0, i = 1, 2, ..., k, we get γp−+ = γ−+ | X and γp−− = γ−− | X. Corollary (7.1) As an interesting result of the simple proof method, which is used in Theorem 7.1, we find that most of the resulting estimators will possess all the desirable properties of the corresponding HELs, such as consistency and normality. For example, when k = o(n) → ∞, the estimator γ++ is consistent (cf. Theorem 3.2.2 in de Haan and Ferriera, 2006). Moreover, the asymptotic normality of this estimator was proved under appropriate conditions, which are imposed on F (cf. Theorem 3.2.5 in de Haan and Ferriera, 2006). By virtue of the proof of Theorem 7.1, we can see that both of the estimators γp++ and γp−+≺ satisfy these desirable properties. Moreover, the same two asymptotic properties were proved, under appropriate conditions, for the estimator γ+− (cf. Theorem 3.6.4 in de Haan and Ferriera, 2006). Therefore, the proof of Theorem 7.1 shows that both of the estimators γp+− and γp−−≺ satisfy these desirable properties.

170

Estimations under power normalization for the EVI

7.2 Hill plot under power normalization (HPP) The Hill plot under power normalization is a plot of {(k, γp... ), 3 ≤ k ≤ n− 2}, where γp... is the HEPs, which is constructed from the k largest order statistics of a sample of size n. Then, we infer a value in a stable region in the graph. For example, under power normalization and in the positive data case, where γ > 0, let n = L + 2 + l, where L + 2 is the number of observations, which are greater than 1 and l is the number of observations, which are less than 1. Then, we can define the following two Hill plots. (++)

) : 1 ≤ k ≤ L}, where k runs over all the rank values among all order 1. {(k, γp observations, which are greater than 1, except the last two greatest values, e.g., k = 1 is the rank of the least order value greater than 1. (++≺) ) : 3 ≤ k ≤ l}, where k runs over all the rank values among all order 2. {(k, γp observations, which are less than 1, e.g., k = 3 is the rank of the third order value and k = l is the rank of the greatest order value less than 1. As an illustrative example, we select a random sample (of size 300) out of 1000 generated random samples for the true value γ = 0.5 in Table 7.1, and apply this technique on this sample. The total number of observations that are greater than 1 is 159, while the number of observations that are less than 1 is 141. The left panel for the plot {(k, γp++ ) : 1 ≤ k ≤ 157}, in Figure 7.1, shows that the suitable threshold is at k = 112, i.e., the total number of observations included in the estimation is 47 = 159−112, the value of this threshold is 4.7194, and the corresponding estimate value of γ is 0.5001. The right panel for the plot {(k, γp++≺ ) : 3 ≤ k ≤ 141}, in Figure 7.1, shows that the suitable threshold is at k = 82, i.e., the total number of observations included in the estimation is 82, the value of this threshold is 0.6254 and the corresponding estimate value of γ is 0.4998. These results show that the estimate γp++ is a little bit better than γp++≺ , for all values of γ and all different sample sizes.

Figure 7.1 The left panel is the Hill plot of γp++ , with k = 112, and the right panel is the Hill plot of γp++≺ , with k = 82

7.3 Simulation study

171

7.3 Simulation study In this section, we present a simulation study, which shows that the eight suggested counterparts of HELs suggest an efficient technique for estimating the tail index under power normalization. We consider the different values 0.1, 0.3, 0.5, 0.7, 0.9, 1, 1.5, 2, −0.6, −0.7, −0.8, −0.9, −1, −1.5, −2. of the parameter γ. For each of these true values of γ, we generate 1000 random samples each of size 100 and 300 from each of the two GEVPs (2.32) and (2.33) (with a = b = 1), see Tables 7.1–7.4. For all the HEPs defined in Theorems 7.1, we compute the corresponding estimates using the order generated data for all possible different values of thresholds; e.g., for the HEP γp++ , we compute the corresponding estimate from the least order value, which is greater than 1, to the (n − 2)th order value (n = 100 or n = 300), and for the HEP γp++≺ , we compute the corresponding estimate using the order generated data for different values of thresholds starting from the third (lower) order value to the greatest order value, which is less than 1. Furthermore, for each true value, we select the corresponding best value of threshold (that gives the closest HEP value to the true value of γ). Tables 7.1-7.4 summarize the results of the eight suggested HEPs, i.e., the counterparts of HELs, such that for each true value of γ, we write the average value of these best values of the HEPs over the 1000 random samples and attached with the corresponding MSE. Moreover, the ratio of upper (or lower) observations, which exceed (or fail behind) the best threshold value and are used in the computation of the Hill estimates, i.e., nk , is embedded in each of the Tables 7.1–7.4. However, in this section we mainly focus our attention to assess the efficiency of the derived HEPs rather than the problematic issue of threshold selection. Tables 7.1–7.4 show that all the HEPs give good performance, except γp+−≺ and γp−−≺ .

172

Estimations under power normalization for the EVI

Table 7.1 Simulation output for assessing the estimators γp++ and γp++≺ True shape

P1,γ , γ > 0, n = 100

P1,γ , γ > 0, n = 300

γ

HEP

k/n Estimated values

MSE

k/n Estimated values

0.10

γp++ γp++≺

MSE

0.06 0.04

0.1003 0.1538

8.02E-04 9.27E-03

0.06 0.01

0.0999 0.1295

2.09E-04 4.89E-03

0.30

γp++ γp++≺

0.14 0.08

0.2994 0.2978

7.21E-04 1.21E-03

0.15 0.04

0.2999 0.2987

4.07E-04 6.85E-04

0.50

γp++ γp++≺

0.22 0.14

0.4986 0.4958

1.40E-03 2.62E-03

0.23 0.06

0.4997 0.4960

6.91E-04 1.57E-03

0.70

γp++ γp++≺

0.28 0.18

0.6972 0.6827

2.32E-03 6.78E-03

0.28 0.08

0.6997 0.6924

9.85E-04 3.71E-03

0.90

γp++ γp++≺

0.32 0.20

0.8964 0.8503

3.84E-03 1.63E-02

0.32 0.09

0.8992 0.8678

1.31E-03 8.64E-03

1

γp++ γp++≺

0.33 0.21

0.9945 0.9346

5.32E-03 2.57E-02

0.34 0.09

0.9988 0.9609

1.80E-03 1.28E-02

1.5

γp++ γp++≺

0.40 0.23

1.4783 1.2627

1.30E-02 1.33E-01

0.41 0.10

1.4957 1.3254

5.98E-03 8.51E-02

2

γp++ γp++≺

0.44 0.24

1.9414 1.5677

2.71E-02 3.56E-01

0.45 0.11

1.9780 1.6520

1.19E-02 2.55E-01

Table 7.2 Simulation output for assessing the estimators γp+− and γp+−≺ True shape

P1,γ , γ < 0, n = 100

P1,γ , γ < 0, n = 300

γ

HEP

k/n Estimated values

MSE

k/n Estimated values

−0.60

γp+− γp+−≺

MSE

0.12 0.12

-0.6203 -0.5593

9.98E-03 2.59E-02

0.03 0.04

-0.6137 -0.5764

1.48E-02 2.00E-02

−0.70

γp+− γp+−≺

0.13 0.10

-0.7019 -0.5579

4.31E-03 5.51E-02

0.04 0.03

-0.7028 -0.5679

7.36E-03 4.57E-02

−0.80

γp+− γp+−≺

0.17 0.08

-0.8055 -0.5485

2.52E-03 1.01E-01

0.05 0.03

-0.8007 -0.5554

3.38E-03 8.98E-02

−0.90

γp+− γp+−≺

0.17 0.08

-0.8995 -0.5391

1.94E-03 1.66E-01

0.06 0.02

-0.9011 -0.5470

2.50E-03 1.53E-01

−1

γp+− γp+−≺

0.21 0.07

-1.0007 -0.5266

1.60E-03 2.52E-01

0.09 0.02

-1.0027 -0.5375

7.36E-04 2.37E-01

−1.5

γp+− γp+−≺

0.31 0.06

-1.4808 -0.4673

1.83E-02 1.03E+00

0.15 0.02

-1.4883 -0.4796

1.20E-02 9.90E-01

−2

γp+− γp+−≺

0.33 0.06

-1.8103 -0.4201

8.18E-02 2.38E+00

0.15 0.02

-1.8462 -0.4482

6.33E-02 2.30E+00

7.3 Simulation study

173

Table 7.3 Simulation output for assessing the estimators γp−+≺ and γp−+ True shape

P2,γ , γ > 0, n = 100

P2,γ , γ > 0, n = 300

γ

HEP

k/n Estimated values

MSE

k/n Estimated values

0.10

γp−+≺ γp−+

MSE

0.04 0.05

0.1218 0.1011

3.34E-03 4.86E-04

0.02 0.05

0.1013 0.0999

2.34E-04 4.77E-05

0.30

γp−+≺ γp−+

0.13 0.14

0.2999 0.2998

4.87E-04 4.24E-04

0.14 0.15

0.3001 0.3000

6.80E-05 6.14E-05

0.50

γp−+≺ γp−+

0.23 0.22

0.5001 0.5002

9.72E-04 5.53E-04

0.23 0.22

0.5003 0.4998

1.25E-04 9.24E-05

0.70

γp−+≺ γp−+

0.29 0.28

0.6996 0.7001

2.45E-03 9.92E-04

0.29 0.28

0.6999 0.7001

2.58E-04 1.10E-04

0.90

γp−+≺ γp−+

0.32 0.32

0.8968 0.8998

5.50E-03 1.19E-03

0.33 0.33

0.8998 0.9003

5.88E-04 1.47E-04

1

γp−+≺ γp−+

0.34 0.34

0.9946 0.9992

7.74E-03 1.46E-03

0.34 0.34

1.0000 0.9998

8.10E-04 1.72E-04

1.5

γp−+≺ γp−+

0.38 0.41

1.4891 1.4993

3.84E-02 2.92E-03

0.38 0.41

1.4997 1.4999

4.41E-03 3.69E-04

2

γp−+≺ γp−+

0.39 0.45

1.9541 1.9980

1.64E-01 4.67E-03

0.40 0.46

1.9905 2.0002

1.79E-02 5.37E-04

Table 7.4 Simulation output for assessing the estimators γp−−≺ and γp−− True shape

P2,γ , γ < 0, n = 100

P2,γ , γ < 0, n = 300

γ

HEP

k/n Estimated values

MSE

k/n Estimated values

-0.60

γp−−≺ γp−−

MSE

0.11 0.09

-0.5906 -0.6014

1.02E-02 2.69E-02

0.04 0.03

-0.5911 -0.5983

1.13E-02 2.98E-02

-0.70

γp−−≺ γp−−

0.09 0.12

-0.6457 -0.6996

2.34E-02 1.54E-02

0.03 0.04

-0.6421 -0.7004

2.40E-02 2.81E-02

-0.80

γp−−≺ γp−−

0.08 0.15

-0.6595 -0.8004

4.71E-02 1.12E-02

0.02 0.05

-0.6563 -0.7997

4.75E-02 2.10E-02

-0.90

γp−−≺ γp−−

0.06 0.18

-0.6715 -0.8992

8.12E-02 6.84E-03

0.02 0.07

-0.6684 -0.8987

8.10E-02 1.22E-02

-1

γp−−≺ γp−−

0.05 0.21

-0.6796 -0.9977

1.26E-01 6.00E-03

0.02 0.09

-0.6862 -0.9993

1.25E-01 7.73E-03

-1.5

γp−−≺ γp−−

0.04 0.31

-0.7981 -1.4582

4.91E-01 2.41E-02

0.01 0.14

-0.8024 -1.4801

4.77E-01 1.31E-02

-2

γp−−≺ γp−−

0.03 0.33

-0.9529 -1.7897

1.07E+00 9.73E-02

0.01 0.15

-0.9649 -1.8198

1.02E+00 6.73E-02

174

Estimations under power normalization for the EVI

7.4 Harmonic t-Hill estimator under power normalization The following theorem gives two counterparts of estimators under power normalization (HMEP) of the HMEL defined in (6.9). Theorem 7.2 (cf. Barakat et al., 2017a) Yn:n , we have

For the order sample data Y1:n ≤ ... ≤

1. The HMEP, in the case γ > 0 and a suitable threshold Yn−k:n > 1, is ⎧# ⎫ β−1 $−1 k  ⎬ 1 ⎨ −1 log Yn−k:n ++ γP H(β) = −1 , k ⎭ β−1⎩ log Yn−i+1:n i=1

β > 0, β = 1, 1 ≤ k ≤ n − 1.

(7.1)

2. The HMEP, in the case γ > 0 and a suitable threshold 0 < Yk+1:n < 1, is ⎫ ⎧# β−1 $−1 k  ⎬ 1 ⎨ −1 log Yk+1:n ++≺ −1 , γP H(β) = k ⎭ β−1⎩ log Yi:n i=1

β > 0, β = 1, 1 ≤ k ≤ n − 1,

(7.2)

where the superscripts, in the two HMEPs (7.1) and (7.2), are defined exactly as those in Theorem 7.1. Proof In view of the result of Christoph and Falk (1996), if F ∈ Dp (P1;γ ), use the transformation Xn−i+1:n = log Yn−i+1:n > 0, i = 1, 2, ..., k, for the estimator γP++ H and the transformation Xi:n = log Yi:n < 0, i = 1, 2, ..., k, for the estimator γP++≺ H , we can easily see that the proof of Theorem 7.2 is similar to the proof of Theorem 7.1 (Parts 1 and 2). ++≺ Corollary (7.2) For β → 1, γP++ H(β) (or γP H(β) ) is interpreted as the limit, when ++ ++ ++≺ β → 1, i.e., γP H(1) = limβ→1 γP H(β) = γp++ (or γP++≺ H(1) = limβ→1 γP H(β) = ++≺ γp ) defined in Theorem 7.2 (Parts 1 and 2). The harmonic t-Hill estimators, ++≺ which are HMEPs with β = 2, are denoted by γP++ H and γP H . ++ were Corollary (7.3) The consistency and normality of the estimator γH(β) proved by Beran et al. (2014) in Theorems 1 and 2, respectively. Therefore, as an important result of the simple proof method used in Theorem 7.2, the proof of Theorem 7.2 shows that the estimator γP++ H(β) satisfies the same two desirable properties.

7.5 Moment and moment-ratio estimators under power normalization In Section 6.1, we discussed the moment estimator under linear normalization (MEL) for positive data and for γ ∈ R (see equation (6.6)). The following theorem gives the corresponding moment estimators under the power normalization (MEP).

7.5 Moment and moment-ratio estimators under power normalization Theorem 7.3 (cf. Barakat et al., 2018) data. Then, we have

Let Y1:n ≤ ... ≤ Yn:n be the order sample

1. The MEP, for γ ∈ R and a suitable threshold Yn−k:n > 1, is  −1 ˆ2 ˆ P :1 + 1 − 1 1 − MP :1 , γP+ = M M ˆ P :2 2 M ˆ P :j = where M

1 k

k  i=1

175

(7.3)

(log(log Yn−i+1:n ) − log(log Yn−k:n ))j , j = 1, 2.

2. The MEP, for γ ∈ R and a suitable threshold 0 < Yk+1:n < 1, where 0 < Y1:n , is  ˇ 2 −1 1 M ˇ 1 − P :1 γP+≺ , (7.4) M = MP :1 + 1 − ˇ P :2 2 M ˇ P :j = where M

1 k

k  i=1

(log | log Yi:n | − log | log Yk+1:n |)j , j = 1, 2.

Proof First, by using the well-known relation between the lower and upper order statistics, we can easily deduce the MEL for γ ∈ R when the data is negative. − , is based on the extreme value theory In this case the corresponding MEL, γM for the minimum case. More specifically, the corresponding MEL is based on the lower threshold and the minimum extreme value theory (its GEVL is G γ (x) = 1 − Gγ (−x)). Namely, define for j = 1, 2, ... k

ˇ :j = 1 (log |Xi:n | − log |Xk+1:n |)j . M k i=1

(7.5)

ˇ :1 is the Hill estimator γ −+ of the parameter γ+ = max{0, γ}, which is Clearly, M  defined in Lemma 6.1 (for negative data) and is valid for γ > 0. Then, the MEL for γ = γ+ + γ− ∈ R is defined by  ˇ 2 −1 M 1 − 1 − :1 = γ−+ + γˇ− , where γˇ− = 1 − . (7.6) γM ˇ :2 2 M Now, for the positive data case and in view of the result of Christoph and Falk (1996), if F ∈ DP (P1;γ ), use the transformation Xn−i+1:n = log Yn−i+1:n > 0, i = ++ ++ ++ 1, 2, ..., k, we get γP+ M = γM | X, where γM | X means that the estimator γM is based on data from the rv X. Moreover, by using the transformation Xi:n = − − log Yi:n < 0, i = 1, 2, ..., k, we get γP+≺ M = γM | X, where γM | X means that the − estimator γM is based on data from the RV X. This completes the proof of the theorem. Remark We can define the following two Hill plots based on the two estimators (7.3) and (7.4): 1. {(k, γP+ M ) : 1 ≤ k ≤ T − 2}, where k runs over all the rank values among all order observations, which are greater than 1, except the last two greatest values. +≺ 2. {(k, γP M ) : 3 ≤ k ≤ t}, where k runs over all the rank values among all order observations, which are less than 1.

176

Estimations under power normalization for the EVI

We have seen that the Hill estimator is a moment estimator, based on the first conditional moment of the highest logarithmically transformed data. The following theorem gives the corresponding moment-ratio estimators under power normalization (MREP) of the MREL defined by the equation (7.5). Theorem 7.4 (cf. Barakat et al., 2018) data. Then, we have

Let Y1:n ≤ ... ≤ Yn:n be the order sample

1. The MREP in the case γ > 0 and a suitable threshold Yn−k:n > 1 is k 

γP++ MR

ˆ P :2 M = = ˆ P :1 2M

(log(log Yn−i+1:n ) − log(log Yn−k:n ))2

i=1 k 

2

i=1

.

(7.7)

log(log Yn−i+1:n ) − log(log Yn−k:n )

2. The MREP in the case γ > 0 and a suitable threshold 0 < Yk+1:n < 1, where 0 < Y1:n , is k 

γP++≺ MR

ˇ P :2 M = = ˇ P :1 2M

(log | log Yi:n | − log | log Yk+1:n |)2

i=1 k 

2

i=1

.

(7.8)

log | log Yi:n | − log | log Yk+1:n |

Proof First, by using the well-known relation between the lower and upper order statistics, we can easily deduce the MREL for γ > 0, when the data that we have is −+ negative. In this case the corresponding MREL, γM R , is based on lower threshold and the minimum EVT (its GEVL is G γ (x) = 1 − Gγ (−x)). Namely, MREL for γ > 0 is defined by −+ γM R =

k 2 ˇ :2 M i=1 (log |Xi:n | − log |Xk+1:n |) . =  k ˇ :1 2M 2 i=1 (log |Xi:n | − log |Xk+1:n |)

(7.9)

Now, for the positive data case and in view of the result of Christoph and Falk (1996), if F ∈ DP (P1;γ ), use the transformation Xn−i+1:n = log Yn−i+1:n > 0, i = ++ ++ 1, 2, ..., k, we get γP++ M R = γM R | X, where γM R | X means that the estimator ++ is based on data from the RV X. Moreover, by using the transformation γM R −+ −+ Xi:n = log Yi:n < 0, i = 1, 2, ..., k, we get γP++≺ = γM MR R | X, where γM R | X −+ means that the estimator γM R is based on data from the RV X. This completes the proof of the theorem. For negative data, we can also obtain the moment and moment ratio estimators under the power normalization, by using the result of Christoph and Falk (1996) (see Lemma 1.3.3), when the right endpoint of F is negative (i.e., for a negative data case). Theorem 7.5 (cf. Barakat et al., 2018) data.

Let Y1:n ≤ ... ≤ Yn:n be the order sample

1. The MEP for γ ∈  and a suitable threshold −1 < Yn−k:n < 0, where Yn:n < 0,

7.5 Moment and moment-ratio estimators under power normalization is ˆ γP− M = NP :1 + 1 − ˆP :j = where N

1 k

k  i=1

1 2

 1−

ˆ2 N P :1 ˆ NP :2

−1 ,

(7.10)

(log | log |Yn−i+1:n || − log | log |Yn−k:n ||)j , j = 1, 2 (note that

ˆP :1 = γ −+ ). N P 2. The MEP for γ ∈  and a suitable threshold Yk+1:n < −1 is  ˇ 2 −1 ˇP :1 + 1 − 1 1 − NP :1 γP−≺ = N , M ˇP :2 2 N ˇP :j = where N

1 k

177

k  i=1

(7.11)

ˇP :1 = (log log |Yi:n | − log log |Yk+1:n |)j , j = 1, 2 (note that N

γp−+≺ ). 3. The MREP in the case γ > 0 and a suitable threshold −1 < Yn−k:n < 0, where Yn:n < 0, is k 

γP−+ MR

ˆP :2 N = = ˆP :1 2N

(log | log |Yn−i+1:n || − log | log |Yn−k:n ||)2

i=1 k 

2

i=1

.

(7.12)

(log | log |Yn−i+1:n || − log | log |Yn−k:n ||)

4. The MREP in the case γ > 0 and a suitable threshold Yk+1:n < −1 is k 

γP−+≺ MR

ˇP :2 N = = ˇP :1 2N

(log log |Yi:n | − log log |Yk+1:n |)2

i=1 k 

2

i=1

.

(7.13)

(log log |Yi:n | − log log |Yk+1:n |)

Proof For the negative data case and in view of the result of Christoph and Falk (1996), if F ∈ Dp (P2;γ ), use the transformation Xn−i+1:n = log |Yi:n | > 0, i = ++ −+≺ ++ 1, 2, ..., k, we get γP−≺ M = γM | X and γP M R = γM R | X. Moreover, by using the − transformation Xi:n = log |Yn−i+1:n | < 0, i = 1, 2, ..., k, we get γP− M = γM | X and −+ = γ | X. γP−+ MR M R As an interesting result of the simple proof method used in Theorems 7.3–7.5, we find that most of the proposed estimators (defined in (7.3)–(7.13)) will possess all the asymptotic properties of the corresponding estimators under linear normalization. More specifically, Dekkers et al. (1989) proved consistency and asymptotic + . By virtue of the proof of Theorems 7.3–7.5, the relations normality of the MEL γM + γP+ = γ log X n−i+1:n , i = 1, 2, ..., k, where log Xn−i+1:n > 0, i = 1, 2, ..., k, M M + and γP− M = γM log |Xi:n |, i = 1, 2, ..., k, where log |Xi:n | > 0, i = 1, 2, ..., k, im− mediately yield that the corresponding estimators γP+ M and γP M have the same + asymptotic properties of the MEL γM . Moreover, Danielsson et al. (1996) proved that the MREL has a lower asymptotic squared bias in comparison with the Hill estimator γ++ when the two estimators are evaluated at the same threshold, but the convergence rates are still the same. By virtue of the proof of Theorems 7.3–7.5, the

178

Estimations under power normalization for the EVI γP++ MR

++ = γM relations R log Xn−i+1:n , i = 1, 2, ..., k, where log Xn−i+1:n > 0, i = −+≺ ++ 1, 2, ..., k, and γP M R = γM R log |Xi:n |, i = 1, 2, ..., k, where log |Xi:n | > 0, i = −+≺ 1, 2, ..., k, immediately yield that the corresponding estimators γP++ M R and γP M R ++ have the same asymptotic properties of the MREL γM R .

7.6 Further contemporaneous Hill estimators under power normalization All the defined Hill estimators in VIII, Section 6.1, can now be easily transformed into the P-model, by applying the same method given in Theorems 7.1–7.5 and by considering the following statistics: 1. In the case γ > 0 and a suitable threshold Xn−k:n > 1, consider the two statistics   k 1

log Xn−i+1:n fr k i=1 log Xn−k:n

HP:n (k, r) = and G :n (k, r, u) =

  k 1

log Xn−i+1:n . gr,u k i=1 log Xn−k:n

2. In the case γ > 0 and a suitable threshold 0 < Xk+1:n < 1, where 0 < X1:n , consider the two statistics HP≺:n (k, r) = and G≺ :n (k, r, u) =

  k 1

log Xi:n fr k i=1 log Xk+1:n   k log Xi:n 1

. gr,u k i=1 log Xk+1:n

Remark Clearly all the Hill estimators defined in Chapter 6, for the L-model are scale-invariant, but not location-invariant. On the other hand, all the Hill estimators for the P-model defined in this Chapter are power invariant, but not scale invariant.

7.7 Four HEPs based on GPDP In the previous sections by using the link between the affine and power norming, several HEPs under power normalization for the tail index (the non-zero extreme value index) were presented. Each of these given estimators has a counterpart in the L-model. In this section, four compact and adaptive HEPs are derived based on the GPDP. All these given estimators do not have counterparts in the L-model. Moreover, in Subsection 7.7.2, a simulation study is conducted to asses the performance of these estimators.

7.7 Four HEPs based on GPDP

179

7.7.1 Four HEPs that do not have counterparts in the L-model The next theorem gives four HEPs based on GPDP that have no counterparts in the L-model. Theorem 7.6 (cf. Barakat et al., 2017a) Part I, the case γ > 0. For the order sample data Y1:n ≤ ... ≤ Yn:n , let Yn−k:n > 0 be a suitable threshold. Then, the HEP is given by γp++ =

k 1

log (1 + log Xn−i+1:n − log Xn−k:n ) . k i=1

(7.14)

Moreover, for the order sample data Y1:n ≤ ... ≤ Yn:n < 0, let Yn−k:n be a suitable threshold. Then, the HEP is given by γp−+ =

k 1

log (1 − log |Xn−i+1:n | + log |Xn−k:n |) . k i=1

(7.15)

Part II, the case γ < −0.5. For the order sample data Y1:n ≤ ... ≤ Yn:n , let Yn−k:n > 0 be a suitable threshold. Then, the HEP is given by k−1 & % 1

log 1 + (log Xn:n − log Xn−i:n )−1 − (log Xn:n − log Xn−k:n )−1 . k i=1 (7.16) Moreover, for the order sample data Y1:n ≤ ... ≤ Yn:n < 0, let Yn−k:n be a suitable threshold. Then, the HEP is given by

γp+− = −

k−1 & % 1

log 1 − (log |Xn:n |−log |Xn−i:n |)−1 +(log |Xn:n |−log |Xn−k:n |)−1 , k i=1 (7.17) where the first superscript, in all the HEPs, denotes the sign of the data, i.e., “+” and “-” denote the non-negative data and the negative data, respectively. Moreover, the second superscript, in all the HEPs, denotes the sign of the EVI, γ, i.e., “+” and “-” denote the non-negative EVI (γ ≥ 0) and the negative EVI (γ < 0), respectively.

γp−− = −

Proof For obtaining the HEP in (7.14), we can assume that the data has the standard left truncated GPDP (with left truncated point u = Yn−k:n > 0, for large n and k) ¯ 1 (y) = P (Y > y) = 1 − Q1 (y) = (1 + log y )−α , y > u, Q u where α = γ −1 and u = Yn−k:n . Then, η = log(1 + log Yu ) has negative exponential −1 DF, with mean α−1 . Thus, the MLE for α is α ˆ = η¯−1 = γp++ . Similarly, we can assume that the data considered in (7.15) has the standard left truncated GPDP (with left truncated point u = Yn−k:n < 0, for large n and k) ¯ 2 (y) = P (Y > y) = 1 − Q2 (y) = (1 − log y )−α , u ≤ y < 0, Q u where α = γ −1 and u = Yn−k:n < 0. Then, ζ = log(1 − log

Y u)

has negative

180

Estimations under power normalization for the EVI −1

ˆ = ζ¯−1 = γp−+ . A exponential DF, with mean α−1 . Thus, the MLE for α is α direct way to get the HEP (7.16) is by noting that (after some routine calculations) if FX (x) = P (X ≤ x) ∈ Dp (P1,γ ), with γ > 0, then the upper endpoint x0 of Fξ (x) 1 is positive finite and Fξ (x) = P (ξ ≤ x) ∈ Dp (G1,−γ ), where log ξ = log x0 −log X (cf. Mohan and Ravi, 1993, see also Theorem 2.40 in Section 2). Therefore, by using the HEP in (7.14) for the data of ξ, after replacing x0 by its estimate Xn:n provided that γ < −0.5 (see Remark 4.5.5 in de Haan and Ferriera, 2006), we get the HEP in (7.16). Finally, we get the HEP (7.17) by noting that (after some routine calculations) if FX (x) = P (X ≤ x) ∈ Dp (P2,γ ), with γ > 0, then the upper endpoint x0 of Fς (x) is negative finite and Fς (x) = P (ς ≤ x) ∈ Dp (G2,−γ ), where 1 log |ς| = log |x0 |−log |X| (cf. Mohan and Ravi, 1993, see also Theorem 2.40 in Section 2). Therefore, by using the HEP in (7.15) for the data of ς, after replacing x0 by its estimate Xn:n provided that γ < −0.5 (see Remark 4.5.5 in de Haan and Ferriera, 2006), we get the HEP in (7.17). Remark Clearly, all the HEPs defined in Theorem 7.6 are more compact than those defined in Theorem 7.1, for being that any HEP defined in Theorem 7.6 comprises the data shape of two HEPs defined in Theorem 7.1. Moreover, the estimators defined in Theorem 7.6 share many attractive properties of MLE such as simplicity, consistency, and the normality.

7.7.2 Simulation study To examine the properties of the positive and negative HEPs defined in (7.14)–(7.17) for positive and negative extreme data, we use a simulation study, using different samples sizes of n = 100 and 300, drawn from P1;γ (x; 1, 1), defined in (2.32) (from which the data is positive) and P2;γ (x; 1, 1), defined in (2.33) (from which the data is negative), with different positive values of the shape parameter 0.1 ≤ γ ≤ 2 and also different negative values −2 ≤ γ ≤ −0.6, where the number of replications is 1000. In each of Tables 7.5–7.8, the first three columns give the average values of k, the HEPs and the MSE, respectively, when n = 100, with 1000 replications. Moreover, the second three columns give the average values of k, the HEPs and the MSE, respectively, when n = 300, with 1000 replications. In Tables 7.5–7.8, the best performance is recorded when the MSE is minimum and in this case, the corresponding value k will be taken as an appropriate choice to determine the suitable threshold. Therefore, we look for the proper number k of upper order observations used for estimating the shape parameter. The selection of the number of upper extremes k, or the most desirable threshold u, is taken based on the value that minimizes the MSE. Tables 7.5–7.8 summarize some estimation results of the shape parameter via the positive and negative of the adaptive four HEPs. From the result of Tables 7.5–7.8, one can easily notice that the selections of the best value of the upper (lower) observations (and thus the suitable threshold value) are classified in the following manner: For positive data (i.e., Tables 7.5 and 7.7), the best value of the upper observations, k, is obtained when the ratio nk (i.e., the ratio between the used observations in the estimation and the total number of the observations) belongs to the range [0.04, 0.28]. Moreover, for negative data, the best value of the lower observations is obtained when that ratio belongs to the range [0.04, 0.40]. On the other hand, in both cases given in Tables 7.6 and 7.8, in both the positive and the negative

7.7 Four HEPs based on GPDP

181

data, the best value of the upper (lower) observations is obtained when the ratio belongs to the range [0.18, 0.39].

Table 7.5 Simulation output for assessing the HEP γp++ , γ > 0 True shape γ

P1,γ , γ > 0, n = 100 k/n Estimated values

MSE

P1,γ , γ > 0, n = 300 k/n Estimated values

MSE

0.10

0.04

0.1199

3.03E-03

0.02

0.1001

1.68E-04

0.30

0.07

0.2995

8.53E-04

0.07

0.3002

1.07E-04

0.50

0.14

0.5000

4.90E-04

0.13

0.5000

6.95E-05

0.70

0.39

0.7000

5.31E-05

0.29

0.7000

9.02E-07

0.90

0.28

0.8989

7.97E-04

0.26

0.8947

9.93E-04

1

0.21

0.9962

2.18E-03

0.20

0.9675

3.15E-03

1.5

0.12

1.4832

8.07E-03

0.11

1.4433

9.07E-03

2

0.12

1.9845

7.09E-03

0.09

1.9956

2.34E-03

Table 7.6 Simulation output for assessing the HEP γp+− , γ < 0 True shape γ

P1,γ , γ < 0, n = 100 k/n Estimated values

MSE

P1,γ , γ < 0, n = 300 k/n Estimated values

MSE

-0.60

0.28

-0.6002

2.19E-03

0.28

-0.6002

2.62E-04

-0.70

0.27

-0.7005

4.17E-03

0.27

-0.6994

4.22E-04

-0.80

0.26

-0.7999

6.09E-03

0.24

-0.8002

6.33E-04

-0.90

0.24

-0.8991

9.55E-03

0.23

-0.9006

6.77E-04

-1

0.23

-1.0003

1.39E-02

0.22

-0.9995

1.43E-03

-1.5

0.20

-1.4985

3.57E-01

0.19

-1.4997

2.44E-02

-2

0.18

-2.0029

5.06E-01

0.18

-1.9993

3.80E-02

k n

182

Estimations under power normalization for the EVI

Table 7.7 Simulation output for assessing the HEP γp−+ , γ > 0 True shape γ

P2,γ , γ > 0, n = 100 k/n Estimated values

MSE

P2,γ , γ > 0, n = 300 k/n Estimated values

MSE

0.10

0.04

0.1209

3.03E-03

0.02

0.1001

1.67E-04

0.30

0.08

0.3006

7.94E-04

0.07

0.3001

1.05E-04

0.50

0.14

0.5006

3.99E-04

0.13

0.5000

6.84E-05

0.70

0.39

0.7000

5.01E-05

0.29

0.7000

8.97E-06

0.90

0.28

0.8990

7.92E-04

0.29

0.8948

9.94E-04

1

0.21

0.9970

2.17E-03

0.20

0.9675

3.15E-03

1.5

0.12

1.4897

7.78E-03

0.11

1.4445

9.02E-03

2

0.11

1.9967

6.02E-03

0.09

1.9982

2.15E-03

Table 7.8 Simulation output for assessing the HEP γp−− , γ < 0 True shape γ

P2,γ , γ < 0, n = 100 k/n Estimated values

MSE

P2,γ , γ < 0, n = 300 k/n Estimated values

MSE

-0.60

0.28

-0.5965

2.20E-03

0.28

-0.5996

2.70E-04

-0.70

0.27

-0.6950

3.32E-03

0.26

-0.6984

4.32E-04

-0.80

0.26

-0.7885

6.12E-03

0.24

-0.7994

6.28E-04

-0.90

0.24

-0.8837

9.88E-03

0.23

-0.8990

7.52E-04

-1

0.23

-0.9866

1.36E-02

0.22

-0.9975

1.64E-03

-1.5

0.21

-1.4481

5.97E-02

0.19

-1.4899

1.52E-02

-2

0.20

-1.8732

2.07E-01

0.17

-1.9663

5.21E-02

7.8 New Hill plot (NHP)

183

7.8 New Hill plot (NHP) By using the Hill plot we construct a plot of the points {(k, γˆ ), 3 ≤ k ≤ n − 2}, where γˆ is a HE and then infer the value of the threshold from a stable region in the graph, or an area of k, where the graph is an almost horizontal segment line. The traditional Hill plot is most effective only when the underlying distribution is the Pareto distribution or very close to the Pareto distribution. For the Pareto distribution, one expects the Hill plot to be close to EVI γ in the right side of the plot. In order to use the traditional Hill plot, there are at least two important decisions to be made. First, a sensible range of k has to be determined. Secondly, a difficult choice, namely deciding on the specific value of k inside the range which gives the point estimate. For selection of the optimal threshold value, based on the statistical properties of the HEs itself, we need a quick, automatic, clear-cut choice of k. The proposed criterion for the choice of k is minimizing the standard error (SE) of HEs in the Hill plot. In the case of the NHP the graph is put in a better perspective and we can clearly see that for an important number of upper order statistics, i.e., the NHP leaves us with less doubt about the estimate k and this is very useful in practice. In this case, the choice is the minimization of the standard error (SE of the Hill) of the estimated γ. The optimal value is then defined as opt kn,k = min(SE(Hill)). Figure 7.2 gives the same example of the River Nidd data when we use the New Hill plot to determine the best threshold. In Figure 7.2, the left panel is the Hill plot of γ++ , with k(u)=126(73) and the right panel is the Hill plot of γp++ , with k(u)=6(189). In each graph, the horizontal dashed line represents the true value of the shape parameter and the vertical dashed lines mark the best thresholds with k. By using the criterion of the standard error of Hill for the choice of k (equivalent of u), this choice becomes less dependent on a stable region of the Hill plot.

Figure 7.2 The threshold selection using the New Hill plot for Nidd data

184

Estimations under power normalization for the EVI

7.9 Comparison between estimators under power normalization In this section, we carry out two simulation studies, in order to compare all the estimators given in this chapter. These comparisons are based on the MSE. In several cases, which have been studied, the results showed that all the HEPs work well. In Subsection 7.9.1, we study the HEPs defined in Theorems 7.1, 7.2, and 7.6, while in Subsection 7.9.2, the HEPs defined in Theorems 7.1, 7.2, and 7.4 are studied. The material of this section is quoted from Barakat et al. (2017a) and Barakat et al. (2018).

7.9.1 The first simulation study In this subsection, we assess the performance of the HEPs defined in Theorems 7.1, 7.2, and 7.6, in making a comparison between these estimators. We consider the different values of the EVI, γ = 0.1, 0.3, 0.5, 0.7, 0.9, 1, 1.5, 2, −0.6, −0.7, −0.8, 0.9, −1, −1.5, −2. Then, for each of these true values of EVI, we generate two groups of random samples each of size 1000, from the GEVPs, P1;γ and P2;γ . Each random sample from the first group (second group) has 100 (300) observations drawn from ++ (Table 7.9); and γp+− , P1;γ , for the HEPs γp++ , with γp++≺ , γp++ , γP++≺ H , γP H +−≺ +− , γp (Table 7.10); and drawn from P2;γ , for the HEPs γp−+ , with with γp −+≺ −+ , γp (Table 7.11) and γp−− , with γp−−≺ , γp−− (Tables 7.12), respectively. γp In view of the p−max-stable property we get Pt;γ ∈ Dp (Pt;γ ), t = 1, 2. Therefore, for the HEPs defined in Theorem 7.6, we compute the corresponding HEP using the order generated data for different values of thresholds starting from the third order value to the (n − 2)th order value (n = 100, or n = 300). Moreover, for the HEPs of the form γp..≺ (or γP++≺ H ), we compute the corresponding HEP using the order generated data for different values of thresholds starting from the third order value to the greatest order value, which is less than 1. Finally, for the HEPs of the form γp.. (or γP++ H ), we compute the corresponding HEP using the order generated data for different values of thresholds starting from the least order value, which is greater than 1 to the (n − 2)th order value (n = 100, or n = 300). Furthermore, in each random sample, we select the best-value threshold (that gives the HEP, which has the closest value to the true value of γ). In Tables 7.9–7.12 we summarized the results of some comparisons between the different estimators under power normalization, that are defined in Theorems 7.1, 7.2 and 7.6, such that for each of the true selected values of γ, we wrote the average value of these best values of the computed HEs over the 1000 random samples. Moreover, these average values were attached with the corresponding MSEs. During the preparation of Tables 7.9 and 7.11, i.e., when γ > 0, it became clear to us that the largest majority of the selected thresholds are such that the range of the ratio of upper (or lower) observations, which exceed (or fail behind) the best threshold value and are used in the computation of the HEs, i.e., nk , is around 25%. Tables 7.9 and 7.11 show that the performance of all the HEs are very good and all these HEs almost have the same accuracy. On the other hand, during the preparation of Tables 7.10 and 7.12, i.e., when γ < 0, we noticed that many of the thresholds are remarkably large, some of them reached to a value such that the corresponding ratio nk is about 39%, although in this case all the HEs give good performance, except γp+−≺ and γp−−≺ .

7.9 Comparison between estimators under power normalization

185

Table 7.9 Simulation output for assessing and comparing the HEPs γp++≺ , ++ γp++ , γp++ , γP++≺ H , and γP H P1,γ , γ > 0, n = 100 Estimator γp++≺ γp++ γp++ ++≺ γP H ++ γP H γp++≺ γp++ γp++ ++≺ γP H ++ γP H γp++≺ γp++ γp++ ++≺ γP H ++ γP H γp++≺ γp++ γp++ ++≺ γP H ++ γP H γp++≺ γp++ γp++ ++≺ γP H ++ γP H γp++≺ γp++ γp++ ++≺ γP H ++ γP H γp++≺ γp++ γp++ ++≺ γP H ++ γP H γp++≺ γp++ γp++ ++≺ γP H ++ γP H

P1,γ , γ > 0, n = 300

True value of γ

Estimated values

MSE

Estimated values

MSE

0.10

0.1303 0.1002 0.1216 0.1572 0.0996

4.35E-03 4.43E-04 2.90E-03 1.11E-02 4.45E-04

0.1029 0.0999 0.0999 0.1038 0.0999

4.22E-04 4.93E-05 1.61E-04 8.42E-04 1.08E-04

0.30

0.2999 0.2987 0.3003 0.2974 0.2992

5.10E-04 5.05E-04 8.40E-04 1.51E-03 1.04E-03

0.2999 0.2998 0.3000 0.2993 0.2997

7.45E-05 6.55E-05 1.24E-04 2.12E-04 1.88E-04

0.50

0.4992 0.4982 0.5004 0.4911 0.4973

1.11E-03 8.66E-04 4.24E-04 4.09E-03 1.87E-03

0.4999 0.4999 0.5000 0.4988 0.4993

1.44E-04 9.47E-05 7.07E-05 5.98E-04 1.23E-04

0.70

0.6985 0.6982 0.7001 0.6740 0.6970

2.38E-03 9.28E-04 6.35E-05 1.08E-03 3.31E-03

0.7000 0.6998 0.7000 0.6983 0.6995

3.00E-04 1.08E-04 1.23E-05 1.50E-03 5.16E-04

0.90

0.8958 0.8959 0.8990 0.8449 0.8934

4.74E-03 1.71-E03 6.62E-04 2.40E-02 5.35E-03

0.8996 0.8996 0.8951 0.8895 0.8990

6.53E-04 1.54E-04 9.06E-04 3.33E-03 8.96E-04

1

0.9966 0.9949 0.9970 0.9249 0.9919

6.83E-03 2.24E-03 2.21E-03 3.50E-02 6.91E-03

0.9994 0.9997 0.9684 0.9881 0.9986

8.56E-04 1.88E-04 3.00E-03 5.10E-03 1.05E-03

1.5

1.4790 1.4837 1.4848 1.2413 1.4670

3.15E-02 6.02E-03 8.11E-03 1.61E-01 2.08E-02

1.4992 1.4975 1.4433 1.4195 1.4965

4.51E-03 4.80E-04 8.85E-03 3.21E-02 5.61E-03

2

1.9106 1.9554 1.9844 1.5191 1.9567

1.23E-01 1.22E-02 5.56E-03 4.23E-01 3.53E-02

1.9863 1.9923 1.9960 1.8101 1.9905

1.78E-02 1.36E-03 1.50E-03 1.05E-01 8.66E-03

186

Estimations under power normalization for the EVI

Table 7.10 Simulation output for assessing and comparing the HEPs γp+−≺ , γp+− , and γp+− P1,γ , γ < 0, n = 100

P1,γ , γ < 0, n = 300

HEPs

True value of γ

Estimated values

MSE

Estimated values

MSE

γp+−≺ γp+− γp+−

-0.60

-0.5525 -0.6077 -0.5998

8.70E-03 8.57E-03 1.06E-04

-0.5596 -0.6060 -0.6000

5.95E-03 5.03E-03 8.51E-06

γp+−≺ γp+− γp+−

-0.70

-0.4331 -0.7024 -0.6996

7.81E-02 2.51E-03 9.69E-04

-0.6902 -0.7006 -0.6998

1.64E-02 7.79E-04 9.12E-06

γp+−≺ γp+− γp+−

-0.80

-0.3912 -0.8024 -0.8004

1.63E-01 1.77E-03 1.43E-04

-0.4066 -0.8096 -0.8000

1.72E-01 3.59E-03 1.60E-05

γp+−≺ γp+− γp+−

-0.90

-0.8612 -0.9000 -0.8997

6.29E-02 4.42E-04 9.25E-05

-0.5927 -0.8995 -0.8979

9.48E-02 1.53E-05 4.91E-06

γp+−≺ γp+− γp+−

-1

-0.2361 -1.0011 -1.0003

4.23E-01 1.95E-03 1.32E-04

-0.6250 -1.0002 -0.9998

1.65E-01 3.09E-04 1.47E-05

γp+−≺ γp+− γp+−

-1.5

-0.3283 -1.4896 -1.5012

1.29E+02 7.80E-03 3.39E-04

-1.1782 -1.4908 -1.4996

1.10E-01 1.75E-04 4.57E-05

γp+−≺ γp+− γp+−

-2

-0.2125 -1.6751 -1.9997

2.86E+02 1.24E-01 4.66E-04

-1.1110 -1.8024 -2.0012

7.90E-01 5.99E-02 3.02E-05

7.9 Comparison between estimators under power normalization

187

Table 7.11 Simulation output for assessing and comparing the HEPs γp−+≺ , γp−+ , and γp−+ P2,γ , γ > 0, n = 100

P2,γ , γ > 0, n = 300

HEPs

True value of γ

Estimated values

MSE

Estimated values

MSE

γp−+≺ γp−+ γp−+

0.10

0.1127 0.1010 0.1228

1.85E-03 5.09E-04 2.91E-03

0.1002 0.0999 0.1001

9.80E-05 4.92E-05 1.58E-04

γp−+≺ γp−+ γp−+

0.30

0.2997 0.2999 0.3009

4.74E-04 4.27E-04 7.39E-04

0.2999 0.2999 0.3001

8.09E-05 6.35E-05 1.41E-04

γp−+≺ γp−+ γp−+

0.50

0.5001 0.4995 0.5008

1.28E-03 6.65E-04 3.91E-04

0.5001 0.4999 0.4998

1.37E-04 7.80E-05 7.47E-05

γp−+≺ γp−+ γp−+

0.70

0.6998 0.7001 0.7000

2.12E-03 9.86E-04 5.13E-05

0.6996 0.7002 0.6999

2.47E-04 1.30E-04 1.32E-05

γp−+≺ γp−+ γp−+

0.90

0.8991 0.8993 0.8991

4.61E-03 1.24E-03 7.39E-04

0.8998 0.8996 0.8925

5.83E-04 1.37E-04 1.10E-03

γp−+≺ γp−+ γp−+

1

0.9965 0.9995 0.9967

7.67E-03 1.32E-03 2.04E-03

1.0008 1.0001 0.9646

8.18E-04 1.66E-04 3.46E-03

γp−+≺ γp−+ γp−+

1.5

1.4886 1.4975 1.4905

3.64E-02 2.75E-03 7.93E-03

1.4967 1.5000 1.4457

3.94E-03 2.85E-04 8.77E-03

γp−+≺ γp−+ γp−+

2

1.9682 1.9984 1.9972

1.46E-02 5.54E-03 5.19E-03

1.9961 2.0001 1.9981

1.81E-02 5.39E-04 1.81E-03

188

Estimations under power normalization for the EVI

Table 7.12 Simulation output for assessing and comparing the HEPs γp−−≺ , γp−− , and γp−− P2,γ , γ < 0, n = 100

P2,γ , γ < 0, n = 300

HEPs

True value of γ

Estimated values

MSE

Estimated values

MSE

γp−−≺ γp−− γp−−

-0.60

-0.3874 -0.5982 -0.5997

3.69E-02 2.21E-02 6.00E-05

-0.5629 -0.5991 -0.5996

7.34E-03 1.56E-03 8.76E-06

γp−−≺ γp−− γp−−

-0.70

-0.6559 -0.7002 -0.6995

1.68E-02 8.38E-03 7.15E-05

-0.5643 -0.6997 -0.6998

2.28E-02 9.34E-04 9.57E-06

γp−−≺ γp−− γp−−

-0.80

-0.4837 -0.8000 -0.7988

9.72E-02 1.61E-03 7.18E-04

-0.7181 -0.8000 -0.7999

1.88E-02 2.24E-04 7.46E-06

γp−−≺ γp−− γp−−

-0.90

-0.4862 -0.9006 -0.8992

1.28E-01 1.52E-02 1.08E-03

-0.8056 -0.8998 -0.8997

2.39E-02 1.02E-03 1.06E-05

γp−−≺ γp−− γp−−

-1

-0.6500 -0.9953 -0.9999

1.16E-01 4.89E-03 3.72E-04

-0.6862 -1.0000 -0.9999

1.19E-01 5.03E-04 1.56E-05

γp−−≺ γp−− γp−−

-1.5

-0.8177 -1.4917 -1.4973

4.73E-01 6.67E-03 3.27E-04

-0.7585 -1.4993 -1.4988

5.92E-01 1.94E-04 5.62E-05

γp−−≺ γp−− γp−−

-2

-1.0402 -1.5862 -1.9872

1.04E+02 1.91E-01 8.99E-04

-1.0382 -1.8224 -1.9956

9.53E-01 3.80E-02 1.10E-04

7.9 Comparison between estimators under power normalization

189

7.9.2 The second simulation study In this subsection, we assess the performance of the HEPs defined in Theorems 7.1, 7.2, and 7.4, with making a comparison between these estimators. Specifically, the ++≺ ++≺ performance of the estimator γP++≺ compared to γP+≺ M , γP H , and γP M R is assessed ++ in Table 7.13. Moreover, the performance of the estimator γP , compared to γP+ M, ++ γP++≺ H , and γP M R is assessed in Table 7.14. Our procedure of the simulation study, which is applied in Tables 7.13 and 7.14, is as follows: 1. Consider the different values of the EVI, γ = 0.1, 0.3, 0.5, 0.7, 0.9, 1, 1.5, 2. 2. For each of these true values of EVI, generate two groups of random samples each of size 1000, from the GEVP P1;γ . Each random sample from the first group (second group) has 100 (300) observations. In view of the p−max-stable property we get P1;γ ∈ DP (P1;γ ). 3. Compute the estimates in Table 7.13 by using the order generated data for different values of the threshold starting from the third order value to the greatest order value, which is less than 1. 4. Compute the estimates treated in Table 7.14 by using the order-generated data for different values of the threshold starting from the least order value, which is greater than 1 to the (n − 2)th order value (n = 100, or n = 300). 5. In each random sample, select the best value of the threshold (that gives the estimate, which has the closest value to the true value of γ). Moreover, write the average value of these best values of the computed estimates over the 1000 random samples. 6. For quantitative comparison of the different estimators, we use a mean squarederror criterion, denoted by MSEP, which is inspired by Lei (2008). Namely, for the estimators in Table 7.14, we have MSEP =

k 1 ˆ (F (xn−i+1:n ) − Ei )2 , k i=1

where Fˆ (xn−i+1:n ) is the value of GEVP evaluated at the observed order statistic ++ value xn−i+1:n by using the estimated EVI via γP++ , or γP+ M , or γP M R , and i Ei = k+1 is the corresponding value of empirical DF (the scale parameter of the GEVP is estimated by usual MLE). On the other hand, for the estimators in Table 7.13, we have MSEP =

k 1 ˆ (F (xi:n ) − Ei )2 . k i=1

First, in Tables 7.13 and 7.14 the largest majority of the selected thresholds are such that the range of the ratio of upper (or lower) observations, which exceed (or fall behind) the best threshold value and used in the computation of the estimates, i.e., nk , is around 25%. In general, all the estimates are very good (especially for large n sample, i.e., n = 300) and all these estimates almost have the same accuracy ++ except γP++≺ M R and γP M R . ++≺ , and γP+≺ Tables 7.13 and 7.14 show that the performance of γP++≺ H , γP M is worse ++ + , γ , and γ , respectively, while the performance than the performance of γP++ H P PM ++ + of γP++≺ M R is better than the performance of γP M R . Moreover, the estimators γP M ,

190 γP++ ,

Estimations under power normalization for the EVI γP++ H

and have the best performance. In addition, the performance of γP+ M is the best (especially for large n sample, i.e., n = 300), in the range γ > 0.5, while ++ the performance of γP M R is the worst. However, within the estimates given in Table 7.14, the estimate γP+≺ M is the best in the range γ > 0.5, followed by the estimate γP++≺ , while in the range γ < 0.5, the estimate γP++≺ is the best, followed by the estimate γP+≺ M . On the other hand, by comparing the performance of the estimates in Tables 7.13 and 7.14 and those in Table 6.2, we find that the performance of the estimators of the EVI under power normalization is more stable than the performance of the estimators under linear normalization in the sense that the performance of the first estimators is not affected by changing the value of the EVI, as in the case of second estimators. The simulation study given in this subsection was carried out only for assessing the performance of several Hill estimators in the P-model. Therefore, we set aside the effect of the choice of the suitable threshold, by repeating the simulating study 1000 times and computing the mean average of the best-estimated values of the EVI. The main reason for the avoidance of the effect of the threshold is that the choice of a suitable threshold is not always clear, because it is subjective. Actually, the success of the Hill plot technique depends on the identification of a stable region, where estimates of the tail index do not change too much over a number of different exceedances. + +≺ Now, we apply the Hill plot technique to the estimators γP+≺ M , γP M , γP H:2 , and + γP H:2 and compare the resulted estimates with the corresponding estimates given in Tables 7.13 and 7.14. To do that we first select a random sample (of size 300) out of 1000 generated random samples for the true value γ = 0.7 in Tables 7.13 and + +≺ + 7.14, for each couple (γP+≺ M , γP M ) and (γP H:2 , γP H ), and then apply this technique to these estimators. For the first couple, the total number of observations, which are greater than 1, is 180, while the number of observations, which are less than 1, is 120. The plot {(k, γP+≺ M ) : 3 ≤ k ≤ 120}, in Figure 7.3, shows that the suitable threshold is at k = 19, the value of the corresponding threshold is 0.4632 and the corresponding estimate value of γ is 0.6976 (the corresponding value in Table 7.13 is 0.6972). The plot {(k, γP+ M ) : 1 ≤ k ≤ 178} in Figure 7.4, shows that the suitable threshold is at k = 52, the value of the corresponding threshold is 1.8091 and the corresponding estimate value of γ is 0.7022 (the corresponding value in Table 7.13 is 0.6982). For the second couple, the total number of observations, which are greater than 1, is 200, while the number of observations, which are less than 1, is 100. The plot {(k, γP+≺ H ) : 3 ≤ k ≤ 100}, in Figure 7.5, shows that the suitable threshold is at k = 94, the value of the corresponding threshold is 0.8454 and the corresponding estimate value of γ is 0.7064 (the corresponding value in Table 7.13 is 0.6740). The plot {(k, γP+ H:2 ) : 1 ≤ k ≤ 198}, in Figure 7.6, shows that the suitable threshold is at k = 9, the value of the corresponding threshold is 1.0851, and the corresponding estimate value of γ is 0.6942 (the corresponding value in Table 7.13 is 0.6970).

7.9 Comparison between estimators under power normalization

191

Table 7.13 Simulation output for assessing and comparing the estimators +≺ ++≺ γP++≺ , γP++≺ H , γP M , and γP M R P1,γ , γ > 0, n = 100 Estimator ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R ++≺ γP ++≺ γP H +≺ γP M ++≺ γP M R

True value of γ

0.10

0.30

0.50

0.70

0.90

1

1.50

2

P1,γ , γ > 0, n = 300

Estimated values

MSEP

Estimated values

MSEP

0.1539 0.1572 0.4229 0.2593

9.28E-03 1.11E-02 1.20E-01 5.07E-02

0.1038 0.1038 0.3509 0.1513

7.31E-04 8.42E-04 6.93E-02 1.01E-02

0.2978 0.2974 0.4027 0.2961

1.22E-03 1.51E-03 1.93E-02 9.84E-03

0.2994 0.2993 0.3363 0.2920

1.90E-04 2.12E-04 3.87E-03 3.16E-03

0.4958 0.4911 0.4967 0.4746

2.62E-03 4.09E-03 2.28E-03 1.07E-02

0.4996 0.4988 0.4998 0.4898

2.79E-04 5.98E-04 3.59E-04 5.14E-03

0.6828 0.6740 0.6972 0.6480

6.79E-03 1.08E-03 1.38E-03 2.04E-02

0.6972 0.6983 0.6996 0.6863

7.16E-04 1.50E-03 1.33E-04 6.40E-03

0.8504 0.8449 0.8934 0.8379

1.63E-02 2.40E-02 2.63E-03 3.22E-02

0.8955 0.8895 0.8994 0.8705

1.92E-03 3.33E-03 2.42E-04 1.01E-02

0.9347 0.9249 0.9824 0.9235

2.58E-02 3.50E-02 5.03E-03 3.88E-02

0.9901 0.9881 0.9989 0.9728

3.18E-03 5.10E-03 4.88E-04 9.38E-03

1.2627 1.2413 1.3974 1.3227

1.33E-01 1.61E-01 3.90E-02 9.69E-02

1.4369 1.4195 1.4790 1.4400

2.20E-02 3.21E-02 5.32E-03 2.25E-02

1.5678 1.5191 1.7119 1.7609

3.56E-01 4.23E-01 1.73E-01 1.78E-01

1.8414 1.8101 1.9117 1.8960

8.40E-02 1.05E-01 3.13E-02 4.35E-02

192

Estimations under power normalization for the EVI

Table 7.14 Simulation output for assessing and comparing the estimators + ++ γP++ , γP++ H , γP M , and γP M R P1,γ , γ > 0, n = 100 Estimator ++ γP ++ γP H + γP M ++ γP M R ++ γP ++ γP H + γP M ++ γP M R ++ γP ++ γP H + γP M ++ γP M R ++ γP ++ γP H + γP M ++ γP M R ++ γP ++ γP H + γP M ++ γP M R ++ γP ++ γP H + γP M ++ γP M R ++ γP ++ γP H + γP M ++ γP M R ++ γP ++ γP H + γP M ++ γP M R

True value of γ

0.10

0.30

0.50

0.70

0.90

1

1.50

2

P1,γ , γ > 0, n = 300

Estimated values

MSEP

Estimated values

MSEP

0.1004 0.0996 0.3104 0.1080

8.03E-04 4.45E-04 5.59E-02 8.67E-03

0.0999 0.0999 0.2714 0.0953

8.85E-05 1.08E-04 3.30E-02 1.20E-03

0.2995 0.2992 0.3234 0.2721

7.22E-04 1.04E-03 7.25E-03 1.18E-02

0.2998 0.2997 0.2781 0.2841

1.31E-04 1.88E-04 1.21E-03 4.72E-03

0.4987 0.4973 0.4963 0.4314

1.40E-03 1.87E-03 3.12E-03 2.71E-02

0.4998 0.4993 0.4998 0.4732

1.87E-04 1.23E-04 3.74E-04 1.08E-02

0.6973 0.6970 0.6982 0.5964

2.33E-03 3.31E-03 2.79E-03 5.32E-02

0.6999 0.6995 0.6998 0.6521

2.60E-04 5.16E-04 2.53E-04 2.21E-02

0.8964 0.8934 0.8973 0.7501

3.84E-03 5.35E-03 2.49E-03 8.75E-02

0.8994 0.8990 0.8997 0.8259

3.58E-04 8.96E-04 3.09E-04 3.74E-02

0.9946 0.9919 0.9973 0.8361

5.33E-03 6.91E-03 3.00E-03 1.09E-01

0.9992 0.9986 0.9998 0.9187

4.10E-04 1.05E-03 2.54E-04 4.78E-02

1.4783 1.4670 1.4887 1.1969

1.31E-02 2.08E-02 6.68E-03 2.71E-01

1.4964 1.4965 1.4989 1.3171

1.35E-03 5.61E-03 6.54E-04 1.28E-01

1.9414 1.9567 1.9760 1.6003

2.72E-02 3.53E-02 1.45E-02 4.66E-01

1.9870 1.9905 1.9962 1.7203

3.15E-03 8.66E-03 1.43E-03 2.40E-01

7.9 Comparison between estimators under power normalization

Figure 7.3 The Hill plot of γP+≺ M , with k=19

Figure 7.4 The Hill plot of γP+ M , with k=51

Figure 7.5 The Hill plot of γP+≺ H , with k=94

193

194

Estimations under power normalization for the EVI

Figure 7.6 The Hill plot of γP+ H , with k=9

7.10 The weighting between the linear and power models The problem of weighting between the linear and power models itself is challenging. For this purpose, the suggested MSEL and MSEP criteria may not be suitable, since we have two different models. Instead of this, we suggest the coefficient variation criterion (CVC) ) k  1 (Fˆ (xn−i+1:n ) − Ei )2  k MSE(L or P) i=1 = , (7.18) CVC(L or P) = k k   1 1 ˆ (xn−i+1:n ) ˆ (xn−i+1:n ) F F k k i=1

i=1

where Fˆ (xn−i+1:n ) is the value of GEVL (or GEVP) evaluated at the observed i order statistic value xn−i+1:n and Ei = k+1 is the corresponding value of empirical DF. In order to test the efficiency of this new criterion for weighting between the + ) and power (with EVI estimator γP++ ) models, we linear (with EVI estimator γM + carry out a simulation study test (the choice of the two estimators γM and γP++ is according to the result of the next chapter, where we apply this criterion on a real data set for which these two estimators gave the best estimate value for the EVI). The idea of this test is to generate a random sample of size n = 100 from GEVL (2.8) with the location and scale parameters μ = 7 and σ = 1, respectively, for each value of γ = 0.10, 0.15, 0.20, ..., 2.55, 2.60 (Table 7.15). On the other hand, we also generate a random sample of size n = 100 from the GEVP (2.32) for each value of γ = 0.10, 0.15, 0.20, ..., 2.55, 2.60 (Table 7.16). For each sample, we calculate the value of CVCL and CVCP. This process is repeated 1000 times. Therefore, we write the average value of the computed CVCL and CVCP over these 1000 replications. In order to compute the HEPs to estimate γ, we choose only the best threshold, i.e., the threshold that gives the closest estimate value to the true value of γ. To estimate the parameters μ and σ, we use the usual MLE. A good performance test for this suggested criterion is now dependent on verification of the two inequalities

7.10 The weighting between the linear and power models

195

CVCL < CVCP in Table 7.15 and CVCL > CVCP in Table 7.16. We chose the maximum range of γ that satisfies the two preceding inequalities as the admissible range, in which the criterion well works. This admissible range based on the present simulation study is 0.10 ≤ γ ≤ 0.95. In the next chapter, we will apply this criterion to a real data set, which was previously studied (see Embrechts et al., 1997) and it was revealed that the estimated value of its EVI falls within this admissible range. It is worth mentioning that the admissible range, in which this criterion works well, may vary with changing the considered estimators. For example, we consider the two estimators γ++ + and γP++ , respectively instead of γM and γP++ and apply the same simulation study test, the simulation results are presented in Table 7.17 for the data under linear normalization and Table 7.18 for the data under power normalization. For 0.30 ≤ γ ≤ 1.35 and the data is under linear normalization, the new criterion works very well, whereas for γ < 0.30 or γ > 1.35, the criterion does not work. Also, when the data is under power normalization, the criterion works well for 0.30 ≤ γ ≤ 1.10 except for γ < 0.30 and γ > 1.10, for which it does not work well. In general, we get a new admissible range 0.30 ≤ γ ≤ 1.10. This fact means that this criterion needs to be further practically studied. In order to ease the detection of the admissible range for the suggested criterion, the results of the above tables may be depicted as graphs. For example, Tables 7.15 and 7.16 are summarized in Figures 7.7 and 7.8, where a good performance test for the suggested criterion is dependent on verification of the two inequalities CVCL < CVCP in Figure 7.7 and CVCL > CVCP in Figure 7.8 (since CVCL > CVCP, only when 0.1 ≤ γ ≤ 0.95, the values of the parameter γ in Figure 7.8 are depicted until 1.6). We chose the maximum range of γ that satisfies the two preceding inequalities as the admissible range, in which the criterion works well. The admissible range based on the present simulation study is 0.10 ≤ γ ≤ 0.95. Remark In this simulation study, we used the MSE as the most widely used risk function to construct a criterion for the quantitative comparison between the Lmodel and P-model. Although, this study shows that this criterion works well and is reliable, it is not robust in the sense that it depends strongly on the value of the EVI and also depends on the Hill estimators that we use. Clearly, we can use any other loss function such as the absolute error loss, but it seems that any such quantitative criterion would not be robust. The main reason for this drawback is that all the Hill estimators for either the L-model or the P-model suffer from this drawback, i.e., their performance depends on the value of the EVI.

196

Estimations under power normalization for the EVI

+ Table 7.15 Simulation output for assessing the criterion CVC for γM Samples generated from GEVL (0.10 ≤ γ ≤ 2.60, μ = 7, σ = 1,): n=100, replicates= 1000 γ

CVCL

CVCP

γ

CVCL

CVCP

γ

CVCL

CVCP

0.10

5.22E-01

5.55E-01

0.95

3.54E-01

4.67E-01

1.80

8.63E-01

4.65E-01

0.15

5.21E-01

5.72E-01

1.00

3.48E-01

4.66E-01

1.85

9.04E-01

4.61E-01

0.20

5.19E-01

5.71E-01

1.05

3.36E-01

4.66E-01

1.90

9.51E-01

4.60E-01

0.25

5.19E-01

5.60E-01

1.10

3.45E-01

4.71E-01

1.95

1.04E+00

4.57E-01

0.30

5.06E-01

5.43E-01

1.15

3.67E-01

4.64E-01

2.00

1.22E+00

4.55E-01

0.35

5.04E-01

5.35E-01

1.20

3.76E-01

4.66E-01

2.05

1.22E+00

4.55E-01

0.40

4.93E-01

5.32E-01

1.25

3.32E-01

4.60E-01

2.10

1.46E+00

4.54E-01

0.45

4.81E-01

5.24E-01

1.30

3.47E-01

4.69E-01

2.15

1.35E+00

4.48E-01

0.50

4.71E-01

5.21E-01

1.35

3.61E-01

4.65E-01

2.20

1.87E+00

4.48E-01

0.55

4.62E-01

5.17E-01

1.40

3.96E-01

4.68E-01

2.25

1.72E+00

4.45E-01

0.60

4.32E-01

5.03E-01

1.45

4.03E-01

4.67E-01

2.30

1.83E+00

4.41E-01

0.65

4.24E-01

4.96E-01

1.50

4.58E-01

4.61E-01

2.35

2.09E+00

4.37E-01

0.70

4.15E-01

4.93E-01

1.55

4.87E-01

4.67E-01

2.40

2.22E+00

4.33E-01

0.75

4.03E-01

4.88E-01

1.60

5.65E-01

4.69E-01

2.45

2.26E+00

4.30E-01

0.80

3.88E-01

4.77E-01

1.65

5.40E-01

4.66E-01

2.50

2.56E+00

4.28E-01

0.85

3.80E-01

4.70E-01

1.70

7.40E-01

4.64E-01

2.55

2.64E+00

4.28E-01

0.90

3.70E-01

4.70E-01

1.75

7.46E-01

4.63E-01

2.60

2.66E+00

4.24E-01

7.10 The weighting between the linear and power models

197

Table 7.16 Simulation output for assessing the criterion CVC for γP++ Samples generated from GEVP(0.10 ≤ γ ≤ 2.60): n=100, replicated=1000 γ

CVCL

CVCP

γ

CVCL

CVCP

γ

CVCL

CVCP

0.10

5.34E-01

5.24E-01

0.95

5.48E-01

4.86E-01

1.80

6.16E-01

2.74E+00

0.15

5.38E-01

5.17E-01

1.00

4.87E-01

5.62E-01

1.85

6.27E-01

3.27E+00

0.20

5.41E-01

5.11E-01

1.05

4.91E-01

5.81E-01

1.90

6.31E-01

3.95E+00

0.25

5.34E-01

5.03E-01

1.10

4.96E-01

6.06E-01

1.95

6.39E-01

4.44E+00

0.30

5.31E-01

4.91E-01

1.15

4.99E-01

6.22E-01

2.00

6.38E-01

4.97E+00

0.35

5.29E-01

4.84E-01

1.20

5.07E-01

6.36E-01

2.05

6.47E-01

5.38E+00

0.40

5.21E-01

4.76E-01

1.25

5.08E-01

6.73E-01

2.10

6.53E-01

6.27E+00

0.45

5.20E-01

4.72E-01

1.30

5.25E-01

7.12E-01

2.15

6.59E-01

7.18E+00

0.50

5.11E-01

4.74E-01

1.35

5.33E-01

7.51E-01

2.20

6.70E-01

7.82E+00

0.55

5.05E-01

4.75E-01

1.40

5.36E-01

8.21E-01

2.25

6.73E-01

9.26E+00

0.60

5.05E-01

4.73E-01

1.45

5.52E-01

9.20E-01

2.30

6.72E-01

9.88E+00

0.65

4.99E-01

4.79E-01

1.50

5.59E-01

1.01E+00

2.35

6.81E-01

1.10E+01

0.70

4.95E-01

4.90E-01

1.55

5.71E-01

1.18E+00

2.40

6.96E-01

1.25E+01

0.75

5.04E-01

4.91E-01

1.60

5.85E-01

1.40E+00

2.45

7.00E-01

1.37E+01

0.80

5.06E-01

4.88E-01

1.65

5.94E-01

1.71E+00

2.50

7.06E-01

1.53E+01

0.85

5.18E-01

4.83E-01

1.70

6.03E-01

2.02E+00

2.55

7.12E-01

1.56E+01

0.90

5.37E-01

4.88E-01

1.75

6.10E-01

2.27E+00

2.60

7.19E-01

1.79E+01

198

Estimations under power normalization for the EVI

Table 7.17 Simulation output for assessing the criterion CVC for γ++ Samples generated from GEVL (0.10 ≤ γ ≤ 2.60, μ = 7, σ = 1,): n=100, replicates= 5000 Shape

CVCL

CVCP

Shape

CVCL

CVCP

Shape

CVCL

CVCP

0.10

5.51E-01

5.71E-01

0.95

3.02E-01

4.11E-01

1.80

7.69E-01

3.83E-01

0.15

5.60E-01

5.47E-01

1.00

3.04E-01

4.04E-01

1.85

8.40E-01

3.81E-01

0.20

5.54E-01

5.39E-01

1.05

2.92E-01

4.01E-01

1.90

9.28E-01

3.77E-01

0.25

5.34E-01

5.30E-01

1.10

2.92E-01

4.03E-01

1.95

1.03E+00

3.76E-01

0.30

5.06E-01

5.22E-01

1.15

3.14E-01

3.97E-01

2.00

1.09E+00

3.76E-01

0.35

4.82E-01

5.09E-01

1.20

3.13E-01

3.94E-01

2.05

1.19E+00

3.74E-01

0.40

4.58E-01

4.99E-01

1.25

3.14E-01

3.95E-01

2.10

1.25E+00

3.73E-01

0.45

4.38E-01

4.88E-01

1.30

3.43E-01

3.89E-01

2.15

1.32E+00

3.75E-01

0.50

4.17E-01

4.75E-01

1.35

3.69E-01

3.88E-01

2.20

1.45E+00

3.71E-01

0.55

3.99E-01

4.67E-01

1.40

4.02E-01

3.90E-01

2.25

1.59E+00

3.71E-01

0.60

3.83E-01

4.57E-01

1.45

4.16E-01

3.89E-01

2.30

1.61E+00

3.69E-01

0.65

3.65E-01

4.45E-01

1.50

4.85E-01

3.84E-01

2.35

1.76E+00

3.68E-01

0.70

3.53E-01

4.37E-01

1.55

4.92E-01

3.84E-01

2.40

1.77E+00

3.67E-01

0.75

3.42E-01

4.30E-01

1.60

5.56E-01

3.86E-01

2.45

1.89E+00

3.68E-01

0.80

3.28E-01

4.24E-01

1.65

6.00E-01

3.85E-01

2.50

2.06E+00

3.64E-01

0.85

3.20E-01

4.20E-01

1.70

6.63E-01

3.83E-01

2,55

2.06E+00

3.66E-01

0.90

3.14E-01

4.17E-01

1.75

7.42E-01

3.80E-01

2.60

2.23E+00

3.61E-01

7.10 The weighting between the linear and power models

199

Table 7.18 Simulation output for assessing the criterion CVC for γP++ Samples generated from GEVP(0.10 ≤ γ ≤ 2.60): n=100, replicated=5000 Shape

CVCL

CVCP

Shape

CVCL

CVCP

Shape

CVCL

CVCP

0.10

5.19E-01

5.35E-01

0.95

3.06E-01

2.77E-01

1.80

2.21E+00

3.40E-01

0.15

5.09E-01

5.21E-01

1.00

3.05E-01

2.85E-01

1.85

2.76E+00

3.36E-01

0.20

4.92E-01

4.99E-01

1.05

2.98E-01

2.89E-01

1.90

3.21E+00

3.32E-01

0.25

4.74E-01

4.75E-01

1.10

2.96E-01

2.96E-01

1.95

3.94E+00

3.30E-01

0.30

4.59E-01

4.47E-01

1.15

2.95E-01

2.99E-01

2.00

4.50E+00

3.29E-01

0.35

4.44E-01

4.20E-01

1.20

2.93E-01

3.09E-01

2.05

5.34E+00

3.23E-01

0.40

4.28E-01

3.93E-01

1.25

2.93E-01

3.14E-01

2.10

5.98E+00

3.23E-01

0.45

4.15E-01

3.69E-01

1.30

3.05E-01

3.18E-01

2.15

6.79E+00

3.22E-01

0.50

4.01E-01

3.47E-01

1.35

3.21E-01

3.27E-01

2.20

7.89E+00

3.17E-01

0.55

3.88E-01

3.27E-01

1.40

3.65E-01

3.29E-01

2.25

8.83E+00

3.16E-01

0.60

3.75E-01

3.12E-01

1.45

4.16E-01

3.89E-01

2.30

9.99E+00

3.14E-01

0.65

3.66E-01

2.97E-01

1.50

5.37E-01

3.37E-01

2.35

1.12E+01

3.13E-01

0.70

3.52E-01

2.86E-01

1.55

6.75E-01

3.38E-01

2.40

1.24E+01

3.13E-01

0.75

3.42E-01

2.81E-01

1.60

8.77E-01

3.40E-01

2.45

1.36E+01

3.11E-01

0.80

3.35E-01

2.76E-01

1.65

1.11E+00

3.42E-01

2.50

1.48E+01

3.11E-01

0.85

3.24E-01

2.74E-01

1.70

6.63E-01

3.83E-01

2,55

1.67E+01

3.10E-01

0.90

3.17E-01

2.77E-01

1.75

1.83E+00

3.40E-01

2.60

1.73E+01

3.12E-01

Figure 7.7 Samples generated from GEVL (0.10 ≤ γ ≤ 2.60, μ = 7, σ = 1): n=100, replicates=1000

200

Estimations under power normalization for the EVI

Figure 7.8 Samples generated from GEVP(0.10 ≤ γ ≤ 2.60): n=100, replicated=1000

7.11 Summary and conclusion In extreme value analysis, the main problem is the estimation of the tail index or the extreme value index (EVI), which is the primary parameter of extreme events. One of the foremost aims of this book is focusing on the estimators of the tail index under power normalization. We presented a simulation study, which shows that the counterparts of Hill estimators under the power normalization suggest an efficient technique for estimating the tail index under power normalization. The results show that all the Hill estimators have good performance, except γp+−≺ and γp−−≺ . Moreover, more compact and adaptive four Hill estimators under the power model were suggested and derived based on the GPDP. A comprehensive simulation study using the R-package shows that all the HEPs work well. In the cases of both positive and negative data, the results are good. In general, the range of upper observations of γ > 0 is 0.04 ≤ γ ≤ 0.28 and 0.18 ≤ γ ≤ 0.39 of γ < 0. We also made a comparison between these compact and adaptive Hill estimators under power normalization and the eight counterparts of Hill estimators under power normalization to provide an adaptive choice of k, which is based on the MSE criterion. Meanwhile, we summarized the results of the study, when γ > 0 we found that the most majority of the selected thresholds are such that the range of the ratio of upper (or lower) observations, which exceed (or fail behind) the best threshold value and used in the computation of the HEPs, i.e., nk , is around 25%. Moreover, the result shows that all the HEPs have a very good performance and all these HEPs almost have the same accuracy. On the other hand, during the preparation of the results when γ < 0 we monitored many of the selected thresholds, which are remarkably large, some of them reached to a value such that the corresponding ratio nk , is about 45%; although all HEPs in this case have a good performance, except γp+−≺ and γp−−≺ . For assessing the performance of the estimators under power normalization of harmonic t-Hill estimators, moment and moment-ratio estimators with making a comparison between them and other Hill estimators, we considered the different values of the EVI in the simulation study to determine the best estimator. In general, all the estimators are very good and all these estimators almost have

7.11 Summary and conclusion γP++≺ M R.

201

The results of this study showed that the perthe same accuracy except formance of γP+≺ M is the best (especially for large n sample, i.e. n = 300), in the range γ  0.5, while in the range γ  0.7 when γP+≺ M . We adapted some graphical methods to be used in the P-model to select threshold before fitting the GPDP. Furthermore, we also extended the graphical method of a mean excess plot into the P-model (MEPP) to allow threshold choice. It is easy to observe that the estimation of the EVI profoundly depends on the number k of upper order observations (or lower order observations) used in this estimation. No hard and fast rule exists for confronting the problem of choosing the optimal value of the threshold. Actually, the threshold choice involves balancing between the bias and variance. The threshold must be sufficiently high to ensure the GPDP approximation is reliable, thus reducing the bias. However, the high threshold implies the reduction of sample size, which in turn leads to the increase of the variance of the parameter estimates. Theoretically, k should satisfy the condition k = o(n) −→ ∞. n When the extreme value models under linear normalization fail to fit the data, it is natural to switch to test the extreme value models under power normalization, but when the two models fit the data, the real challenge is weighting one of the two models. This problem is tackled by suggesting the coefficient variation criterion for the weighting between the linear and power models. A simulation study was conducted to confirm the validity of this criterion and to detect the admissible range of γ in which the suggested criterion works well. Future work will have to focus on further study of this criterion and on the issue of choosing the optimal value of threshold. In Appendix A, which is mostly quoted from Alaswed (2015), several estimators of the EVI in the L-model and P-model are summarized in Tables A.1–A.5.

8 Some applications to real data examples

In this chapter, we manipulate two sets of real data, on which we apply the contemporaneous methods that were developed in the preceding chapters. The first set is related to air pollution, where we reconsider, from a new perspective, the same data that we analyzed in Chapter 4. However, this data is treated by applying the new techniques given in preceding chapters. The second set of real data pertains the insurances affair. In our study of this chapter, we apply two approaches. The first approach is the BM technique for the GEVL model, where we use the MLE for the parameter estimation of the GEVL. Moreover, we use the likelihood ratio test to detect which type of GEVL is appropriate for our data. The second approach is the POT technique for the GPDL and GPDP models, where we use some graphical methods to select suitable threshold values. For testing the goodness of fit of the GPDL and GPDP models, we use the Gomes and van Monfort test (cf. Gomes and van Monfort, 1986). Furthermore, we compare the performance of different methods of the threshold selection, which depend on the MSE, to select the best methods among them. Finally, for weighting between the linear and power models, we apply the suggested coefficient variation criterion that was presented in Section 7.10.

8.1 The first application to real data-related air pollution In this section, we reconsider two real data sets of two air pollutants, Sulphur Dioxide (SO2 ) and Particulate Matter (PM10). These data sets were analyzed in Chapter 4 by using only the conventional L-model. We treat again these data sets by applying the L-model as well as the P-model. Initially, it is useful to have an overview of the statistical behaviour of the given data. Actually, as the starting point before applying the Gumbel’s approach (the BM technique), it is useful to have an idea about the tail index of the underlying DF F. Table 8.1 provides a summary of some statistical properties of each of the pollutants PM10 and SO2 in the two cities, 10th of Ramadan and Zagazig. Note that in Table 8.1, the number of blocks sometimes is less than 365 (accordingly, the size of the generated samples is less than 365 × 24 = 8760) this is due to the inactivation and maintenance of the monitor devices in some hours on some days. However, Table 8.1 shows that the data of daily pollution has a positive skewness and the kurtosis is higher than 3, which suggests heavy tails. Therefore, there is a strong reason to think that the DFs of these pollutants belong to max-domain of attraction of Fr´echet (heavy tail,

Pollutant

PM10

PM10

SO2

SO2

City

10th of Ramadan

Zagazig

10th of Ramadan

Zagazig

360

356

361

364

Number of maxima

4.256

18.89

99.57

143

Min

218.1

422.9

1151

1219

Max

28.06

95.49

222

275.5

Median

30.41

108.6

242.4

306.5

Mean

20.73

60.22

108.71

140.88

STD

3.65

1.97

3.22

3.33

Skewness

Table 8.1 Statistics summary of the two pollutants in the two cities

23.68

5.93

19.44

14.88

Kurtosis

204

Some applications to real data examples

γ > 0). A preliminary graphical analysis may help us to guess which type of right tail is probably suitable for our data, or, in other words, which one of the three types of extreme value distributions is more suitable to describe the data. The return level plot (RLP) is used as a diagnostic model (see Coles, 2001). In all cases, the plot of RLP, in Figure 8.1, shows that all the DFs belong to Fr´echet max-domain of attraction, i.e., they have heavy tails. Hence, it may be reasonable to fit a GEVD with Fr´echet distribution (γ > 0) for the BM of the two pollutants in the two cities. Moreover, for this proposed probability distribution we apply the MLE to estimate

Figure 8.1 Return level plot of two pollutants in the two cities its shape parameter (i.e., to get the point estimates). We summarize these results in Table 8.2. The point estimates of the shape parameter for all pollutants are positive and indicate that in each case, the distribution of BM follows the Fr´echet distribution (right tail). Clearly, it is desirable now to use some diagnostic procedures to decide how well the models fit the data, and we consider some of these procedures here. The diagnostic plot for fitting the BM data of the PM10 is shown in Figures 8.2 and 8.3, while this plot for SO2 is shown in Figures 8.4 and 8.5, in the two cities, respectively. Moreover, we present four different diagnostic plots, P-P-plot, Q-Q-plot, RLP, and density plot for the PM10 and SO2 pollutants in the two cities. From the attached figures, it is easy to note that the P-P-plot indicates that the proposed model fits well with our data. Also, the Q-Q plot is roughly a straight line, so the model can be considered to be a reasonably good fit the given data. In addition, in all the cases, the RLP shows that the Fr´echet distribution (i.e., γ > 0) well fits the BM data of the two pollutants in the two cities. Finally, Table 8.2 (the 3rd column) shows that the estimates of the shape parameter are positive which suggests that the data sets have Fr´echet distributions. In order to check whether the Fr´echet DF is a suitable model for our data or not,

8.1 The first application to real data-related air pollution

205

we compare the Gumbel and Fr´echet families, by using the likelihood ratio (LR) statistic (see Section 1.2 and equation (1.3)) to test which of the two distributions is the better suited model for the data. The null hypothesis is “The distribution is Gumbel (i.e., γ ∼ = 0),” while the alternative is “The distribution is Fr´echet (i.e., γ  0).” The LR statistic, in Table 8.3, is less than 1%, so in all the cases we reject the null hypothesis, and we conclude that the Fr´echet distribution is reasonable for these data sets.

Table 8.2 ML estimates for the shape parameter of the two pollutants in the two cities City

Pollutant

Parameter

Estimate

SE

The type of GEVL

10th of Ramadan

PM10

μ σ γ

248.21 66.91 0.2152

3.89 3.08 0.037

Fr´ echet(γ > 0)

Zagazig

PM10

μ σ γ

195.73 66.27 0.1121

3.90 2.94 0.036

Fr´ echet(γ > 0)

10th of Ramadan

SO2

μ σ γ

81.26 38.51 0.1187

2.28 1.72 0.038

Fr´ echet(γ > 0)

Zagazig

SO2

μ σ γ

21.60 11.63 0.1520

0.69 0.53 0.041

Fr´ echet(γ > 0)

Table 8.3 The LRT of the two pollutants, p-value, and decision Pollutant

PM10

SO2

City

10th of Ramadan

Zagazig

10th of Ramadan

Zagazig

L0

2176.898

2111.801

1891.251

1492.802

L1

2148.688

2104.995

1885.189

1481.304

LRT

56.420

13.611

12.124

22.996

Hypothesis

H0 : γ = 0 H1 : γ > 0

H0 : γ = 0 H1 : γ > 0

H0 : γ = 0 H1 : γ > 0

H0 : γ = 0 H1 : γ > 0

p-value

5.85E-05

2.24E-04

4.97E-04

1.62E-06

Decision

Accept H1

Accept H1

Accept H1

Accept H1

206

Some applications to real data examples Quantile Plot

0.0

0.2

0.4

0.6

0.8



1.0





1000 600

Empirical

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

200

Model

0.0

0.4

0.8

Probability Plot

●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

200

400

600

800 1000

Empirical

Model

Return Level Plot

Density Plot

1e−01

1e+01 Return Period

0.000

f(z)

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.003

1000 600 200

Return Level

● ● ● ●●●

1e+03

● ●● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ●● ●●● ● ●● ●●● ● ●● ● ●● ● ●● ●● ● ●● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ●● ●●●● ●●

●●

200

1000

600





z

Figure 8.2 Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) of PM10 in 10th of Ramadan

8.1 The first application to real data-related air pollution Quantile Plot

0.0

0.2

0.4

0.6

0.8

1000

● ● ●●

600

Empirical

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

200

Model

0.0

0.4

0.8

Probability Plot

1.0

●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

100

300

Empirical

700

Density Plot

1000



1e−01

f(z)

0.000

●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.003

● ●●

600

Return Level

Return Level Plot

200

500 Model

1e+01

1e+03

● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ●●● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●

0 200

Return Period

● ●



600



1000

z

Figure 8.3 Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) PM10 in Zagazig

0.2

0.4

0.6

0.8

300 100

Empirical



● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1.0

100

200

300

Model

Return Level Plot

Density Plot

● ●

1e−01

1e+01 Return Period

f(z) 1e+03

400

0.008

Empirical

●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



●● ●●●

0.004

400 200

Return Level

0.0

0

Quantile Plot

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.000

Model

0.0

0.4

0.8

Probability Plot

● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●●●●● ●

0

100

200

●● ●

300



400

z

Figure 8.4 Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) SO2 in 10th of Ramadan

207

208

Some applications to real data examples Quantile Plot

0.0

0.2

0.4

0.6

0.8

150

Empirical



50

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0

Model

0.0

0.4

0.8

Probability Plot

1.0

● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0

20

60

Empirical

1e+01

0.030 f(z)

0.015



0.000

150

Return Level

50 0

Return Period

100

Density Plot



1e−01



Model

Return Level Plot

● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●





1e+03

● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ●

0

50

100

150



200

z

Figure 8.5 Four different plots: P-P plot, Q-Q plot, return level plot, and density plot of a daily period (24 hours) SO2 in Zagazig

8.2 Graphical methods to select the threshold of the given application In this section, we apply two graphical methods to choose an appropriate threshold for the linear and power models. The first method is the Hill plot (HP) and the second is the mean excess plot.

8.2.1 HPL and HPP to select a suitable threshold Figures 8.6–8.9 show the scatter plots of the HEL γ++ , HEPs γp++ , and γp++ for the two pollutants PM10 and SO2 in the two cities 10th of Ramadan and Zagazig. In all these figures, the red horizontal solid line, which traverses the figure, represents the true value of the shape parameter γ, and the blue horizontal dashed lines, which meet the graph, indicate the values of the Hill estimators of γ. Moreover, the vertical blue dashed line represents the value of k, which determines the upper observations that are used in the calculations of the estimates. Table 8.4 summarizes the results of these estimators. Clearly, in all the cases, the results suggest a heavy tail behaviour of the GPDL and GPDP.

Pollutants PM10

PM10

SO2

SO2

City

10th of Ramadan

Zagazig

10th of Ramadan

Zagazig

0.1520

0.1187

0.1121

0.2152

Shape of BM

K 200 8500 248 120 6981 50 35 5986 79 7 5383 11

Type of HP HPL of γ++ HPP of γp++ HPP of γp++ HPL of γ++ HPP of γp++ HPP of γp++ HPL of γ++ HPP of γp++ HPP of γp++ HPL of γ++ HPP of γp++ HPP of γp++

0.001 0.624 0.001

0.004 0.701 0.009

0.014 0.807 0.006

0.023 0.974 0.028

k/n

121.562 12.256 105.868

225.834 29.792 179.550

336 76 378

412 75 369

Threshold

Table 8.4 Threshold selection by using HPL and HPP

0.1828 0.1516 0.1954

0.1881 0.1693 0.1973

0.1776 0.1350 0.1956

0.3095 0.1749 0.2803

Estimated values

210

Some applications to real data examples

Figure 8.6 Selection of the threshold of PM10 in 10th of Ramadan. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++

Figure 8.7 Selection of the threshold of PM10 in Zagazig. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++

8.2 Graphical methods to select the threshold of the given application

211

Figure 8.8 Selection of the threshold of SO2 in 10th of Ramadan. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++

Figure 8.9 Selection of the threshold of SO2 in Zagazig. The left, middle, and right panels indicate respectively γ++ , γP++ , and γP++

8.2.2 MEPL and MEPP to select a suitable threshold The suitable threshold can be selected by using the MEPL and MEPP, which are plotted by using the mean excess function of the GPDL and GPDP, respectively. Table 8.5 provides the threshold selection by using MEPL and MEPP. Moreover,

212

Some applications to real data examples

Figures 8.10–8.13 suggest a stability region to select a suitable threshold for the two pollutants in the two cities by using MEPL and MEPP.

Figure 8.10 Mean excess plot for selecting the threshold of PM10 in 10th of Ramadan. The left and the right panels indicate respectively the MEPL and the MEPP

Figure 8.11 Mean excess plot for selecting the threshold of PM10 in Zagazig. The left and the right panels indicate respectively the MEPL and the MEPP

Pollutants

PM10

PM10 SO2 SO2

City

10th of Ramadan

Zagazig

10th of Ramadan

Zagazig

0.1520

0.1187

0.1121

0.2152

Shape of BM

MEPL MEPP

MEPL MEPP

MEPL MEPP

MEPL MEPP

MEPL and MEPP

529 100

400 200

250 297

1497 400

K

0.061 0.012

0.047 0.023

0.029 0.034

0.172 0.046

k/n

31.92 47.88

120.736 140.448

295 284

297 342

Threshold

0.2536 0.2659

0.2384 0.2506

0.1800 0.1598

0.2003 0.2197

Estimated values

Table 8.5 MEPL and MEPP for selecting an appropriate threshold

214

Some applications to real data examples

Figure 8.12 Mean excess plot for selecting the threshold of SO2 in 10th of Ramadan. The left and the right panels indicate respectively the MEPL and the MEPP

Figure 8.13 Mean excess plot for selecting the threshold of SO2 in Zagazig. The left and the right panels indicate respectively the MEPL and the MEPP

8.3 Test for the choice of EVI in the GPDL and GPDP models

215

8.3 Test for the choice of EVI in the GPDL and GPDP models All the results for numerical and graphical methods show that the underlying DF F belongs to the max-domain of heavy tailed (γ > 0). We now carry out the following test: H0 : γ = 0 (i.e., Exponential DF) vs

H1 : γ > 0 (i.e., Pareto DF).

Since, we are in a POT context, the statistical test will be performed based on the exceedances, which are obtained from the available data. This test was proposed by Gomes and van Monfort (1986) by using the test statistic ηm =

Wm:m , W[m/2]+1:m

where W1:n , ..., Wm:m denote the m order exceedances over a suitable threshold (note that for the positive data case, Wm:m = Xn:n ). Under the validity of H0 , we d

∗ have ηm = log 2 ∗ ηm − log m → Z ≈ ∧, where Z and ∧ are the normal and Gumbel d

distributions, respectively, and “→” means the convergence in the distribution, i.e., ∗ the weak convergence. Moreover, H0 is rejected if ηvm ≤ gα , where gα represents the standard Gumbel α-quantile and α is the chosen significant coefficient. Tables 8.6–8.9 show the results of the statistical test of the two pollutants in the two cities. We note that an appropriate distribution of our data that is suggested by this test is the Pareto DF, or in other words, γ > 0 (i.e. reject null hypothesis).

Table 8.6 Gomes and van Monfort test for GPDL and GPDP of PM10 in 10th of Ramadan Pollutant and City

PM10 in 10th of Ramadan ηm

∗ ηm

p-value

Decision

0.3095 H0 :γ = 0 vs H1 : γ > 0

0.701

-4.81

3.73E-05

Accept H1

++ HPP of γP

0.1749 H0 :γ = 0 vs H1 : γ > 0

2.079

-7.60

0.000

Accept H1

++ HPP of γP

0.2803 H0 :γ = 0 vs H1 : γ > 0

1.029

-4.80

1.67E-05

Accept H1

MEPL

0.2003 H0 :γ = 0 vs H1 : γ > 0

1.035

-6.59

6.43E-03

Accept H1

MEPP

0.2197 H0 :γ = 0 vs H1 : γ > 0

1.087

-5.23

2.36E-08

Accept H1

Methods HPL of

γ++

γ

Hypothesis

216

Some applications to real data examples

Table 8.7 Gomes and van Monfort test for GPDL and GPDP of PM10 in Zagazig Pollutant and City

PM10 in Zagazig ηm

∗ ηm

p-value

Decision

0.1776 H0 :γ = 0 vs H1 : γ > 0

0.542

-4.41

1.62E-06

Accept H1

++ HPP of γP

0.1350 H0 :γ = 0 vs H1 : γ > 0

1.287

-7.95

0.000

Accept H1

++ HPP of γP

0.1956 H0 :γ = 0 vs H1 : γ > 0

0.822

-3.34

5.23E-03

Accept H1

MEPL

0.1800 H0 :γ = 0 vs H1 : γ > 0

0.913

-4.88

2.30E-05

Accept H1

MEPP

0.1598 H0 :γ = 0 vs H1 : γ > 0

1.017

-4.98

1.93E-06

Accept H1

Methods HPL of

γ++

γ

Hypothesis

Table 8.8 Gomes and van Monfort test for GPDL and GPDP of SO2 in 10th of Ramadan Pollutant and City Methods HPL of

γ++

++ HPP of γP

HPP of

++ γP

SO2 in 10th of Ramadan γ

Hypothesis

ηm

∗ ηm

p-value

Decision

0.1881

H0 :γ = 0 vs H1 : γ > 0

0.567

-3.16

5.54E-05

Accept H1

0.1693

H0 :γ = 0 vs H1 : γ > 0

0.605

-8.27

0.000

Accept H1

0.1973

H0 :γ = 0 vs H1 : γ > 0

0.849

-3.78

9.02E-02

Accept H1

MEPL

0.2384

H0 :γ = 0 vs H1 : γ > 0

0.906

-5.21

4.65E-04

Accept H1

MEPP

0.2506

H0 :γ = 0 vs H1 : γ > 0

1.065

-4.55

3.13E-03

Accept H1

8.4 Comparison between graphical methods of threshold selection

217

Table 8.9 Gomes and van Monfort test for GPDL and GPDP of SO2 in Zagazig Pollutant and City

SO2 in Zagazig γ

Hypothesis

ηm

∗ ηm

p-value

Decision

0.1828

H0 :γ = 0 vs H1 : γ > 0

1.118

-1.17

3.97E-02

Accept H1

++ HPP of γP

0.1886

H0 :γ = 0 vs H1 : γ > 0

0.610

-8.167

0.000

Accept H1

++ HPP of γP

0.1954

H0 :γ = 0 vs H1 : γ > 0

0.881

-1.786

2.55E-03

Accept H1

MEPL

0.2536

H0 :γ = 0 vs H1 : γ > 0

1.157

-5.469

9.44E-04

Accept H1

MEPP

0.2659

H0 :γ = 0 vs H1 : γ > 0

0.801

-4.049

1.21E-05

Accept H1

Methods HPL of

γ++

8.4 Comparison between graphical methods of threshold selection In this section, we summarize and compare the results of all applied graphical methods under linear and power models to determine an appropriate threshold. We summarize these results in Tables 8.10 and 8.11, for PM10 and SO2 , respectively, in the two cities.

Table 8.10 Graphical methods of threshold selection of PM10 Pollutant

PM10

City

10th of Ramadan

Methods

K

HPL of γ++

200

++ HPP of γP

8500

++ HPP of γP

248

MEPL

1497

MEPP

400

Threshold Shape

Zagazig MSE

K

0.3095

3.23E-01

120

75

0.1749

2.95E-01

369

0.2803

2.98E-01

297

0.2003

392

0.2197

412

Threshold Shape

MSE

336

0.1776

3.26E-01

6981

76

0.1350

2.76E-01

50

378

0.1956

2.57E-01

3.27E-01

250

295

0.1800

3.19E-01

2.56E-01

297

284

0.1598

2.35E-01

218

Some applications to real data examples

Table 8.11 Graphical methods of threshold selection of SO2 Pollutant

SO2

City

10th of Ramadan

Methods

K

HPL of γ++

35

++ HPP of γP ++ HPP of γP

MEPL

400

MEPP

200

Threshold Shape

Zagazig MSE

K

Threshold Shape

MSE

225.834

0.1881

3.19E-01

7

121.562

0.1828

3.09E-01

5986

29.792

0.1693

2.25E-01

79

179.550

0.1973

2.37E-01

5383

12.256

0.1516

2.31E-01

11

105.868

0.1954

2.17E-01

120.736

0.2384

2.90E-01

529

31.92

0.2536

2.95E-01

140.448

0.2506

2.16E-01

100

47.88

0.2659

1.63E-01

8.5 Fitting of the GPDL and GPDP Traditionally, the threshold is chosen before fitting the GPDL and GPDP. In the previous section, we note that all the applied methods determine a suitable threshold based on the asymptotic approximation of GPDL and GPDP. In other words, we select a threshold, for which the asymptotic distributions of GPDL and GPDP are good approximation. On the other hand, we accept the alternative hypothesis that our data follows GPDL (or GPDP), with γ > 0. In this section, the shape and scale parameters of the GPDL and GPDP are estimated by MLE. The results of PM10 are presented in Table 8.12, while the results of SO2 are presented in Table 8.13, under linear and power normalizing constants.

Table 8.12 Parameter estimates for the GP distribution of PM10, under linear and power normalizing constants Pollution City

PM10 10th of Ramadan

Zagazig

Methods

Threshold

Shape

Scale

Threshold

Shape

Scale

HPL of γ++

412

0.3095

160.24

336

0.1776

41.37

++ HPP of γP

75

0.1749

136.47

76

0.1350

83.16

++ HPP of γP

369

0.2803

176.58

378

0.1956

76.43

MEPL

297

0.2003

34.47

295

0.1800

48.58

MEPP

392

0.2197

71.83

284

0.1598

51.15

8.6 Comparison between some estimators of the EVI

219

Table 8.13 Parameter estimates for the GP distribution of SO2 , under linear and power normalizing constants Pollution City

SO2 10th of Ramadan

Zagazig

Methods

Threshold

Shape

Scale

Threshold

Shape

Scale

HPL of γ++

225.834

0.1881

32.50

121.562

0.1828

14.75

++ HPP of γP

29.792

0.1693

42.75

12.256

0.1886

10.07

++ HPP of γP

179.550

0.1973

49.72

105.868

0.1954

24.88

MEPL

120.736

0.2384

29.16

31.92

0.2536

6.64

MEPP

140.448

0.2506

36.66

47.88

0.2659

17.98

8.6 Comparison between some estimators of the EVI In this section, for real data of the two considered pollutants, we apply the three ++ ++ , and γH . On the other different HELs concerning the linear model, γ++ , γM ++ hand, we apply the three HEPs concerning the power model, γp , γp++ , and γP++ H. Tables 8.14 and 8.15 give the estimated values of the EVI by using these estimators compared to the estimated values of the EVI by using the BM method. For the threshold selection (i.e., the value of k), we apply the Hill plot technique (see for example Drees et al., 2000). For example, the Hill plot for γ++ (or γp++ ) is defined by the set of points {(k, γ++ ) : 2 < k < n − 2} (or {(k, γp++ ) : 2 < k < n − 2}), where for every choice of k we obtain an estimate γ++ (or γp++ ). The threshold u = Xn−k:n is selected from this graph for the stable areas of the tail index. Finally, for quantitative comparison of the different estimators, we use a mean squared-error criterion (MSEL for the linear model and MSEP for the power model), which is inspired by Lei (2008). The performances of the estimators are arranged according to the rank score (i.e., the best estimate receives rank 1). According to the suggested MSE criterion under linear and power normalizing constants, Tables 8.14 and 8.15 show that, for the linear model, the best HEL is ++ ++ γM for the pollutant SO2 in Zagazig city, while the best HEL is γH for all other cases for the linear model. On the other hand, under the power model the best HEP is γp++ for the pollutant SO2 in Zagazig city, while the best HEP is γP++ H for all other cases under the power model (the two estimators γp++ and γP++ H have a close performance). Moreover, Tables 8.14 and 8.15 show that the given estimators have a close performance, although the small values of MSEP comparing with MSEL may indicate that the HEPs are more stable than HELs. Clearly, the problem of weighting between the linear and power models to describe the given data, when the two models fit the given data, is challenging. Some theoretical results concerning this problem were obtained by Barakat et al. (2010). For this purpose, the suggested MSE criterion may not be suitable, since we have two different models, instead of this we suggest the coefficient variation criterion (CVC), which is defined in (7.18) and may be helpful for this purpose.

HEL

0.2215 0.2202 0.2201

Shape

3.00E-01 3.18E-01 3.00E-01

3.28E-01 3.07E-01 3.02E-01

MSEL

2 3 1

3 2 1

Ranks

γp++ γp++ ++ γP H

γp++ γp++ ++ γP H

HEP

0.1099 0.1112 0.1101

0.2201 0.2184 0.2186

Shape

2.21E-01 2.93E-01 2.20E-01

2.80E-01 2.98E-01 2.97E-01

MSEP

2 3 1

1 3 2

Ranks

10th of Ramadan

Table 8.14 Different estimates of the EVI for 10th of Ramadan under the linear and power models City

The BM estimate of γ

γ++ ++ γM ++ γH

0.1071 0.1142 0.1098

P-Model

Pollutant 0.22

γ++ ++ γM ++ γH

L-Model

PM10

0.11

Type

SO2

The BM estimate of γ 0.099

0.16

Pollutant

PM10

SO2

Type

City

Shape 0.1652 0.1581 0.0868 0.1685 0.1540 0.1656

HEL γ++ ++ γM ++ γH γ++ ++ γM ++ γH 2.95E-01 1.79E-01 3.00E-01

3.27E-01 3.81E-01 2.30E-01

MSEL

L-Model

2 1 3

3 2 1

Ranks

γp++ γp++ ++ γP H

γp++ γp++ ++ γP H

HEP

Zagazig

0.1610 0.1577 0.1601

0.0980 0.1400 0.0980

Shape

1.18E-01 1.53E-01 1.16E-01

2.71E-01 3.16E-01 2.70E-01

MSEP

P-Model

2 3 1

2 3 1

Ranks

Table 8.15 Different estimates of the EVI for Zagazig under the linear and power models

222

Some applications to real data examples

If we apply this suggested criterion considering only √the best estimators un++ der the linear √ and power models, we get CVC(γH ) = 0.302/0.996 = 0.552 < 1.556 = 0.280/0.34 = CVC(γp++ ), for the pollutant PM10 in 10th of Ra√ √ ++ madan city and CVC(γH ) = 0.300/1 = 0.548 < 3.825 = 0.220/0.123 = ++ CVC(γH )= CVCγP++ H ), for the pollutant SO2 in 10th √ √ of Ramadan city. Moreover, ++ 0.230/0.998 = 0.481 < 0.764 = 0.270/0.068 = CVC(γ ), for the polluP H √ ++ tant and CVC(γM ) = 0.300/0.859 = 0.638 < 1.191 = √ PM10 in Zagazig city, ++ 0.116/0.286 = CVC(γP H ) for the pollutant SO2 in Zagazig city. This indicates that the linear model is more favourable for this data. We summarize the results of these various estimators under linear and power normalizing constants in Table 8.16.

Pollution PM10 SO2 PM10 SO2

City

10th of Ramadan

Zagazig

Type Shape 0.2201 0.1098 0.0868 0.1540

Estimator ++ γH ++ γH ++ γH ++ γM

L-Model

0.481 0.638

0.552 0.548

CVCL

++ γP H ++ γP H

++ γP ++ γP H

Estimator

0.0980 0.1601

0.2201 0.1101

Shape

P-Model

0.764 1.191

1.556 3.825

CVCP

L-Model L-Model

L-Model L-Model

Best Fit

Comparison

Table 8.16 Comparison between the linear and power models for pollutants PM10 and SO2

224

Some applications to real data examples

8.7 The second application to real data In this section, we apply the suggested estimators as well as the suggested criteria to a real data set of Danish fire insurance claims from 1980–1990, which is taken from Package evir. The data is contained in the numeric vector of 2167 observations. For this data, we apply three different estimators concerning the linear model, γ++ , ++ ++ , and γM γM R . On the other hand, we apply three estimators concerning the power ++ ++ model, γP , γP+ M , and γP M R . Table 8.17 gives the estimated values of the EVI by these estimators. For the threshold selection (i.e., the value of k), we apply the HP technique (see for example Drees et al., 2000). For example, the HP for γ++ is defined by the set of points {(k, γ++ ) : 2 < k < n − 2}, where for every choice of k we obtain an estimate γ++ . The threshold u = Xn−k:n is selected from this graph for the stable areas of the tail index. Finally, for a quantitative comparison of the different estimators, we use the MSEL for the linear model and the MSEP for the power model. The performances of the estimators are arranged according to the rank score (i.e., the best estimate receives rank 1), see Table 8.17. Table 8.17 shows that under the linear model, the ++ best estimator is γM , while under the power model the best estimator is γP++ . For weighting between the linear and power models, we apply the CVC criterion considering only the best estimators under the linear and power models, see Table 8.18. Table 8.18 shows that the value of the CVCL is smaller than the CVCP by a very small amount, which indicates that both models can describe this data. Nevertheless, we can consider the linear model is more favourable for this data.

Table 8.17 Estimates of the EVI of Danish fire insurance claims under linear and power normalizing constants Type

Estimator

Estimated value of γ

MSE (L or P)

Ranks

L-Model

γ++ ++ γM ++ γM R

0.6182 0.6181 0.6179

1.14E-01 9.33E-02 9.86E-02

3 1 2

P-Model

++ γP + γP M ++ γP MR

0.6164 0.6209 0.6737

9.11E-02 3.21E-01 3.61E-01

1 2 3

Table 8.18 Comparison between the linear and power models L-Model

P-Model

Estimator

Estimated γ

CVCL

Estimator

Estimated γ

CVCP

++ γM

0.6181

4.07E-01

++ γP

0.6164

4.13E-01

9 Miscellaneous results

The present chapter is devoted to two major topics. The first topic is suggesting and studying a new model, the linear-power model, to overcome some drawbacks of the power model. This suggested model merges the linear and power models into a more capable model for treating the different extreme value data sets. The second topic is considering several generalizations to achieve an improved fit of the GEVL (GEVP) for the block maxima data sets and the GPDL (GPDP) for the peak over threshold data sets. The first topic will be tackled in the first two sections, while the last two sections are devoted to tackling the second topic.

9.1 Extreme value theory under linear-power normalization As we have seen through the different chapters of this book, the max-stable laws under power normalization attract more distributions than under linear normalization. This fact practically means that the classical linear model may fail to fit the given extreme data, while the power model succeeds in doing that. Nevertheless, the linear model is more flexible than the power model because under the linear model we can manoeuvre around the zero. Moreover, practically, the mathematical modelling of extremes under power normalization fails unless all the observed maximum values have the same sign. In order to overcome these drawbacks of the power model, Barakat et al. (2017b) suggested a new model, the linear-power model. Moreover, Barakat et al. (2017b) studied the statistical inference about the upper tail of a DF by using the suggested model. The linear-power model (denoted by lp−model) is defined via the monotone transformation Gn,c (x) = an |x − c|

bn

S(x − c), an , bn > 0,

where c is a parameter (which is assumed to be unknown and independent on n). Clearly, when c = 0, we get the power normalization Gn,0 . Moreover, Gn,c (x) = Gn,0 (x − c) and ( ( b1 (x( n ( ( S(x) + c = G−1 (x) + c. (9.1) G−1 n,c (x) = ( n,0 an ( Therefore, our first aim is to determine all the possible limit types of H, for which F n (Gn,c (x)) = P (G−1 H(x). n,c (Xn:n ) ≤ x) −→ n w

(9.2)

226

Miscellaneous results

Under the condition (9.2), we write F ∈ Dlp (H).

9.1.1 The class of weak limits of lp−model As we have seen before, the convergence to a Khinchin-type theorem is the cornerstone for obtaining the weak limit under any nonlinear normalization. Clearly, this theorem is applicable for linear normalization, power normalization, and linearpower normalization. Namely, for the linear-power normalization, we get the following result: Lemma 9.1 (cf. Barakat et al., 2017b) Let Fn be a sequence of DFs and T1 be w b T1 (x), where a non-degenerate DF. Furthermore, let Fn (an |x − c| n S(x − c)) −→ n an , bn > 0, c ∈ R. Then, for some non-degenerate DF T2 and an , bn > 0, we get w a b b T2 (x), if and only if ( ann )1/bn → A > 0, bnn → B > 0, Fn (a n |x − c| n S(x − c)) −→ n B

for some A, B > 0, and T2 (x) = T1 (A |x| S(x)).

By using Lemma 9.1, Barakat et al. (2017b) could prove the next theorem that characterizes the class of weak limits of the lp−model. Theorem 9.2 (cf. Barakat et al., 2017b) With the linear-power transformation Gn,c , the possible limit laws in (9.2), denoted by lp−max-stable DFs, are H1;β,c (x) = exp(−(log(x − c))−β ), x > c + 1; H2;β,c (x) = exp(−(− log(x − c))β ), c < x ≤ c + 1; H3;β,c (x) = exp(−(− log(−(x−c)))−β ), c−1 < x ≤ c; H4;β,c (x) = exp(−(log(−(x− 1 ), x > c; and H6;c (x) = exp(− | x + c |), c)))β ), x ≤ −1 + c; H5;c (x) = exp(− x−c x ≤ c. The lp−max-stable DFs given in Theorem 9.2 can be summarized by the von Mises type representations P1;γ (x; c; 1, 1) and P2;γ (x; c; 1, 1) (denoted by GEVLPs), where for any a, b > 0,   1 P1;γ (x; c; a, b) = exp −(1 + γ log a(x − c)b )− γ , x > c, 1 + γ log a(x − c)b > 0, (9.3) and   1 P2;γ (x; c; a, b) = exp −(1 − γ log a(−(x − c))b )− γ , x < c, 1−γ log a(−(x−c))b > 0. (9.4) Remark Clearly, for max(a, b) > 1 non of the types l−max-stable DFs belongs to the types lp−max-stable DFs. Moreover, for c = 0 non of the types p−max-stable DFs belongs to the lp−max-stable DFs. Therefore, the class of lp−max-stable DFs is larger than the class of l−max-table DFs, as well as the class p−max-stable DFs. The next theorem, from Barakat et al. (2017b), in which we adopt the preceding abbreviation F [u] (x) = P (X < x|X ≥ u), gives the generalized Pareto distributions under linear-power normalization, denoted by GPVLPs. This theorem paves the way for applying the POT approach by using the lp−model. Theorem 9.3 1. Let (9.2) be satisfied with H(x) = P1;γ (x; c; a, b). Then, there exists α( ) > 0 such that w Q ¯ F [] ( (x − c)α() ) −→ 1;γ (x − c; b),

9.1 Extreme value theory under linear-power normalization

227

and ζ = as ↑ x > 0, where Q1;γ (x − c; ¯b) = 1 + log P1;γ (x; c; 1, ¯b), ¯b = 1 + γ log a. 2. Let (9.2) be satisfied with H(x) = P2;γ (x; c; a, b). Then, there exists α( ) > 0 such that b ζ

o

w Q F [] ( |x − c|α() ) −→ 2;γ (x − c; b),

as ↑ xo ≤ 0, where Q2;γ (x − c; b) = 1 + log P2;γ (x; c; 1, b), b = 1 − γ log a.

b ζ

and ζ =

Remark One may be curious to know what happens if the location parameter is chosen to be variable with respect to n, i.e., cn . In this case, the sequence cn either converges to a finite constant c, say, or the sequence cn converges to ±∞ (the case when cn does not converge, i.e., cn has more than one finite or infinite limit point, can be excluded by replacing the sequence cn by any of its convergent subsequence). Clearly, when cn converges to a constant c, we get the same limit laws given in Theorem 9.2 (but in practice, the use of a constant location parameter is obviously more flexible than the use of a variable location parameter). When cn converges to an infinite limit (±∞), the conditions of Theorem 9.2 will be violated (in this case we cannot apply Lemma 9.1). Nevertheless, the continuity of the lp−max-stable limit DFs, defined in Theorem 9.2, allows us to write limn→∞ F n (Gn,cn (x)) = limc→±∞ H(x), which implies that this case cannot result in any limit type except the degenerate type.

9.1.2 Statistical inference using BM method in lp−model Let x1:n ≤ x2:n ≤ ... ≤ xn:n be a given data set of maximum values. The modelling, by using the BM approach, under power normalization can be applied only if these maximum values have the same sign; but the modelling under linear-power normalization can be applied now regardless of the data sign by using either the model P1;γ (x; c; 1, 1) or P2;γ (x; c; 1, 1). The MLE (ˆ γ, a ˆ, ˆb, cˆ) of (γ, a, b, c) must be numerically evaluated as a solution to the likelihood equations based on the basis of model (9.3) or (9.4). The estimate of the shape parameter γ corresponds to the Dubey estimate in the GEVLP model is for linear combinations of ratios of spacing Rn =

log | xnq2 :n | − log | xnq1 :n | , log | xnq1 :n | − log | xnq0 :n |

where q0 < q1 < q2 and qi = ni . Evidently, in view of (9.1), this statistic is invariant under the linear-power transformation. By using the obvious relations: 1. Fn−1 (qi ) = xi:n , ∀xi:n where Fn is the sample DF, 2. for large n, we have Fn (x)  F n (x)  Pt;γ (G−1 n,c (x); c; 1, 1), where t = 1 or 2, −1 3. Fn−1 (Gn,c (x)) = G−1 (F (x)), n,c n we get Rn =

−1 −1 log P1;γ (q2 ; c; 1, 1) − log P1;γ (q1 ; c; 1, 1) −1 −1 log P1;γ (q1 ; c; 1, 1) − log P1;γ (q0 ; c; 1, 1)

, ∀x, i = 1, 2, ..., n,

228

Miscellaneous results

which implies, after routine calculations,

Rn =

(− log q2 )−γ − (− log q1 )−γ = (− log q1 )−γ − (− log q0 )−γ



log q0 log q2

 γ2 ,

(9.5)

if q0 , q1 , and q2 satisfy the equation (− log q1 )2 = (− log q2 )(− log q0 ). In this manner, by taking the logarithm of both sides of (9.5), one obtains the estimate γˆ =

2 log Rn . log(log q0 / log q2 ) 2

On the other hand, if q0 = q, q1 = q a , q2 = q a , for some 0 < q, a < 1, we get the log Rn Rn 1 family of estimates γˆ = − ˆ = log log a . By taking a = 2 , we get γ log 2 .

9.2 Real data application related to AccuWeather Moscow has a humid continental climate with warm, sometimes hot, somewhat humid summers and long, cold winters. Typical high temperatures in the warm months of June, July, and August are around 23o C, but during heat waves, which can occur anytime from May to September, daytime temperature highs often top 30o C for sometimes one or two weeks. In the winter, temperatures normally drop to approximately -10o C, though there can be periods of warmth with temperatures rising above 0o C. As an example of the researches on this subject, see Rahmstorf and Coumou (2011), where the authors discussed the theoretical approach to quantify the effect of long-term trends on the expected number of extremes in generic time series, using analytical solutions and Monte Carlo simulations by using monthly mean July temperatures for Moscow from 1911 to 2010. In this study, we take the daily maximum temperature during the year 2015; this data set was obtained from the AccuWeather website available at: http://accuweather.com/en/ru/russia-weather/. The summary statistics for 365 recorded data are given in Table 9.1. We estimate

Table 9.1 Summary statistics Descriptive statistics for maximum temperature minimum

maximum

median

mean

STD

skewness

kurtosis

-17

31

9

11.09

10.28

0.025

-1.081

9.2 Real data application related to AccuWeather

229

the parameters (γ, c, a, b) of the suggested model P1;γ (x; c; a, b) by Mathematica Package, by using the ML method. These estimates (ˆ γ , cˆ, a ˆ, ˆb) are summarized in Table 9.2. Moreover, Figure 9.2 gives the graphical representation of the data set and the fitted distribution P1;ˆγ (x; a ˆ, ˆb). This figure shows that the suggested model ˆ ˆ, b) fits the given data. Moreover, the validity of the estimated models P1;ˆγ (x; cˆ; a is checked by the K-S test. The result of this study shows that H = 0, P = 0.2, KSST AT = 0.0465, and CV = 0.0636. Therefore, we accept the null hypothesis that the DF P1;ˆγ (x; cˆ; a ˆ, ˆb) fits the given data.

Table 9.2 Parameter estimation for maximum temperature ML parameters estimation a ˆ

ˆb

γˆ



0.0005176

2.37

-0.55

-19.016

Figure 9.1 Graphical representation of the data set and the fitted distriˆ, ˆb) bution P1;ˆγ (x; cˆ; a

230

Miscellaneous results

9.3 Box-Cox transformation to improve the L-model and P-model It is known that the validity of EVT, either in the L-model or the P-model, is contingent upon fulfilment of some requirements. Perhaps, the most crucial requirements are the independence and the identity of all the RVs—we study their maximum. Clearly, the result of the sharp violation of these requirements is that the GEVL (GEVP) (for the block maxima data) and the GPDL (GPDP) (for the peak over threshold data) do not fit our extreme data. Therefore, it is not surprising to find the GEVL does not give adequate fits in many areas. For example, Kharin and Zwiers (2000) showed that the GEVL does not fit the DF of the longest annual dry period well. Caprani et al. (2008) stated that the assumption of convergence to a GEVL is not valid for data on traffic loading. Moreover, Tolikas and Gettinby (2009) found that the popular GEVL is not the best model for both the extreme minima and maxima daily returns of the Singapore stock market. Also, Zwiers et al. (2012) found some evidences that the GEVL does not fit well, for daily temperature extremes at regional scales. In the next two sections, we will discuss the results of Bali (2003) and Khaled and Kamal (2018) for improving the L-model and P-model. In the spirit of the Box-Cox transformation (see Box and Cox, 1964), Bali (2003) composed the GEVL and GPDL into the following distribution, for every 0 ≤ λ ≤ 1, ⎡ ⎤  1 % x−μ &2− γ1  exp −λ 1 + γ − 1 σ ⎢ ⎥ Wγ (x; μ, σ, λ) = ⎣ (9.6) ⎦ + 1, λ −λ where μ + σγ [( log(1−λ) )γ − 1] ≤ x < ∞, if λ = 0 and γ > 0, while μ −

( − log(1−λ) )γ λ



σ γ∗

≤x≤



μ+ − ], γ = −γ, when λ = 0 and γ < 0. The distribution introduced in (9.6) is called Box-Cox-GL, which is converted to the GEVL and GPDL, when λ = 1 and λ −→ 0, respectively. More precisely, when λ = 1, we get Gγ ( x−μ σ ) (defined in (2.4)), while when λ −→ 0, by using L’hopital’s rule we get Wγ ( x−μ σ ) (defined in (2.8)). Bali (2003) successfully applied Box-Cox-GL extreme stock data. The PDF of the DF (9.6) is given by   x − μ − γ1 −1 1 x − μ − γ1 [1 + γ( . wγ (x; μ, σ, λ) = exp −λ[1 + γ( )] )] σ σ σ σ γ ∗ [1

Moreover, the quantile function is Wγ−1 (q; μ, σ, λ) = μ −

σ σ + γ γ



1 − log(λ(q − 1) + 1) λ

−γ , 0 < q < 1.

For the P-model, Khaled and Kamal (2018) derived the Box-Cox-GEVP, denoted by Box-Cox-GP1, corresponding to the GEVP, defined in (2.32), and the GPVP, defined in (2.33),   ⎡ ⎤ 1 exp −λ(1 + γ log axb )− γ − 1 ⎦ + 1, 0 ≤ λ ≤ 1, (9.7) F1,γ (x; a, b, λ) = ⎣ λ

 where 

9.4 Real data application  1b 1 1 −λ γ ≤ x < ∞, if λ = 0, γ > 0, and a exp[ γ (( log(1−λ) ) − 1)]

1 − log(1 − λ) γ ∗ 1 exp[ ∗ (1 − ( ) )] a γ λ

 1b

 ≤x≤

1 1 exp( ∗ ) a γ

 1b

231

, if λ = 0, γ < 0.

The Box-Cox-GP1, defined in (9.7), is reduced to the GEVP (defined in (2.32)), 1 when λ = 1, while the Box-Cox-GP1 is reduced to GPDP (Q1,γ (a b x; b), see (2.34)), when λ −→ 0. The PDF and the quantile function of the DF (9.7) are respectively given by f1,γ (x; a, b, λ) = and −1 (q; a, b, λ) F1,γ

  1 1 b exp −λ(1 + γ log axb )− γ (1 + γ log axb )− γ −1 y

# = exp

1 − log(λ(q − 1) + 1) λ

−γ

$ 1 log(a) − − , 0 < q < 1. bγ b

Also, for negative data, Khaled and Kamal (2018) derived the Box-Cox-GP2 by composing the GEVP, defined in (2.33), and its corresponding GPDP, by   ⎡ ⎤ 1 exp −λ(1 − γ log a(−y)b )− γ − 1 ⎦ + 1. F2,γ (y; a, b, λ) = ⎣ λ 1

Again, F2,γ (y; a, b, 1) = P2,γ (x; a, b) and F2,γ (y; a, b, λ) −→ Q2,γ (a b x; b), as λ → 0.

9.4 Real data application In this section, we apply the new suggested models (9.6) and (9.7) on a positive data set for air pollution from the London Air Quality Network (LAQN). This network is a united resource for air pollution measurements, which are fundamental to support air quality administration. The LAQN was framed in 1993 to arrange and enhance air pollution checking in London. People living in the city of London can check the updates of the level of air pollution data through the website www.londonair.org.uk. In this study, a data set was taken from the site Barking Dagenham at Rush Green square, that monitors nitrogen oxides, sulphur dioxide, PM10, and climatology data. The daily maximum level of nitric oxide (NO), nitrogen dioxide (NO2 ), and sulphur dioxide (SO2 ) were recorded every hour. Therefore, a total of 53000 readings have been obtained in the period from 1-1-2010 to 31-12-2015. This data can be downloaded by any researcher in the form of a report every half hour, every hour, or every day according to the type of study from the following site: www.londonair.org.uk/london/asp/datadownload.asp. The set of the daily maximums of this data was used for application of these models. The summary statistics is given in Table 9.3. In this case study, we deal with this data as follows: First, we estimate the parameter of the block data (daily maximum) using the GEVL model and estimate the peak over mean of the data using the GPDL model by the MLE

232

Miscellaneous results

method. The log-likelihood of the Box-Cox-GL distribution is: − γ1  k 

xi − μ 1+γ γ (x; μ, σ, λ) = −k log σ − λ σ i=1     k  xi − μ 1

− 1+ log 1 + γ . γ i=1 σ

(9.8)

Equation (9.8) gives the log-likelihood for the GEVL distribution if λ = 1 and the GPDL distribution if λ = 0. Analytically, we cannot maximize the log-likelihood function in Equation (9.8), so we get the negative log-likelihood for minimization by using the FindMinimum function in Mathematica Package.

Table 9.3 Descriptive statistics for maximum data for air pollution n

min

max

median

mean

STD

skewness

kurtosis

NO

2006

0.2

597.7

10.7

31.29

51.678

3.65

19.570

NO2

2003

2.30

229.1

45.1

47.417

22.18

0.935

2.814

SO2

2077

0.1

67.3

4.1

5.013

4.16

7.57

89.02

Table 9.4 summarizes the MLEs results for the GEVL and the application of the Akaike Information Criterion (AIC) and the Bayesian Akaike Criterion (BIC), for comparing the models (note that small value of AIC and BIC indicates to a more favorable model). Also, we used the K-S test, see Table 9.5. Table 9.5 shows that we reject the GEVL model for NO and SO2 and accept this model for the NO2 pollutant. Moreover, we reject the GPDL model for all pollutants. Therefore, we apply the Box-Cox-GL for these data sets. The estimates of the AIC, BIC, and K-S test for the Box-GL model are summarized in Tables 9.6 and 9.7. Table 9.6 shows that this model is accepted for the BM and POT techniques. In order to apply the P-model on our data sets, we use the models (2.32), 1 Q1,γ (a b x; b), and (9.7). First, the log-likelihood of Box-Cox-GP1 is given by ∗γ (y; a, b, λ) = k log b −

k

log yi

i=1

−λ

k

% i=1

1 + γ log ay b

&− γ1

 k  & % 1

log 1 + γ log axb . − 1+ γ i=1

(9.9)

Equation (9.10) gives the log-likelihood for the GEVP distribution, if λ = 1, and 1 the GPDP distribution (Q1,γ (a b x; b)), if λ → 0. The estimates of the AIC, BIC, and K-S test are given in Tables 9.8 and 9.9, respectively. Table 9.9 indicates that both the GEVP and GPDP models fail to fit the given data sets. Hence, we turn to using the model Box-Cox-GP1. The estimate values of the AIC, BIC, and parameter estimates for this model are summarized in Table 9.10. Moreover, the result of

9.4 Real data application

233

the K-S test is given in Table 9.11, which indicates that the model Box-Cox-GP1 fits the given data, except NO2 .

Table 9.4 The MLEs for the GEVL and GPDL models—the application of AIC and BIC Parameter estimates of the GEVL model by using the BM technique Pollutant

γ

σ

μ

AIC

BIC 17044.8

NO

1.075

7.907

6.856

8518.99

NO2

-0.0396

18.3

37.53

8944.9

17900

SO2

0.1806

1.6008

3.634

4473.51

8957.94

Parameter estimates of the GPDL model by using the POT technique Pollutant

γ

σ

μ(threshold)

AIC

BIC

NO

0.1634

50.32

31.227

2761.08

5529.04

NO2

-0.07077

20.34

47.4170

3677.32

7363.14

SO2

0.396

1.996

5.0138

1351.04

2709.48

Table 9.5 K-S test Fitting of the GEVL model Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

1 0 1

0.06 0.1662 7.99∗10−7

0.032 0.0211 0.0245

0.0272 0.0273 0.0636

reject model accepted model reject model

Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

1 1 1

0 0 0

0.2916 0.5372 0.6909

0.0272 0.0273 0.0268

reject model reject model reject model

Fitting of the GPDL model

234

Miscellaneous results

Table 9.6 The MLEs for Box-Cox-GL model and the application of AIC and BIC Parameter estimates of Box-Cox-GL by using the BM technique Pollutant

γ

σ

μ

λ

AIC

BIC

NO

0.13

6.6

6.63

0.985

1005.6

20021.92

NO2

-0.04

19

38

0.999

8952.09

17908.6

SO2

0.187

1.79

2.7

0.996

3790.51

8595.58

Parameter estimates of Box-Cox-GL by using the POT technique Pollutant

γ

σ

μ(threshold)

λ

AIC

BIC

NO

0.19

51.6

31.2

0.0004

2743.74

5516.65

NO2

-0.066

20

47.417

0.01

3584.06

7319.45

SO2

0.1

0.826

5.1

0.0002

1241.53

2692

Table 9.7 K-S test Fitting of the Box-Cox-GL model by using the BM technique Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

0 0 0

0.7966 0.0892 0.977

0.025 0.0245 0.0023

0.0272 0.0273 0.0268

accepted model accepted model accepted model

Fitting of the Box-Cox-GL model by using the POT technique Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

0 0 0

0.2795 0.2746 0.51030

0.0340 0.0262 0.0226

0.0523 0.0486 0.0480

accepted model accepted model accepted model

9.4 Real data application

235

Table 9.8 The MLEs for GEVP and GPVP models—the application of AIC and BIC Parameter estimates of the GEVP model by using the BM technique Pollutant

γ

a

b

AIC

BIC

NO

-0.1935

0.2022

0.7993

6150.95

12320.4

NO2

-0.2897

0.001

1.8201

9255.05

18520.4

SO2

-0.1744

0.0985

1.834

4690.51

9404.53

Parameter estimates of the GPVP model by using the POT technique Pollutant

γ

a

b

AIC

BIC

NO

-0.4453

0.2338

0.5749

3090.73

6188

NO2

-0.2624

0.00027

2.188

3970.95

7820.68

SO2

-0.157

0.0284

2.437

1679.19

3365.77

Table 9.9 K-S test Fitting of the GEVP model Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

1 1 1

0.00046 0 0

0.043 0.0656 0.0245

0.0272 0.0273 0.1063

reject model reject model reject model

Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

1 1 1

0 0 0

0.5454 0.4613 0.5505

0.0272 0.0273 0.0268

reject model reject model reject model

Fitting of the GPDP model

236

Miscellaneous results

Table 9.10 The MLEs for the Box-Cox-GP1 model and the application of the AIC and BIC Parameter estimates of the Box-Cox-GP1 by using the BM technique Pollutant

γ

a

b

λ

AIC

BIC

NO

-0.142

0.385

1.25

0.999999

5330.96

10676.5

NO2

-0.29

0.001429

1.84

0.999999

9041.58

18094

SO2

-0.14

0.15

1.9

9.9∗109

3796.51

78854.2

Parameter estimates of the Box-Cox-GP1 by using the POT technique Pollutant

γ

a

b

λ

AIC

BIC

NO

-0.4

0.077

0.77

0.001

2609.32

5427

NO2

-0.23

0.00029

2.2

0.012

3026.37

7064.08

SO2

-0.011

0.04

2.11

0.009

1180.55

2570.95

Table 9.11 K-S test Fitting of the Box-Cox-GP1 by using the BM technique Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

0 1 0

0.9931 2.7∗10−6 0.9406

0.0012 0.0564 0.0038

0.0272 0.0273 0.0268

accepted model rejected model accepted model

Fitting of the Box-Cox-GP1 by using the POT technique Pollutant

H

P

KSST AT

CV

Decision

NO NO2 SO2

0 0 0

0.3349 0.0657 0.1136

0.0315 0.0381 0.0409

0.0523 0.040 0.0480

accepted model accepted model accepted model

9.5 The Kumaraswamy GEVL and GEVP DFs and further generalizations237

9.5 The Kumaraswamy GEVL and GEVP DFs and further generalizations There are several ways to generalize the GEVL distribution. For example, Hosking (1994) suggested the four-parameter kappa DF given by  1 1 x − μ − γ1 θ θ , −∞ < θ < ∞. (9.10) )] F1:θ (x) = (1 − θu(x)) = 1 − θ[1 + γ( σ The GEVL DF, Gγ ( x−μ σ ), is a particular case of Fθ (x), as θ → 0. Moreover, Adeyemi and Adebanji (2006) suggested the following extended GEVL:   x−μ x − μ − γ1 F2:κ (x) = exp(−κu(x)) = exp −κ[1 + γ( , κ > 0, 1 + γ( )] ) > 0. σ σ (9.11) The GEVL, Gγ ( x−μ σ ), is a particular case of (9.11) for κ = 1. In addition, Rulfova et al. (2016) proposed the two-component GEVL specified by the DF F1,2 (x) = exp(−u1 (x) − u2 (x)),

(9.12)

− γ1 i i γi ( x−μ σi )]

, −∞ < μi , γi < ∞, σ > 0, i = 1, 2. But it is not where ui (x) = [1 + clear if the GEVL DF is contained as a particular case of F1,2 (x). Clearly, the extended GEVP (2.32) corresponding to (9.10), (9.11), and (9.12) can easily be obtained by replacing the functions u(x), u1 (x) and u2 (x) by v + (x) = 1 1 1 [1 + γ log axb ]− γ , v1+ (x) = [1 + γ1 log a1 xb1 ]− γ1 and v2+ (x) = [1 + γ2 log a2 xb2 ]− γ2 , respectively. Moreover, the extended GEVP (2.33) corresponding to (9.10), (9.11), and (9.12) can be easily obtained by replacing the functions u(x), u1 (x), and u2 (x) 1 1 by v − (x) = [1 − γ log a(−x)b ]− γ , v1− (x) = [1 − γ1 log a1 (−x)b1 ]− γ1 and v2− (x) = 1 b2 − γ2 [1 − γ2 log a2 (−x) ] , respectively. Recently, Eljabri and Nadarajah (2017) gave a simple generalization of the GEVL distribution. Moreover, Eljabri and Nadarajah (2017) provided a motivation for this simple generalization based on the definition of the GEVL distribution. Namely, the GEVL distribution arises as the limiting distribution of normalized maxima Xn:n such that (2.1) is satisfied with H(x) = exp(−u(x)). In practice, there may be situations where the distribution of Xn:n is heterogeneous (see for example, Caprani et al., 2008). One possible way to describe this situation is to model the distribution of Xn:n as a mixture, say Fn:n (an x + bn ) = P (Xn:n < an x + bn ) =

p

(i) ωi P (Xn:n < an x + bn ),

(9.13)

i=1 (i)

where Xn:n is an RV representing the ith component of the mixture and ωi , ω2 , ..., ωp are nonnegative weights summing to one. Under suitable conditions, the limiting distribution of (9.13) may be (i) < an x + bn ) −→ Fn:n (an x + bn ) = P (Xn:n n w

p

i=1

ωi Gγi (

x − μi ). σi

(9.14)

But mixtures of the form (9.14) are difficult to treat not just because of the complicated mathematical form. The inferences and fitting of (9.14) are also difficult.

238

Miscellaneous results

Eljabri and Nadarajah (2017) wrote (9.14) in a simple mathematical form, motivated by the works of Kumaraswamy (1980) and Cordeiro and de Castro (2011), as B  x−μ A , (9.15) ) F3 (x) = 1 − 1 − Gγ ( σ where A > 0, B > 0 are two additional parameters. Note that the right-hand side of (9.15) can be expanded as  B

∞ x−μ A x−μ 1 − 1 − Gγ ( = ci GiA ) ) γ ( σ σ i=1 a mixture taking the form of (9.14). The coefficients ci , i = 1, 2, ... are functions of B. For instance, c1 = B. The parameter A dictates the tail behaviour of the mixture components. The parameter B dictates the mixture coefficients. The DF (9.15) is denoted by KGEVL. Clearly, the GEVL distribution is a special case of the KGEVL distribution for a = b = 1. The exponentiated GEV distribution, due to Adeyemi and Adebanji (2006), is also a particular case of the KumGEV distribution. Eljabri and Nadarajah (2017) illustrated the flexibility of the KGEVL distribution by using a real data set of the annual rainfall maxima in millimetres from 1938 to 1972 at Uccle, Belgium, over the duration of one day, so the sample size n = 35. This data set is contained as part of the evd contributed package in the R package (R Development Core Team, 2016). The KGEVL distribution provided better fits than the GEVL distribution for this real data set. It is worth mentioning that a drawback with the KGEVL distribution is the lack of an exact theoretical motivation from an extreme value context. The GEVL distribution is the limiting distribution of a linearly normalized maximum as the sample size approaches infinity. Such a result does not appear to be possible for the KGEVL distribution. Actually, the same drawback exists for all the known generalizations of the GEVL and GPVL (or even for the GEVP and GPVP). Clearly, the extended GEVPs (2.32) and (2.33) corresponding to (9.15), denoted by KGEVP1 and KGEVP2, respectively, can be suggested as &B % F3:P 1 (x) = 1 − 1 − P1;γ (x; a, b))A and

% &B F3:P 2 (x) = 1 − 1 − P2;γ (x; a, b))A .

Appendix A Summary of Hill’s estimators in the L-model and P-model

Table A.1 Four HELs Data type Positive

Type of γ γ>0 γ0 γ

0, Yn−k:n

γ > 0, 0 < Yk+1:n

γ < 0, Yn−k:n > 1 γ < 0, 0 < Yk+1:n < 1 γ > 0, Yk+1:n < −1 γ > 0, −1 < Yn−k:n < 0 γ < 0, Yk+1:n < −1 γ < 0, −1 < Yn−k:n < 0

Negative

γ>0

Positive

γ0

γ 1

+ ˆ γP M = MP :1 + 1 −

1 2

0 < Yk+1:n < 1, 0 < Y1:n ,

+≺ ˇ γP M = MP :1 + 1 −

1 2

−1 < Yn−k:n < 0, Yn:n < 0,

− ˆ γP M = NP :1 + 1 −

1 2

Yk+1:n < −1

−≺ ˇ γP M = NP :1 + 1 −

1 2

Negative

Moment estimator

 

 

1−

ˆ2 M P :1 ˆ P :2 M

1−

ˇ2 M P :1 ˇ P :2 M

1−

ˆ2 N P :1 ˆP :2 N

1−

ˇ2 N P :1 ˇP :2 N

−1 −1 −1 −1

Negative

γ > 0, Yn−k:n > 1

Positive

γ > 0, Yk+1:n < −1

γ > 0, −1 < Yn−k:n < 0, Yn:n < 0,

γ > 0, 0 < Yk+1:n < 1, 0 < Y1:n

Type γ and Threshold

Data type

=

=

=

=

=

2

=

i=1

log | log Yi:n |−log | log Yk+1:n |

.

i=1



(log log |Yi:n |−log log |Yk+1:n |)

(log log |Yi:n |−log log |Yk+1:n |)2 i=1 k

k 

.

(log | log |Yn−i+1:n ||−log | log |Yn−k:n ||)

2

i=1



i=1 k

ˇP :2 N ˇP :1 2N

=

2



(log | log Yi:n |−log | log Yk+1:n |)2

i=1 k



i=1 k

.

(log | log |Yn−i+1:n ||−log | log |Yn−k:n ||)2

=

2

log(log Yn−i+1:n )−log(log Yn−k:n )

(log(log Yn−i+1:n )−log(log Yn−k:n ))2



k  i=1 k

k 

ˇ P :2 M ˇ P :1 2M

ˆ P :2 M ˆ P :1 2M

ˆP :2 N ˆP :1 2N

−+≺ γP MR

−+ γP MR

++≺ γP MR

++ γP MR

Moment ratio estimator

Table A.5 Four moment ratio estimators under power normalization

.

References

Adeyemi, S., and Adebanji, A. O. 2006. The exponentiated generalized extreme value distribution. J. Appl. Functional Dif. Eq., 1, 41–47. Alaswed, H. A. 2015. Statistical modeling of extremes with some applications. Ph. D. Dissertation, Zagazig University, Egypt. Alves, M. I. F. 1995. Estimation of the tail parameter in the domain of attraction of an extremal distribution. J. Statist. Plan. Inference, 45(1–2), 143–173. Angus, J. 1993. Asymptotic theory for bootstrapping the extremes. Comm. Statist. Theory and Methodology, 22, 15–30. Arcones, M. A., and Gin´e, E. 1989. The bootstrap of the mean with arbitrary bootstrap sample size. Ann. de l’l. H. P., Sec. B, 25(4), 457–481. Arnold, B. C., Balakrishnan, N., and Nagaraja, H. N. 1992. A first course in order statistic. John Wiley & Sons Inc. Athreya, K. B. 1987. Bootstrap of the mean in the infinite variance case. Ann. Statist., 15, 724–731. Athreya, K. B., and Fukuchi, J. 1993. Bootstrapping extremes of i.i.d. random variables. Gaitherburg, Maryland. Bali, T. B. 2003. The generalized extreme value distribution. Econ. Letters, 79, 423–427. Balkema, A. A., and de Haan, L. 1974. Residual life time at great age. Ann. Probab., 2(5), 792–804. Balkema, A. A., and de Haan, L. 1978a. Limit distributions for order statistics. I. J. Statist. Plann. Inference, 23, 77–92. Balkema, A. A., and de Haan, L. 1978b. Limit distributions for order statistics. II. Math. Methods in Statist., 23, 341–358. Barakat, H. M. 1997a. Asymptotic properties of bivariate random extremes. J. Statist. Plann. Inference, 61, 203–217. Barakat, H. M. 1997b. On the continuation of the limit distribution of the extreme and central terms of a sample. Test, 6(2–3), 51–368. Barakat, H. M. 2003. On the restricted convergence of intermediate order statistics. Probab. Math. Statist., 23(2), 229–240. Barakat, H. M. 2007. Limit theory of generalized order statistics. J. Statist. Plann. Inference, 137(1), 1–11. Barakat, H. M. 2010. Continuability of local weak limits for record values. Statistics, 44(3), 269–274.

References

245

Barakat, H. M. 2013. The use of power normalization as a new trend in the order statistics limit theory. J. Stat. Appl. Probab., 2(3), 251–260. Barakat, H. M., and El-Adll, M. E. 2009. Asymptotic theory of extreme dual generalized order statistics. Statist. & Probab. Letters, 79, 1252–1259. Barakat, H. M., and El-Adll, M. E. 2012. Limit theory of extreme generalized order statistics. Proc. Indian Acad. Sci. (Math. Sci.), 122(2), 297–311. Barakat, H. M., and El-Shandidy, M. A. 2004. On general asymptotic behaviour of order statistics with random index. Bull. Malays. Math. Sci. Soc., 27, 169–183. Barakat, H. M., and Nigm, E. M. 1996. The mixing property of order statistics with some applications. Bull. Malaysian Math. Soc. (Second Series), 19, 39–52. Barakat, H. M., and Nigm, E. M. 2002. Extreme order statistics under power normalization and random sample size. Kuwait J. Sci. Eng., 29(1), 27–41. Barakat, H. M., and Omar, A. R. 2010. Limit theorems for central order statistics under nonlinear normalization. Optimization and Statist. J., 43–52. Barakat, H. M., and Omar, A. R. 2011a. Limit theorems for order statistics under nonlinear normalization. J. Statist. Plann. Inference, 141, 524–535. Barakat, H. M., and Omar, A. R. 2011b. On limit distributions for intermediate order statistics under power normalization. Math. Methods in Statist., 20(4), 365–377. Barakat, H. M., and Omar, A. R. 2016. A note on domains of attraction of the limit laws of intermediate order statistics under power normalization. Statistical Methodology, 31, 1–7. Barakat, H. M., and Omar, A. R. 2019. Limit theorems for order statistics with variable rank under exponential normalization. submitted. Barakat, H. M., and Ramachandran, B. 2001. Continuability/ Identifebility of local weak limits for certain normalized intermediate/central rank sequences of order statistics. J. of Indian Statist. Assoc., 39, 1–31. Barakat, H. M., Nigm, E. M., and El-Adll, M. E. 2010. Comparison between the rates of convergence of extremes under linear and under power normalization. Statistical Papers, 51(1), 39–52. Barakat, H. M., Nigm, E. M., Ramadan, A. A., and Khaled, O. M. 2011a. Statistical modeling of extreme values with applications to air pollution. J. Appl. Statist., 18(2), 230–245. Barakat, H. M., Nigm, E. M., and El-Adll, M. E. 2011b. Bootstrap for extreme generalized order statistics. Arab J. Sci. Eng., 36, 1083–1090. Barakat, H. M., Nigm, E. M., and Khaled, O. M. 2012. Statistical Modeling of Extreme Values with Applications to Air Pollution. J. Life Science, 9(1), 124–132. Barakat, H. M., Nigm, E. M., and Khaled, O. M. 2013a. Extreme value modeling under power normalization. Applied Math. Modelling, 37, 10162–10169. Barakat, H. M., Nigm, E. M., and Abo Zaid, E. O. 2013b. On the Continuation of the limit distributions of intermediate order statistics under power normalization. J. Statist. Appl. Probab. (JSAP), 2(2), 83–91. Barakat, H. M., Nigm, E. M., and Khaled, O. M. 2014a. Statistical modeling of extremes under linear and power normalizations with applications to air pollution. Kuwait J. Sci., 41(1), 1–19. Barakat, H. M., Nigm, E. M., and Abd Elgawad, M. A. 2014b. Limit theory for bivariate extreme generalized order statistics and dual generalized order statistics. ALEA, Lat. Am. J. Probab. Math. Statist., 11(1), 331–340.

246

References

Barakat, H. M., Nigm, E. M., and Abd Elgawad, M. A. 2014c. Limit theory for joint generalized order statistics. REVSTAT, 12(3), 1–22. Barakat, H. M., Nigm, E. M., Khaled, O. M., and Khan, F. M. 2015a. Bootstrap order statistics and modeling study of the air pollution. Comm. Statist. Theory Methods, 44, 1477–1491. Barakat, H. M., Nigm, E. M., and Abd Elgawad, M. A. 2015b. Limit theory for bivariate central and bivariate intermediate dual generalized order statistics. Probab. Math. Statist., 35(2), 267–284. Barakat, H. M., Nigm, E. M., and Khaled, O. M. 2015c. Bootstrap method for central and intermediate order statistics under power normalization. Kybernetika, 51, 923–932. Barakat, H. M., Nigm, E. M., and Khaled, O. M. 2015d. Application of subsample bootstrap technique for extreme value modeling under linear and power normalization. JASS, 21(2), 189–199. Barakat, H. M., Nigm, E. M., and Abo Zaid, E. O. 2016. On the continuation of the limit distribution of central order statistics under power normalization. Applicatines Mathematicae, 43(2), 145–155. Barakat, H. M., Nigm, E. M., and Alaswed, H. A. 2017a. The Hill estimators under power normalization. Applied Mathematical Modelling, 45, 813–822. Barakat, H. M., Omar, A. R., and Khaled, O. M. 2017b. A new flexible extreme value model for modeling the extreme value data, with an application to environmental data. Statist. & Probab. Letters, 130, 25–31. Barakat, H. M., Nigm, E. M., Khaled, O. M., and Alaswed, H. A. 2018. The estimations under power normalization for the tail index, with comparison. AStA Adv. Stat. Anal., 102(3), 431–454. Beirlant, J., Vynckier, P., and Teugels, J. L. 1996. Excess functions and estimation of the extreme value index. Bernoulli, 2, 293–318. Beirlant, J., Dierckx, G., and Guillou, A. 2005. Estimation of the extreme value index and generalized quantile plots. Bernoulli, 11(6), 949–970. Beran, J., Schell, D., and Stehl´ık, M. 2014. The harmonic moment tail index estimator: asymptotic distribution and robustness. Ann. Inst. Statist. Math., 66, 193–220. Bickel, P. J., and Freedman, D. A. 1981. Some asymptotic theory for the bootstrap. Ann. Statist., 9, 1196–1217. Box, G., and Cox, D. 1964. An analysis of transformations. J. R. Stat Soc. B., 211–264. Box, G., Jenkins, G., Reinsel, G., and Ljung, G. M. 2015. Time Series Analysis: Forecasting and Control. Holden Day, San Francisco. Bretagnolle, J. 1983. Lois limites du bootstrap de certaines functionnelles. Ann. de l’l. H. P., Sec. B, 19, 281–296. Burkschat, M., Cramer, E., and Kamps, U. 2003. Dual generalized order statistics. Metron, LXI (1), 13–26. Caprani, C. C., O’Brien, E. J., and McLachlan, G. J. 2008. Characteristic traffic load effects from a mixture of loading events on short to medium span bridges. Structural Safety, 30, 394–404. Carey, J. R. 2003. Longevity: The biology and demography of life span. Princeton University Press, Princeton, New Jersey, USA. Castillo, E., Hadi, A. S., Balakrishnan, N., and Sarabia, J. M. 2014. Extreme value and related models in engineering and science applications. New York: John Wiley & Sons.

References

247

Chatterjee, S., and Laudato, M. 1997. Genetic algorithms in statistics: procedures and applications. Comm. Statist.- Sim. Comp., 4(26), 1617–1630. Chatterjee, S., Laudato, M., and Lynch, L. A. 1996. Genetic algorithm and their statistical applications: an introduction. Comp. Statist. & Data An., 22, 633– 51. Chibisov, D. M. 1964. On limit distributions of order statistics. Theory Probab. Appl., 9, 142–148. Christoph, G., and Falk, M. 1996. A note on domains of attraction of p-max stable laws. Statist. & Probab. Letters, 28, 279–284. Claxton, L. D., Matthews, P. P., and Warren, S. H. 2004. The genotoxicity of ambient outdoor air, a review: Salmonella mutagenicity. Mutat. Res., 56, 347–399. Coles, S. 2001. Introduction to statistical modelling of extreme values. Springer. Cordeiro, G. M., and de Castro, M. 2011. A new family of generalized distributions. J. Statist. Comput. Sim., 81, 883–898. Cramer, E. 2003. Contributions to generalized order statistics. Habililationsschrift, Reprint, University of Oldenburg. Danielsson, J., Jansen, D. W., and de Vries, C. G. 1996. The method of moment ratio estimator for the tail shape distribution. Comm. Statist.-Theory and Meth., 25(4), 711–720. David, H. 1981. Order statistics. Wiley, New York, (2nd Ed.). David, H. A., and Nagaraja, H. N. 2003. Order statistics. John Wiley Sons. Inc., (3nd Ed.). Davison, A. C., and Hinkley, D. V. 1997. Bootstrap Methods and their Application. Cambridge: Cambridge University Press. Davison, A. C., and Smith, R. L. 1990. Models for exceedances over high threshold. J. R. Stat. Soc. Ser. B., 52, 393–442. de Haan, L. 1970. On regular variation and its application to the weak convergence of sample extremes. Mathematisch Centrum, Amsterdam. de Haan, L. 1976. Sample extremes: An elementary introduction. Statist. Neerland, 24, 161–172. de Haan, L., and Ferriera, A. 2006. Extreme value theory: An introduction. Springer, Series in Operations Research. Deheuvels, P., Haeusler, E., and Mason, D. M. 1988. Almost sure convergence of the Hill estimator. Math. Proc. Cambridge Philos. Soc., 371–381. Deheuvels, P., Mason, D., and Shorack, G. 1993. Some results on the influence of extremes on the bootstrap. Ann. de l’l. H. P., Sec. B, 29, 83–103. Dekkers, A. L. M., Einmahl, J. H. J., and de Haan, L. 1989. A moment estimator for the index of an extreme-value distribution. Ann. Statist., 17(4), 1833–1855. Drees, H. 1995. Refined Pickands estimators of the extreme value index. Ann. Statist., 23(6), 2059–2080. Drees, H., and Kaufmann, E. 1998. Selecting the optimal sample fraction in univariate extreme value estimation. Stoch. Proc. Appl., 75, 149–172. Drees, H., de Haan, L., and Resnick, S. 2000. How to make a Hill plot. Ann. Statist., 28(1), 254–274. Drew, J., Glen, A., and Leemis, L. 2000. Computing the cumulative distribution function of the Kolmogorov Smirnov statistic. Computat. Statist. Data Anal., 34(1), 1–15. Efron, B. 1979. Bootstrap Methods: Another Look at the Jackknife. Ann. of Statist., 7, 1–26.

248

References

Efron, B. 1982. The jackknife, the bootstrap and other resampling plans. SIAM: CB38. Efron, B., and Gong, G. 1983. A leisurely look at the bootstrap, the jackknife, and cross-validation. The Amer. Statistician, 37, 36–48. Efron, B., and Tibshirani, R. J. 1993. An Introduction to the Bootstrap. New York: Chapman and Hall. El-Dars, F. M., Mohamed, A. M., and Aly, H. A. 2004. Monitoring ambient sulfur dioxide levels at some residential environments in the greater Cairo urban region- Egypt. Environmental Monitoring and Assessment, 95, 269–282. El-Hussainy, F., and Sharobiem, W. 2002. Studies on atmospheric sulfur dioxide values and trends. Al-Azhar Bull Sci., 13(1), 137–151. Eljabri, S., and Nadarajah, S. 2017. The Kumaraswamy GEV distribution. Comm. Statist.-Theory and Meth., 46(20), 10203–10235. Embrechts, P., Kl¨ uppelberg, C., and Mikosch, T. 1997. Modelling extremal events for insurance and finance. Springer-Verlag, Berlin Heidelberg. Falk, M. 1995. Some best estimators for distributions with finite endpoint. Statistics, 27, 115–125. Feller, W. 1979. An introduction to probability theory and its applications. Vol. 2. John Wiley & Sons, New York. Fisher, R. A., and Tippett, L. H. C. 1928. Limiting forms of frequency distribution of the largest or smallest number of sample. Proc. Camb. Phil. Soc., 24, 180–190. Folland, C. K., and Karl, T. R. 2001. Observed climate variability and change. Pages 99–181 of: J. T. Houghton et al., editors. Climate change 2001: The scientific basis. Cambridge University Press, Cambridge, UK. Gaines, S. D., and Denny, M. W. 1993. The largest, smallest, highest, lowest, longest, and shortest: Extremes in ecology. Ecology, 74, 1677–1692. Galambos, J. 1987. The asymptotic theory of extreme order statistics. Kreiger. FI (2nd Ed.). Gnedenko, B. V. 1943. Sur la distribution limite du terme maximum d’une s´erie al´eatoire. Ann. Math., 44, 423–453. Gnedenko, B. V. 1982. On some stability theorems. Stability Problems for Stochastic Models. Proc. 6th Seminar, Moscow, ed. V. V. Kalashnikov and V. M. Zolotarev, Lecture Notes in Mathematics, 982, Springer-Verlag, Berlin, 24–31. Gnedenko, B. V., and Senocy Bereksy, L. 1982a. On one characteristic of logistics distribution. Dokl. Akad. Nauk. SSSR., 267(6), 1293–1295. Gnedenko, B. V., and Senocy Bereksy, L. 1982b. limit theorems for the extreme terms of a variational series. Dokl. Akad. Nauk. SSSR., 270(3), 523–523. Gnedenko, B. V., and Senocy Bereksy, L. 1983. On the continuation property of the limit distributions of maxima of variational series-translation: Moscow Univ. Matm. Bull. Moscow University. Mathematics Bulletin, New York. Vestnik. Moskov. Univ. Ser. Mat. Mch., 3, 11–20. Gnedenko, B. V., and Sherif, A. A. 1983. limit theorems for the extreme terms of a variational series. Dokl. Akad. Nauk. SSSR., 270(3), 523–523. Gnedenko, B. V., Barakat, H. M., and Hemeda, S. Z. 1985. On the continuation of the convergence of the joint distribution of members of variational series. Dokl. Akad. Nauk. SSSR., 5, 1039–1040. Gomes, M. I., and Martins, M. J. 2001. Generalizations of the Hill estimatorasymptotic versus finite sample behaviour. J. Statist. Plan. Inference, 93, 161–180.

References

249

Gomes, M. I., and van Monfort, M. A. J. 1986. Exponentiality versus generalized Pareto quick tests. Pages 185–195 of: Proc. III Internat. Conf. Statistical Climatology. J¨ urg H¨ usler, Rolf-Dieter Reiss. Grigelionis, B. 2006. Limit theorems for record values using power normalization. Liet. Mat. Rink. 46, No. 4, 492-500; translation in Lithuanian Math. J., 46, 398–405. Guillou, A., and Hall, P. 2001. A diagnostic for selecting the threshold in extremevalue analysis. J. R. Statist. Soc. B., 63, 293–305. Gumbel, E. J. 1958. Statistics of Extremes. Columbia University Press, New YorkLondon. Gutschick, V. P., and BassiriRad, H. 2003. Extreme events as shaping physiology, ecology, and evolution of plants: Toward a unified definition and evaluation of their consequences. New Phytologist, 160, 21–42. Guttorp, P., Le, N. D., Sampson, P. D., and Zidek, J. V. 1993. Using Entropy in the Redesign of an Environmental Monitoring Network. Pages 175–202 of: G. P. Patil and C. R. Rao (eds.), Amsterdam: North-Holland. Multivariate Environmental Statist. Hall, P. 1982. On estimating the endpoint of a distribution. Ann. Statist., 10, 556–568. Hill, B. M. 1975. A simple general approach to inference about the tail of a distribution. Ann. Statist., 13, 331–341. Hosking, J. R. M. 1994. The four-parameter kappa distribution. IBM Journal of Research and Development, 38, 251–258. Hosking, J. R. M., Wallis, J. R., and Wood, E. F. 1985. Estimation of the generalized extreme-value distribution by method of probability-Weighted moment. Technometrics, 27, 251–261. Hurairah, A., Ibrahim, N., Daud, I., and Haron, K. 2003. Maximum likelihood estimation of the three parameters of extreme value distribution. In: papers presented at International Conference on Research and Education in Mathematics (ICREM) Bangi, 2–4 April. ICREM. Hurairah, A., Ibrahim, N. A., Daud, I. B., and Haron, K. 2005. An application of a new extreme value distribution to air pollution data. Manage. Environ. Qual.: Int. J., 16(1), 17–25. Jenkinson, A. F. 1955. The frequency distribution of the annual maximum (or minimum) values of meteorological events. Quarterly Journal of the Royal Meteorological Society, 81, 158–172. Johnson, N. L., Kotz, S., and Balakrishnan, N. 1994. Distributions in Statistics: Continuous Univariate Distributions, Vol 1 2nd ed., Wiley. Kamal, A. 2017. Application of different extreme values models for environmental pollution. Ph.D. Dissertation, Port Said University, Egypt. Kamps, U. 1995. A Concept of Generalized Order Statistics. Teubner, Stuttgart. Kamps, U., and Cramer, E. 2001. On distribution of generalized order statistics. Statistics, 35, 269–280. Kan, H. D., and Chen, B. H. 2004. Statistical distributions of ambient air pollutants in Shanghai. Chin. Biomed. Environ. Sci., 17(3), 366–372. Katz, R. W., Parlange, M. B., and Naveau, P. 2002. Statistics of extremes in hydrology. Advances in Water Resources, 25, 1287–1304. Khaled, O. M. 2012. Evaluation of ambient Gamma radiation level and air pollutants using bootstrapping extremes models. Ph. D. Dissertation, Zagazig University, Egypt.

250

References

Khaled, O. M., and Kamal, A. 2018. Additional extreme distribution for modeling extreme value data. JOEMS, 26(1), 44–56. Kharin, V. V., and Zwiers, F. W. 2000. Changes in the extremes in an ensemble of transient climate simulations with a coupled atmosphere-ocean GCM. J. of Climate, 13, 3760–3788. Kramer, O. 2017. Genetic Algorithm Essentials: Studies in Computational Intelligence Volume 679. Springer. Krewski, D., Burnett, R. T., Goldberg, M. S., Hoover, K., Siemiatycki, J., Abrahamowicz, M., and White, W. H. 2004. Validation of the Harvard Six Cities Study of particulate air pollution and mortality. N. Engl. J. Med., 350, 198– 199. Kuchenhoff, H., and Thamerus, M. 1996. Extreme value analysis of Munich air pollution data. Environ. Ecol. Stat., 3, 127–141. Kumaraswamy, P. 1980. Generalized probability density-function for doublebounded random-processes. J. of Hydrology, 46, 79–88. Leadbetter, M. R., Lindgren, G., and Rootz´en, H. 1983. Extremes and related properties of random sequences and processes. Springer-Verlag New York Heidelberg Berlin. Lei, L. 2008. Evaluation of three methods for estimating the Weibull distribution parameters of Chinese pine (Pinus tabulaeformis). J. Forest Science, 54(12), 566–571. Ludwig, D. 1996. Uncertainty and the assessment of extinction probabilities. Ecological Appl., 6, 1067–1076. Mached, A. J. 1989. Comment on maximum-likelihood estimation of the parameters of the generalized extreme-value distribution. Appl. Statist., 38, 198–199. Marlier, M. E., Amir, S. J., Kinney, P. L., and DeFries, R. S. 2016. Extreme Air Pollution in Global Megacities. Current Climate Change Reports. March 2016, 2(1), 15–27. Mason, D. M. 1982. Laws of large numbers for sums of extreme values. Ann. Probab., 10, 750–764. Mat´ern, B. 1960. Spatial Statistics. Republished in Lecture Notes in Statist., vol. 36, New York, Springer. Mohan, N. R., and Ravi, S. 1993. Max domains of attraction of univariate and multivariate p-max stable laws. Theory Probab. Appl., 37, 632–643. Monge-Sanz, B. M., and Medrano-Marqus, N. J. 2004. Total ozone time series analysis: a neural network model approach. Nonlinear Proc. Geophys., 11, 683–689. Moritz, M. A. 1997. Analyzing extreme disturbance events: Fire in Los Padres National Forest. Ecological Appl., 7, 1252–1262. Nasri-Roudsari, D. 1996. Extreme value theory of generalized order statistics. J. Statist. Plann. Inference, 28, 281–297. Nasri-Roudsari, D. 1999. Limit distributions of generalized order statistics under power normalization. Comm. Statist. Theory Methods, 28(6), 1379–1389. Nigm, E. M. 2006. Bootstrapping extremes of random variables under power normalization. Test, 15, 257–269. Nigm, E. M. 2009. Limit laws for record values under power normalization. J. App. Statist. Science, 17(2), 1–9. Pancheva, E. 1984. Limit theorems for extreme order statistics under nonlinear normalization. Lecture Notes in Math., 1155, 284–309. Pancheva, E. 1988. Max-stability. Theory Probab. Appl., 33, 155–158.

References

251

Pancheva, E. 1993. Convergence of types under monotonous mappings. Theory Probab. Appl., 38, 551–556. Pancheva, E. 1994. Extreme value limit theory with nonlinear normalization. In: Galambos, J., et al. (eds.) Extreme Value Theory and Application. Kluwere, Boston, 305–318. Pancheva, E. 2010. Max-Semistability: A Survey. ProbStat Forum, 3, 11–24. Paulauskas, V., and Vaiˇciulis, M. 2013. On the improvement of Hill and some others estimators. Lithuanian Math. J., 53, 336–355. Paulauskas, V., and Vaiˇciulis, M. 2017. A class of new tail index estimators. Ann. Inst. Statist. Math., 69, 461–487. Pereira, T. T. 1994. Second order behaviour of domains of attraction and the bias of generalized Pickands’ estimator. Pages 165–177, of: Galambos, J., Lechner, L. and Simiu, E. (Eds.), Extreme Value Theory and Applications III, Proc. Gaithersburg Conf. NIST special publication 866, Washington. Perez, P. 2001. Prediction of sulfur dioxide concentrations at a site near downtown Santiago, Chile. Atmospheric Environment, 35, 4929–4935. Pfaff, B., McNeil, A., and Stephenson, A. 2012. evir: Extreme Values in R. R package version 1.7-3, URL http://CRAN.R-project.org/package=evir. Pickands, J. III. 1975. Statistical inference using extreme order statistics. Ann. Statist., 3, 119–131. Prescott, P., and Walden, A. T. 1980. Maximum likelihood estimation of the parameters of the generalized extreme-value distribution. Biometrika, 67, 723–724. Prescott, P., and Walden, A. T. 1983. Maximum likelihood estimation of the parameters of the generalized extreme-value distribution from censored samples. J. Statist. Comput. Sim., 16, 241–250. Quintela-del-Rio, A., and Francisco-Fern´ andez, M. 2011. Nonparametric functional data estimation applied to ozone data: Prediction and extreme value analysis. Chemosphere, 82(6), 800–808. Rahmstorf, S., and Coumou, D. 2011. Increase of extreme events in a warming world. PNAS, 108(44), 17905–17909. Ravi, S., and Mavitha, T. S. 2016. New limit distributions for extreme under a nonlinear normalization. PropStat Form, 9, 01–20. Reiss, R. D., and Thomas, M. 2007. Statistical Analysis of Extreme Values from insurance, finance, Hydrology and other fields, 3rd ed. Berlin: Birkhauser Verlag. R´enyi, A. 1962. Probability theory. Amsterdam. R´enyi, A. 1970. Foundation of probability. Holden. Day Inc. Resnick, S., and Stˇ aricˇ a, C. 1997. Smoothing the hill estimator. Adv. Appl. Probab., 29, 271–293. Reyes, H., Vaquera, H., and Villasenor, J. A. 2009. Estimation of trends in high urban ozone levels using the quantiles of (GEV). Environmetrics, 21(5), 470– 481. Ribatet, M. 2009. Generalized Pareto distribution and peaks over threshold. R package versions. 1:1–2 Ed. Riedel, M. A. 1977. A new version of the central limit theorem. Teor. Verojatnost. i Primenen, 22(1), 187–193. Rieder, H., Staehelin, J., Maeder, J., Peter, T., Ribatet, M., Davison, A. C., Stbi, R., Weihs, P., and Holawe, F. 2010. Extreme events in total ozone over Arosa: Application of extreme value theory and fingerprints of atmospheric dynamics and chemistry and their effects on mean values and long-term changes. In:

252

References

Geophysical Research Abstracts, 12, EGU General Assembly. EGU General Assembly 2010, held 2-7 May, 2010 in Vienna, Austria, p.11525. Ripley, B. D. 1981. Spatial Statistics. New York: Wiley. Rossberg, H. J. 1974. On a problem of Kolmogorov concerning the Normal distribution. Teor. Verojatnost. i Primenen, 19(4), 824–828. Rossberg, H. J. 1995. Limit theorems involving restricted convergence. Theory Probab. Appl., 39(2), 298–314. Rulfova, Z., Buishand, A., Roth, M., and Kyselya, J. 2016. A two-component generalized extreme value distribution for precipitation frequency analysis. J. of Hydrology, 534, 659–668. Sarhan, A. E., and Greenberg, B. G. 1962. Contributions to order statistics. John Wiley and Sons, New York. Schlesinger, R. B., and Cassee, F. 2003. Atmospheric secondary inorganic particulate matter: the toxicological perspective as a basis for health effects risk assessment. Inhal. Toxicol., 15(3), 197–235. Segers, J. 2005. Generalized Pickand estimators for the extreme value index. J. Statist. Plan. Inference, 28, 381–396. Sfetsos, A., Zoras, S., Bartzis, J. G., and Triantafyllou, A. G. 2006. Extreme value modeling of daily PM10 concentrations in an industrial area. Fresenius Environ. Bull., 15(8), 841–845. Sharma, P., Khare, M., and Chakrabarti, S. P. 1999. Application of extreme value theory for predicting violations of air quality standards for an urban road intersection. Transportation Research, Part D, 4, 201–16. Smirnov, N. V. 1952. Limit distribution for terms of a variational series. Amer. Math. Soc. Trans. Ser., 11, 82–143. Smirnov, N. V. 1967. Some remarks on limit laws for order statistics. Theory Probab. Appl., 12, 337–339. Smirnov, N. V. 1970. Selected works in the theory of probability and mathematical statistics. Nauka, Moscow. Smith, R. L. 1985. Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72, 67–90. Smith, R. L. 1989. Extreme value analysis of environmental time series: An application to trend detection in ground-level ozone. Statist. Scin., 4, 367–377. Sreehari, M. 2009. General max-stable laws. Extremes. Extremes, 12(2), 187–200. Stehl´ık, M., Potock´ y, R., Waldl, H., and Fabi´ an, Z. 2010. On the favorable estimation for fitting heavy tailed data. Comput. Statist., 25, 485–503. Subramanya, U. R. 1994. On max domains of attraction of univariate p-max stable laws. Statist. & Probab. Letters, 19, 271–279. Sungpurwalla, N. D. 1972. Extreme values from a lognormal law with application to air pollution problems. Technometrics, 14, 703–711. Tatsuya, I., and Kanda, J. 2002. Comparison of correlated Gumbel probability models for directional maximum wind speeds. J. Wind Engin. and Ind Aerodynamics, 90, 1631–1644. Teugels, J. L. 1981. Limit theorems on order statistics. Ann. Probab., 9, 868–880. Tolikas, K., and Gettinby, G. D. 2009. Modelling the distribution of the extreme share returns in Singapore. J.of Emp. Finance, 16, 254–263. von Mises, R. 1936. La distribution de la plus grande de n values. In Selected Paper, Volume II, pages 271-294. American Math.l Society, Providence, RI.

References

253

Weinstein, S. B. 1973. Theory and application of some classical and generalized asymptotic distributions of extreme values. IEEE Trans. Information Theory, IT-19(2), 148–154. Wu, C. Y. 1966. The types of limit distributions for terms of variational series. Sci. Sincia, 15, 749–762. Yun, S. 2002. On a generalized Pickand estimator of the extreme value index. J. Statist. Plann. Inferernce, 102, 389–409. Zhou, S., Deng, Q., and Liu, W. 2012. Extreme air pollution events: Modeling and prediction. J. of Central South Un., 19(6), 1668–1672. Zwiers, F. W., Zhang, X., and Feng, Y. 2012. Anthropogenic influence on long return period daily temperature extremes at regional scales. J. of Climate, 24, 881–892.

Author index

Abd Elgawad, M. A., 246, 247

Carey, J. R., 79, 247

Abo Zaid, E. O., 247

Castillo, E., 28, 247

Abrahamowicz, M., 250

Cassee, F., 114, 253

Adebanji, A. O., 238, 239, 245

Chakrabarti, S. P., 253

Adeyemi, S., 238, 239, 245

Chatterjee, S., 12, 247

Alves, M. I. F., 154, 155, 245

Chen, B. H., 81, 251

Alaswed, H. A., 202, 245, 247

Chibisov, D. M., 28, 30, 31, 98, 251

Aly, H. A., 249 Amir, S. J., 251

Christoph, G., 38, 49, 55, 59, 62, 71, 140, 162, 167–169, 176–179, 251

Angus, J., 84, 245

Claxton, L. D., 115, 248

Arcones, M. A., 83, 245

Coles, S., 9, 159, 161, 165, 248

Arnold, B. C., 17, 245

Cordeiro, G. M., 239, 248

Athreya, B. K., 83–85, 106, 111, 245

Cox, D., 231, 247

Balakrishnan, N., 245

Coumou, D., 229, 252

Bali, T. B., 232, 245

Cramer, E., 69, 71, 76, 106, 247

Balkema, A. A., 24, 25, 28, 35, 125, 245

Danielsson, J., 157, 179, 248

Barakat, H. M., 6, 31, 38, 43, 44, 46, 47, 50, 51, 52, 54, 55, 57, 59, 60–62, 63, 64, 66, 67, 68, 70–79, 82, 83, 89, 92, 93, 98, 100–104, 106–108, 124, 139, 140, 142, 143, 165, 168, 175–178, 180, 185, 221, 227, 228, 245–247, 249

Daud, I., 250

BassiriRad, H., 80, 250 Bartzis, J. G., 253

de Haan, L., 24, 25, 26, 28, 35, 43, 125, 155, 156, 169, 181, 245

Beirlant, J., 155, 156, 160, 162, 247

Deheuvels, P., 86, 155, 248

Beran, J., 157, 176, 247

Dekkers, A. L., 154, 156, 179, 248

Bickel, P. J., 84, 247

Deng, Q., 254

Box, G., 117, 231, 247

Denny, M. W., 80, 249

Bretagnolle, J., 83, 247

de Vries, C. G., 248

Buishand, A., 253

Dierckx, G., 247

Burkschat, M., 68, 70–72, 247

Drees, H., 155, 160, 163, 220, 225, 248

Burnett, R. T., 251

Drew, J., 10, 248

Caprani, C. C., 231, 238, 247

Efron, B., 14, 15, 82, 248, 249

David, H., 18–20, 248 Davison, A. C., 14, 161, 165, 248, 252 de Castro, M., 239, 248 DeFries, R. S., 251

Author index

255

Einmahl, J. H. J., 248

Hosking, J. R. M., 8, 238, 251

El-Adll, M. E., 76, 77, 246

Hurairah, A., 81, 82, 250

El-Dars, F. M., 116, 249

Ibrahim, N., 250

El-Hussainy, F., 116, 259

Johnson, N. L., 141, 250

Eljabri, S., 238, 239, 249

Kamal, A., 152, 231, 232, 250

El-Shandidy, M. A., 92, 246

Kamps, U., 68–71, 247

Embrechts, P., 155, 160, 162, 196, 249

Kan, H. D., 81, 250

Gaines, S. D., 80, 249

Kanda, J., 81, 253

Galambos, J., 20, 27, 47, 129, 249

Karl, T. R., 80, 249

Gettinby, G. D., 231, 253

Katz, R. W., 80, 250

Gin´ e, E., 83, 245

Kaufmann, E., 160, 248

Glen, A., 248

Khaled, O. M., 112, 246, 247

Goldberg, M. S., 251

Khan, F. M., 247

Gomes, M. I., 156, 158, 203, 214–217, 219, 250

Kharin, V. V., 231, 251

Gong, G., 82, 249

Khare, M., 253

Gnedenko, B. V., 21, 59, 78, 249

Kinney, P. L., 251

Greenberg, B. G., 20, 258

Kl¨ uppelberg, C., 249

Grigelionis, B., 61, 250

Kotz, S., 250

Guillou, A., 160, 247, 250

Kramer, O., 12, 251

Gumbel, E. J., 17, 21, 22, 80, 250

Krewski, D., 114, 251

Guttorp, P., 113, 250

Kuchenhoff, H., 80, 251

Gutschick, V. P., 80, 250

Kumaraswamy, P., 239, 249, 251

Hall, P., 155, 160, 246

Kyselya, J., 253

Hemeda, S. Z., 249

Laudato, M., 12, 247

Hinkley, D. V., 14, 248

Le, N. D., 250

Hoover, K., 251 Jansen, D. W., 248

Leadbetter, M. R., 21, 23, 27, 50, 58, 59, 92, 251

Jenkins, G., 247

Leemis, L., 248

Jenkinson, A. F., 22, 250

Lei, L., 191, 221, 251

Fabi´ an, Z., 246

Lindgren, G., 251

Falk, M., 38, 49, 55, 59, 62, 71, 140, 162, 167–169, 176–179, 246

Liu, W., 254

Feller, W., 5, 249

Lynch, L. A., 248

Feng, Y., 254

Ludwig, D., 79, 251

Ferriera, A., 155, 156, 169, 181, 245

Mached, A. J., 9, 251

Fisher, R. A., 21, 79, 249

Maeder, J., 252

Folland, C. K., 80, 249

Marlier, M. E., 82, 251

Francisco-Fern´ andez, M., 81, 252

Martins, M. J., 156, 158, 250

Freedman, D. A., 84, 247

Mason, D. M., 20, 154, 247

Fukuchi, J., 83–85, 106, 111, 245

Mat´ ern, B., 113, 251

Hadi, A. S., 247

Matthews, P. P., 248

Haeusler, E., 248

Mavitha, T. S., 64–66, 252

Haron, K., 250

McLachlan, G. J., 247

Hill, B. M., 154, 155, 163, 164, 167, 250

McNeil, A., 252

Holawe, F., 252

Medrano-Marqus, N. J., 117, 251

Ljung, G. M., 247

256

Author index

Mikosch, T., 249

Sarhan, A. E., 20, 253

Mohamed, A. M., 249

Segers, J., 148, 153

Mohan, N. R., 38, 47, 49, 139, 181, 251

Senocy Bereksy, L., 78, 249

Monge-Sanz, B. M., 117, 251

Schell, D., 247

Moritz, M. A., 80, 251

Schlesinger, R. B., 114, 253

Nagaraja, H. N., 19, 245

Sfetsos, A., 82, 253

Nadarajah, S., 239, 239, 249

Sharobiem, W., 116, 249

Nasri-Roudsari, D., 46, 70–72, 251

Sharma, P., 81, 253

Naveau, P., 250

Sherif, A. A., 78, 249

Nigm, E. M., 6, 47, 61, 67, 83, 86–89, 142, 246, 247

Shorack, G., 248

O’Brien, E. J., 247

Smirnov, N. V., 28, 31, 33–37, 50, 51, 59, 72, 92, 107, 253

Omar, A. R., 31, 43, 44, 50–52, 54, 57, 59, 60, 62–64, 66–68, 100–104, 246 Pancheva, E., 4, 37–45, 47, 50, 57, 59, 168, 251, 252 Parlange, M. B., 250 Paulauskas, V., 158, 252 Pereira, T. T., 155, 252 Perez, P., 81, 252 Peter, T., 252 Pfaff, B., 162, 252 Pickands, J. III., 20, 25, 143, 144, 154, 155, 160, 164, 252

Siemiatycki, J., 251

Smith, R. L., 8, 9, 80, 145, 161, 165, 248 Sreehari, M., 43, 44, 58, 253 Staehelin, J., 252 Stbi, R., 252 Stˇ aricˇ a, C., 160, 248 Stehl´ık, D., 157, 247 Stehl´ık, M., 157, 253 Stephenson, A., 252 Subramanya, U. R., 38, 253 Sungpurwalla, N. D., 81, 254

Potock´ y, R., 246, 253

Tatsuya, I., 81, 253

Prescott, P., 9, 252

Teugels, J. L., 20, 247

Quintela-del-Rio, A., 81, 252

Thamerus, M., 80, 251

Rahmstorf, S., 229, 252

Tibshirani, R. J., 14, 249

Ramachandran, B., 31, 78, 247

Tippett, L. H. C., 21, 79, 249

Ramadan, A. A., 246

Thomas, M., 125, 130, 141, 250, 252

Ravi, S., 38, 47, 49, 139, 181, 251

Tolikas, K., 231, 253

Reinsel, G., 247

Triantafyllou, A. G., 253

Reiss, R. D., 125, 130, 141, 250, 252

Vaiˇ ciulis, M., 158, 252

Resnick, S., 160, 248

Vaquera, H., 252

Reyes, H., 80, 252

van Monfort, M. A. J., 203, 214–217, 219, 250

R´ enyi, A., 6, 252

Villasenor, J. A., 252

Ribatet, M., 165, 252

von Mises, 22, 26, 46, 253

Riedel, M. A., 78, 252

Vynckier, P., 247

Rieder, H., 117, 252

Walden, A. T., 9, 252

Ripley, B. D., 113, 253

Waldl, H., 246, 253

Rootz´ en, H., 251

Wallis, J. R., 250

Rossberg, H. J., 78, 253

Warren, S. H., 248

Roth, M., 253

Weihs, P., 252

Rulfova, Z., 238, 253

Weinstein, S. B., 38, 254

Sampson, P. D., 257

White, W. H., 251

Sarabia, J. M., 247

Wood, E. F., 250

Author index Wu, C. Y., 28, 31–33, 35, 37, 59, 62, 254 Yun, S., 155, 254 Zidek, J. V., 250 Zhang, X., 254 Zhou, S., 82, 254 Zoras, S., 253 Zwiers, F. W., 231, 251

257

Subject index

AccuWeather, 230 Air pollution, 2, 80–82, 114, 115, 118, 124, 203, 206, 208, 232, 233 Akaike information criterion (AIC), 233–238 a.s.

Almost sure convergence ( −→ ), 2 n Asymptotic bias (AB), 160 Asymptotic mean squared-error (AM), 160, 233 Asymptotic variance (AV), 160 Bayesian Akaike criterion (BAC), 233 Beta distribution, 71, 106

d Convergence in distribution ( −→ ), 2, 3 n

Domain of attraction, 23, 24, 26, 28, 31, 32, 36, 37, 42, 43, 47, 49, 57, 65, 66, 71, 85, 86, 88, 89, 130, 140, 155, 167, 168, 205 Dual generalized order statistics (dgos), 68–73, 76, 77 Dubey estimate, 46, 141, 228 Exponential normalization, 64–68 Extreme-value index (EVI), 22, 25, 82, 139, 142, 154–156, 158–150, 162, 165, 167, 169, 177, 179–183, 184, 186–191, 193, 196, 197, 199–202, 217, 220, 221, 226, 228, 229, 232

Block maxima (BM), 22, 23, 25, 26, 46, 82, 94, 112, 124–126, 140, 141, 127, 205, 221, 229, 232, 233–238

Extreme value theory, 20, 21, 23, 72, 79–81, 156, 168, 176, 228

Bootstrap technique, 13–17, 89, 94,96, 97, 104, 112, 124, 126, 128 Bootstrap confidence intervals, 16 Bootstrap hypothesis tests, 16

Fr´ echet type, 21, 22, 28, 29, 166, 167, 203, 205, 206

Box-Cox-GEVL (Box-Cox-GL), 232, 233, 235, Box-Cox transformation, 231, Central order statistics, 1, 6, 20, 21, 28, 30, 33, 50–57, 66–68, 71, 73, 83, 92, 93, 100, 101 Chibisov’s condition, 30, 31, 67 Class of self-decomposable laws, 41 Class of max-stable laws (MS-laws), 42 Coefficient of variation criterion (CVC), 197–202, 223, 225

evir package in R, 162, 226,

Gamma radiation, 117, 119, 122, 123, 129, 131, 134–136, 138, 139 Generalized extreme value distribution in an L-model (GEVL), 7, 8, 22, 23, 25, 28, 46, 81, 89–91, 124–129, 131, 137–139, 141, 143, 152, 165, 176, 178, 196, 197, 199, 206, 227–229, 232, 234, 235, 239, 240 Generalized extreme value distribution in a P-model (GEVP), 46, 89–91, 140, 145, 146, 149–153, 171, 186, 190, 191, 196, 198, 200, 201, 227, 231, 234, 236, 238, 239

Coefficient of variation under linear normalization (CVCL), 197–202, 224, 225

Generalized Hill estimator, 156,

Coefficient of variation under power normalization (CVCP), 197–202, 225

Generalized order statistics (gos), 68–72, 75, 76, 77, 79

Continuous mapping theorem, 3

Generalized Pareto distribution in L-model (GPVL), 25, 26, 82, 112, 125, 126, 128, 136, 138, 156, 159–166, 209, 211, 214–220, 231, 233, 234

p

), 2 Convergence in probability ( −→ n r

Convergence in the rth mean ( −→ ), 2 n

Generalized inverse, 27, 48, 69

Subject index

259

Generalized Pareto distribution in P-model (GPDP), 46, 140, 142–144, 147, 148, 159, 160–162, 167, 180–182, 201, 202, 209, 211, 214, 218–221, 228, 231, 232, 236

Mean excess plot under power normalization (MEPP), 162, 163, 202

Generalized Pareto DF under linear-power normalization (GPVLP), 228

Min-Fr´ echet type, 22, 28

Genetic algorithms (GA), 11–13, 152, 153 Gomes and van Monfort test, 203, 215, 216, 217, 219, Gumbel type, 21, 22, 28, 29, 205, 206, 215, Harmonic moment estimator under linear normalization (HMEL), 157, 158, 175 Harmonic moment estimator under power normalization (HMEP), 175, 176

m-generalized order statistics (m-gos), 69, 106, 107 Min-Gumbel type, 22, 28, 29 Modification of khinchin’s Theorem, 39, 45, 60, 66, 67, 142, 143, 228 Moment estimator, 156, 157, 176, 177, 221, 239, 243 Moment estimators under linear normalization (MEL), 157, 158, 160, 175–179, 191, 227, 232

Harmonic t-Hill estimator, 157, 175, 176, 202,

Moment estimators under power normalization (MEP), 176

Hill estimator (HE), 82, 155–158, 168, 169, 175–178, 179, 180, 193, 200–202, 209

Moment ratio estimator, 157, 167, 176–178, 202, 241, 244

Hill estimator under linear normalization (HEL), 155–157, 159, 160, 163, 168, 169, 171, 205, 209, 221

Negative Hill estimator, 155

Hill estimator under power normalization (HEP), 165, 168–176, 180–191, 201, 202, 221–223

New Hill plot (NHP), 184, 167 Nitric oxide (NO), 233–236 Nitrogen dioxide (NO2 ), 81, 82, 114, 117, 134, 229, 230–235

Hill plot (HP), 160, 161, 163, 164, 170, 177, 184, 185, 193, 194, 195, 209, 221

Nonlinear normalization, 18, 37–39, 41, 46, 50, 51, 55, 57, 59, 61–63

Hill Plot under linear normalization (HPL), 163, 164, 209–211, 213, 215–217, 219–221

Normal λ−attraction, 34–37, 50, 51, 55, 57

Hill plot under power normalization (HPP), 170, 209–222

Particulate matter (PM), 81, 82, 114, 115, 119, 120, 127–129, 131, 133, 135, 136, 138–139, 205–210, 213, 215–217, 219, 220, 224, 229

Intermediate order statistics, 20, 28, 30–32, 57–59, 61–64, 73–75, 92, 98–100, 102, 103, 154, 159 Khinchin’s type theorem, 5, 23, 93, 101 Kolmogorov-Smirnov (K-S) test, 9, 10, 95, 105, 124, 127, 135, 230, 233–237

Ozone, 80, 81, 114, 116, 117, 122, 130

Peak over threshold (POT), 25, 46, 82, 86, 89, 112, 124–126, 128, 135, 136, 140–143, 145, 159, 165, 206, 214–217, 219, 227, 228, 231, 233–235 Pfeiffer record values, 68

Likelihood ratio test, 9, 203

Pickands estimator, 154, 155

Linear model (L-model), 92, 93, 95, 99, 139, 140, 152, 155, 158, 167, 180, 200, 202, 203, 226, 231, 237, 239, 240, 241–243

Pickands-plot (PE), 160,

Linear normalization, 6, 7, 19–21, 25, 28, 31, 37–39, 41, 45,57, 59, 62–64, 77, 84, 85, 92, 98, 100, 111, 140, 155–164, 176, 191, 196, 228 Linear-power normalization, 140, 227–229 Location-invariant (LI), 156–159, 180 London air quality network (LAQN), 232 Max-automorphism mapping, 39 Maximum likelihood method, 6 Max-Weibull type, 21, 22, 28, 29 Mean excess plot under linear normalization (MEPL), 161, 162, 211, 212, 215–221

POT stability property, 46, 142, 143, Power max stable Dfs, 38 Power model (P-model), 100, 101, 103, 139, 140, 142, 145, 152, 155, 167, 179, 180, 193, 200, 202, 203, 226, 231, 233, 238, 239, 242–243 Power normalization, 38, 39, 45–47, 49, 50, 54, 55, 57, 59, 61, 62, 64, 67, 71, 79, 85–87, 92, 100, 102, 118, 140, 141, 161–163, 168, 170–173, 175–194, 196, 219–222 Progressive type II censored order statistics, 69 P-P-plot, 205, 207–209 Q-Q plot, 165, 205, 207–209,

260

Subject index

Record value, 61, 68, 71, 72 Re-sampling, 13, 14, 124 Restricted convergence, 77, 78 Return level plot (RLP), 205 Scale-invariant (SI), 156–158, 180 Slutzky’s theorem, 3 Stability plot, 161, 164 Standard error (SE), 185 Stochastically bound, 5 Strong consistence, 158 Sub-sample bootstrap method, 83, 86, 88, 89, 90–92, 94, 96, 98, 102, 104, 104, 106, 108, 110 Sulphur dioxide (SO2 ), 81, 82, 114, 115, 116, 119, 121, 122, 127, 128, 129, 131, 132, 135, 136–139, 203, 205, 206, 208, 209, 214, 215, 218, 219–224, 229–234 Uniform assumption (UA), 39 Threshold choice plot (TCP), 161, 164, 165, w

Weak convergence ( −→ ), 2, 3 n Weibull type, 22, 28, 29, 141 World Health Organization (WHO), 115