Handbook of Hydrometeorological Ensemble Forecasting 9783642404573

249 37 24MB

English Pages [569]

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Handbook of Hydrometeorological Ensemble Forecasting
 9783642404573

Citation preview

Attributes of Forecast Quality A. Allen Bradley, Julie Demargne, and Kristie J. Franz

Contents 1 2 3 4 5

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ensemble Verification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Formulation of the Verification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aspects of Forecast Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Measures of Forecast Quality with Distribution Moments . . . . . . . . . . . . . . . . . . . . . . . 5.1 Calibration-Refinement Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Likelihood-Base Rate Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Skill Score and Its Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Forecast Quality of Hypothetical Ensemble Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Illustration of Forecast Quality for a Single-Valued Forecast . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Illustration of Forecast Quality for a Probability Forecast for a Discrete Event . . . . . . 6.3 Illustration of Forecast Quality for an Ensemble Probability Distribution Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Diagnostic Verification and Ensemble Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Absolute Versus Relative Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Choosing a Verification Data Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Sampling Uncertainties for Small Data Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 5 7 10 10 12 14 16 17 21 29 37 37 38 39 40 41 42

A.A. Bradley (*) IIHR–Hydroscience & Engineering, The University of Iowa, Iowa City, IA, USA e-mail: [email protected] J. Demargne (*) HYDRIS Hydrologie, Saint Mathieu de Tréviers, France e-mail: [email protected]; [email protected] K.J. Franz (*) Department of Geological and Atmospheric Sciences, Iowa State University, Ames, IA, USA e-mail: [email protected] # Springer-Verlag Berlin Heidelberg 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_2-1

1

2

A.A. Bradley et al.

Abstract

Forecast verification is a process used to assess the quality of hydrometeorological ensemble forecasts. This chapter describes the many aspects of forecast quality using a distributions-oriented approach. Using the joint distribution of forecasts and observations, or one of its factorizations into a conditional and marginal distribution, the aspects of forecast quality are defined. Hypothetical ensemble forecasts are then used to illustrate aspects of forecast quality. The hypothetical ensemble forecasts are used to construct single-valued forecasts, probability forecasts for an event, and ensemble probability distribution forecasts. Their forecast quality is then diagnosed using visual comparisons and numerical comparisons of forecast quality measures. The examples illustrate that a single aspect of forecast quality is insufficient and that many aspects are needed to understand the nature of the forecasts. Some practical considerations in the application of the framework to ensemble forecast verification are discussed. Keywords

Forecast verification • Ensemble forecasts • Deterministic forecasts • Probabilistic forecasts • Distributions-oriented approach

1

Introduction

Everyone wants forecasts to be good – from the forecasters who issue them to those who use them to make decisions. In part, what makes a forecast “good” depends on one’s point of view and purpose with respect to the forecast (Murphy 1993). Murphy (1993) describes three types of forecast “goodness”: consistency (Type 1), quality (Type 2), and value (Type 3). Consistency means that the forecast corresponds to a forecaster’s best judgment and true state of knowledge. This is often assumed as a forecast system that should provide the best and most appropriate information to forecast users. Quality is the ability of the forecast to predict an event well according to some objective criteria (Murphy 1993). Value refers to the benefits realized by the forecast users when the forecast is used to inform their decision making. Evaluating consistency and value inherently requires consideration of the forecaster and user, and is dependent upon the situation under which the forecast is made or used. Quality, in contrast, is assessed objectively by comparing predicted events with the corresponding observations (Brier and Allen 1951; Murphy and Winkler 1987; Wilks 2011; Jolliffe and Stephenson 2011). Forecast verification is the process of assessing the quality of forecasts (Murphy and Winkler 1987). Fundamentally, forecast verification compares forecasts with observations. The comparison can be as simple as a plot of a sample of forecasts and observations. But more often it produces one or more indices or scores from which the forecastobservation pair or pairs can be interpreted (Brier and Allen 1951). Forecast verification is essential to any forecasting system (Murphy and Winkler 1987). It is the means by which the performance of forecasting methods can be documented and

Attributes of Forecast Quality

3

compared to each other or against standards. Forecast verification provides guidance for improving forecast systems. It is a tool for analyzing forecasts under different conditions (e.g., low vs. high flows, specific seasons) and assessing how suitable forecasts might be for different applications (e.g., droughts or floods) (Brier and Allen 1951; Murphy and Winkler 1987; Murphy 1993; Welles et al. 2007). Assessing forecast quality involves the statistical examination of the joint distribution of forecasts and observations, with higher quality forecasts displaying a greater degree of correspondence with the observations (Murphy and Winkler 1987). There are many scores that can be used to quantify different aspects of forecast quality. A variety of different scores should be used because a single score cannot describe all the properties of the relationship between forecasts and observation (Murphy 1993, 1997). Additionally, no one set of scores is suitable to all applications. Depending on the forecast type, the forecast application, and the expertise of the user, a set of scores can be chosen to answer questions of interest and give the user information upon which he/she can make decisions (Joliffe and Stephensen 2011). The chapter is organized as follows. Section 2 describes the ensemble forecast verification process and defines the common forms of forecasts (deterministic and probabilistic) and observations (discrete and continuous). Section 3 reviews the distributions-oriented approach for forecast verification, and the role of the joint distribution of forecasts and observations in the verification processes. Section 4 defines the various aspects of forecast quality using the distributions-oriented framework. Section 5 defines common measures of forecast quality using the moments of the joint distribution. Section 6 illustrates aspects of forecast quality for alternative hypothetical ensemble forecasts. The ensemble forecasts are evaluated in three ways: as single-valued forecasts using the ensemble mean, as probability forecasts for a discrete event occurrence, and as ensemble probability distribution forecasts of the outcome. The forecast quality for each is diagnosed using visual comparisons and numerical comparisons of forecast quality measures. Section 7 discusses some practical considerations in the application of the framework to ensemble forecast verification. Section 8 concludes the chapter.

2

Ensemble Verification Process

Forecasts come in many different forms. In a broad sense, a forecast can be either deterministic or probabilistic. Deterministic forecasts consist of unqualified statements that a single outcome will occur (single-valued forecast with no statement of uncertainty) (Wilks 2011). Probabilistic forecasts express the degree of uncertainty that an outcome or set of possible outcomes will occur. The observation corresponding to the forecast can be either discrete or continuous (Potts 2011). An observation is discrete when the outcome being predicted can have only a limited set of possible values (e.g., a flood/no flood outcome has a discrete observation). An observation is continuous when the outcome can have an infinite number of possible values (e.g., a spring flood volume outcome can have a continuous observation).

4

A.A. Bradley et al.

Ensemble forecasts consist of multiple estimates of some future event. Most commonly, hydrologic ensembles are a set of streamflow time series. A forecast is derived from the ensemble by taking a summary measure, subsampling, thresholding, categorizing, or assigning a probability distribution to the forecast data (Potts, 2011). One of the most common approaches is to choose the mean or median of the ensemble, thereby converting an ensemble prediction into a deterministic forecast. However, this removes all measures of uncertainty from the forecast and potentially useful information contained in the ensemble (Franz et al. 2003; Demargne et al. 2010). Dichotomous forecasts are another type of deterministic forecasts that make statements about whether an event will happen or won’t happen. Dichotomous forecasts can be derived from the ensemble through thresholding, such as flood or no flood. Another approach to thresholding ensembles is to make statements about the possibility of a flood occurring or not occurring by determining the frequency with which the ensemble members predict either category, i.e., a probabilistic forecast. Probabilistic forecasts provide explicit statements of uncertainty regarding how well the future hydrology is known (Krzysztofowicz 2001; Wilks 2011). Probability distribution forecasts are generated by assuming an appropriate distribution for the ensemble members. A forecast based on the empirical cumulative distribution is a direct measurement of the sample, or ensemble. A forecast based on a parametric distribution uses a characteristic shape with specified distribution parameters that describe the magnitude and variation of the variable of interest (Wilks 2011). In this case, the distribution parameters represent properties of the population, although they are estimated from the sample. There are a variety of ways to express the continuous probability distribution function of the outcome being predicted. Probability distribution forecasts are often displayed as cumulative frequency distribution functions or exceedance probability forecasts, which describe the probability that the event will be less than a given value. Although hydrologic ensembles and observations are in essence discrete variables with a finite number of significant figures, in practice, these data are often treated as continuous variables and the forecasts are displayed as a smooth cumulative distribution function. An empirical distribution, which is a step function with probability jumps of 1/n at each of the n data points, can be transformed to a smooth line through kernel smoothing (Wilks 2011). Often it is necessary or convenient to convert the ensemble traces to a set of discrete variables through categorizing or defining event thresholds. Categorical predictands are discrete predictands that can take on only one of a finite set of values (Potts 2011). Multicategory probabilistic forecasts can be created by approximating the continuous probability distribution function with quantiles, probability categories, or event thresholds (Krzysztofowicz 2001; Potts 2011). For example, Franz et al. (2003) evaluated seasonal water supply ensemble hindcasts using five forecast categories with threshold values based on the historical streamflow record, specifically 0–10 %, >10–30 %, >30–70 %, >70–90 %, and > 90 % of the climatology. Ensemble members are assigned to bins or categories, and the frequency of the ensemble members in each bin is used to compute a probabilistic forecast.

Attributes of Forecast Quality

5

The forecast verification process starts with the collection of well-defined forecasts and observations. As the preceding discussion illustrates, ensemble forecast information can be used in different ways and by different users, so one could choose alternative forms of a forecast for verification (Potts 2011). The simplest types of forecasts constructed from ensemble predictions are ones where the forecast is a single number (either deterministic or probabilistic) and the observation is a single number (either discrete or continuous). One example of an elemental forecast would be a deterministic flow volume forecast; the mean ensemble flow volume is used as a single-valued forecast, and the observation is the future flow volume. Another example would be a probabilistic flood event forecast; the forecast is the probability of a flood occurrence (a number from 0 to 1), and the observation is whether the flood occurred or not (represented by a discrete value of 1 if a flood occurs, or 0 if it does not). Verification of elemental forecasts like these is straightforward. However, given the information available in an ensemble forecast, other (more complex) forms are frequently constructed as well. For instance, probability forecasts for multiple categories constructed from ensembles, the forecast is in the form of probability distribution function for the discrete category outcomes. For probability forecasts of a continuous outcome, the forecast is in the form of a probability distribution of a continuous variable. Verification of forecasts in these forms (where the forecast itself is no longer a single number) is much more challenging. Regardless of the form of the ensemble forecast chosen for verification, once a sufficiently large sample of well-defined forecasts and observations has been collected, the verification sample can be used to explore the relationship between the forecast-observation pairs. Attributes of this relationship describe different aspects of the quality of the forecasts. In the next section, a general framework is presented for characterizing forecast quality based on the statistical relationship between forecasts and observation. The framework was developed for the elemental ensemble forecast verification problem – the case where the forecast is a single number (either deterministic or probabilistic) and the observation is a single number (either discrete or continuous). However, the aspects of forecast quality developed for the elemental case also have meaning for more complex probability distribution forecasts. As will be seen in the ensemble forecasting examples shown in Sect. 6, the framework can be extended to describe aspects of forecast quality for probability distribution forecasts from ensemble predictions.

3

Mathematical Formulation of the Verification Process

Murphy and Winkler (1987) introduced the distributions-oriented (DO) approach – also called diagnostic verification (Potts 2011) – as a general framework for forecast verification. The approach explicitly defines a stochastic process in which: (1) the forecast and observation are treated as random variables, and (2) each forecastobservation pair is assumed to be independent of all other pairs and identically distributed.

6

A.A. Bradley et al.

Consider the case where the forecast is a single number (either deterministic or probabilistic) and the observation is a single number (either discrete or continuous). Let F be the random variable denoting the forecast and X the random variable denoting the observation of the underlying variable of interest. Let f and x denote the numerical values of these two variables. The relationship between the forecast and the observation is thus defined by the joint probability distribution function (pdf) of the forecast variable and the corresponding observed variable: pFX ðf, xÞ ¼ PrðF ¼ f; X ¼ xÞ:

(1)

In other words, for any forecast of this type – a single-valued deterministic forecast of a continuous flow variable, or a probability forecast (between 0 and 1) of a discrete event outcome (either 1 or 0) – the joint pdf contains all the relevant information about their relationship. The distributions-oriented approach focuses on describing the characteristics of the joint distribution of forecasts and observations; it has been developed and used extensively for both probabilistic and deterministic meteorological forecast verification (see Bradley et al. 2003 for references). The measures-oriented verification approach, in contrast, does not focus explicitly on the joint distribution. It reduces the verification information to a small number of performance measures that describe specific aspects of the overall quality of forecasts (such as accuracy or skill). Regarding the DO verification approach, the different aspects of forecast quality can be defined using the joint probability distribution or its two factorizations into marginal and conditional distributions (Murphy and Winkler 1987). By conditioning on the forecast f, the calibration-refinement factorization of the joint distribution is: pFX ðf , xÞ ¼ pXjF ðx j f Þ  pF ðf Þ:

(2)

By conditioning on the observation x, the likelihood-base rate factorization of the joint distribution is: pFX ðf , xÞ ¼ pFjX ðf j xÞ  pX ðxÞ:

(3)

The distributions pXjF ðx j f Þ and pFjX ðf j xÞ are the conditional distribution of the observations given the forecast f and the conditional distribution of the forecasts given the observation x, respectively. The distributions pF( f ) and pX(x) are the marginal (unconditional) probability distributions of the forecasts and observations, respectively. The marginal distribution pF( f ) is sometimes called the predictive distribution or refinement distribution (Wilks 2011). Forecast refinement refers to the dispersion of the forecast distribution pF( f ). A forecast that only takes on a small range of values (e.g., it does not stray far from climatology) has a small spread of its refinement distribution pF( f ); conversely, a forecast that takes on a wide range of values (e.g., it can stray far from climatology) has a large spread, and the forecast F has the potential to discern a broad range of conditions. The marginal distribution pX(x) is sometimes called the base rate or uncertainty distribution and characterizes

Attributes of Forecast Quality

7

Fig. 1 Conceptual illustration of the joint distribution and its marginal and conditional distributions for discrete forecasts and observations. The joint distribution pFX( f, x) is represented by the box; each square represents the density for a specific discrete forecast f and observation x. The histogram above the box represents the marginal (unconditional) distribution of the forecast pF( f ). The histogram beside the box represents the marginal (unconditional) distribution of the observation pX(x). The highlighted column within the joint distribution represents the product of the conditional distribution pX|F(x| f)  pF( f ); hence, each column is related to the conditional distribution of a discrete forecast. The highlighted row represents the product of the conditional distribution pF|X( f | x)  pX(x); hence, each row is related the conditional distribution of a discrete observation

the forecasting situation (i.e., the observation X of the quantity to be forecast), not the forecast itself. Figure 1 illustrates the joint distribution and its marginal and conditional conditions for a case where the forecast and observation are discrete random variables.

4

Aspects of Forecast Quality

The different aspects of the forecast quality are defined as follows (Wilks 2011; Jolliffe and Stephenson 2011; Murphy 1997): Bias (or first order bias, overall bias, unconditional bias) describes the difference between the average forecast and the average observation. For example, if flow forecasts are consistently larger than observed flows, the deterministic forecast has a positive bias. If the forecast probabilities for a flood event are consistently less than the flood event’s occurrence frequency, the probability forecast has a negative bias. Bias describes differences between the marginal (unconditional) distribution of

8

A.A. Bradley et al.

forecasts pF( f ) and observations pX(x); it is a measure of their central tendency (such as the mean, median, or mode). Accuracy describes the average difference between the individual forecasts and observations. Forecasts that consistently agree closely with the observed outcomes are accurate. It is common to hear an individual forecast described as “accurate” when its forecast error is small and “inaccurate” when its forecast error is large. But in the context of DO verification, accuracy refers to average characteristics of forecast errors (over many forecasts). Accuracy is a property defined by the entire joint distribution pFX( f, x). Skill describes the accuracy of a forecast relative to a reference forecast or benchmark. Forecasts that are more accurate than the reference have skill (or are “skillful”). Generally the reference forecast corresponds to a more “naïve” forecast, such as (observed) climatology, persistence, or random chance. In other situations, the output from a baseline forecasting system serves as the reference. For example, at short lead times persistence (a forecast that the current condition will continue) can be accurate and is an important reference for comparison; forecasts must accurately predict short-range changes to be skillful (more accurate than persistence). At longer lead times, climatology (a forecast of the climatological outcome) is an important reference; forecasts must accurately predict long-range deviations from climatology to be skillful. (Note that climatology should be defined from a long record of observations, as it is a more stable estimate than one based on the sample climatology from the verification dataset.) Association (or correlation) describes the strength of the linear relationship between the forecasts and observations. A forecast has good association if the observations are highly correlated with the forecasts. The correlation coefficient is a common measure of association. Correlation is a property defined by the entire joint distribution pFX( f, x). Reliability or type-1 conditional bias describes how well the forecast agrees with the observed outcome on average when a specific forecast is issued. For a flood probability forecast to be reliable (or conditionally unbiased), a flood should be observed 20 % of the time when a forecast probability of 0.2 is issued. A flow forecast is reliable if the average observed flow is 1000 m3/s when a forecast flow of 1000 m3/s has been issued. Measures of reliability evaluate this conditional bias for all possible forecasts issued; forecasts that are reliable for all possible forecasts are said to be well calibrated. Reliability is a property of the conditional distribution pXjF ðx j f Þ and the marginal distribution pF( f ) from the calibration-refinement (CR) factorization. Resolution describes whether the outcomes are different for different forecasts issued (independent of whether or not the forecasts are reliable). For example, flood events should occur more often when a forecast probability of 0.8 is issued than when a forecast probability of 0.2 is issued. The average observed flow when a forecast of 1000 m3/s is issued should be higher than that when a forecast of 250 m3/s is issued. Measures of resolution evaluate this difference for all possible forecasts issued; when the forecasts can resolve the different outcomes, they are said to have

Attributes of Forecast Quality

9

resolution. Resolution is also a property of the conditional distribution pXjF ðx j f Þ and the marginal distribution pF( f ) from the calibration-refinement (CR) factorization. Type-2 conditional bias describes how well the observation agrees with the forecast on average when a specific outcome is observed. For all the cases when a flood event occurs, if the average forecast flood probability issued is 75 %, the probability forecasts have type-2 conditional bias. For all the cases when a flow of 1000 m3/s is observed, if the average flow forecast issued is 1200 m3/s, the deterministic forecast has type-2 conditional bias. Measures of type-2 conditional bias evaluate this conditional bias for all possible outcomes. Type-2 conditional bias is a property of the conditional distribution pFjX ðf j xÞ and the marginal distribution pX(x) from the likelihood-base rate (LBR) factorization. Discrimination describes whether the forecasts are different for different outcomes. If forecast probabilities issued when a flood occurs tend to be higher than those issued when a flood does not occur, the probability forecasts have discrimination. If the flow forecasts issued when high flows occur tend to be higher than those issued when low flows occur, the deterministic forecasts have discrimination. Measures of discrimination evaluate this difference for all possible outcomes. Discrimination is also a property of the conditional distribution pFjX ðf j xÞ and the marginal distribution pX(x) from the likelihood-base rate (LBR) factorization. Sharpness describes the degree of variability of the forecasts. For a probability forecast, it indicates the tendency to predict with extreme probabilities (0 or 1). Probability forecast are said to be sharp if they issue probabilities close to 0 or 1; in contrast, a climatology forecast (which issues only the climatological event probability) is “unsharp.” For deterministic forecasts, the forecasts must deviate significantly from the mean forecast to be sharp. Sharpness is closely related to the notion of resolution although it only concerns the forecast (some authors have used the two terms, sharpness and resolution, as synonymous – see the discussion in Jolliffe and Stephenson 2011). A high degree of sharpness is only desirable in the context of other measures. Without reliability, a sharp forecast is misleading; however, for a given level of reliability, a sharp forecast is preferred over an “unsharp” one since it contributes less uncertainty to decision making (Gneiting et al. 2007). Sharpness is a property of the marginal (unconditional) distribution of forecasts pF( f ) alone. Uncertainty describes the degree of variability in the observations. It is an important aspect in the performance of a forecasting system, but is independent of the forecast. Uncertainty is most simply measured by the variance of the observations. Uncertainty is a property of the marginal (unconditional) distribution of observations pX(x) alone. For any given aspect of forecast quality, there are many possible ways to assess it. Some involve graphical representation of statistics from the verification data sample (such as a reliability diagram or a rank histogram). Others involve metrics or measures of forecast quality. In the next section, measures based on the joint distribution of forecasts and observations are presented for the different aspects of forecast quality.

10

5

A.A. Bradley et al.

Common Measures of Forecast Quality with Distribution Moments

When the forecast F is a random variable (e.g., a single-valued deterministic flow or an event probability between 0 and 1) and the observation X is a random variable (a continuous flow variable or a discrete event outcome of either 1 or 0), then the moments of X and F may be used to characterize the different aspects of the joint distribution. The first moment of X and F are the expected value (generally referred as the mean) of the observations, E½X ¼ μX, and the expected value of the forecasts, E½F ¼ μF. One can also characterize the conditional moments. For example, the first moment of F conditioned on the observation x is denoted E½FjX ¼ x ¼ μFjX . The first moment of X conditioned on the forecast f is denoted E½XjF ¼ f  ¼ μXjF . To characterize the deviation of a random variable from its expected value, one can use The variance of the observation is h defined as h the variance. i i σ 2X ¼ Ε ðX  μX Þ2 . The variance of the forecast is defined as σ 2F ¼ Ε ðF  μF Þ2 . A measure of bias is the mean error (ME), which is characterized by the first moments of the joint distribution as: ME ¼ μF  μX :

(4)

For unbiased forecasts, ME = 0. A positive mean error denotes over-forecasting whereas a negative mean error denotes under-forecasting. A measure of accuracy is the mean square error (MSE) defined as the expected value of the square differences between the individual pairs of forecasts and observations: h i MSE ¼ Ε ðF  XÞ2 :

(5)

Note that, being a quadratic measure, the MSE tends to penalize the large differences. For probability forecasts, the MSE is equivalent to the Brier Score (Brier 1950). A measure of the association is the correlation (ρ), defined by the joint distribution as: ρFX ¼

5.1

Ε½FX : σFσX

(6)

Calibration-Refinement Decomposition

The accuracy of the forecasts depends on other aspects of forecast quality. Viewed in one way, accuracy depends on the reliability and resolution of the forecasts. By conditioning on the forecast, the calibration-refinement (CR) decomposition of the MSE can be written as:

Attributes of Forecast Quality

11

Fig. 2 Conceptual illustration of the calibration-refinement (CR) decomposition and its related aspects of forecast quality. The box represents all combinations of discrete forecasts and discrete observations. The circles (connected by the line) represent the expected value of the observation given the forecast μX|F for all possible forecasts. The horizontal line indicates the mean of the observations μX. The sloping line represents x = f. The histogram above the box represents the marginal distribution of the forecasts pF( f ) (known as the predictive or refinement distribution). The forecast has perfect reliability if all the μX|F points fall on the sloping line. The reliability component REL measures the squared deviations from the sloping line weighted by the marginal distribution pF( f ) (indicated by the red shaded area). The forecast has resolution since the μX|F points increase as the forecast f increases, indicating that the observations are different for different forecasts. The resolution component RES measures the squared deviations from the horizontal line μX weighted by the marginal distribution pF( f ) (indicated by the blue shaded area). The uncertainty is the variability of the observations. The uncertainty component UNC is represented by the standard deviation of the observations σ X about the mean

 2 h i  2 MSE ¼ Ε ðF  XÞ2 ¼ σ 2X þ EF μXjF  F  EF μXjF  μX ¼ UNC þ REL  RES

(7)

where EF is the expected value with respect to the forecast distribution and μXjF is the expected value of the observations conditioned on the forecast. Figure 2 illustrates these aspects of forecast quality for discrete forecasts and observations. The reliability component (REL) or type-1 conditional bias measures the conditional bias of the forecasts: REL ¼ EF



μXjF  F

2 

(8)

12

A.A. Bradley et al.

For perfectly reliable forecasts, the expected value of the observations conditioned on the forecast equals f (i.e., μXjF ¼ f ) and REL is 0. The resolution component (RES) measures the degree to which the average observations given a specific forecast f differs from the unconditional mean (or climatology): RES ¼ EF



μXjF  μX

2 

(9)

For high-resolution probability forecasts, when the forecast probability is higher (lower) than the climatological event frequency, the event should be observed more (less) frequently than climatology. For high-resolution deterministic forecasts, when the forecast is higher (lower) than the mean observation, the average outcome should be more (less) than climatology. Forecasts with larger differences have higher resolution. If the conditional expected value of the observations μXjF is the same, regardless of the forecast f, the forecasts have no resolution. For a forecast with no resolution, RES is 0. The maximum resolution is obtained for a perfectly reliable forecast (i.e., μXjF ¼ f ) where RES is equal to σ 2X. The (inherent) uncertainty component (UNC) of the observations is measured with the variance of the observations: UNC ¼ σ 2X

(10)

Therefore, Eq. 7 shows the relationship between accuracy (as measured by the MSE) and the reliability and resolution of the forecasts. If the forecasts are perfectly reliable (REL = 0), then biases (conditional and unconditional) do not inflate the forecast errors; whenever there are biases (REL > 0), the MSE increases (lower accuracy). The forecasts must have good resolution to be accurate. As the resolution (RES) increases, the MSE decreases (higher accuracy). Note that a climatology forecast is perfectly reliable (REL = 0) but has no resolution (RES = 0), so the MSE is equal to the inherent uncertainty (MSE ¼ σ 2X ).

5.2

Likelihood-Base Rate Decomposition

Viewed in another way, forecast accuracy depends on the sharpness, type-2 conditional bias, and discrimination of the forecasts. By conditioning on the observation, the likelihood-base rate (LBR) decomposition of the MSE can be written as:  2 h i  2 MSE ¼ Ε ðF  XÞ2 ¼ σ 2F þ EX μFjX  X  EX μFjX  μF ¼ SHA þ T2B  DIS

(11)

where EX is the expected value with respect to the distribution of the observations and μFjX is the expected value of the forecasts conditioned on the observation.

Attributes of Forecast Quality

13

Fig. 3 Conceptual illustration of the likelihood-base rate (LBR) decomposition and its related aspects of forecast quality. The box represents all combinations of discrete forecasts and discrete observations. The circles (connected by the line) represent the expected value of the forecast given the observation μF |X for all possible outcomes. The vertical line indicates the mean of the forecasts μF. The sloping line represents x = f. The histogram to the side of the box represents the marginal distribution of the observations pX(x) (known as the base rate or uncertainty distribution). The forecast has no type-2 conditional bias if all the μF|X points fall on the sloping line. The type-2 conditional bias component T2B measures the squared deviations from the sloping line weighted by the marginal distribution pX(x) (indicated by the red shaded area). The forecast has discrimination since the μF|X points increase as the observation x increases, indicating that the forecasts are different for different outcomes. The discrimination component DIS measures the squared deviations from the vertical line μF weighted by the marginal distribution pX(x) (indicated by the blue shaded area). The sharpness is the variability of the forecasts. The sharpness component SHA is represented by the standard deviation of the forecasts σ F about the mean

Figure 3 illustrates these aspects of forecast quality for discrete forecasts and observations. The first term in the LBR decomposition is the sharpness component (SHA), which measures the variability of the forecasts: SHA ¼ σ 2F

(12)

Forecasts with high sharpness are desirable because a large variance in forecast values has the potential to discern a broad range of outcomes. For probability forecasts, sharpness measures the degree to which forecast probabilities approach 0 and 1. For a climatology forecast (i.e., f ¼ μX in all cases), the sharpness SHA is equal to 0. The type-2 conditional bias component (T2B) (Murphy 1997) measures how close the observation agrees with the forecasts when a specific outcome is observed. It describes the bias conditioned on the observation:

14

A.A. Bradley et al.

T2B ¼ EX



μFjX  X

2 

(13)

Only a perfect forecast has no type-2 conditional bias. For a climatology forecast, T2B is equal to σ 2F. The discrimination component (DIS) measures how different the forecasts are for different outcomes: DIS ¼ EX

 2  μFjX  μF

(14)

For a flood probability forecast, the forecasts have discrimination if the average forecast probability μFjX when the event occurs (x ¼ 1) differs from the average forecast probability when the event does not occur (x ¼ 0). For a deterministic flow forecast, the forecasts have discrimination if the average forecast flow μFjX differs when a high or low flow is observed. If forecasts have no discrimination, then μFjX is the same (and equal to μF) regardless of the outcome and DIS is 0; a climatology forecast is one example of a forecast with no discrimination. Therefore, Eq. 11 shows the relationship between accuracy (as measured by the MSE) and the sharpness, type-2 conditional bias, and discrimination of the forecasts. There is a seeming contradiction in Eq. 11, which implies that sharper forecasts would increase the MSE (lower accuracy). However, there is a trade-off between sharpness, type-2 conditional bias, and discrimination. At one extreme, if the same forecast is always issued, the forecasts have no sharpness (SHP = 0) and no discrimination (DIS = 0), and the type-2 conditional bias is maximized. Hence, forecasts must be sharp (SHP > 0) in order to have discrimination (DIS > 0) and low type-2 conditional bias.

5.3

Skill Score and Its Decompositions

The forecast skill is a measure of the accuracy of the forecast relative to a reference forecast. A skill score shows the improvement of the forecast accuracy relative to the reference (Murphy 1997). Using the MSE as a measure of accuracy, and climatology as a reference forecast, the MSE skill score is: SSMSE ¼ 1 

MSE σ 2X

(15)

where σ 2X is the MSE of a climatology forecast. Perfect forecasts have a skill score of 1. Forecasts that are more accurate than a climatology forecast have a skill score greater than 0. Forecasts that are less accurate than a climatology forecast have a skill score less than 0. The aspects of forecast quality that enhance or degrade forecast skill can be diagnosed using one or more decompositions of the skill score. Substituting the

Attributes of Forecast Quality

15

decomposition in Eq. 7 for MSE, the calibration-refinement (CR) decomposition of the MSE skill score is:

SSCR ¼

 2 EF μXjF  μX σ 2X



 2 EF μXjF  F σ 2X

¼

RES REL  2 ¼ RRES  RREL (16) σ 2X σX

The CR decomposition implies that forecasts are skillful (or more accurate than a climatology forecast) when their resolution is greater than their conditional bias (the reliability). In other words, forecasts with good resolution can be skillful, but poor reliability (conditional bias) will degrade their skill. Similar to the CR decomposition, substituting the decomposition in Eq. 11 for MSE leads to the likelihood-base rate (LBR) decomposition of the MSE skill score:

SSLBR ¼ 1 

σ 2F σ 2X

þ

 2 EX μFjX  μF

σ 2X SHA DIS T2B ¼1 2 þ 2  2 σX σX σX ¼ 1  RSHA þ RDIS  RT2B



 2 EX μFjX  X σ 2X

(17)

The LBR decomposition relates the forecast skill to relative measures of the sharpness, discrimination, and type-2 conditional bias. As noted above, forecasts must be sharp in order to have high discrimination and low type-2 conditional bias (see Bradley et al. (2004) also). Another useful MSE skill score decomposition was derived by Murphy (1988), which will be used extensively in the following sections. A basic decomposition of the MSE is: h i MSE ¼ Ε ðF  XÞ2 ¼ ðμF  μX Þ2 þ σ 2F þ σ 2X  2ρFX σ F σ X

(18)

Substituting the decomposition in Eq. 18 for the MSE in Eq. 15, and rearranging the terms, yields:   2   σF μ  μX 2 SS ¼ ρ2FX  ρFX   F ¼ PS  SREL  SME: σX σX

(19)

Murphy (1988) interpreted this MSE skill score decomposition with the aid of a linear regression analogy for the conditional mean μXjF as a function of f. The first term is the potential skill (PS), a measure of association; it can be interpreted as the forecast skill for perfectly calibrated forecasts. The second term is the slope reliability (SREL), a linear regression-based measure of the reliability (or miscalibration). The third term is the standardized mean error (SME), a measure of the bias. One use of the decomposition is to diagnose the potential skill of the forecast and distinguish between the conditional bias (SREL) and unconditional bias (SME) that degrades the

16

A.A. Bradley et al.

forecast skill (Murphy and Winkler 1992; Hashino et al. 2007). In essence, this decomposition is similar to the CR decomposition shown in Eq. 16. With the linear regression analogy, the potential skill (PS) is equivalent to the relative resolution, where the slope reliability (SREL) and standardized mean error (SME) are the conditional and unconditional biases that represent the relative reliability. But because this decomposition is based on the moments of forecasts and observations (and not their conditional moments), it is often easier to compute from a verification data sample.

6

Forecast Quality of Hypothetical Ensemble Forecasts

To illustrate aspects of forecast quality for ensemble forecasts, we generated hypothetical ensemble forecasts for four forecasting systems. Each forecasting system makes an ensemble forecast for the same outcome – a continuous forecast variable. We will use a set of 100 forecast-observation pairs for each system for illustration purposes (a visual comparison); a much larger set will be used to derive moments (by Monte Carlo simulation) for a comparison with verification measures. Figure 4 shows a subset of the hypothetical forecasts for six forecast periods, along with the corresponding observed outcomes. For comparison, a climatological ensemble forecast (corresponding to the unconditional distribution of the observations) is also shown. To create these forecasts, the hypothetical relationship between the forecasts and observations is represented by a statistical model (a bivariate normal distribution). The ensemble forecasting examples were constructed by randomly generating forecast-observations pairs from the model (the same observations were used in all four cases). By changing certain parameters of the model, we can represent forecasts of differing quality. For the high quality forecasts, the forecasts and observations both have zero mean and unit variance, with a correlation of 0.9. For the unconditional bias forecasts, the forecast mean was changed to 0.632. For the conditional bias forecasts, the forecast standard deviation was changed to 0.260. For the poor association forecasts, the correlation was changed to 0.7. From Fig. 4 it is clear that all four hypothetical forecasts have some properties one would expect from good quality forecasts. All the ensembles have less spread than the climatology forecast. Also, all the ensembles tend to shift away from the climatology forecast and towards the observed outcomes. As we shall see, the first of the four forecasts shown for all six forecast periods (red box plots) is a high quality forecast. The second forecast (blue box plots) has the same ensemble spread as the high quality forecast but is systematically higher. In contrast, the third forecast (green box plots) has much less ensemble spread, but the observations rarely fall within the range of the forecasts. The fourth forecast (light blue box plots) resembles the high quality forecasts but has a larger ensemble spread. In the following sections, we evaluate the quality of the four hypothetical ensemble forecasts. First we evaluate their forecast quality as single-valued forecasts, using the forecast ensemble mean as a single-valued forecast of the outcome.

Attributes of Forecast Quality

17

Fig. 4 Hypothetical ensemble forecasts for six forecast periods. Each forecast is represented by a box plot. The box contains 25–75 % of the ensemble members; the whiskers extend to the 5 % and 95 % levels. The median is indicated by the bar within the box. The outcome for each forecast period (the observation) is indicated by the x-symbol. A climatology ensemble forecast, corresponding to the unconditional distribution of the observations, is shown on the left-hand side for comparison. The dashed horizontal line shows the upper tercile of the observation distribution; in a latter section, the ensemble forecasts will be used to make probability forecasts for outcomes above the threshold. For each forecast period, hypothetical ensemble forecasts are issued by the four forecasting systems, representing high quality forecasts (red), forecasts with unconditional biases (blue), forecasts with conditional biases (green), and forecasts with poor association (sky blue)

Next we evaluate their forecast quality as probability forecasts for a discrete event. Finally, we evaluate the forecast quality of the complete ensemble, treating it as a probability distribution forecast of the continuous outcome.

6.1

Illustration of Forecast Quality for a Single-Valued Forecast

First we will examine the use of ensemble forecasts as a single-valued forecast of a continuous forecast variable (e.g., streamflow). To accomplish this, the ensemble forecast must be transformed into a single value. As is often done in ensemble forecasting, the ensemble mean will be used to represent a single-valued forecast of the outcome. To characterize the forecast quality of single-valued forecasts, we examine the joint distribution of the forecast-observation pairs pFX( f, x). In this case, the forecast f is the ensemble mean, and the observation x is the outcome variable. We will first make a visual comparison of the joint distribution, as represented by scatter plots of forecast-observation pairs from a verification data sample; many aspects of forecast quality are readily interpreted from such information. Then we compare the forecasts using verification measures, which quantify different aspects of forecast quality.

18

A.A. Bradley et al.

6.1.1 Visual Comparison Figure 5a illustrates single-valued forecasts with high forecast quality. The forecasts are unbiased (the mean of forecasts and observations falls on the one-to-one line) and have strong association (their correlation with the observations is high). Since the observations are closely scattered around the one-to-one line, the forecasts are reliable (the expected outcome is close to the forecast outcome) and have good resolution (when forecasts are higher than average, their observations tend to be higher than average, and vice versa). Furthermore, the forecasts have low type-2 conditional bias (the expected forecast is close to the observation) and high discrimination (when observations are higher than average, their forecasts tend to be higher than average, and vice versa). All in all, the forecasts for this case are very accurate (the forecasts and observations correspond quite well). Figure 5b illustrates single-valued forecasts that are identical to the high quality case, except that their accuracy is degraded by unconditional bias. The forecasts are systematically higher than the observations, resulting in a shift from the one-to-one line to the right. As a result, the mean of forecasts and observations no longer falls on the one-to-one line. The forecasts still have strong association, good resolution, and high discrimination. However, they are no longer reliable (the expected outcome is lower than the forecast outcome) and have significant type-2 conditional bias (the expected forecast is higher than the observations). Figure 5c illustrates single-valued forecasts that are identical to the high quality case, except that their accuracy is degraded by conditional bias; the forecasts are much less variable than in the high quality case, resulting in a steeper sloping relationship that no longer follows the one-to-one line. Overall, the forecasts are unbiased; the mean of the forecasts and observations falls on the one-to-one line. Also, they still have strong association and good resolution. However, as in the unconditional bias case, the forecasts are no longer reliable; the expected outcome is higher (lower) than the forecast at higher (lower) values. Furthermore, since the forecasts are less variable, their discrimination is poor (different observations are no longer associated with substantially different forecasts) and they have significant type-2 conditional bias, since the expected forecast is lower (higher) than the observations at higher (lower) values. Figure 5d illustrates single-valued forecasts that are identical to the high quality case, except that their accuracy is degraded by poor association. The correlation between the forecasts and observations is much lower, resulting in more scatter about the one-to-one line. The forecasts remain unbiased. But because of the greater scatter in the relationship, the resolution and discrimination of the forecasts is weaker. 6.1.2 Comparison with Verification Measures A variety of measures can be used to quantify aspect of forecast quality. Here we use moments of the joint distribution, including the mean squared error (MSE) as a measure of accuracy and its various decompositions, to compare the four sets of single-valued forecasts shown in Fig. 5. Table 1 shows elements of the basic measures of forecast quality. The high forecast quality case is the most accurate; it has the lowest MSE. In contrast, the

Attributes of Forecast Quality

19

Fig. 5 Comparison of single-valued forecasts and observations for the four hypothetical cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association. The panels show 100 forecastsobservation pairs; note that the observations are the same for all four cases. The slope line is the oneto-one line. The mean of the forecasts is indicated by the vertical dashed line; the mean of the observations is indicated by the horizontal dash line. The intersection of the means is indicated by the box. The intersection falls on the one-to-one for all but one case, indicating the forecasts are unbiased. The exception is the unconditional bias case (b), where the mean of the forecasts is much larger than the mean of the observations

other three cases are much less accurate; their MSE is three times higher. For the unconditional bias case, the reason for lower accuracy is the bias; the mean error (ME) is high for this case but equal to 0 for all the others (indicating unbiased forecasts). For the poor association case, the reason for lower accuracy is the association; the correlation is 0.7 for this case, but for all the others the correlation

20

A.A. Bradley et al.

Table 1 Measures of accuracy, bias, and association of single-valued forecasts for the four cases Case High quality Unconditional bias Conditional bias Poor association

Accuracy MSE 0.2 0.6 0.6 0.6

Association ρFX 0.9 0.9 0.9 0.7

Bias ME 0 0.632 0 0

Table 2 Elements of the calibration-refinement MSE decomposition showing measures of accuracy, uncertainty, reliability, and resolution for the four single-valued forecast cases

Case High quality Unconditional bias Conditional bias Poor association

Accuracy MSE 0.2 0.6

Uncertainty σ 2X 1 1

Reliability  2  EF μXjF  F

Resolution  2  EF μXjF  μF

0.01 0.41

0.81 0.81

0.6

1

0.41

0.81

0.6

1

0.09

0.49

is much higher (0.9). For the conditional bias case, the bias and association are the same as in the high quality case; to understand the reason for lower accuracy, we must examine other aspects of the forecast quality. Table 2 shows elements of the calibration-refinement MSE decomposition from Eq. 7. The high quality case is the most accurate because the forecasts are reliable and have good resolution. For the unconditional bias and conditional bias cases, the forecasts still have good resolution, but are no longer reliable. Note that both these cases have identical results for the calibration-refinement MSE decomposition, indicating that the two types of bias affect the reliability measure by the same magnitude; however, the ME of 0 for the conditional bias case (see Table 1) indicates that its poor reliability is due solely to conditional bias, while the large ME for the unconditional bias case indicates that its poor reliability is due to (unconditional) bias. For the poor association case, the forecasts are less accurate primarily because they have lower resolution. Table 3 shows elements of the likelihood-base rate MSE decomposition from Eq. 11. The high quality case is the most accurate because the forecasts have low type-2 conditional bias and high discrimination. Although the unconditional bias case does not degrade the discrimination, it does contribute to a larger type-2 conditional bias. On the other hand, conditional bias case makes the forecast much less variable; its sharpness is 0.067, compared to 1 for the other three cases. Less variable forecasts mean they have very low discrimination and high type-2 conditional bias. For the poor association case, the forecasts are less accurate primarily because their discrimination is less.

Attributes of Forecast Quality

21

Table 3 Elements of the likelihood-base rate MSE decomposition showing measures of accuracy, sharpness, type-2 conditional bias, and discrimination for the four single-valued forecast cases

Case High quality Unconditional bias Conditional bias Poor association

Accuracy MSE 0.2 0.6

Sharpness σ 2F 1 1

Type-2  bias   2 EX μFjX  X

Discrimination  2  EX μFjX  μX

0.010 0.410

0.81 0.81

0.6

0.067

0.587

0.05

0.6

1

0.090

0.49

Table 4 Elements of the MSE skill score decomposition showing measures of association (potential skill PS), reliability (slope reliability SREL), and unconditional bias (standardized mean error SME) for the four single-valued forecast cases. Climatology (mX) is the reference forecast Case High quality Unconditional bias Conditional bias Poor association

Skill SS 0.8 0.4 0.4 0.4

Association PS 0.81 0.81 0.81 0.49

Reliability SREL 0.01 0.01 0.41 0.09

Bias SME 0 0.40 0 0

Table 4 shows elements of the MSE skill score decomposition from Eq. 19, using climatology (μX) as the reference forecast. The high quality case has high skill. In contrast, the skill for the other three cases is half that of the high quality case. The decomposition shows that bias (SME) degrades the forecasts for the unconditional bias case, poor reliability (SREL) degrades the forecasts for the conditional bias case, and lowered potential skill (PS) degrades the forecasts for the poor association case. By the design of these hypothetical ensemble forecasts, the three lower quality cases have the exact same accuracy and skill, as indicated by the MSE and the skill score. However, the nature of the forecast themselves are quite different, as is seen in Fig. 5. Clearly, one single measure – accuracy or skill – is insufficient to characterize the nature of the forecasts. Indeed, one needs to fully consider the various aspects of forecast quality, as described by the joint distribution and factorizations (calibration refinement and likelihood-base rate), to distinguish between the nature of the different forecasts.

6.2

Illustration of Forecast Quality for a Probability Forecast for a Discrete Event

Although single-valued forecasts derived from the ensembles provide insights on the quality of the forecasts, they are an incomplete description of the ensembles. By transforming ensemble forecasts into single-valued forecasts, much of the

22

A.A. Bradley et al.

information contained in the ensemble is lost. In particular, the ensemble distribution provides information of the probability of different outcomes. In this section, we will use the hypothetical ensemble forecasts to examine probability forecasts for a discrete (event) outcome. Consider a situation where one would like to forecast the probability of observing a relatively high outcome (e.g., high streamflow). To define a “high outcome,” we use a threshold. If the observed outcome z exceeds the threshold z*, we say the event occurs. If the observed outcome does not exceed the threshold, the event does not occur. Therefore, the observation x is now a binary event – either it occurs or it does not. Mathematically, the binary observation x is simply a transformation of the continuous outcome variable z:  x¼

1 0

if z > z : if z  z

(20)

The forecast f for this event is the probability that the event occurs. In the case of ensemble forecasts, one can use the ensemble forecast probability distribution to define the probability forecast f of the event occurrence. This task is accomplished by estimating the relative frequency of the event occurrence using the ensemble members. In this section, we will use the hypothetical ensemble forecasts to create probability forecasts for an outcome in the upper tercile of the climatological distribution of observed outcomes. (Note that this upper tercile threshold is illustrated in Fig. 4 by the horizontal dashed line.). By definition, the climatological exceedance of the threshold is one third (μX = 1/3); the event occurs one third of the time and does not occur two thirds of the time. The hypothetical ensemble forecasts for all four cases are then transformed into probability forecasts f of the event occurrence. Using the probability forecasts f and the binary observations x, the joint distribution pFX( f, x) of the forecast-observations pairs can be examined.

6.2.1 Visual Comparison Figure 6 illustrates the probability forecasts for the discrete events for the four examples. The plots show the continuous observations z on the y-axis. Whenever an observation exceeds the upper tercile (the horizontal line in the plots), the exceedance event occurs. The forecasts shown on the x-axis are the probability f that an exceedance event occurs. A climatology forecast corresponds to a forecast probability of 1/3 (the vertical line in the plots). Whenever the forecast probability is greater than 1/3, the event is predicted to be more likely to occur. If an exceedance event does occur, the forecast-observation pair plots in the upper right shaded area, indicating that the forecast is better than a climatology forecast. Likewise, whenever the forecast is less than 1/3, and the exceedance event does not occur, the forecastobservation pair plots in the lower left shaded area, again indicating that the forecast is better than a climatology forecast. Individual forecasts that are worse than the climatology forecast plot in the two unshaded areas.

Attributes of Forecast Quality

23

Fig. 6 Comparison of probability forecasts and continuous observations for an upper tercile event for the four hypothetical cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association. The panels show 100 forecasts-observation pairs; note that the observations are the same for all four cases and are the same as for the single-valued forecasts (see Fig. 5). The horizontal line shows the upper tercile of the continuous observations. If the observation z is above the line, an exceedance event occurs; if z is below the line, the event does not occur. The forecast f is the forecast probability that the event occurs. The vertical line shows the climatological probability of an event occurrence (1/3). Forecast-observation pairs that fall in the shaded areas are better than a climatology forecast (less error); the event occurs (does not occur) and the forecast probability is higher (lower) than the climatological probability. Forecast-observation pairs that fall outside the shaded areas are worse than a climatology forecast (greater error); the event occurs (does not occur) and the forecast probability is lower (higher) than the climatological probability

24

A.A. Bradley et al.

Figure 6a illustrates probability forecasts with high forecast quality. Several aspects of forecast quality are apparent. First, the forecasts are sharp; often a probability forecast of 0 or 1 is issued. Furthermore, the forecasts are reliable (the expected outcome is close to the forecast outcome). When the probability forecast is near zero, the event rarely occurs; when the forecast is near one, the event usually occurs. The reliability is also seen for other forecast probabilities. For example, for a probability forecast near 1/3, the event occurs (the observation exceeds the threshold) about one third of the time as expected for reliable forecasts. The probability forecasts have good discrimination; when the event occurs, the probability forecast tends to be higher than the climatological probability, and vice versa. All these aspects contribute to the accuracy of the forecasts (the forecasts and event observations correspond quite well). Figure 6b illustrates probability forecasts with unconditional bias. This is the case where the corresponding single-valued forecast (the ensemble mean) is systematically higher than the observations (see Fig. 5b). As with the single-valued forecasts, the probability forecasts are no longer reliable. For example, for a probability forecast near 0.2, one would still expect the event to occur about 20 % of the time. However, no events occurred when the probability forecast issued was less than the climatological probability of 1/3. Likewise, even when the probability forecast is greater than the climatological probability, the event does not occur as often as its predicted probability. Overall, the accuracy of the forecasts is degraded by unconditional bias; the forecasts systematically overpredict the probability of the event occurrence. Figure 6c illustrates probability forecasts with conditional bias. This is the case where the corresponding single-valued forecasts have much less variability than in the high quality case (see Fig. 5c). As with the single-valued forecasts, the probability forecasts are much less variable; the forecasts are much less sharp than the high quality case, with forecast probabilities ranging only from about 0 to 0.7. Even though the event occurs more often as the probability forecast increases, the forecasts are not reliable and have less discrimination than the high quality forecasts (different observations are not associated with as strongly different forecast probabilities). In this case, the forecasts systematically underpredict the probability of the event occurrence. Figure 6d illustrates probability forecasts with poor association. This is the case where the corresponding single-valued forecasts have lower correlation than in the high quality case (see Fig. 5d). As with the single-valued forecasts, the probability forecasts are more scattered than in the high quality forecast case. The forecasts appears to be reliable but are less sharp (e.g., fewer forecast probabilities near zero and one are issued) and have less discrimination. A more common way to make a visual comparison of probability forecasts is by plotting elements of the joint distribution. For example, a reliability diagram plots elements of the calibration-refinement factorization. Figure 7 shows the reliability diagrams for the four hypothetical probability forecasts for an upper tercile event. The main plot shows the observed relative frequency of the event occurrence μX|F as a function of the forecast f issued. For perfectly reliable forecasts, the observed

Attributes of Forecast Quality

25

Fig. 7 Reliability diagram for probability forecasts for an upper tercile event for the four cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association. The vertical line shows the mean forecast μF. The horizontal line shows the mean observation μX. A forecast is perfectly reliable if μX|F plots on the one-to-one line. A forecast has no resolution if μX|F plots on the horizontal μX line. The insert plot is the sharpness diagram showing the marginal distribution pF( f )

relative frequency of the event μX|F is the same as its forecast probability (i.e., μX|F plots on the one-to-one line). As is seen in Fig. 7, all four forecasts are reliable when a forecast probability f of 0 or 1 is issued; the observed relative frequency is 0 or 1 in all four cases. However, only the high quality forecasts (see Fig. 7a) and the poor association forecasts (see Fig. 7d) are fairly reliable when other forecast probabilities are issued. For the unconditional bias case (see Fig. 7b), the forecasts systematically overpredict the probability of the event occurrence (the observed event occurrence frequency is less than forecast). In contrast, for the conditional bias case (see

26

A.A. Bradley et al.

Fig. 7c), the forecasts systematically overpredict event occurrence for forecast probabilities less than about 0.17 and underpredict the event occurrence for forecast probabilities greater than 0.17. The insert plot on each reliability diagram shows the marginal distribution of the forecasts pF( f ), or how likely each forecast probability f is issued. Note that most of the forecasts are sharp; forecast probabilities of 0 and 1 are issued with very high frequency. The one exception is the conditional bias case (see Fig. 7c). In this case, high forecast probabilities for the event occurrence are never issued. Furthermore, forecast probabilities of 0 are issued much less frequently those slightly greater than 0; the most commonly issued forecast probability (the mode of the distribution) is 0.06. As noted above, the ensemble forecasts for the conditional bias case have a much narrower ensemble spread, resulting in probability forecasts that are much less variable (or less sharp) (see again Fig. 6c). Figure 8 shows a discrimination diagram, which plots elements of the likelihoodbase rate factorization. The plots show the conditional distribution of the forecast pF|X(f|x)for the two outcomes – when the event occurs (x = 1) because the observation is in the upper tercile, and when it does not occur (x = 0). For most cases, the forecasts are strikingly different for the two outcomes. When the event occurs a forecast probability of 1 is often issued; when the event does not occur a forecast probability of 0 is often issued. Hence, the forecasts have significant discrimination. The one exception is for the conditional bias case (see Fig. 8c); when the event occurs the conditional mean forecast is 0.12, and when the event does not occur the conditional mean forecast is 0.35. In this case, the discrimination is low.

6.2.2 Comparison with Verification Measures Table 5 shows some basic measures of forecast quality. The probability forecasts of an upper tercile event are the most accurate for the high forecast quality case; it has the lowest MSE. In contrast, the other three cases have lower accuracy; their MSE is about twice as high. For the unconditional bias and the conditional bias cases, the reason for lower accuracy is the bias; the mean error (ME) is significant for these two cases (indicating biased forecasts). For the poor association case, the bias is low but the association is much less than for the high quality case. Table 6 shows elements of the calibration-refinement MSE decomposition from Eq. 7. The high quality case is the most accurate because its forecasts are reliable and have good resolution. For the unconditional bias and conditional bias cases, the forecasts still have good resolution, but are no longer reliable. Note that both these cases have similar results for the calibration-refinement MSE decomposition, indicating that the two types of biases are affecting the reliability measure in a similar way. For the poor association case, the forecasts are less accurate primarily because they have lower resolution. Table 7 shows elements of the likelihood-base rate MSE decomposition from Eq. 11. The high quality case is accurate because its forecasts have low type-2 conditional bias and high discrimination. Although unconditional bias does not significantly degrade the discrimination, it does contribute to a larger type-2 conditional bias. On the other hand, conditional bias makes the forecasts much less

Attributes of Forecast Quality

27

Fig. 8 Discrimination diagram for probability forecasts for an upper tercile event for the four cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association. The conditional distributions pF|X( f |x) are shown for the two possible outcomes. The solid line shows the conditional forecast distribution when the event does not occur (x = 0). The dashed line shows the conditional forecast distribution when the event does occur (x = 1). A forecast has discrimination when the distributions are different for different outcomes. Note that the marginal distribution of the observation pX (x) is the same for all four cases

variable; its sharpness is 0.022, compared to values over 0.1 for the other three cases. Less variable forecasts mean they have very low discrimination and high type-2 conditional bias. For the poor association case, the forecasts are less accurate than the high quality case because their sharpness and discrimination are lower, and their type-2 conditional bias is higher.

28

A.A. Bradley et al.

Table 5 Measures of accuracy, bias, and association of probability forecasts for the four cases Case High quality Unconditional bias Conditional bias Poor association

Accuracy MSE 0.091 0.188 0.160 0.157

Association ρFX 0.769 0.665 0.730 0.559

Bias ME 0.013 0.240 0.135 0.030

Table 6 Elements of the calibration-refinement MSE decomposition showing measures of accuracy, uncertainty, reliability, and resolution for the four probability forecast cases

Case High quality Unconditional bias Conditional bias Poor association

Accuracy MSE 0.091 0.188

Uncertainty σ 2X 0.222 0.222

Reliability  2  EF μXjF  F

Resolution  2  EF μXjF  μF

0.001 0.096

0.132 0.130

0.160

0.222

0.069

0.132

0.157

0.222

0.005

0.070

Table 7 Elements of the likelihood-base rate MSE decomposition showing measures of accuracy, sharpness, type-2 conditional bias, and discrimination for the four probability forecast cases

Case High quality Unconditional bias Conditional bias Poor association

Accuracy MSE 0.091 0.188

Sharpness σ 2F 0.141 0.154

Type-2  bias   2 EX μFjX  X

Discrimination  2  EX μFjX  μX

0.034 0.102

0.084 0.068

0.160

0.022

0.150

0.012

0.157

0.105

0.085

0.033

Table 8 Elements of the MSE skill score decomposition showing measures of association (potential skill PS), reliability (slope reliability SREL), and unconditional bias (standardized mean error SME) for the four probability forecast cases. Climatology (mX) is the reference forecast Case High quality Unconditional bias Conditional bias Poor association

Skill SS 0.589 0.156 0.280 0.292

Association PS 0.591 0.442 0.533 0.312

Reliability SREL 0.001 0.028 0.171 0.016

Bias SME 0.001 0.259 0.082 0.004

Table 8 shows elements of the MSE skill score decomposition from Eq. 19, using climatology (μX) as the reference forecast. The high quality case has the highest skill;

Attributes of Forecast Quality

29

the skill for the conditional bias and poor association cases are about half that of the high quality case, and for the unconditional bias case it is about one fourth. The decomposition shows that bias (SME) degrades the forecasts for the unconditional bias case, poor reliability (SREL) degrades the forecasts for the conditional bias case, and low potential skill (PS) degrades the forecasts for the poor association case.

6.3

Illustration of Forecast Quality for an Ensemble Probability Distribution Forecast

In the previous section we examined the quality of the ensemble forecast as a probability forecast for a discrete event. In this section, we evaluate the forecast quality of the complete ensemble, treating it as a probability distribution forecast of the continuous outcome.

6.3.1 Visual Comparison Figure 9 illustrates the ensemble forecasts for the four hypothetical cases. The plots are similar to a scatter plot, but with the ensemble forecasts plotted as box plots (yaxis) versus their corresponding observations (on the x-axis). The ensemble forecasts for the high forecast quality case (see Fig. 9a) appear to be reliable (the box plots are scattered about the one-to-one line), have good discrimination (a high observation tends to have a high ensemble forecast and vice versa), and have strong association. The ensemble forecasts for the unconditional bias case (see Fig. 9b) are identical to those for the high quality case, except that they are shifted upward from the one-to-one line. With forecasts systematically higher than the observations, the accuracy is degraded by unconditional bias. The ensemble forecasts for the conditional bias case (see Fig. 9c) are not reliable; although the ensemble spread is narrower, the slope of box plots no longer follows the one-to-one line. The ensemble forecasts for the poor association case (see Fig. 9d) appear to be reliable (follow the one-to-one line) but have a larger ensemble spread than the high quality case. As a result, the discrimination of the forecasts is also lower. If one compares the plots of single-valued forecasts (Fig. 5) and probability forecasts for an event (Fig. 6), with those for the ensemble forecasts (Fig. 9), it is evident that ensemble forecasts are of a different nature. In the other two plots, the forecast itself is a number – the ensemble mean for the single-valued forecast, or a probability of an event for the probability forecast. In the case of the complete ensemble, the forecast is instead a probability distribution, as defined empirically by the ensemble members. Unfortunately, one cannot directly apply the distributionsoriented verification approach (as outlined in Sect. 3) to ensemble forecasts, since the forecast is in the form of a function and not a random variable. However, building from the example of the probability forecast of an event, one can construct and evaluate the quality of probability forecasts for any arbitrary binary event, as defined by an event threshold. Conceptually then, one can think of the forecast quality of the ensemble forecast as a continuous function of the threshold (see Bradley et al. 2004; Bradley and Schwartz 2011).

30

A.A. Bradley et al.

Fig. 9 Comparison of ensemble forecasts and observations for the four cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association. Each ensemble forecast is represented by a box plot. The box contains 25–75 % of the ensemble members; the whiskers extend to the 5 % and 95 % levels. The median is indicated by the bar within the box. The panels show 100 forecastsobservation pairs; note that the observations are the same for all four cases and are the same as for the single-valued and probability forecasts (see Figs. 5 and 6). The solid line shows the 1:1 relationship between the ensemble forecasts and the observations

Figure 10 shows an example of a forecast quality function for accuracy. The MSE of a probability forecast is plotted as a continuous function of the threshold. Rather than using the numerical value of the threshold itself, MSE is plotted against the climatological nonexceedance probability of the threshold value. A climatological probability close to zero defines a low outcome event (e.g., the nonexceedance probability of a low flow threshold), whereas a probability close to one defines a

Attributes of Forecast Quality

31

Fig. 10 MSE forecast quality functions for the four ensemble forecast cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association

high outcome event (e.g., the nonexceedance probability of a high flow threshold). Note that an upper tercile event, as indicated by the vertical line, was previously evaluated in Sect. 6.2. As can be seen in Fig. 10, the high quality ensemble forecasts are more accurate than the three others; their MSE is lower for all event thresholds. The unconditional bias forecasts, because they are systematically higher, yield more accurate probability forecasts for lower event thresholds than the other two cases but are the least accurate for thresholds greater than 0.42. The conditional bias forecasts have a narrow ensemble spread and are more accurate than the poor association forecasts near the middle tercile (between 0.34 and 0.66 thresholds). However, the poor association forecasts make better probability forecasts for events than the conditional bias forecasts in the lower and upper terciles. For comparison, the MSE of climatology forecasts are shown for all levels. By definition, the climatology forecast probability is the same as the threshold probability p. Therefore, its MSE is simply the variance p(1–p), which ranges from 0 at the two extremes to a maximum of 0.25 for the median ( p = 0.5). Note that for some thresholds, the probability forecasts are not skillful, since they are less accurate than climatology forecasts. For example, the unconditional bias forecasts are not skillful for high event thresholds, whereas the conditional bias and poor association forecast are not skillful for low and high event extremes. This comparison illustrates the drawback of plotting an absolute measure like MSE as a continuous function. MSE is a dimensional measure, with units of probability squared (for probability forecasts). It is valid to compare MSE as a measure of accuracy for a fixed threshold; however, it is not valid to compare MSE at different thresholds. In other words, the fact that the MSE for low and high

32

A.A. Bradley et al.

thresholds is quite small does not mean the forecasts for these thresholds are more accurate; indeed, the MSE for climatology forecasts are also quite small for these thresholds. A better way to plot forecast quality as a continuous function is using a relative measure. Figure 11 illustrates this using the MSE skill score and its decomposition. Recall, the MSE skill score is the accuracy of a forecast relative to a reference forecast. In Fig. 11, a climatology forecast is used as the reference. The skill score and its decomposition are dimensionless measures and can be compared across threshold values. It is clear that the high quality ensemble forecasts (see Fig. 11a) make very good probability forecasts. They have high skill (in terms of MSE), high association (potential skill PS), and virtually no conditional bias (slope reliability SREL) or unconditional bias (standardized mean error SME), expect perhaps at extreme low and high event thresholds. For the unconditional bias ensemble forecasts (see Fig. 11b), the probability forecasts for nearly all thresholds have high association (or high potential skill). The biases are low at low event thresholds, making them quite skillful forecasts. However, the unconditional bias (SME) increases steadily with threshold, which degrades the skill. For thresholds in the upper tercile, the conditional bias (SREL) becomes quite large as well. As a result, the ensemble forecasts for the unconditional bias case are not skillful for high event thresholds (0.79 and greater). For the conditional bias ensemble forecasts (see Fig. 11c), the probability forecasts also have high association (high potential skill). However, their narrow ensemble spread results in significant conditional bias (SREL) at all thresholds, and unconditional biases (SME) that peak at 0.25 and 0.75. As a result, the ensemble forecasts for the conditional bias case are not skillful for low and high event thresholds (below 0.16 or above 0.84). For the poor association ensemble forecasts (see Fig. 11c), the symmetrical shape of the forecast quality functions resembles those of the high quality forecast. However, the forecasts have lower overall skill. They have lower association and thus lower potential skill (PS). Furthermore, the conditional bias (SREL) is much larger at very low and high event thresholds, as are the unconditional bias (SME). As a result of low association and significant biases, the ensemble forecasts for the poor association case are not skillful for low and high event thresholds (below 0.04 or above 0.96). Other MSE skill score decompositions may be used to plot a relative forecast quality measure as a continuous function. Figure 12 shows the likelihood-base rate (LBR) decomposition of the MSE skill score, with terms based on Eq. 18, using climatology as the reference forecast. This decomposition uses measures that describe forecast quality conditioned on the observations. The high quality forecasts (see Fig. 12a) are sharp, have good discrimination, and have low type-2 conditional bias. The poor association forecasts (see Fig. 12d) resemble the high quality forecasts, but with generally lower sharpness, lower discrimination, and higher type-2 conditional bias, resulting in lower overall skill. The conditional bias forecasts (see Fig. 12c), with their narrow ensemble spread, have very low sharpness (near zero for

Attributes of Forecast Quality

33

Fig. 11 MSE skill score and its decomposition forecast quality functions for the four ensemble forecast cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association. Climatology is the reference forecast. Each plot show measures of skill (skill score SS), association (potential skill PS), reliability (slope reliability SREL), and unconditional bias (standardized mean error SME). The measures are based on probability forecasts for binary events, defined by the climatological nonexceedance probability threshold p. The vertical line defines the upper tercile event ( p = 0.667); forecast quality for probability forecasts of this upper tercile event was examined in detail (see Figs. 6, 7, and 8)

low and high event thresholds). Since forecasts cannot have discrimination without sharpness, the discrimination is also quite low. Unlike the other cases, the skill function for the unconditional bias forecasts (see Fig. 12b) is not symmetrical; the forecasts are skillful for low and medium event thresholds, but not for high

34

A.A. Bradley et al.

Fig. 12 MSE skill score and its likelihood-base rate (LBR) decomposition forecast quality functions for the four ensemble forecast cases: (a) high quality forecasts, (b) forecasts degraded by unconditional bias, (c) forecasts degraded by conditional bases, and (d) forecasts degraded by poor association. Climatology is the reference forecast. Each plot shows measures of skill (skill score SS), relative sharpness (RSHA), relative discrimination (RDIS), and relative type-2 conditional bias (RT2B), as defined in Eq. 17. The measures are based on probability forecasts for binary events, defined by the climatological nonexceedance probability threshold p. The vertical line defines the upper tercile event ( p = 0.667); forecast quality for probability forecasts of this upper tercile event was examined in detail (see Figs. 6, 7, and 8)

thresholds. Interestingly, the sharpness and discrimination are actually lower at low event thresholds and increase dramatically moving to higher thresholds. However, the discrimination increases more slowly than the sharpness, resulting in lower skill at higher thresholds.

Attributes of Forecast Quality

35

6.3.2 Comparison with Verification Measures Unlike the previous cases of single-valued forecasts and probability forecasts for a discrete event, verification measures derived from the joint distribution of forecasts and observations are not applicable to ensemble forecasts in the form of a probability distribution. Instead, we plotted verification measures for probability forecasts – defined by the ensemble forecasts for different threshold levels – as a continuous function of the thresholds, to illustrate the forecast quality of ensemble forecasts. However, measures that summarize aspects of these forecast quality functions are often used in ensemble forecast verification. The most common example is the continuous ranked probability score (Matheson and Winkler 1976; Unger 1985). The continuous ranked probability score ( CRPS ) is a measure of the overall accuracy of the ensemble forecast. It is related to the ranked probability score and the Brier score. The ranked probability score (RPS) extends the concept of the MSE for a binary event (also known as the Brier score) to multicategory events. The CRPS extends the RPS to the limit of an infinite number of categories (Hersbach 2000). Mathematically, the CRPS is related to the MSE as: 1 ð

CRPS ¼

MSEðzÞdz

(21)

1

where MSE(z) is the MSE for probability forecasts for an event defined by a threshold outcome z. Conceptually, the CRPS summarizes the MSE function illustrated in Fig. 10. Table 9 shows the CRPS for the four sets of ensemble forecasts. Results for climatology ensemble forecasts are also shown for comparison. Like the MSE, the CRPS is a measure of accuracy for the ensemble forecasts. As expected, the high quality ensemble forecasts are the most accurate (lowest CRPS), while the climatology ensemble forecasts are the least accurate (highest CRPS). The remaining three ensemble forecasts have more similar accuracy. It is clear that all four sets of ensemble forecasts are skillful compared to the climatology reference. Similar to the CRPS, which summarizes the MSE function (Fig. 10) over the entire range of thresholds, relative measures like the skill scores and its decomposition can be evaluated. For example, Bradley and Schwartz (2011) showed that the continuous ranked probability skill score (CRPSS) is mathematically equivalent to:

Table 9 The continuous ranked probability score (CRPS) for the four sets of ensemble forecasts and climatology forecasts

Case High quality Unconditional bias Conditional bias Poor association Climatology

Accuracy CRPS 0.252 0.489 0.463 0.438 0.564

36

A.A. Bradley et al. 1 ð

CRPSS ¼

wðzÞSSðzÞdz

(22)

1

where SS(z) is the Brier skill score for probability forecasts for an event defined by a threshold outcome z, and w(z) is a weighting function that depends on the inherent uncertainty for threshold z (see Eq. 10). Therefore, CRPSS is the weighted-average skill of the function SS(z) over all possible outcomes z. If instead one indexed the skill score function by the climatological probability p (as was done in Figs. 10 and 11), an analogous summary measure of the skill of the ensemble forecast is: ð1 SS ¼ wðpÞSSðpÞdp

(23)

0

where w( p) is a weighting function proportional to the inherent uncertainty p(1–p) for threshold p: w ð pÞ ¼

pð 1  pÞ ð1

¼ 6pð1  pÞ

(24)

pð1  pÞdp 0

Here, SS represents the weighted-average skill of the skill function shown in Fig. 11. Using a similar approach for all elements of the decomposition yields the summary measures shown in Table 10. As seen in Fig. 11, the high quality ensemble forecasts have high potential skill ( PS ) and low conditional ( SREL ) and unconditional biases ( SME ). Both the unconditional bias and conditional bias cases have reasonably high potential skill. However, a large SME degrades the skill for the unconditional bias ensemble forecasts, and a large SREL degrades the skill for the conditional bias forecasts. For the poor association ensemble forecasts, the conditional (SREL) and unconditional biases (SME) are slightly higher than for the high quality forecasts; however, their skill is much lower because their potential skill is much lower than all the other cases. Table 10 Summary verification measures for ensemble forecast skill and its decomposition. The summary measures indicate the weighted-average MSE skill SS , potential skill PS (association), slope reliability SREL (conditional bias), and standardized mean error SME (unconditional bias) Case High quality Unconditional bias Conditional bias Poor association

Skill SS 0.572 0.200 0.234 0.260

Association PS 0.576 0.441 0.507 0.296

Reliability SREL 0.003 0.048 0.199 0.028

Bias SME 0.001 0.193 0.074 0.008

Attributes of Forecast Quality

37

Table 11 Summary verification measures for ensemble forecast MSE skill and its likelihood-base rate (LBR) decomposition. The summary measures indicate the weighted-average skill SS , the relative sharpness RSHA , relative discrimination RDIS , and relative type-2 conditional bias RT2B Case High quality Unconditional bias Conditional bias Poor association

Skill SS 0.572 0.200 0.234 0.260

Sharpness RSHA 0.645 0.578 0.090 0.489

Discrimination RDIS 0.371 0.247 0.048 0.143

Type-2 bias RT2B 0.154 0.469 0.725 0.394

Table 11 shows summary measures based on the MSE skill likelihood-base rate decomposition from Eq. 17, using climatology as the reference forecast. As seen in Fig. 12, the high quality forecasts have high sharpness, low type-2 conditional bias, and high discrimination. The unconditional bias forecasts are also sharp but have lower discrimination and larger type-2 conditional bias. The conditional bias forecasts are not sharp; poor discrimination and high type-2 conditional bias degrade their skill. The poor association forecasts still have good sharpness but low discrimination and significant type-2 conditional bias.

7

Practical Considerations

7.1

Diagnostic Verification and Ensemble Forecasts

For those new to forecast verification, it is common to ask what one metric is best to use to assess forecasts or forecast systems. One important lesson of the hypothetical ensemble forecast examples shown in this chapter is that a single measure of forecast quality is insufficient (Murphy 1993). By design, the hypothetical examples had one high quality forecast case and three lower quality forecasts cases; the lower quality forecasts were created by degrading the high quality forecasts in a specific manner. Although the three lower quality ensemble forecasts were dramatically different, their forecast accuracies (or skills) were roughly the same (whether treated as singlevalued forecasts, as probability forecasts, or as probability distribution forecasts). It should be obvious then that forecasts with similar accuracy are not necessarily of comparable quality. Examining elements of the joint distribution of forecasts and observations – both visually for a qualitative assessment, and numerically with forecast verification measures for a quantitative assessment – was needed to distinguish the nature of the different forecasts. The distributions-oriented verification framework provides guidance on what aspects and measures of forecast quality are relevant to such a diagnostic assessment. Different aspects of forecast quality have significant diagnostic and practical implications in ensemble forecasting. For example, when a forecast has poor association, then its potential skill and accuracy are limited. In contrast, when forecast accuracy is degraded by bias, accuracy can be improved by reducing or eliminating diagnosed biases through ensemble postprocessing. Unconditional bias in ensemble

38

A.A. Bradley et al.

forecasts can easily be removed with simple bias-correction methods (Seo et al. 2006; Hashino et al. 2007). Conditional bias can be improved by calibration methods (Atger 2003; Gneiting et al. 2005; Brown and Seo 2010; among others), but the task is much more complicated. Therefore, it is important to diagnose the reasons for low accuracy and to distinguish between unconditional and conditional biases.

7.2

Absolute Versus Relative Measures

Common absolute measures of forecast quality – like MSE and its decompositions – are often used in forecast verification. Such measures take on the dimensions of the forecast variable. For instance, for a single-valued forecast of flow, MSE has dimensions of flow-squared. For a probability forecast of an event, MSE has dimensions of probability squared. The dimensionality of these measures can limit their applications in hydrologic ensemble forecast comparisons. For example, consider a situation where one wants to compare the quality of forecasts issued at different locations in a watershed to find where they are most accurate. Using the MSE as a measure of accuracy, one would naturally find that smaller rivers (with lower flows) tend to have lower MSE than those for bigger rivers (with higher flows). To make the comparison possible, one needs a nondimensional measure of accuracy. A common approach is to take the root mean square error (RMSE) and divide by the observation mean, so that differences in the relative accuracy can be assessed. Likewise, consider a situation where one wants to determine the quality of probability forecasts at a single location for an ensemble prediction system. This is the same situation encountered in the verification example for the ensemble probability distribution forecasts. Here, we first plotted MSE (see Fig. 10) as a function of the defining event threshold (its climatological probability). But forecast probabilities for commonly occurring events (e.g., flow above an upper tercile) tend to be much higher than for rarely occurring events (e.g., an extreme high flow that occurs only 5 % of the time). As a result, MSE tends to be higher for commonly occurring events than for rarely occurring events. A smaller MSE is not synonymous with a more accurate forecast; instead, one needs to compare results with the MSE of a climatology forecast (a reference forecast) to make any inferences on their accuracy. For any comparisons where the different forecast variables have very different climatologies, the use of absolute measures for comparisons is problematic. Instead, relative measures of forecast quality – like MSE skill score and its decompositions shown in Eqs. 15, 16, and 18 – are better suited for forecast comparisons. As the verification example showed, plotting relative measures of forecast quality (see Fig. 11) as a function of threshold allowed an assessment of when probability forecasts for an event were of higher quality, and when they were of lower quality. The results showed that even with the same ensemble forecast, the probability statements issued by ensemble probability distribution function forecasts are not of the same quality; probability statements for certain thresholds were of higher quality than for others. Clearly, the use of relative forecast measures requires an appropriate

Attributes of Forecast Quality

39

reference forecast. By evaluating forecast quality relative to a reference, relative forecast measures allow an “apples-to-apples” comparison of results required for many forecast verification applications.

7.3

Choosing a Verification Data Sample

Although the distribution-oriented forecast verification measures presented in Sect. 5 are defined by the moments of the joint distribution of forecasts and observations, in practice the moments must be estimated with a verification data sample. That is, a sample made up of forecasts and their corresponding observation is chosen, and then sample estimates of moments are computed. Because of the assumptions made, the verification data sample must be chosen with some care. For example, one assumption of the joint-distribution model is that forecastobservation pairs are identically distributed. Hence, one must choose a verification data sample that is a random sample drawn from this joint distribution. Consider the case of forecasts of the flow volume for the upcoming season for a basin in the Rocky Mountains of the western United States. Because of climatological variations in river flows, the distribution of observed seasonal flow volumes for April forecasts (which are a result of snow accumulation over the winter months and their melting rate during the spring season) can be very different that for August forecasts (which are dominated by short-term meteorological events). Yet in practice, in an effort to create a larger sample size, verification data samples are often formed by pooling forecasts issued at different times of the year. In this case, pooling of April and August forecasts would violate the fundamental assumption of stationary; the joint distribution for April forecasts is not the same as that for August forecasts. Hamill and Juras (2006) show examples where violating the stationarity assumption when choosing a verification data sample can lead to false inferences (unreliable or misleading verification measures). Therefore, selection of a verification data sample must be done with care. Pooling of forecasts issued at different times, forecasts relative to various lead times, or forecasts issued for different locations, should be avoided, unless additional analysis shows that stationarity assumption appears reasonable. The above example illustrates one case where stratifying the entire forecastobservation data set should be employed. However, data stratification may also be used to better assess how the forecast quality depends on the forecast situation or condition. For instance, it may be useful to assess forecast quality by region (stratify data spatially) and/or by season (stratify data temporally). In other cases one may want to assess forecast quality for particular atmospheric or hydrologic outcomes, such as the occurrence of precipitation or flooding (stratify data by condition). Note that data stratification should involve categories with reasonable sample sizes to obtain reliable verification statistics for each category. This can be a significant obstacle with hydrometeorological ensemble forecast verification, where data samples tend to be small. Consider again the case of seasonal flow volume forecasts. If hindcasting is used to create forecasts for the past 30 years, that means the data sample for April (or August) forecasts is 30 for a particular location. This sample size

40

A.A. Bradley et al.

is already quite small. However, if an adequate sample size can be maintained, such as when forecasts are issued more frequently, the hindcast period is long, or it is reasonable to pool forecasts issued at different times, then additional data stratification could provide useful insights. The second assumption of the joint-distribution model is that forecast-observation pairs are independent. The independence assumption can be violated if forecasts or observations are significantly correlated in time or space. For instance, a short-range forecast that is updated frequently would have forecasts and observations that are correlated. Such violations would not necessarily lead to unreliable or misleading verification measures (as long as stationarity assumption is still valid). However, violation of the independence assumption reduces the effective size of the verification data sample. As a result, the sampling uncertainties associated with the estimated verification measures would be higher than for an independent sample. It is important also to recognize the limitations of observational data sets used in forecast verification. Some observations are not direct measurements of the forecast variable. Consider a quantitative precipitation forecast for a specific duration and areal domain; its corresponding observation might then be estimated by areal interpolation from point precipitation gauge measurements and/or weather radar. Note also that even direct measurements of a forecast variable (e.g., observations of streamflow at a stream-gage) have measurement errors. Changes in data collection and quality control over time, data processing, and data interpolation methods add uncertainty regarding the structure of observational errors and their effect on the verification process. However these observational errors are generally neglected during the verification process, assuming that they are insignificant compared to the forecast error. Methods to account for the observational error are under investigation (e.g., Gorgas and Dorninger 2012). Note that sometimes another reference observation is used to verify forecasts, such as simulated flow values (produced by the same forecasting system from observed atmospheric inputs), to help isolate and analyze a specific source of error (e.g., Jaun and Ahrens 2009; Demargne et al. 2010; Brown et al. 2014).

7.4

Sampling Uncertainties for Small Data Samples

As with any estimate based on a data sample, forecast verification measures estimated with a verification data sample have sampling uncertainties. Obviously, estimates should be based on the largest verification data sample possible. However, as noted above, pooling of forecast-observation pairs to create a larger sample is counterproductive if pooling violates assumptions of the distributions-oriented approach. Therefore, one of key requirements in forecast verification is to archive forecasts from a fixed forecast system for multiple years, or to include a hindcasting capability to retrospectively produce forecasts from a fixed version of the forecast system. Ideally, evaluating the sampling uncertainties of verification measures should be a routine part of forecast verification to account for the uncertainty associated with a

Attributes of Forecast Quality

41

verification metric, as well as help analyze if one forecast system is significantly different from another for a given metric and given the sampling uncertainty. Unfortunately, assessing sampling uncertainties is not yet common place. Yet for certain verification measures, simple analytical expressions for sampling uncertainty have been derived (Kane and Brown 2000; Stephenson 2000; Thornes and Stephenson 2001; Mason and Graham 2002; Ferro 2007; Jolliffe 2007). For example, Bradley et al. (2008) derived expressions for the sampling uncertainty of the MSE and skill scores for probability forecasts; Wilks (2010) has extended this approach to evaluate sampling uncertainty for correlated forecast-observation pairs. In the absence of analytical uncertainty expressions, others have used a more computationally intensive resampling method to assess sampling uncertainties (Efron 1981; Wilks 2011). This method involves randomly resampling forecast-observations pairs to compute verification measures and define their empirical distribution. Examples of the resampling method in forecast verification literature include Mason and Mimmack (1992), Politis and Romano (1994), Wilks (1996), Hamill (1999), Zhang and Casey (2000), Doblas-Reyes et al. (2003), Accadia et al. (2003, 2005), Ebert et al. (2004), Jolliffe (2007), and Ferro (2007), among others.

8

Conclusions

Forecast verification is the process of assessing the quality of forecasts. Using a distributions-oriented approach to forecast verification, aspects of forecast quality are defined based on the joint distribution of forecasts and observations. Hydrometeorological ensemble forecasts can be used to represent many types of forecasts, such as single-valued deterministic forecasts of a continuous outcome, probability forecasts for event occurrences (e.g., a flood event), or ensemble probability distribution forecasts of a continuous outcome. Regardless of the form of the forecast (deterministic or probabilistic) or the observations (discrete or continuous), the joint distribution of forecasts and observations completely describes their relationship. The many aspects of forecast quality can be examined using the joint distribution or one of its factorizations into a conditional and marginal distribution. Basic aspects like bias, accuracy, skill, and association are defined by the joint distribution. By examining the distribution of observations for specific forecasts, aspects like reliability (type-1 conditional bias) and resolution are defined. By examining the distribution of forecasts for specific observations, aspects like type-2 conditional bias and discrimination are defined. Other aspects such as sharpness depend only on the distribution of forecasts, while uncertainty depends only on the distribution of observations. Common measures of these aspects of forecast quality are defined based on the moments of the joint distribution or its conditional and marginal distributions. Hypothetical ensemble forecasts were used to illustrate aspects of forecast quality as single-valued forecasts (ensemble mean), probability forecasts for an event, and as ensemble probability distribution forecasts. One of the ensemble forecasts was of high quality, while three others were lower quality – degraded by either unconditional bias,

42

A.A. Bradley et al.

conditional bias, or poor association. Although all the lower quality forecasts had similar accuracy and skill (the accuracy relative to a climatology reference forecast), the nature of the forecasts themselves were very different. By examining other aspects of forecast quality – either visually for a qualitative assessment or numerically with forecast verification measures for a quantitative assessment – one could easily diagnose the attributes that were degraded. Some practical considerations of forecast quality for ensemble forecast verification were also discussed. For example, diagnosing the factors that contribute to forecast quality or degrade forecast quality is important and can help determine what ensemble postprocessing methods could be employed to improve the forecasts. Making inferences about forecast quality often requires the selection of appropriate measures (either absolute or relative measures). In practice, forecast verification is carried out with a verification data sample; given the assumptions of the distributions-oriented approach, care must be taken in selecting an appropriate data sample to avoid misleading results. Finally, given the typically small hydrometeorological ensemble forecast verification data samples, the sampling uncertainty associated with forecast verification measures should be assessed as part of the verification process.

References C. Accadia, S. Mariani, M. Casaioli, A. Lavagnini, A. Speranza, Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on highresolution verification grids. Weather Forecast. 18, 918–932 (2003) C. Accadia, S. Mariani, M. Casaioli, A. Lavagnini, A. Speranza, Verification of precipitation forecasts from two limited-area models over Italy and comparison with ECMWF forecasts using a resampling technique. Weather Forecast. 20, 276–300 (2005) F. Atger, Spatial and interannual variability of the reliability of ensemble-based probabilistic forecasts: Consequences for calibration. Monthly Weather Review. 131(8), 1509–1523 (2003) A.A. Bradley, S.S. Schwartz, Summary verification measures and their interpretation for ensemble forecasts. Mon. Weather Rev. 139, 3075–3089 (2011) A.A. Bradley, T. Hashino, S.S. Schwartz, Distributions-oriented verification of probability forecasts for small data samples. Weather Forecast. 18, 903–917 (2003) A.A. Bradley, S.S. Schwartz, T. Hashino, Distributions-oriented verification of ensemble streamflow predictions. J. Hydrometeorol. 5, 532–545 (2004) A.A. Bradley, S.S. Schwartz, T. Hashino, Sampling uncertainty and confidence intervals for the Brier score and Brier skill score. Weather Forecast. 23, 992–1006 (2008) G.W. Brier, Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1–3 (1950) G.W. Brier, R.A. Allen, Verification of weather forecasts, in Compendium of Meteorology, ed. by T.F. Malone (American Meteorological Society, Boston, 1951), pp. 841–848 J.D. Brown, D.J. Seo, A nonparametric postprocessor for bias correction of hydrometeorological and hydrologic ensemble forecasts. J. Hydrometeorol. 11, 642–665 (2010) J.D. Brown, L. Wu, M. He, S. Regonda, H. Lee, D.-J. Seo, Verification of temperature, precipitation, and streamflow forecasts from the NOAA/NWS Hydrologic Ensemble Forecast Service (HEFS): 1. Experimental design and forcing verification. J. Hydrol. 519, 2869–2889 (2014) J. Demargne, J. Brown, Y.Q. Liu, D.J. Seo, L.M. Wu, Z. Toth, Y.J. Zhu, Diagnostic verification of hydrometeorological and hydrologic ensembles. Atmos. Sci. Lett. 11, 114–122 (2010)

Attributes of Forecast Quality

43

F.J. Doblas-Reyes, V. Pavan, D.B. Stephenson, The skill of multi-model seasonal forecasts of the wintertime North Atlantic Oscillation. Climate Dynam. 21, 501–514 (2003) E.E. Ebert, L.J. Wilson, B.G. Brown, P. Nurmi, H.E. Brooks, J. Bally, M. Jaeneke, Verification of nowcasts from the WWRP Sydney 2000 forecast demonstration project. Weather Forecast. 19, 73–96 (2004) B. Efron, Nonparametric estimates of standard error: the jacknife, the bootstrap and other methods. Biometrika 68, 589–599 (1981) C.A. Ferro, Comparing probabilistic forecasting systems with the Brier score. Weather Forecast. 22, 1076–1088 (2007) K.J. Franz, H.C. Hartmann, S. Sorooshian, R. Bales, Verification of national weather service ensemble streamflow predictions for water supply forecasting in the Colorado River basin. J. Hydrometeorol. 4, 1105–1118 (2003) T. Gneiting, A.E. Raftery, A.H. Westveld, T. Goldman, Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098–1118 (2005) T. Gneiting, F. Balabdaoui, A.E. Raftery, Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat Methodol. 69, 243–268 (2007) T. Gorgas and M. Dorninger, Quantifying verification uncertainty by reference data variation. Meteorologishe Zeitschrift. 21(3), 259–277 (2012) T.M. Hamill, Hypothesis tests for evaluating numerical precipitation forecasts. Weather Forecast. 14, 155–167 (1999) T.M. Hamill, J. Juras, Measuring forecast skill: is it real skill or is it the varying climatology? Q. J. Roy. Meteorol. Soc. 132, 2905–2923 (2006) T. Hashino, A.A. Bradley, S.S. Schwartz, Evaluation of bias-correction methods for ensemble streamflow volume forecasts. Hydrol. Earth Syst. Sci. 11, 939–950 (2007) H. Hersbach, Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15, 559–570 (2000) S. Jaun, B. Ahrens, Evaluation of a probabilistic hydrometeorological forecast system. Hydrol. Earth Syst. Sci. 13, 1031–1043 (2009) I. Jolliffe, Uncertainty and inference for verification measures. Weather Forecast. 22, 637–650 (2007) I.T. Jolliffe, D.B. Stephenson, Introduction, in Forecast Verification, ed. by I.T. Jolliffe, D.B. Stephenson (Wiley, Chichester, 2011), pp. 1–9 T.L. Kane, B.G. Brown, Confidence intervals for some verification measures – a survey of several methods, in Preprints, 15th Conference on Probability and Statistics in the Atmospheric Sciences (American Meteorologic Society, Asheville, 2000) R. Krzysztofowicz, The case for probabilistic forecasting in hydrology. J. Hydrol. 249, 2–9 (2001) S.J. Mason, N.E. Graham, Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation. Q. J. Roy. Meteorol. Soc. 128, 2145–2166 (2002) S.J. Mason, G.M. Mimmack, The use of bootstrap confidence intervals for the correlation coefficient in climatology. Theor. Appl. Climatol. 45, 229–233 (1992) J.E. Matheson, R.E. Winkler, Scoring rules for continuous probability distributions. Manag. Sci. 22, 1087–1095 (1976) A.H. Murphy, Skill scores based on the mean-square error and their relationships to the correlationcoefficient. Mon. Weather Rev. 116, 2417–2425 (1988) A.H. Murphy, What is a good forecast – an essay on the nature of goodness in weather forecasting. Weather Forecast. 8, 281–293 (1993) A.H. Murphy, Forecast verification, in Economic Value of Weather and Climate Forecasts, ed. by R.W. Katz, A.H. Murphy (Cambridge University Press, Cambridge, 1997), pp. 19–74 A.H. Murphy, R.L. Winkler, A general framework for forecast verification. Mon. Weather Rev. 115, 1330–1338 (1987)

44

A.A. Bradley et al.

A.H. Murphy, R.L. Winkler, Diagnostic verification of probability forecasts. Int. J. Forecast. 7, 435–455 (1992) D.N. Politis, J.P. Romano, The stationary bootstrap. J. Am. Stat. Assoc. 89, 1303–1313 (1994) J.M. Potts, Basic concepts, in Forecast Verification, ed. by I.T. Jolliffe, D.B. Stephenson (Wiley, Chichester, 2011), pp. 11–29 D.J. Seo, H.D. Herr, J.C. Schaake, A statistical post-processor for accounting of hydrologic uncertainty in short-range ensemble streamflow prediction. Hydrol. Earth Syst. Sci. Discuss. 3, 1987–2035 (2006) D.B. Stephenson, Use of the “odds ratio” for diagnosing forecast skill. Weather Forecast. 15, 221–232 (2000) J.E. Thornes, D.B. Stephenson, How to judge the quality and value of weather forecast products. Meteorol. Appl. 8, 307–314 (2001) D.A. Unger, A method to estimate the continuous ranked probability score, in Nineth Conference on Probability and Statistics in Atmospheric Sciences (American Meteorological Society, Virginia Beach, 1985), pp. 206–213 E. Welles, S. Sorooshian, G. Carter, B. Olsen, Hydrologic verification – a call for action and collaboration. Bull. Am. Meteorol. Soc. 88, 503 (2007) D.S. Wilks, Statistical significance of long-range “optimal climate normal” temperature and precipitation forecasts. J. Climate 9, 827–839 (1996) D.S. Wilks, Sampling distributions of the Brier score and Brier skill score under serial dependence. Q. J. Roy. Meteorol. Soc. 136, 2109–2118 (2010) D.S. Wilks, Statistical Methods in the Atmospheric Sciences, 3rd edn. (Academic, Amsterdam, 2011), p. 704 H. Zhang, T. Casey, Verification of categorical probability forecasts. Weather Forecast. 15, 80–89 (2000)

Verification of Meteorological Forecasts for Hydrological Applications Eric Gilleland, Florian Pappenberger, Barbara Brown, Elizabeth Ebert, and David Richardson

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Meteorological Forecast Verification Aspects of Interest to Hydrology . . . . . . . . . . . . . . . . . . . . 2.1 Variables and Verifying Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Benchmark and Skill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Threshold-Based Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Verification Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Averaging, Smoothing, and Accumulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Issues for the Verification of Point Forecasts in Meteorology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Matching Forecasts and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Dealing with Inhomogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Verifying Ensemble Forecasts for Point Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Issues for the Verification of Gridded Forecasts in Meteorology . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Sources of Gridded Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Choosing and Preprocessing Gridded Observations for Verification Analysis . . . . . . . 4.3 Double-Penalty and Small-Scale Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Methods for Verifying High-Resolution Gridded Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Filter Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Displacement Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Application of Spatial Techniques to Meteorological Ensemble Forecasts . . . . . . . . . . . . . . . . .

2 3 3 4 4 5 5 6 6 8 9 9 9 11 12 13 13 14 15 16

E. Gilleland (*) • B. Brown Research Applications Laboratory, Weather Systems and Assessment Program, National Center for Atmospheric Research NCAR, Boulder, CO, USA e-mail: [email protected]; [email protected] F. Pappenberger • D. Richardson European Centre for Medium-Range Weather Forecasts, ECMWF, Reading, UK e-mail: fl[email protected]; [email protected] E. Ebert Research and Development Branch, Bureau of Meteorology, BoM, Melbourne, Australia e-mail: [email protected] # Springer-Verlag GmbH Germany 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_4-2

1

2

E. Gilleland et al.

7 Verification of Extreme Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 New Scores for Deterministic Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Verifying Extreme Events for Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Extreme Forecast Index (EFI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Inference in the Presence of Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20 20 20 21 23 25 25

Abstract

This chapter illustrates how verification is conducted with operational meteorological ensemble forecasts. It focuses on the main aspects of importance to hydrological applications, such as verification of point and spatial precipitation forecasts, verification of temperature forecasts, verification of extreme meteorological events, and feature-based verification. Keywords

Forecast verification • Model evaluation • Spatial statistics • Spatial forecast verification • Extreme-value verification

1

Introduction

Operational meteorological ensembles from numerical weather prediction (NWP) are regularly evaluated and verified to ensure that the forecasts are useful, to understand and monitor their quality and understand deficiencies to guide improvements of the forecasts. For example, Fig. 1 shows a verification of ensemble

Fig. 1 Continuous rank probability skill score of ECMWF, JMA, UKMO, and NCEP for precipitation accumulated over 24-h and 3-day lead time (extratropics – poleward of 20  ). The number next to the centers is the average score over the 5-year period. Skill is measured against climatology

Verification of Meteorological Forecasts for Hydrological Applications

3

forecasts from various weather forecasting centers for precipitation over the extra tropics. Such forecasts are then used to drive hydrological models and produce forecasts for flood, drought, water resources, and other hydrological relevant variables. In this chapter, some standard and more cutting-edge verification methods from meteorology that are relevant to hydrology are described. In particular, it is illustrated how verification is conducted with operational meteorological ensemble forecasts with a focus on the main aspects of importance to hydrological applications, such as verification of point and spatial precipitation forecasts, verification of temperature forecasts, verification of extreme meteorological events, and featurebased verification.

2

Meteorological Forecast Verification Aspects of Interest to Hydrology

2.1

Variables and Verifying Data

Meteorological forecasts produce a large number of variables both at single locations and on spatial fields. Some of them are directly used in hydrological models (e.g., precipitation or 2 m temperature), while others, such as geopotential height at 500 hPa or tropospheric moisture transports, influence hydrological predictability indirectly. For example, the flooding in Central Europe in 2013 was caused by a certain synoptic weather pattern which typically leads to heavy precipitation (Haiden et al. 2014). Atmospheric moisture transport is a prerequisite for heavy precipitation and, although both are correlated, they are not identical. Indeed, predictions for atmospheric moisture transports have a higher skill than precipitation forecasts. Therefore, verification of synoptic patterns is important, but cannot stand on its own if the forecast verification is supposed to be useful for hydrology. The evaluation of surface variables is essential. The type and property of surface variable that is evaluated may significantly differ according to the hydrological application. A flash flood forecasting system, for instance, is interested in peak precipitation rates, whereas a hydrological forecast used for navigation may be more focused on daily average precipitation. In a tropical forecasting system, evaporation plays a major role and the quality of a forecast of evaporation is of high importance. Forecasts in snow-dominated catchments are more focused on forecasting snow melt, temperatures, and rain or snow events. This means that it is difficult for any meteorological forecasting center or provider to produce evaluation metrics that are relevant to all users; rather, many of them will be of general interest and some designed for specific end users. Hydrologists also need to bear in mind that meteorological forecasts are sometimes verified against spatial “analyses.” An analysis field is often derived from many sources of observations (e.g., ground based or satellites) and merged in a data assimilation framework with a meteorological model. Hence, error structures will depend on the quality of the individual components. However, it is very often the

4

E. Gilleland et al.

only way to evaluate forecasts in a data-sparse environment on a larger scale (e.g., global as it is done for the Global Flood Awareness System; Alfieri et al. 2012). Such an analysis is not equivalent to observations, which can have different error structures and dependencies. Therefore, verification against analyses can lead to different results than verification against observations. This issue is considered further in Sect. 4. In addition, most hydrological models are calibrated using observations. Therefore, deriving forecast performance quality from an analysis field needs to be treated with necessary caution. It may be necessary for hydrologists to conduct their own forecast verification, based on the same type of observations used for the calibration of the hydrological model.

2.2

Benchmark and Skill

To fully understand the performance and value of a forecast, it is necessary to evaluate forecasts against a benchmark. The most widely used hydrological score for benchmarking is the one introduced by Nash and Sutcliffe (1970), in which the hydrological model is compared against a climatological mean flow. A climatological reference is also widely used in meteorological verification. The choice of the benchmark strongly influences whether a model has skill or not (Pappenberger et al. 2015). For example, if one chooses to evaluate hydrological forecasts against a forecast that always predicts zero discharge, then it is very likely that the forecast will be classified as skillful. A more difficult benchmark, e.g., discharge persistence (last observed flow), may lead to lower skill. Hydrologists who use meteorological verification need to be aware that a skillful precipitation forecast does not necessarily mean that the discharge forecast will be skillful (and vice versa!). The hydrological model is a nonlinear transformation of the meteorological forecasts. For example, a climatological precipitation distribution (e.g., mean precipitation) will not lead to a climatological discharge distribution (e.g., mean discharge).

2.3

Threshold-Based Evaluation

Nonlinear transformation of a meteorological forecast through the hydrological model is important to consider when applying and interpreting verification metrics that are based on threshold exceedances. Flooding, for instance, is usually caused by intense precipitation exceeding a certain threshold for a certain time period. The exact characteristics of a meteorological event that may cause a flood depend strongly on particular characteristics of a catchment and its initial conditions (Merz and Bloeschl 2003). Standard, threshold-based meteorological evaluation approaches (e.g., precipitation exceeding 50 mm in 24 h) are therefore only relevant if the threshold considered is also relevant in the context of the hydrological application envisaged.

Verification of Meteorological Forecasts for Hydrological Applications

2.4

5

Verification Area

Meteorological verification often aggregates scores over latitude-longitude boxes corresponding to the grid of the model or the verifying analysis. Catchments are irregularly shaped, and a heavy precipitation event predicted in a rectangular box may actually fall in a neighboring catchment and hence produce a poor hydrological prediction. In addition, mean skill scores over the entire continent will only partially represent the true skill of even the largest catchments in the continent. Therefore, it is important to ensure that meteorological verification is done on an appropriate scale for the hydrological application in which it is used. Hydrologists may also need to assess the meteorological forecasts at the basin or subbasin scale to better understand the quality of the hydrological forecasts.

2.5

Averaging, Smoothing, and Accumulation

Hydrological models integrate components of a meteorological forecast over space, time, and multiple variables in a nonlinear way. For example, water from different parts of the Rhine catchment takes different amounts of time to reach the outlet. A drop of precipitation takes 7 days from the upper part of the catchment to the outlet, whereas precipitation falling close to the outlet of the catchment needs only 1 day. Therefore, a discharge prediction at the outlet will incorporate forecasts over a range of days as well as observations from multiple days. As an idealized example, assume a catchment has an upper, medium, and lower section. The travel time from the upper section to the outlet is 3 days, the travel time from the medium section to the outlet is 2 days, and the travel time from the lower section to the outlet is 1 day. In this imaginary catchment, all precipitation becomes discharge and no water is lost. A discharge forecast issued today for tomorrow will only contain precipitation from the past observations (observations from 2 days ago for the upper part, yesterday’s precipitation for the medium part, and today’s precipitation for the lower part). The 2-day forecast will incorporate the precipitation from the 1-day forecast in the lower areas of the catchment, today’s observations in the medium part and yesterday’s observations in the upper part. The 3-day discharge forecast still carries observed precipitation in the upper part from today and 1 day as well as 2 days of precipitation forecast for the lower and medium part, respectively. Only a 4-day discharge forecast will be purely dependent on meteorological precipitation forecasts of lead time 1, 2, and 3 days. Therefore a hydrological forecast not only spatially integrates but also mixes lead times and observations in a physically coherent manner, which would be difficult to assess in a precipitation-only verification study. In addition, as already mentioned, a discharge forecast is also influenced by forecast variables other than precipitation (e.g., evaporation, temperature), neatly dictating a physically coherent correlation structure between the variables. The size of the catchment is therefore clearly of importance. It has been shown that forecasts for larger catchments tend to have greater skill simply due to the larger

6

E. Gilleland et al.

averaging effect. However, as the previous example illustrates, an incorrect spatial covariance structure can lead to an amplified loss in skill in the discharge predictions of a catchment, similar to the double-penalty effect (see Sect. 4.2). Not all hydrological models have a grid-type structure, some of them are lumped. Calculating verification scores for a spatial field, a spatial average, or a single point in a catchment can lead to significantly different results; hence an evaluation of meteorological forecasts for hydrological applications has to be informed by the choice of the hydrological model. In addition, not all parts of the catchment will contribute equally to the average error of the discharge forecast. Imagine a catchment in which one part has very high soil moisture, for example, due to specific geology, and another part does not. In the first part of the catchment, high soil moisture will lead to more discharge, whereas in the part with low soil moisture, most precipitation will become part of a groundwater store. This result means that any forecast errors in the high soil moisture area will be amplified, whereas in the other part, they will be damped. A simple spatial averaging of errors may not give appropriate information for a hydrological forecaster. In summary, most meteorological forecast evaluations will have to be performed in conjunction with hydrological skill evaluations to allow for a full understanding of the forecast performance and allow for efficient diagnostics to improve the coupled hydrometeorological forecasting system.

3

Issues for the Verification of Point Forecasts in Meteorology

3.1

Matching Forecasts and Observations

While NWP forecasts are often verified against spatial analyses, measurements made at point locations remain the “gold standard” for small-scale accuracy. NWP meteorological forecasts of surface weather variables, such as temperature and precipitation, are routinely verified against observations from the global network of surface reporting stations coordinated by the WMO (2014). These synoptic observations (often referred to as SYNOPs) are made using a common reporting procedure and measurement standards. Typically, the gridded NWP data are interpolated to each observation location, often using the nearest model grid-point value for precipitation and a bilinear interpolation from the four nearest model grid points for temperature. Scores, such as bias, root-mean-square error (RMSE), standard deviation of error and mean absolute error (MAE) for deterministic forecasts, and Brier score and continuous ranked probability score (CRPS) for probability forecasts, are then computed and aggregated over geographical regions and over a large set of cases (months or seasons). For precipitation, scores are commonly calculated for a number of defined thresholds (e.g., 10 mm in 24 h), using a variety of contingency tablebased measures including the Peirce skill score (PSS, sometimes called “true skill score”), the equitable threat score (ETS), and frequency bias index (FBI). Definitions of these and other commonly used verification scores can be found in Wilks (2011),

Verification of Meteorological Forecasts for Hydrological Applications

7

Fig. 2 Verification of 2-m temperature forecasts against European SYNOP data on the GTS for 60-h (nighttime, blue) and 72-h (daytime, red) forecasts. Lower pair of curves shows bias, and upper curves are standard deviation of error

and examples of their application to NWP forecast verification are given by McBride and Ebert (2000) and Wolff et al. (2014). A number of issues with this basic approach are discussed below. Nevertheless, it can provide useful information about the overall performance of an NWP system, highlighting important biases and illustrating changes in performance over time. For example, Fig. 2 shows the trend in mean error and standard deviation over the last 10 years of error for 2-m temperature forecasts over Europe for the ECMWF highresolution model (HRES). Verification is against synoptic observations available on the Global Telecommunication System (GTS). A correction for the difference between model orography and station height was applied to the temperature forecasts, but no other post-processing was applied to the model output. There is a clear seasonal cycle in the scores as well as a general improvement over time in the standard deviation of the error. Differences between nighttime and daytime errors are most apparent in the biases. A recurring feature of the 2-m temperature forecast in recent years is a negative nighttime temperature bias in winter and early spring. Although Fig. 2 provides a useful summary, it inevitably hides much valuable information. For example, there is a significant geographical variation in bias due to different local climatological conditions, and a substantial part of the bias in some locations is related to particular meteorological situations. Too-rapid afternoon cooling in snow-covered forested areas in Scandinavia is one example that contributes to the overall cold bias. A number of issues need to be addressed when verifying NWP forecasts against point observations. A fundamental issue is that the NWP model generates output as an area average over a grid box, with a grid length that can be only a few kilometers for a high-resolution limited area model or a few tens of kilometers for a global ensemble forecast system. If an EPS runs on a 30-km grid, the precipitation at each

8

E. Gilleland et al.

grid point should be seen as representing the average value for a 1,000-km2 area. In a situation with a small-scale severe weather event, such as an intense summer storm, there can be very large differences between the rainfall measured at an individual station and the average value over a larger region representative of a model grid box. There also can be significant discrepancies between model and observed temperatures: stations located within the same model grid box may have significantly different climatologies due to local variations, especially in areas of complex orography, coastal areas, or urban locations (urban heat island effects are generally not represented in NWP models). Such basic differences between the modeled variable and the observations are often referred to as representativeness issues. It is important to take account of these in forecast verification, especially in the verification of extreme events (Ebert et al. 2003; Goeber et al. 2008).

3.2

Dealing with Inhomogeneity

Observation sites are distributed very inhomogeneously across the globe. Even in Europe, where the meteorological ground station density is relatively high, less than 10 % of land grid boxes (at 16-km resolution) contain one or more SYNOP stations that provide real-time data for general verification use. Thus, there is considerable undersampling, which renders verification results for quantities such as precipitation less robust. This situation becomes a particular issue when the duration of the verification sample is relatively short, such as is generally the case for studies that compare current and future operational NWP model configurations, when conditional verification is performed, or in the context of verification of extremes. In order to generate robust results, it is often necessary to aggregate scores over a wide geographical area as well as over a significant period of time (typically a 3-month season). Differences in climate between stations can affect some skill scores (Hamill and Juras 2006). One way to address this issue is to scale the quantity to be verified by the local climate of the station. Doing so allows homogeneous samples to be accumulated over larger areas and in mountainous domains. For example, choosing a precipitation threshold as a quantile of the climate distribution may be more appropriate than selecting an absolute threshold such as 5 mm/day, which may occur much more frequently at one location than another (Jenkner et al. 2008). The Stable Equitable Error in Probability Space (SEEPS) score (Rodwell et al. 2010; Haiden et al. 2012) was developed to mitigate many of these issues as far as possible for precipitation forecasts. SEEPS assesses rainfall forecasts in three categories (no rain, light rain, and heavy rain), where the boundary between “light” and “heavy” rain marks the highest one third (1/3) of the climatological cumulative rainfall distribution. The representativeness is further accounted for by applying the verification to precipitation accumulated over 24-h periods, which provides more relevant results than verification of instantaneous fields, such as precipitation rates. North et al. (2013) showed that SEEPS can also be used successfully for 6-h accumulations.

Verification of Meteorological Forecasts for Hydrological Applications

3.3

9

Verifying Ensemble Forecasts for Point Locations

The scores described in the previous two subsections can all be applied both to the individual members of an ensemble forecast and to the ensemble mean, verified as a deterministic forecast (e.g., Duda et al. 2014). One needs, however, to be aware of the intrinsic smoothing of the ensemble mean field in the interpretation of the verification results. The information on distributions contained in an ensemble forecast also needs to be assessed. Different scores are used to assess the probabilistic aspects of ensemble forecasts. Ensemble forecasts for temperature and precipitation are both often verified using the continuous ranked probability score (CRPS), which assesses the probability distribution derived from the ensemble. All the issues of representativeness discussed above apply equally to the ensemble verification. An additional issue for the verification of ensemble forecasts in meteorology is in the interpretation of the spread and reliability of an ensemble prediction system (EPS). In a well-tuned ensemble system, the RMS error of the ensemble mean (EM) forecast should, on average, match the ensemble standard deviation (spread) (e.g., Molteni et al. 1996); note, however, that care must be used in selecting the method to estimate the standard deviation (Fortin et al. 2016). Typically, an EPS is underdispersive for surface weather elements, indicating that the EPS does not fully capture all the uncertainty in the forecasting system. However, it is important to account for errors, and particularly the representativeness, of the observations when comparing the EM error (computed against observations) with the spread (computed purely between model fields on the model grid). The representativeness can account for a substantial proportion of the apparent underdispersion of the ensemble forecast. A number of standard measures that assess the calibration or reliability of EPS rank histograms (PIT diagrams) and reliability diagrams are sensitive to representativeness issues, and care needs to be taken in interpretation of the results if the observation uncertainty is not properly taken into account (Saetra et al. 2004).

4

Issues for the Verification of Gridded Forecasts in Meteorology

4.1

Sources of Gridded Observations

Verification of gridded forecasts from NWP models against gridded observations addresses issues of spatial completeness and avoids the issue of different scales being represented by the model grid and point observations. Gridded observations, especially if available at high resolution, are particularly useful in estimating rainfall over catchments (which are on the mesoscale) in order to simulate and predict streamflows. Mesoscale surface analyses, in particular, are useful for verifying model predictions of hydrometeorological variables such as temperature, wind, and humidity. Their higher resolution (1–10 km, typically) represents topographical effects fairly

10

E. Gilleland et al.

well and offers the possibility to verify forecasts from both high- and low-resolution models. A disadvantage of many, if not most, mesoscale analyses is the use of a model short-range forecast as the first-guess field for the analysis of the surface-level observations. In this case, model biases are present in the final product to a greater or lesser degree, depending on the density of surface observations and how strongly the individual observations are weighted in the analysis scheme (Lazarus et al. 2002). The effect on the verification, particularly in model intercomparisons, is that the model being verified may be incorrectly assessed as more accurate than it actually is if it is the one supplying the first guess. If the observations are sufficiently dense in space, a model-independent mesoscale analysis (e.g., the Vienna Enhanced Resolution Analysis; Steinacker et al. 2000) would provide a better gridded dataset to use in verification. Information on the data density used in the analysis should be reported. Rain gauge analyses have long been used to verify gridded precipitation forecasts. Most nations collect daily rainfall data at manned and automatic stations, and these can be mapped onto a grid using kriging or other objective analysis techniques. The number of gauges measuring rainfall at sub-daily time scales is much lower, so it is less common to produce gauge-based rainfall analyses for hourly or 3-hourly accumulations. Some nations have volunteer observing networks that report daily rainfall outside of real time. These additional observations can substantially boost the overall number of gauges, warranting a second nonreal-time rainfall analysis that is more accurate than the real-time analysis. The accuracy of the gauge-based rainfall analysis depends on a number of factors. The most important is the station density, or the number of stations per analysis grid cell. Studies have shown that no improvement in the objective analysis scheme can adequately make up for a lack of sufficient observations (e.g., Bussieres and Hogg 1989). A related factor is the spatial and temporal resolution of the analysis – finer resolution is prone to greater errors due, again, to sampling considerations. The nature of the rainfall itself also impacts on the accuracy, with convective rainfall more difficult to estimate accurately than large-scale rainfall because its spatiotemporal coherence is lower. Gauge measurements can be affected by wind-induced undercatch; this is particularly important in storm situations or when the precipitation falls as snow. When using gauge analyses for verification, it is important to take these factors into account. Radar is another excellent source of gridded precipitation estimates in many parts of the world. The spatial resolution of radar data is typically 1–2 km, and the temporal resolution is 5–15 min. Radar quantitative precipitation estimates (QPE) are based on the reflection of microwave pulses by suspended hydrometeors, with the reflectivity transformed to rain rate through statistical fits to the data. Radar data first undergo a complex quality control process to remove artifacts from ground clutter, beam blockage, attenuation, range effects, biological targets, and anomalous propagation before they can be converted from reflectivity to rain rate. The traditional Z-R relationships that have been used for several decades are making way for more accurate QPE techniques based on data from newer dual-polarization radars

Verification of Meteorological Forecasts for Hydrological Applications

11

that transmit horizontal and vertical pulses to collect more information on droplet size, shape, and composition (Cifelli and Chandrasekar 2010). Even with the newer technology, there remain large uncertainties in the radar rainfall estimates. Radar QPE is more accurate within about 100 km from the radar, and bias correction using underlying rain gauge data is always recommended (Seo and Breidenbach 2002). QPE mosaics from multiple radars in a network may provide high-resolution rainfall estimates over large domains that can be used for weather monitoring and nowcasting, rainfall estimation, assimilation into NWP models, and model verification. These mosaics are more spatially complete, although there may still be gaps between the radar rainfall fields. Often radar and gauge data are blended to produce a multisensor analysis that takes advantage of the greater accuracy of gauge measurements and the better spatial and temporal coverage of the radar observations, giving spatially complete high-resolution fields of hourly precipitation (e.g., NWS 2014; EUMETNET 2014). Satellite observations offer an alternative source of gridded precipitation estimates that can be used for model verification, especially where gauge networks are inadequate and radar QPE are unavailable. Microwave rainfall estimates from several low earth-orbiting satellites, sometimes augmented with IR rainfall or cloud motion information from geostationary satellites, are combined into quasiglobal rainfall fields at a number of centers. Because they are not limited to land areas, as is the case with gauge and radar, they can observe rainfall from weather systems approaching the land, which is very important in the case of tropical cyclones. Satellite rainfall can also be used to fill gaps in radar mosaics over land. Kidd and Levizzani (2011) and IPWG (2016) provide a useful review of the status of satellite precipitation estimation. Temporal and spatial resolutions range from 30 min to 3 hourly, 8 km to 0.25 latitude/longitude, respectively. Most operational products are underpinned by radar and radiometer precipitation estimates from the TRMM (Tropical Rain Measuring Mission) satellite or more recently the GPM (Global Precipitation Measurement) core satellite launched in 2014, which has advanced detection capabilities enabling measurement of light rain and snow. Nevertheless, satellite precipitation retrievals based on passive microwave or IR data struggle to detect cool season rainfall from low clouds over land, leading to severe underestimates during mid-latitude winter. Bias correction and/or blending with gauge data can greatly improve the accuracy by reducing the effects of retrieval and sampling errors, as done with the TRMM 3B42 and GPM integrated multisatellite retrievals for GPM (IMERG) product (Huffman et al. 2007, 2013).

4.2

Choosing and Preprocessing Gridded Observations for Verification Analysis

In many cases, for hydrological purposes, the most important aspect of the forecast to get right is the location of the rain (it must be over the catchment of interest), followed by the quantitative amount. Spatial verification, discussed in the next section, is ideally suited for assessing the predicted location of rain, but requires

12

E. Gilleland et al.

high-quality spatial observations. Gauge analyses, radar, and satellite all provide gridded rainfall observations with varying degrees of detail, accuracy, and coverage. The choice of data to use for verification will depend on the availability of observations and the spatial and temporal characteristics of the forecast. For verification of daily rainfall from a global NWP model, a rain gauge analysis may be adequate, provided the surface network is sufficiently dense to justify a grid that is similar in scale to the model. Where available, radar data that has been bias corrected and blended with gauge data provides additional spatial and temporal detail and is well suited for verifying shorter period accumulations, for example, from higherresolution model forecasts. Satellite rainfall is the least accurate of the three data sources discussed above and should be used with caution when verifying model QPFs. After identifying a gridded observation dataset, if the forecasts and observations are not at the same grid scale, it is necessary to map one or both onto a common grid for verification. The fairest approach is to map the finer scale product onto the grid of the coarser product because this method does not make unrealistic demands on the resolution of the coarser product. However, for some types of evaluation, such as verification of catchment average rain, it may make more sense to remap onto a finer grid in order to “cookie cut” the forecast and observations to fit the irregular area. Methods used for grid remapping include interpolation, nearest-neighbor sampling, and area-weighted averaging. The choice of remapping method affects the verification results and should be appropriate for the variable being verified. For rainfall, area-weighted averaging is recommended in preference to interpolation because it preserves the total rainfall (important for hydrological applications) and does not smooth the intensity distribution (Accadia et al. 2003). For temperature, verification on a grid, bilinear interpolation is appropriate in x-y space, but it also is necessary to adjust the temperature for any vertical differences in altitude between the forecast and observed grid points. This adjustment can be done using a standard lapse rate or a lapse rate derived from the model forecast or analysis (Minder et al. 2010).

4.3

Double-Penalty and Small-Scale Errors

The inherent predictability of weather is usually lower at fine scales where local convective processes and topographic effects have a strong influence than at synoptic scales where larger-scale dynamical processes control the evolution of the fields. When verification is conducted at fine scales, forecast errors tend to be more pronounced because correctly predicting the fine detail is more difficult than predicting the larger-scale structure. For example, a model may predict precipitation at a certain location and time, but the observed precipitation occurs at a slightly different location and time. This displacement leads to the “double penalty” in verification, where the forecast is penalized for predicting rain when/where it did not occur and again for failing to predict rain when/where it did occur. The forecast may still provide very useful information on structure and intensity to a forecaster

Verification of Meteorological Forecasts for Hydrological Applications

13

(who does not interpret the forecast at face value), but it typically scores worse than a coarser-resolution forecast using standard grid-point-to-grid-point verification metrics, such as the RMSE (cf. Mass et al. 2002; Gilleland et al. 2009, 2010a). Moreover, forecast verification at finer resolution puts greater demands on the observations. The difficulty in measuring precipitation accurately was discussed earlier. Spatial and temporal aggregation dampens the influence of random errors in the observations, increasing the accuracy of the analysis, but, at the same time, it removes useful information on location, timing, and intensity of observed events that may be relevant to the hydrological response. Small-scale errors in the observed rainfall can lead to poorer verification scores by introducing an additional error term (e.g., Ciach and Krajewski 1999), leading the user to believe that the forecasts are poorer than they truly are (Mass et al. 2002; Gilleland et al. 2009, 2010a).

5

Methods for Verifying High-Resolution Gridded Forecasts

5.1

Overview

To address the challenges to verification presented by high-resolution forecasts, numerous new verification methods have been proposed in order to inform users about forecast performance in a spatially meaningful way (Gilleland et al. 2009, 2010a; Brown et al. 2011; Gilleland 2013). The location and amplitude errors diagnosed by these schemes give hydrological users an indication of how much the meteorological forecasts can be trusted to predict the right amount of rainfall in the right place. The methods generally fall into two main categories: filter and displacement. The category of filter methods (Ebert 2008) includes neighborhood methods, utilizing smoothing filters, and scale separation methods, based on band-pass filters. Smoothing filters can be applied to one or both of the forecast and observed raw or thresholded (binary) fields. The aim is to assess the performance at different scales and diagnose at which scale the forecast has sufficient accuracy to be useful. Bandpass filters quantify forecast performance at distinctly separate levels of detail that are considered to represent physically meaningful scales of interest (e.g., synoptic versus convective scale). Filter methods can be used to verify all types of precipitation fields and are particularly well suited for “messy” situations that may have no clearly identifiable rain features. Displacement, or location, methods are primarily aimed at informing about positional errors and are especially well suited for situations when clearly identifiable rain features are present. In most cases, they include metrics for verifying intensity as well. This category is further divided into feature-based methods, which first identify features within the field (e.g., attempting to identify individual storms), and field deformation techniques, which are applied to the entire field. However, the latter techniques are often applied to specific features in addition to the entire field (e.g., Gilleland et al. 2008). Field deformation techniques attempt to

14

E. Gilleland et al.

morph the natural grid of the forecast field into one that better aligns with the observation field before applying the traditional grid-point-to-grid-point verification.

5.2

Filter Methods

The first class of filter methods is often called neighborhood methods. Here a smoothing filter replaces each grid-point value with an average or other statistical quantity based on the distribution of values from nearby grid points. The nearby grid points are determined by two main factors: the desired shape of the neighborhood (e.g., square or disk) and the size of the neighborhood. The simplest neighborhood method is the upscaling method, which simply smooths the forecast so that small-scale and location errors have a much-reduced impact on the results. This method is often used to compare models with different spatial resolutions (WMO 2008). When a smoothing filter is applied to a binary field (e.g., rain exceeding a given threshold), the value at a given grid point is interpreted as the event frequency within the neighborhood centered on that grid point. The fractions skill score (FSS) is one of the more commonly used smoothing filter methods at operational centers including the UK Met Office and MeteoSwiss (Roberts and Lean 2008; Weusthoff et al. 2010; Duc et al. 2013; Mittermaier et al. 2013). For each neighborhood size, the FSS is calculated as Xn

ðpb  p Þ2 s¼1 s X s n p2 pb2 þ s¼1 s s¼1 s

FSS ¼ 1  Xn

where p and pb represent the frequency of observed and forecast events, respectively, in the neighborhood centered on point s, and n is the number of grid points in the field. A perfect forecast at a given scale achieves FSS = 1. One useful application is to identify at which scale (neighborhood size) the forecast has better skill (in terms of FSS) than a randomly distributed forecast with mean observed frequency p (Mittermaier and Roberts 2010). Scale separation methods utilize a spectral representation of a function, whereby the function is written as a sum of coefficients multiplied by basis functions. For example, in Fourier decomposition, the function is represented by sine and cosine basis functions, whereas a wavelet decomposition utilizes locally supported basis functions from a relatively wide class of possibilities. Another smoothing filter approach borderlines on this band-pass category in that it applies a wavelet decomposition to one or both fields, sets the smallest wavelet coefficients to zero, and recomposes the field in question (Briggs and Levine 1997). The result is a smoother field so that the technique is analogous to the upscaling method. Briggs and Levine (1997) also apply traditional verification measures like anomaly correlation and RMSE to the component fields of the wavelet decomposition, which analyzes forecast performance in a similar way as applying verification

Verification of Meteorological Forecasts for Hydrological Applications

15

measures to a series at each frequency of the spectral decomposition of a time series. This latter technique is a scale separation, or band-pass filter, technique. Casati et al. (2004) and Casati (2010) apply a similar technique, but to the differences in binary fields of the forecast minus the observations. One note when applying wavelet decomposition techniques, especially for precipitation fields, is that care must be taken in choosing the specific wavelet basis function. For example, the Daubechies wavelet used by Briggs and Levine (1997) will generally result in wavelet artifacts that could lead to spurious results; for precipitation, a Haar wavelet as used by Casati et al. (2004) might be better suited.

5.3

Displacement Methods

The two primary categories for displacement methods include the field deformation approaches, which are applied to the entire fields, and the feature-based approaches, which are applied to individual features (such as precipitation areas) within the fields. The field deformation methods are all applicable on the features in the latter case. The feature-based approaches allow for additional analyses, such as a spatially meaningful contingency table and substantially greater diagnostic information about feature attributes that can be aggregated nicely over time to give highly informative results about the character of forecast errors. Some of the earliest field deformation approaches attempted to realign the forecast grid so that the intensity values better match those of the observation field. The magnitude of the shifting, distortion, and amplification required to match the forecast with the observations is a measure of the forecast error. The simplest field deformation methods simply move the entire grid around rigidly until some objective function is minimized (e.g., Hoffman et al. 1995; Ebert and McBride 2000). Other techniques similarly minimize an objective function, but individual grid points can move independently (usually after performing a rigid or affine transformation) in order to allow for nonlinear deformations of the grid (e.g., Alexander et al. 1998; Keil and Craig 2009; Gilleland et al. 2010c). Once a deformation has been found, numerous possibilities for summarizing information result. Direct inspection of vector fields showing the optimal deformation (possibly aggregated over time or space) allows for a visual display of region-specific location errors. Rigid transformations give readily interpretable information about the largescale amount of location errors in specific directions. Simpler than warping a field is to apply a binary image summary measure (e.g., Hausdorff distance, partial Hausdorff distance, Baddeley’s delta metric, mean error distance, etc.) to the forecast and observed binary fields (such as the event field determined by where the value exceeds a given threshold; see e.g., Gilleland 2011; Schwedler and Baldwin 2011). Such methods, generally, all give the same information, are quick to compute, and give useful summaries of forecast performance in terms of location/pattern errors. AghaKouchak et al. (2010) introduced three geometrical binary field summary measures that can be used to compare the geometrical properties of predicted and

16

E. Gilleland et al.

observed spatial fields. The connectivity index measures the connectivity and organization of the field, the shape index measures its roundness, and the area index measures the dispersiveness of the pattern. This last measure was also introduced as a property for individual features in the technique introduced by Davis et al. (2006a, b, 2009) described below. Feature-based methods, sometimes referred to as object based, give information about forecast performance for specific features within a field. For example, the contiguous rain area (CRA) method of Ebert and McBride (2000) is used at the Australian Bureau of Meteorology. It uses a rigid transformation to estimate the displacement error. The size, intensity, and pattern of the matched features in the forecasts and observations can be directly compared, and the fractional contributions from each type of error can be estimated. Various summary measures are included in the feature-based technique referred to as model object-based diagnostic evaluation (MODE, Davis et al. 2006a, b, 2009) along with other feature-specific information such as feature centroid, major axis angle, intersection area, etc. The MODE technique is used at the NWS Weather Prediction Center to operationally verify precipitation forecasts over the United States from a number of NWP models (http:// www.hpc.ncep.noaa.gov/verification/mode/mode.php). As an extension of the feature-based methods, the structure-amplitude-location (SAL) method of Wernli et al. (2008) is used at the Finnish Meteorological Institute and other operational centers. This method evaluates the realism of model precipitation features in a spatial region such as a hydrological basin, without attempting to match individual forecast and observed features. The location error is a function of the difference between the forecast and observed centers of mass of the rainfall fields. The differences between the forecast and observed structure (“peakiness”) and area-averaged amplitude reflect errors in the rainfall intensity distribution, which is important for runoff calculations. Many of the above methods are clearly aimed at handling spatial displacement errors in order to minimize the impact of double penalties. It is possible, and perhaps worthwhile, to know whether or not an error is actually a spatial displacement, or whether it is merely a timing error that subsequently results in an error of spatial displacement. Some work has been done to incorporate space and time into the verification analyses, but as these methods are newer, they are not covered in full here (e.g., Gilleland et al. 2010b; Bullock 2011; Zimmer and Wernli 2011; Ebert et al. 2013).

6

Application of Spatial Techniques to Meteorological Ensemble Forecasts

When considering the application of spatial verification methods to meteorological ensemble predictions on a spatial grid, the standard approaches for ensemble verification can be applied. In particular, ensemble predictions can be treated in three standard ways: (i) as a set of individual deterministic forecasts, (ii) as a probabilistic forecast, and (iii) as a distribution of forecasts.

Verification of Meteorological Forecasts for Hydrological Applications

17

With regard to (i), the individual ensemble member forecasts can be evaluated using any of the spatial methods described in Sect. 5, leading to a set (or distribution) of verification results; or the ensemble mean (or some other representative forecast) can be evaluated, which would provide a measure of the “average” spatial forecast performance (although losing some potentially useful information about the forecast uncertainty). With regard to approach (ii), the ensemble predictions on a grid can be transformed to a grid of probability values for a specific event (e.g., the exceedance of a threshold value of precipitation) either directly by counting the members exceeding the threshold or by application of a more advanced post-processing method. This approach leads to a spatial probabilistic forecast field for a specific event. The probability field itself often has spatially coherent structures that lend themselves well to application of spatial verification methods. These probability fields can then be compared to the observed binary fields associated with the same specified threshold (used to define the probabilities). Spatial verification methods can be applied to these derived probability fields in the same way that they are applied to the raw forecast fields. Although approach (ii) has not been applied frequently in practice, it appears that all of the spatial methods described in Sect. 4 could be useful in such evaluations. The feature-based approaches, in general, are well suited to alternative (iii), in which the characteristics of the features and their differences can be examined and summarized. For example, it can be highly informative to know if a forecast projects the right number, size, shape, location, and orientation of features within a study domain, regardless of whether or not it matches with the observations at every time step. Treating an ensemble of predictions as a set of individual deterministic forecasts [approach (i)] is perhaps the most straightforward option for evaluation of ensembles using the various spatial methods. In particular, any spatial method can be applied to the individual members. Then the performance of the members can be evaluated by ranking them according to a particular spatial verification measure (e.g., Keil and Craig 2007; Micheas et al. 2007). Alternatively, the distribution of a particular spatial statistic for one ensemble forecasting system can be summarized and compared to the distributions for competing systems. Overall performance of an ensemble can be summarized using information from the distribution of performance of the individual members. For example, Atger (2001) and Marsigli et al. (2008) applied neighborhood approaches to examine the distributions of characteristics of ensemble member performance, and Zacharov and Rezacova (2009) used an average FSS value (based on the FSS values for individual ensemble members) to summarize the spread-skill relationship for high-resolution ensemble predictions of precipitation. Application of feature-based verification approaches to ensemble forecast evaluation provides a straightforward approach toward merging spatial and ensemble forecast evaluation methods (Wernli et al. 2008). The diagram in Fig. 3 demonstrates three different approaches that might be taken to examine an ensemble feature prediction. The figure shows an ensemble forecast of a precipitation area (features F1, F2, and F3) compared to a single observed precipitation feature (note that, in this

18

E. Gilleland et al.

Fig. 3 Schematic of ensemble features and their properties. Features are defined by, for example, forecast and observed precipitation exceeding a threshold, T (Figure courtesy of Chris Davis (NCAR))

example, the intensity of precipitation is not considered; we are only concerned with the shape and location of the precipitation areas). If the ensemble predictions were converted to a probabilistic forecast, four probability values would be possible: 0, 0.33, 0.67, and 1 (depending on the overlap of the four forecast areas). Another approach for examining these forecasts would be to compute the average, as depicted in the bottom diagram in Fig. 3. Each of these two approaches has advantages and disadvantages in terms of the kinds of information that can be gleaned from the comparison. In the case of conversion to a probabilistic forecast, the resulting features have little in common with the shape of the original forecast features. This result also characterizes the mean feature (note that this is a general flaw in the evaluation of ensemble means rather than looking at the whole ensemble). Therefore, neither of these approaches ideally represents the information content of the whole ensemble forecast. An alternative approach is to examine the distributions of attributes (e.g., areas, intensities, locations, displacements) to understand how the overall characteristics of the ensemble features differ from those from the observed field. Gallus (2009) undertook one of the first applications of the feature-based approach to examine and compare ensembles of attributes of precipitation features. Specifically, he applied the CRA and MODE techniques to examine rain rate, volume, areal coverage, and displacement errors to examine spread-skill relationships and other attributes of ensemble performance. A more recent example based on the application of the MODE technique is shown in Fig. 4. In this example, the feature areas associated with high-resolution ensemble forecasts of precipitation are compared to the matched areas of the observed features. A great deal can be learned about the areas of the ensemble-based features compared to their matching observed feature. For example, many of the mean area values do not correspond well to the observed feature area, but, in many cases, one or more of the forecast areas are similar to the observed area. In addition, many of the mean forecast values are dominated by a single member. Similar analyses could be applied to other forecast

members mean

19

15000

20000

25000

Dominated by 1 member

5000

10000

Mean appears to be middle of attribute distribution

0

Matched Forecast Area (grid squares)

30000

Verification of Meteorological Forecasts for Hydrological Applications

0

5000

10000 15000 Observed Area (grid squares)

20000

25000

Fig. 4 Example of feature areas defined by application of the MODE technique to a highresolution (9-km) ensemble forecast of precipitation. Feature areas predicted by individual ensemble members are shown as blue dots; ensemble mean areas are depicted by star symbols in red. Grid squares have approximate dimensions of 9  9 km (Figure courtesy of Tara Jensen (NCAR))

attributes (e.g., intensity summaries within the features, orientation, shape, and location of the feature; cf. Micheas et al. 2007; Davis et al. 2006a, b, 2009; Lack et al. 2010). Such analyses are amenable to a variety of summaries that provide physically meaningful information to forecast developers, hydrologists, and other users, including more traditional verification measures, such as rank histograms and Brier scores. The development of approaches to utilize spatial methods to evaluate ensemble predictions is still in its infancy. While efforts toward implementation of these new methods have been underway for a number of years, they have yet to realize full acceptance and application. Ongoing and future efforts will demonstrate the full benefits of application of spatial methods for evaluation of ensemble predictions.

20

E. Gilleland et al.

7

Verification of Extreme Events

7.1

New Scores for Deterministic Forecasts

The ability of a forecast model to capture extreme meteorological events is a priority for hydrological concerns. However, the rarity of such events poses difficulties for their verification, both in terms of gaining meaningful information from summary scores, which are mostly aimed at informing about average performance, as well as in handling the associated estimation uncertainty that is often very high because of the reduced number of occurrences. Stephenson et al. (2008) summarize the behavior of several typical categorical verification scores as the base rate (i.e., the rate of occurrence of the observed event) deteriorates to zero. They proposed new summary statistics that better inform about a forecast model’s ability to predict rare events, which Ferro and Stephenson (2011) subsequently modified to address issues that were since raised with the originally proposed summaries. These new measures are called the extreme dependency index (EDI) and the symmetric extreme dependency index (SEDI). If F is the false alarm rate (number of false alarms divided by the number of nonobserved events) and H is the hit rate (number of hits divided by the number of observed events), the EDI and SEDI are given by EDI ¼ SEDI ¼

log F  log H log F þ log H

log F  log H  logð1  FÞ þ logð1  HÞ log F þ log H þ logð1  FÞ þ logð1  HÞ

Both of these scores have values of 1 for perfect forecasts and 0 for forecasts with no skill. The motivation for these new statistics comes from extreme-value theory. Specifically, a relationship exists between these statistics and an extreme (or tail) dependence index (cf. Coles 2001, Sect. 8.4; Reiss and Thomas 2007).

7.2

Verifying Extreme Events for Ensembles

One way to focus verification for probabilistic forecasts for specific subgroups of observations such as extremes is to use the threshold-weighted continuous rank probability score (CRPSt) (Gneiting and Ranjan 2011; Lerch 2012). This score can be conditioned on different discharge signatures (similar to weather regimes, see Lerch and Thorarinsdottir 2013). It is defined by ð

CRPS ðf , yÞ ¼ ðFðzÞ  1A ðy  zÞÞ2 uðzÞdz t

Verification of Meteorological Forecasts for Hydrological Applications

 with 1A ðy  zÞ :¼

21

1 if y  z  A 0 if y  z 2 =A

where F is the predictive cumulative distribution function corresponding to the probability density function f of the forecast and y is the observation. u is a nonnegative weight function of the forecast z, with ðzÞ ¼ 1A ðz  r Þ , which is equal to 1 for z values of the observation that are larger than or equal to a threshold rϵℝ (otherwise the function is 0). The score was used by Pappenberger et al. (2015) to evaluate river discharges for high and low flows in comparing different benchmarks and their fit for purpose for forecasting floods and droughts.

7.3

Extreme Forecast Index (EFI)

The extreme forecast index (EFI) was developed at ECMWF as a tool to provide forecasters with guidance on potential extreme weather events (Lalaurette 2003; Zsótér 2006). The EFI is a summary measure of the difference between the ensemble forecast distribution and the model climate distribution for a given weather element. It is specifically designed to highlight occasions when there is a significant shift in the ensemble forecast toward the climate extreme. The methodology of comparing a model forecast to an underlying climatology has been applied in both meteorology and hydrology (Lalaurette 2003; Thielen-del Pozo et al. 2009; Bartholmes et al. 2009; Cloke et al. 2010; Alfieri et al. 2011). This approach can be applied in areas where there are no or very few observations, presenting a major advantage to many other approaches. In addition, it can be seen as a post-processing or calibration method as it is computed entirely in the model space. One of the main assumptions associated with the EFI is that the behavior in terms of occurrence of anomalies in the model space is similar to that in the observation space. The EFI is defined as: ð1 2 p  Fð p Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dp EFI ¼ π pð 1  pÞ 0

where F( p) is a function denoting the proportion of ensemble members lying below pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the pth quantile of the climate record. The term pð1  pÞ, which takes its minimum for p = 0.5 and its maximum at both ends of the probability range, is used to give more weight to the tails of the distribution. This weighting can also be interpreted as using the Anderson–Darling (Anderson and Darling 1952) test as a modification of the well-known Kolmogorov–Smirnov test. Given that 0  F( p)  1, EFI values will lie in the same interval with values of 1.0 obtained when all the ensemble members are above (positive) or below (negative) the climate distribution. Dutra et al. (2013) present a semi-analytical technique to calculate the EFI.

22

E. Gilleland et al.

Precipitation 0.75

EFI Skill

0.7

0.65

0.6

0.55

0.5

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Year

Fig. 5 Performance of the EFI for precipitation at forecast day 4 (24-h accumulation over period 72–96 h ahead); an extreme event is taken as an observation exceeding 95th percentile of station climate, and curves show a four-season running mean of relative operating characteristic (ROC) area skill scores (final point includes spring March–May 2014)

The EFI is naturally extremely sensitive to the uncertainties in the underlying climatic distribution, which has been explored in detail by Zsótér et al. (2014). The climatic distribution needs to be derived from a large sample size in particular if certainty in the tails of the distribution is required. The EFI is routinely applied for variables such as temperature (Ghelli et al. 2010), precipitation (Haiden et al. 2014), snowfall (Tsonevsky and Richardson 2012), and wind (Petroliagis and Pinson 2012). It can be extended to many other areas, for example, to verify the amount of energy available for convection (CAPE, convective available potential energy). Meta-verification of the EFI has been performed using synoptic observations over Europe. An extreme event is judged to have occurred if the observation exceeds the 95th percentile of the observed climate for that station (calculated from a 15-year sample, 1993–2007). The ability of the EFI to detect extreme events is assessed using the relative operating characteristic (ROC) score. Results show that the EFI has substantial ability to provide early warnings of extreme events, confirming the subjective experience of forecasters, and that the performance has improved over recent years (Fig. 5). Other indices for extreme weather forecasting can also be useful. For example, the Shift in Probability Space-index (SPS) provides information on how extreme the tail is. By using both these parameters, the level of extremity could be revealed. In order to measure the abnormality of such extreme situations, the Shift of Tails-index (SOT) should also be considered (Zsótér 2006; Tsonevsky and Richardson 2012).

Verification of Meteorological Forecasts for Hydrological Applications

8

23

Inference in the Presence of Correlation

Comparing one forecasting system or model against another is a frequent application of forecast verification. In the context of ensemble prediction, when verifying ensembles of forecasts composed of multiple models, the question often arises as to which multi-model ensemble member or members are better than others. For example, in aggregating forecasts over the members (e.g., when finding the ensemble mean), one might want to weight better performing models more heavily than models that do not perform as well. In order to make such determinations, one might consider comparing summary statistics for the two models against the same observation. For example, one might consider testing the hypothesis that the difference in mean square error between two forecast members is zero (i.e., H0: E[MSE1] – E[MSE2] = 0) against the two-sided alternative that the difference is different from zero (Diebold and Mariano 1995; Hering and Genton 2011; Gilleland 2013). Correlation becomes a concern for conducting such a test because it will generally lead to overconfidence (lower variability) in the estimation of uncertainty concerning the difference. Correlation can come from dependence in time and space, as well as contemporaneous correlation between the two forecasts. Even if the two forecasts are completely independent of each other, contemporaneous correlation may still exist because of the use of the same set of observations in calculating the statistic of interest. The economic literature is relatively rich in terms of investigating this particular issue. In particular, the seminal work by Diebold and Mariano (1995) introduced the idea of comparing the accuracy of time series forecasts with a test for the null hypothesis of equal forecast accuracy between two competing models while accounting for temporal correlation (henceforth, the DM test). Giacomini and White (2006) unified much of this theory into a generalized framework that includes the evaluation of point, interval, probability, and density forecasts. Hering and Genton (2011) further refined the DM test procedure, extended it to the spatial domain, and applied the test to a wind speed data set. Importantly, they also showed that their modifications result in a test that has proper size and power even in the face of contemporaneous correlation. Moreover, it works for any loss function, as well as correlation, and does not require any distributional assumption for the underlying data, only that the resulting test statistic is distributed as a standard normal. Gilleland (2013) applied the test of Hering and Genton (2011) to quantitative precipitation forecasts and further incorporated methods to account for phase, or location, errors in addition to spatial correlation. The procedure provides a test for competing forecast models that accounts for spatial correlation and evaluates errors in location, which might be caused by timing errors or small-scale errors. The author found that accounting for such errors simultaneously with intensity errors is important in terms of the power of the tests. The additional deformation component in the testing procedure allows the test to better detect differences in skill when they actually exist.

24

E. Gilleland et al.

The DM test for two time series forecasts, F1 and F2, against the same set of observations, O, is carried out by first finding the loss, g(F1, O) and g(F2, O), at each time point (e.g., the loss might be MSE, absolute error, correlation, straight differences, etc.). The null hypothesis is that the expected difference between these two loss functions is zero, i.e., H0 : E½gðF1 , OÞ  E½gðF2 , OÞ ¼ 0: The difference in loss functions is called the loss differential, say d. The asymptotic distribution of the sample mean loss differential as the sample size goes to infinity is normal with (population) mean μ and variance 2πsd(0), obtained in such a way as to account for temporal correlation. The large-sample standard normal test statistic for forecast accuracy is therefore given by: d S ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2πsd ð0Þ=n where n is the length of the time series and d is the sample mean loss differential. The key to incorporation of correlation information is in estimating the spectral density at frequency zero, which is accomplished through a weighted sum of the sample autocovariance for a k-step forecast, which could be either a single forecast or a time series of forecasts with a given lead time. The result is a direct consequence that the variance of a sum is the same as the sum of the covariances of the lagged (in time, space, or both) terms. That is, 2π b s d ð 0Þ ¼

n   1 X ðdt  dÞ dtjτj  d : n τ¼ðk1Þ t¼jτjþ1 k1 X

A major drawback to the originally proposed empirical estimate above is that the sum over all lagged empirical terms is always zero, and clearly the variance is not generally zero. Hering and Genton (2011) suggest modifying the above estimate by replacing the empirical autocovariances with a fitted parametric covariance model, which, by definition, is nonnegative definite, so that the terms will not cancel each other out. For example, they use the exponential model given by: 2πb sd ð0Þ ¼ σ 2 þ 2σ 2

n1 X e3τ=θ τ¼1

where θ is the distance beyond which the correlation is less than about 0.05 (i.e., the practical range) and σ 2 is the lag-zero variance. The extension of the test to the spatial setting is analogous, replacing the autocovariance function with a parametric spatial covariance function of similar form.

Verification of Meteorological Forecasts for Hydrological Applications

9

25

Conclusions

Weather forecast verification is a relatively mature field, but one that continues to adapt to new demands brought on by changing technologies and model improvements, as well as one that continues to develop and adopt new methods, sometimes from other fields such as economics. Because hydrologic forecasts use many of the same variables that are used in meteorology, many of these methods can be useful in a hydrological context. Even in the realm of meteorology, specific verification methods and analysis are highly dependent on specific user requirements. In this chapter, we have described some of the specific requirements for hydrology, such as the often complicated meteorological conditions that affect discharge (e.g., direct and indirect connections with meteorology), and some of the newly developed methods and scores to better address verification challenges that are common to meteorological and hydrological applications. Challenges addressed in this chapter include handling double-penalty issues and small-scale error accumulation when verifying high-resolution forecasts, properly handling important rare events, and addressing spatial correlation when conducting statistical tests. Additional challenges that are not addressed herein include multidimensional verification, which is particularly important for hydrologic applications, where it is likely that a combination of more than one variable can result in important-to-predict events, as well as the ability to verify across multiple temporal scales in order to seamlessly account for both short- and long-term forecasting (Ebert et al. 2013).

References C. Accadia, S. Mariani, M. Casaioli, A. Lavagnini, Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids. Weather Forecast. 18, 918–932 (2003) A. AghaKouchak, N. Nasrollahi, J. Li, B. Imam, S. Sorooshian, Geometrical characterization of precipitation patterns. J. Hydrometeorol. 12(2), 274–285 (2010). doi:10.1175/2010JHM1298.1 G.D. Alexander, J.A. Weinman, J.L. Schols, The use of digital warping of microwave integrated vapor imagery to improve forecasts of marine extratropical cyclones. Mon. Weather Rev. 126, 1469–1496 (1998) L. Alfieri, D. Velasco, J. Thielen, Flash flood detection through a multi-stage probabilistic warning system for heavy precipitation events. Adv. Geosci. 29, 69–75 (2011). doi:10.5194/adgeo-2969-2011. www.adv-geosci.net/29/69/2011/ L. Alfieri, P. Burek, E. Dutra, B. Krzeminski, D. Muraro, J. Thielen, F. Pappenberger, GloFAS – global ensemble streamflow forecasting and flood early warning. Hydrol. Earth Syst. Sci. 13, 141–153 (2012) T. Anderson, D. Darling, Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Ann. Math. Stat. 23, 193–212 (1952) F. Atger, Verification of intense precipitation forecasts from single models and ensemble prediction systems. Nonlinear Processes Geophys. 8, 401–417 (2001) J.C. Bartholmes, J. Thielen, M.H. Ramos, S. Gentilini, The European Flood Alert System EFAS – part 2: statistical skill assessment of probabilistic and deterministic operational forecasts. Hydrol. Earth Syst. Sci. 13, 141–153 (2009)

26

E. Gilleland et al.

W.M. Briggs, R.A. Levine, Wavelets and field forecast verification. Mon. Weather Rev. 125, 1329–1341 (1997) B.G. Brown, E. Gilleland, E.E. Ebert, Forecasts of spatial fields, in Forecast Verification, ed. by I. Jolliffe, D.B. Stephenson. The Atrium, Southern Gate, Chichester, UK (2011), pp. 95–117 R. Bullock, Development and implementation of MODE time domain object-based verification, in 24th Conference on Weather and Forecasting, Seattle, 24–27 Jan 2011. American Meteorological Society N. Bussieres, W. Hogg, The objective analysis of daily rainfall by distance weighting schemes on a mesoscale grid. Atmosphere–Ocean 27, 521–541 (1989) B. Casati, New developments of the intensity-scale technique within the spatial verification methods inter-comparison project. Weather Forecast. 25(1), 113–143 (2010). doi:10.1175/ 2009WAF2222257.1 B. Casati, G. Ross, D.B. Stephenson, A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteorol. Appl. 11, 141–154 (2004) G.J. Ciach, W.F. Krajewski, On the estimation of radar rainfall error variance. Adv. Water Resour. 22, 585–595 (1999) R. Cifelli, V. Chandrasekar, Dual-polarization radar rainfall estimation, in Rainfall: State of the Science, ed. by Y.M. Tistek, M. Gebremichael. American Geophysical Union, Washington, DC (2010), pp. 105–125. doi:10.1029/2010GM000930 H.L. Cloke, C. Jeffers, F. Wetterhall, T. Byrne, J. Lowe, F. Pappenberger, Climate impacts on river flow: projections for the Medway catchment, UK, with UKCP09 and CATCHMOD. Hydrol. Process. (2010). doi:10.1002/hyp.776 S.G. Coles, An Introduction to Statistical Modeling of Extreme Values (Springer, London, 2001) C. Davis, B. Brown, R.G. Bullock, Object-based verification of precipitation forecasts. Part I: methodology and application to mesoscale rain areas. Mon. Weather Rev. 134(7), 1772–1784 (2006a). doi:10.1175/MWR3145.1 C. Davis, B. Brown, R.G. Bullock, Object-based verification of precipitation forecasts. Part II: application to convective rain systems. Mon. Weather Rev. 134(7), 1785–1795 (2006b). doi:10.1175/MWR3146.1 C. Davis, B. Brown, R.G. Bullock, J. Halley Gotway, The method for object-based diagnostic evaluation (MODE) applied to numerical forecasts from the 2005 NSSL/SPC spring program. Weather Forecast. 24(5), 1252–1267 (2009). doi:10.1175/2009WAF2222241.1 F.X. Diebold, R.S. Mariano, Comparing predictive accuracy. J. Bus. Econ. Stat. 13, 253–263 (1995) L. Duc, K. Saito, H. Seko, Spatial-temporal fractions verification for high-resolution ensemble forecasts. Tellus A 65, 18171 (2013). http://dx.doi.org/10.3402/tellusa.v65i0.18171 J.D. Duda, X. Wang, F. Kong, M. Xue, Using varied microphysics to account for uncertainty in warm-season QPF in a convection-allowing ensemble. Mon. Weather Rev. 142, 2198–2219 (2014) E. Dutra, M. Diamantakis, I. Tsonevsky, E. Zsoter, F. Wetterhall, T. Stockdale, D. Richardson, F. Pappenberger, The extreme forecast index at the seasonal scale. Atmos. Sci. Lett. 14(4), 256–262 (2013) E.E. Ebert, Fuzzy verification of high resolution gridded forecasts: a review and proposed framework. Meteorol. Appl. 15, 51–64 (2008). doi:10.1002/met.25. Available at http://www.ecmwf. int/newsevents/meetings/workshops/2007/jwgv/METspecialissueemail.pdf E.E. Ebert, J.L. McBride, Verification of precipitation in weather systems: determination of systematic errors. J. Hydrol. 239, 179–202 (2000) E.E. Ebert, U. Damrath, W. Wergen, M.E. Baldwin, The WGNE assessment of short-term quantitative precipitation forecasts. Bull. Am. Meteorol. Soc. 84, 481–492 (2003) E. Ebert, L. Wilson, A. Weigel, M. Mittermaier, P. Nurmi, P. Gill, M. Göber, S. Joslyn, B. Brown, T. Fowler, A. Watkins, Progress and challenges in forecast verification. Meteorol. Appl. 20, 130–139 (2013). doi:10.1002/met.1392 EUMETNET, ODYSSEY, the OPERA Data Centre (2014), http://www.eumetnet.eu/odysseyopera-data-centre. Viewed 10 Apr 2014

Verification of Meteorological Forecasts for Hydrological Applications

27

C.A.T. Ferro, D.B. Stephenson, Extremal dependence: improved verification measures for deterministic forecasts of rare binary events. Weather Forecast. 26, 699–713 (2011). doi:10.1175/ WAF-D-10-05030.1 V. Fortin, M. Abaza, F. Anctil, R. Turcotte, Why should ensemble spread match the RMSE of the ensemble mean? J. Hydrometeorol. 15, 1708–1713 (2016) W.A. Gallus Jr., Application of object-based verification techniques to ensemble precipitation forecasts. Weather Forecast. 25, 144–158 (2009) A. Ghelli, A. Garcia-Mendez, F. Prates, M. Dahoui, Extreme weather events in summer 2010: how did the ECMWF forecasting systems perform? ECMWF Newsl. 125, 7–11 (2010) R. Giacomini, H. White, Tests of conditional predictive ability. Econometrica 74, 1545–1578 (2006) E. Gilleland, Spatial forecast verification: Baddeley’s delta metric applied to the ICP test cases. Weather Forecast. 26, 409–415 (2011). doi:10.1175/WAF-D-10-05061.1 E. Gilleland, Testing competing precipitation forecasts accurately and efficiently: the spatial prediction comparison test. Mon. Weather Rev. 141(1), 340–355 (2013). doi:10.1175/MWRD-12-00155.1 E. Gilleland, T.C.M. Lee, J. Halley Gotway, R.G. Bullock, B.G. Brown, Computationally efficient spatial forecast verification using Baddeley’s Delta image metric. Mon. Weather Rev. 136(5), 1747–1757 (2008). doi:10.1175/2007MWR2274.1 E. Gilleland, D. Ahijevych, B.G. Brown, B. Casati, E.E. Ebert, Intercomparison of spatial forecast verification methods. Weather Forecast. 24(5), 1416–1430 (2009). doi:10.1175/ 2009WAF2222269.1 E. Gilleland, Coauthors, Spatial forecast verification: Image warping. NCAR Technical Note NCAR/TN-482+STR (2010). doi:10.5065/D62805JJ E. Gilleland, D.A. Ahijevych, B.G. Brown, E.E. Ebert, Verifying forecasts spatially. Bull. Am. Meteorol. Soc. 91(10), 1365–1373 (2010a). doi:10.1175/2010BAMS2819.1 E. Gilleland, L. Chen, M. DePersio, G. Do, K. Eilertson, Y. Jin, E.K. Lang, F. Lindgren, J. Lindström, R.L. Smith, C. Xia, Spatial forecast verification: image warping. NCAR Tech. Note (2010b), NCAR/TN-482 + STR doi:10.5065/D62805JJ E. Gilleland, J. Lindström, F. Lindgren, Analyzing the image warp forecast verification method on precipitation fields from the ICP. Weather Forecast. 25(4), 1249–1262 (2010c). doi:10.1175/ 2010WAF2222365.1 T. Gneiting, R. Ranjan, Comparing density forecasts using threshold- and quantile-weighted scoring rules. J. Bus. Econ. Stat. 29(3), 411–422 (2011) M. Goeber, E. Zsoter, D.S. Richardson, Could a perfect model ever satisfy the forecaster? On grid box mean versus point verification. Meteorol. Appl. 15, 359–365 (2008) T. Haiden, M. Rodwell, D. Richardson, A. Okagaki, T. Robinson, T. Hewson, Intercomparison of global model precipitation forecast skill in 2010/11 using the SEEPS score. Mon. Weather Rev. 140, 2720–2733 (2012) T. Haiden, L. Magnusson, I. Tsonevsky, F. Wetterhall, L. Alfieri, F. Pappenberger, P. de Rosnay, J. Muñoz-Sabater, G. Balsamo, C. Albergel, R. Forbes, T. Hewson, S. Malardel, D. Richardson, ECMWF forecast performance during the June 2013 flood in Central Europe. Technical Memorandum No. 723, European Centre for Medium Range Weather Forecasts, Reading, England, vol 723 (2014) T.M. Hamill, J. Juras, Measuring forecast skill: is it real skill or is it the varying climatology? Q. J. Roy. Meteorol. Soc. 132, 2905–2923 (2006) A.S. Hering, M.G. Genton, Comparing spatial predictions. Technometrics 53, 414–425 (2011) R.N. Hoffman, Z. Liu, J.-F. Louis, C. Grassotti, Distortion representation of forecast errors. Mon. Weather Rev. 123, 1758–2770 (1995) G.J. Huffman, R.F. Adler, D.T. Bolvin, G. Gu, E.J. Nelkin, K.P. Bowman, Y. Hong, E.F. Stocker, D.B. Woff, The TRMM multi-satellite precipitation analysis: quasi-global, multi-year, combined-sensor precipitation estimates at fine scale. J. Hydrometeorol. 8, 38–55 (2007)

28

E. Gilleland et al.

G.J. Huffman, D.T. Bolvin, D. Braithwaite, K. Hsu, R. Joyce, P. Xie, S.H. Yoo, NASA global precipitation measurement (GPM) integrated multi-satellite retrievals for GPM (IMERG), in Algorithm Theoretical Basis Document, Version 4.1. NASA, Greenbelt, MD (2013) IPWG, International Precipitation Working Group: Data and Products (2016), http://www.isac.cnr. it/~ipwg/data.html. Viewed 18 June 2016 J. Jenkner, C. Frei, C. Schwierz, Quantile-based short-range QPF evaluation over Switzerland. Meteorol. Z. 17, 827–848 (2008) C. Keil, G.C. Craig, A displacement-based error measure applied in a regional ensemble forecasting system. Mon. Weather Rev. 135, 3248–3259 (2007). doi:10.1175/MWR3457.1 C. Keil, G.C. Craig, A displacement and amplitude score employing an optical flow technique. Weather Forecast. 24(5), 1297–1308 (2009). doi:10.1175/2009WAF2222247.1 C. Kidd, V. Levizzani, Status of satellite precipitation retrievals. Hydrol. Earth Syst. Sci. 15, 1109–1116 (2011) S.A. Lack, G.L. Limpert, N.I. Fox, An object-oriented multiscale verification scheme. Weather Forecast. 25(1), 79–92 (2010). doi:10.1175/2009WAF2222245.1 F. Lalaurette, Early detection of abnormal weather conditions using a probabilistic extreme forecast index. Q. J. Roy. Meteorol. Soc. 129(594), 3037–3057 (2003) S.M. Lazarus, C.M. Ciliberti, J.D. Horel, K.A. Brewster, Near-real-time applications of a mesoscale analysis system to complex terrain. Weather Forecast. 17, 971–1000 (2002) S. Lerch, Verification of Probabilitsic Forecasts for Rare and Extreme Events (Ruprecht-KarlsUniversitaet Heidelberg, Heidelberg, 2012) S. Lerch, T.L. Thorarinsdottir, Comparison of nonhomogeneous regression models for probabilistic wind speed forecasting. Tellus A 65, 21206 (2013) C. Marsigli, A. Montani, T. Pacagnella, A spatial verification method applied to the evaluation of high-resolution ensemble forecasts. Meteorol. Appl. 15, 125–143 (2008) C.F. Mass, D. Ovens, K. Westrick, B.A. Colle, Does increasing horizontal resolution produce more skillful forecasts? Bull. Am. Meteorol. Soc. 83, 407–430 (2002) J.L. McBride, E.E. Ebert, Verification of quantitative precipitation forecasts from operational numerical weather prediction models over Australia. Weather Forecast. 15, 103–121 (2000) R. Merz, G. Blöschl, A process typology of regional floods, Water Resour. Res. 39, 1340 (2003). doi:10.1029/2002WR001952, 12 A. Micheas, N.I. Fox, S.A. Lack, C.K. Wikle, Cell identification and verification of QPF ensembles using shape analysis techniques. J. Hydrol. 344, 105–116 (2007) J.R. Minder, P.W. Mote, J.D. Lundquist, Surface temperature lapse rates over complex terrain: lessons from the Cascade Mountains. J. Geophys. Res. Atmos. 115(D14) (2010). doi:10.1029/ 2009JD013493 M. Mittermaier, N. Roberts, Intercomparison of spatial forecast verification methods: identifying skillful spatial scales using the fractions skill score. Weather Forecast. 25, 343–354 (2010) M. Mittermaier, N. Roberts, S.A. Thompson, A long-term assessment of precipitation forecast skill using the Fractions Skill Score. Meteorol. Appl. 20, 176–186 (2013) F. Molteni, R. Buizza, T.N. Palmer, T. Petroliagis, The ECMWF ensemble prediction system: methodology and validation. Q. J. Roy. Meteorol. Soc. 122, 73–119 (1996) J.E. Nash, J.V. Sutcliffe, River flow forecasting through conceptual models. Part I: a discussion of principles. J. Hydrol. 10, 282–290 (1970) R. North, M. Trueman, M. Mittermaier, M.J. Rodwell, An assessment of the SEEPS and SEDI metrics for the verification of 6 h forecast precipitation accumulations. Meteorol. Appl. 20, 164–175 (2013) NWS, Advanced Hydrologic Prediction Service (2014), http://water.weather.gov/precip/about.php. Viewed 10 Apr 2014 F. Pappenberger, M.H. Ramos, H.L. Cloke, F. Wetterhall, L. Alfieri, K. Bogner, A. Mueller, P. Salamon, How do I know if my forecasts are better? Using benchmarks in hydrological ensemble prediction. J. Hydrol. 522, 697–713 (2015)

Verification of Meteorological Forecasts for Hydrological Applications

29

T. Petroliagis, P. Pinson, Early indication of extreme winds utilising the Extreme Forecast Index, ECMWF Newsletter No. 132 – Summer 2012, 2012 R.-D. Reiss, M. Thomas, Statistical Analysis of Extreme Values: With Applications to Insurance, Finance, Hydrology and Other Fields, 3rd edn. (Birkhäuser, Basel, 2007), 530pp N.M. Roberts, H.W. Lean, Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev. 136, 78–96 (2008) M.J. Rodwell, D.S. Richardson, T.D. Hewson, T. Haiden, A new equitable score suitable for verifying precipitation in numerical weather prediction. Q. J. Roy. Meteorol. Soc. 136, 1344–1363 (2010) Ø. Saetra, H. Hersbach, J.-R. Bidlot, D.S. Richardson, Effects of observation errors on the statistics for ensemble spread and reliability. Mon. Weather Rev. 132, 1487–1501 (2004) B.R.J. Schwedler, M.E. Baldwin, Diagnosing the sensitivity of binary image measures to bias, location, and event frequency within a forecast verification framework. Weather Forecast. 26, 1032–1044 (2011). doi:10.1175/WAF-D-11-00032.1 D.J. Seo, J.P. Breidenbach, Real-time correction of spatially nonuniform bias in radar rainfall data using rain gauge measurements. J. Hydrometeorol. 3, 93–111 (2002) R. Steinacker, C. Häberli, W. Pöttschacher, A transparent method for the analysis and quality evaluation of irregularly distributed and noisy observational data. Mon. Weather Rev. 128, 2303–2316 (2000) D.B. Stephenson, B. Casati, C.A.T. Ferro, C.A. Wilson, The extreme dependency score: a non-vanishing measure for forecasts of rare events. Meteorol. Appl. 15, 41–50 (2008). doi:10.1002/met.53 J. Thielen-del Pozo, J. Bartholmes, M.-H. Ramos, A. de Roo, The European Flood Alert System. Part 1: concept and development. Hydrol. Earth Syst. Sci. 13, 125–140 (2009) I. Tsonevsky,D.S. Richardson, D.S, Application of the new EFI products to a case of early snowfall in Central Europe. ECMWF Newsletter 133, Autumn 2012, p. 4, 2012 H. Wernli, M. Paulat, M. Hagen, C. Frei, SAL – a novel quality measure for the verification of quantitative precipitation forecasts. Mon. Weather Rev. 136, 4470–4487 (2008) T. Weusthoff, F. Ament, M. Arpagaus, M.W. Rotach, Assessing the benefits of convectionpermitting models by neighborhood verification: examples from MAP D-PHASE. Mon. Weather Rev. 138, 3418–3433 (2010) D.S. Wilks, Statistical Methods in the Atmospheric Sciences, 3rd edn. (Academic, Amsterdam, 2011), 704 pp WMO, Recommendations for the Verification and Intercomparison of QPFs and PQPFs from Operational NWP Models – Revision 2, WMO/TD-No.1485 WWRP 2009-1 (2008) WMO, Manual on the Global Data-Processing and Forecasting System (GDPFS). WMO-no.485, World Meteorological Organization, Geneva, vol 485 (2014) J. Wolff, M. Harrold, T.A. Fowler, J. Halley Gotway, L. Nance, B.G. Brown, Beyond the basics: evaluating model-based precipitation forecasts using traditional, spatial, and object-based methods. Weather Forecast. 29, 1451–1472 (2014) P. Zacharov, D. Rezacova, Using the fractions skill score to assess the relationship between an ensemble QPF spread and skill. Atmos. Res. 94, 684–693 (2009) M. Zimmer, H. Wernli, Verification of quantitative precipitation forecasts on short time-scales: a fuzzy approach to handle timing errors with SAL. Meteorol. Z. 20, 95–105 (2011) E. Zsótér, Recent developments in extreme weather forecasting. ECMWF Newsl. 107(107), 8–17 (2006) E. Zsótér, F. Pappenberger, D. Richardson, Sensitivity of model climate to sampling configurations and the impact on the Extreme Forecast Index. Meteorol. Appl. (2014). doi:10.1002/met.1447

Verification of Short-Range Hydrological Forecasts Katharina Liechti and Massimiliano Zappa

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Verification of Ensemble Streamflow Nowcasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Varying Meteorological Input: Taking into Account the Input Uncertainty . . . . . . . . . . 2.2 Varying Parameterization: Taking into Account the Parameter Uncertainty . . . . . . . . . . 3 Operational Hydrological Ensemble Prediction System (HEPS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Frequency Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Value Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Brier Skill Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Rank Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Conditional Bias: Reliability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Performance Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Flood Peak and Flood Timing Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 4 6 8 9 9 12 12 14 14 19 20 21 22

Abstract

For the mitigation of floods and flashfloods, operational nowcast and forecast systems are crucial. This chapter provides practical illustrations of the verification of hydrological ensemble prediction systems with a temporal horizon of up to 5 days. Section 2 shows the application of two ensemble approaches for discharge nowcasts. The results show that both ensemble approaches have added value compared to deterministic nowcasts. Section 3 presents the evaluation of an operational flood forecasting system. The system is run with the two deterministic COSMO-2 and COSMO-7 weather K. Liechti (*) • M. Zappa Swiss Federal Research Institute WSL, Birmensdorf, Switzerland e-mail: [email protected]; [email protected] # Springer-Verlag GmbH Germany 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_5-1

1

2

K. Liechti and M. Zappa

forecasts and with the probabilistic COSMO-LEPS weather forecast. The evaluation with several skill scores suggests that decisions that need to be taken with a lead time of 1 day and more should be based on the ensemble forecast. Ensemble forecasts can be difficult to interpret. Section 4 provides a helpful tool for the estimation of flood peak timing and magnitude based on probabilistic forecasts. Keywords

Real-time experiment • Hydrological forecast • Short range • Skill score • Forecast verification

1

Introduction

In contrast to deterministic predictions, probabilistic ensemble predictions contain information on the uncertainty of the prediction. The uncertainties that are represented by the ensemble can stem from the model parameterization, the model structure, and/or the input data, depending on the setup of the ensemble prediction system. To assess if this information on uncertainty has an added value for the forecasters and users, the ensemble predictions have to be compared to the deterministic predictions. As a measure for direct comparison of deterministic and ensemble predictions, the Brier skill score (BSS) is used here. The BSS is the mean squared error of the prediction for a predefined threshold normalized by a reference prediction like climatology (Wilks 2006). Other metrics like the correlation coefficient, the frequency bias, and the value score are deterministic measures which need the prior reduction of the ensemble to its median or mean, for example. For more detailed evaluation of probabilistic predictions, three additional metrics are used here. The rank histogram shows if the ensemble has an appropriate degree of dispersion. The reliability diagram (conditional bias) is a powerful visualization that sets the observed relative frequency of an event in relation to the forecast probability of that event. Thus the reliability diagram shows if there exist conditional biases in the prediction. The value score gives information on the economic value of a prediction system for a user with a specific cost/loss ratio. The detailed descriptions of the performance measures used in this chapter can be found in Wilks (2006). For all examples in this chapter, the hydrological model PREVAH was used. PREVAH is semi-distributed, working with hydrological response units. For the presented examples, the model is run at hourly time steps and on a 500 m grid. The meteorological variables needed to drive the model are air temperature, wind speed, water vapor pressure, global radiation, sunshine duration, and precipitation. Detailed descriptions of the model can be found in Viviroli et al. (2009a, b, and c). This chapter is divided into three main parts. Sections 2 and 3 will assess the added value of the ensemble approach versus the deterministic approach. The focus of Sect. 2 is on nowcasts. The objective here is to assess the added value for peak flow predictions when including the uncertainties stemming from the model

Verification of Short-Range Hydrological Forecasts

3

parameterization and from the meteorological input. Thus, two different kinds of ensemble nowcasts are compared to deterministic nowcasts. Section 3 presents an operational hydrological forecast system with a lead time of up to 5 days. The system includes two deterministic model chains as well as a probabilistic model chain, driven by the COSMO-LEPS ensemble weather forecast, which consists of 16 members (Marsigli et al. 2005). The performance of the three different forecast chains are compared with different skill scores. Section 4 presents a tool that helps to interpret ensemble forecasts. The tool is called Peak-Box and helps to interpret ensemble forecasts (spaghetti plots) for predicted flood situations, in terms of both peak flood magnitude and timing. In the presented examples, the performance of each forecast product is computed by comparing it to the discharge observation. However, if no discharge observation is available, the simulated flow, driven by precipitation measurements or estimates, can be used as a reference instead of the observation. If effects from systematic bias of the hydrological model are to be excluded from the analysis, it is recommended to use the simulated flow as reference (Jaun et al. 2008).

2

Verification of Ensemble Streamflow Nowcasts

The objective of this chapter is to assess the added value for peak flow predictions when including the uncertainties stemming from the meteorological input and from the model parameterization. These uncertainties are included using ensembles. When working with one specific hydrological model, like in the presented examples (Sects. 2.1, 2.2), there are basically two approaches to produce ensembles. The meteorological input to the model and the model parameterization can be varied (Liechti et al. 2013b). The focus of this chapter lies on very short-term forecasts, which are generally called nowcasts (Mandapaka et al. 2012). In this chapter, streamflow nowcasts are forced by operationally available precipitation measurements or estimates. Therefore their maximum lead time equals to the response time of the considered catchment. It is self-evident to question the purpose of ensemble streamflow nowcasts with a lead time that does not exceed the response time of the catchment. At first running an ensemble streamflow nowcast seems to be overkill. For smaller catchments with short response times, streamflow nowcasts do not help much for flood preparedness. However, continuous ensemble streamflow nowcasts can be a valuable source for ensembles of initial conditions of the hydrological model for the initialization of NWP-driven forecasts (Liechti et al. 2013a; Zappa et al. 2011). But for catchments with a response time of a few hours and more, even streamflow nowcasts can provide information that may help to take decisions on mitigation measures. Streamflow nowcasts are thus particularly interesting in flood situations. The following experiment is set up for the Verzasca catchment in southern Switzerland. The Verzasca is a mountainous catchment with steep slopes and shallow soils. The valley is only sparsely populated and affected by human impact;

4

K. Liechti and M. Zappa

thus, damage potential is low. Frequent convective and orographic precipitation events often lead to flash floods (Panziera and Germann 2010). The response time of the basin in terms of peak discharge is in case of very local thunderstorms between 2 and 3 h. At the gauging station, the catchment area sums up to 186 km2. Shortly after the gauge, the river flows into a retention lake which is used for hydropower production. The available meteorological information from ground-based stations by MeteoSwiss and other providers is rather dense with an estimated density of one rain gauge per 100 km2 in southern Switzerland (e.g., Andres et al. (2016)). The data used for this analysis is taken from continuous hourly streamflow nowcasts generated between May 2007 and November 2010. To avoid the snow and snowmelt season, only the months May to November were considered. This is also the season where the relevant thunderstorms occur. All streamflow nowcasts are compared to the observed discharge and their performance compared to each other. The focus of this analysis is on peak discharge; thus, the actual data used for the performance analysis are the daily maxima of the observation, of each ensemble member, and of the deterministic streamflow nowcasts. This means that the analysis includes 856 days.

2.1

Varying Meteorological Input: Taking into Account the Input Uncertainty

In this first part of the experiment, a deterministic streamflow nowcast driven by radar Quantitative Precipitation Estimate (QPE) is compared to a probabilistic streamflow nowcast driven by a radar ensemble. Both products are used at a temporal resolution of 1 h. The deterministic radar QPE (referred to as RAD) has a spatial resolution of 1 km. The radar ensemble introduced by Germann et al. (2009) consists of 25 members and has a spatial resolution of 2 km, in the following it is referred to as REAL (radar ensemble generator designed for usage in the Alps using LU decomposition). The members of the REAL ensemble result from the sum of the current weather radar image and a stochastic perturbation field. This methodology allows to account for the residual space-time uncertainties of the atmosphere (Germann et al. 2009). The performance of the streamflow nowcasts in predicting the exceedance of a predefined threshold can be calculated using various skill scores. Here, the Brier skill score (BSS) is used (Wilks 2006). The BSS is a widely used summary skill score that allows direct comparison of deterministic and ensemble predictions and For every day, the maximum discharge value of the observation, of each ensemble member, and of the deterministic nowcast is taken. Then it is checked for each of these 27 values if it exceeds a predefined threshold value or not. For the observation and the deterministic forecast, the answer will be yes or no. For the radar ensemble nowcast, the answer will be the percentage of the ensemble members exceeding the threshold, i.e., the nowcast probability to exceed the threshold. The thresholds tested

Verification of Short-Range Hydrological Forecasts

5

Fig. 1 Brier skill score for the deterministic streamflow nowcast driven by the radar QPE (RAD) and for the ensemble streamflow nowcast driven by the radar ensemble (REAL). The uncertainty estimation of the BSS values results from a bootstrap with 500 random samples. Horizontal numbers below the graph show the number of days in which an exceedance of the threshold quantile was observed. Vertical numbers show for each nowcast type the number of days that was simulated to be above the threshold quantile (for the ensemble nowcasts, the number of ensemble members exceeding the threshold was divided by the ensemble size)

here correspond to the 60 %, 70 %, 80 %, 90 %, and 95 % quantile of the observed daily maximal discharge during the months May to November from the years 1989 to 2010. The performance analysis with the Brier skill score shows that the peak discharge nowcasts benefit from using the radar ensemble REAL (Fig. 1). The BSS values for the ensemble are higher than those for deterministic streamflow nowcasts using the normal radar QPE as input. This superiority gets more pronounced for higher threshold quantiles. Values exceeding high quantiles are per definition scarce; therefore, it is recommended to apply a bootstrap method to get confidence bounds and thus a feeling for the accuracy of the estimated skill score (Efron 1979; Wilks 2006). In the presented examples, 500 random samples of nowcast-observation pairs of daily maxima were drawn with replacement from the 856 days included in the study period. For each of these 500 samples, the BSS was calculated. With the resulting 500 BSS values, the confidence bounds were calculated. Here, the number of bootstrap samples is set to 500 because more samples did not result in a significant difference of the score values. However, the number of bootstrap samples should be chosen for each study separately, generally the more the better. In Figs. 1 and 2, it can be seen that the confidence intervals for the BSS values are larger for the highest quantile, indicating that BSS values estimated from fewer data points are more uncertain.

6

K. Liechti and M. Zappa

Fig. 2 Brier skill score for the deterministic nowcast driven by the interpolated rain gauge data (PLU) and driven by radar QPE (RAD) and for the ensemble nowcasts resulting from using 26 different parameter sets in combination with the interpolated rain gauge data (PPE) and in combination with the radar QPE data (RPE). The uncertainty estimation of the BSS values results from a bootstrap with 500 random samples. Horizontal numbers below the graph show the number of days in which an exceedance of the threshold quantile was observed. Vertical numbers show for each nowcast type the number of days that was simulated to be above the threshold quantile (for the ensemble nowcasts, the number of ensemble members exceeding the threshold was divided by the ensemble size)

2.2

Varying Parameterization: Taking into Account the Parameter Uncertainty

Usually hydrological models are calibrated and then used with the parameter set that performed best during the verification period. However, as no model structure perfectly represents the processes to be modeled, it can be argued that no such best parameter set exists. This fact can be met by the concept of equifinality (Beven and Freer 2001). Several equally likely parameter sets can be searched with a Monte Carlo (MC) experiment (Kuczera and Parent 1998; Vrugt et al. 2008). This approach was followed in this second part of the example. The seven most important parameters for conditioning the precipitation input and the surface runoff generation were allowed to change randomly during an MC experiment. Then all the MC runs were ranked according to an objective function consisting of the Nash-Sutcliffe efficiency (Nash and Sutcliffe 1970) and the sum of weighted absolute errors (Lamb 1999; Viviroli et al. 2009a). Finally, 26 parameter sets were chosen randomly around the 95 % ranking. A more detailed description of this approach can be found in Zappa et al. (2011). These 26 parameter sets were then used to run the hydrological model with deterministic precipitation data from radar QPE and from interpolated rain gauge data, respectively. This resulted in two ensemble streamflow nowcasts that both account for the parameterization uncertainty of the hydrological model. The ensemble driven by radar QPE data is labeled RPE, and the ensemble driven by

Verification of Short-Range Hydrological Forecasts

7

interpolated rain gauge data is labeled PPE hereafter. The performance of these parameter ensemble nowcasts is then compared to the performance of their deterministic counterparts. These deterministic streamflow nowcasts are the result of running the hydrological model with the default parameterization usually used for the Verzasca catchment in combination with the radar QPE (RAD) and the interpolated rain gauge data (PLU). The performance analysis based on the Brier skill score shows also here that ensemble nowcasts generally outperform the deterministic nowcasts (Fig. 2). PPE outperforms PLU, and interestingly it can be seen that the nowcasts driven by interpolated rain gauge data outperform the nowcasts driven by radar data. The two main reasons for this are that the hydrological model was calibrated using interpolated rain gauge data and that there is a rain gauge within the catchment. If the hydrological model would have calibrated with radar data or if there were no rain gauges within the catchment, this result might have been different (Liechti et al. 2013b). RPE is generally better than RAD but does not reach the performance of REAL (Fig. 1). In mountainous regions there are many uncertainties involved in radar QPEs, and thus it is not surprising that REAL, which attempts to account for these uncertainties, performs better than RAD and RPE. If the rank histograms of the two radar-based ensemble streamflow nowcasts REAL and RPE are compared (Fig. 3), it can be seen that both are underdispersed.

Fig. 3 Rank histograms for nowcasts driven by REAL and by the radar QPE in combination with the 26 different parameter sets (RPE). The number of days the ensemble and the observation exceeded the threshold quantile (0.8 and 0.9) are given in the subtitles

8

K. Liechti and M. Zappa

However this is less pronounced for REAL, especially on the high quantile. Underdispersed forecasts are typical in such basins with quick response to precipitation. Postprocessing would be needed to be applied to overcome this (e.g., Bogner et al. (2016), (Hemri et al. 2013)). However, radar ensembles like REAL are not widely available yet, and thus it is still difficult to account for the impact of the radar QPE uncertainty on the hydrological prediction. However, producing a hydrological ensemble using several parameter sets in combination with the deterministic radar QPE seems to be a reasonable alternative to account at least for some of the hydrologic uncertainty sources, namely, the hydrological parameter uncertainty.

3

Operational Hydrological Ensemble Prediction System (HEPS)

The verification of operational HEPS is shown here using the hydrological forecasting system for the Sihl river in Zurich, Switzerland (Addor et al. 2011; Badoux et al. 2010; Zappa et al. 2010). This system was set up to help protect the city of Zurich from flood damage caused by the Sihl river. The flood damage potential of the city is high as a lot of infrastructure was built on the alluvial fan of the Sihl river during the last century. It is estimated to about five billion Euro. This information may seem irrelevant first, but it is relevant if the economic value of the system is assessed. The pre-alpine Sihl catchment has an area of 336 km2, thereof 46 % first flows into a reservoir lake that is used for hydropower production but can also act as a retention basin. It is possible to draw down the reservoir lake as a preventive measure if a major event is expected (Zappa et al. 2015). However, it is to be considered that this action needs a lead time of 1 to 3 days and that if the predicted event does not occur, the authorities have to pay the spilled water to the operator of the hydropower plant. For this operational forecast system, the hydrological model PREVAH is driven by three different numerical weather prediction (NWP) models. These are the two deterministic models COSMO-2 and COSMO-7 and the probabilistic model COSMO-LEPS, which has 16 members (Fundel et al. 2010; Marsigli et al. 2005). The models differ in their spatial resolution, lead time, and update cycle (Table 1). The system is operational since 2007. A major change in the system in late 2009 was the increase in spatial resolution of the meteorological models. The critical period for the Sihl river lasts from March to October, including the snowmelt season, Table 1 Properties of numerical weather prediction (NWP) systems used to drive the PREVAH model. Initialization times refer to the runs of the NWP used in the hydrological forecast chain Model COSMO-2 COSMO-7 COSMO-LEPS

Spatial resolution 2.2 km 6.6 km 7 km

Initialization 00, 03, 06, . . . UTC 00, 06, 12 UTC 12 UTC

Lead time 24 h 72 h 132 h

Member 1 1 16

Verification of Short-Range Hydrological Forecasts

9

the thunderstorms in summer, as well as long-lasting precipitation events in summer and autumn. During these months the emergency response team of the authorities is on duty. So, even though the forecast system runs continuously over the whole year, for the analysis presented here, only the relevant months (March to October) from each year (2010 to 2014) are considered. A period of 2 weeks from summer 2014 that includes an event resulting from an artificial drawdown of the reservoir lake was excluded from the sample. This is justified as the forecast otherwise gets wrongly penalized. The observation data is available in hourly time steps for the whole period. The system is evaluated here with several scores, from simple to more complex. It is one possible set of scores and can be extended or changed if desired (Brown et al. 2010). The correlation and the frequency bias give a broad first impression about the behavior of the system in a deterministic way. Such an approach has been particularly useful for slowly introducing end users to more complex skill scores. The Brier skill score is used to directly compare the performance of deterministic and ensemble forecasts to predict the exceedance of a predefined threshold. The value score gives information about the economical usefulness of the forecast system for users with a specific cost/loss ratio. Then the rank histogram and the conditional bias (reliability plot) are used to get more insight into the characteristics and performance of the ensemble forecast.

3.1

Correlation

For a first overview of the data, one can look at the correlation of the daily mean values from observed and forecast time series. For COSMO-LEPS only the ensemble median is considered here. From Fig. 4 (middle and right panel), it can be seen that the data scatters more with increasing lead time. This is to be expected as the forecasts get more uncertain with lead time. With an R2 of 0.86, COSMO-2 and COSMO-7 outperform the COSMO-LEPS ensemble median on the first forecast day. Thereafter COSMO-LEPS ensemble median outperforms COSMO-7 on forecast days two and three.

3.2

Frequency Bias

The frequency bias sets the predicted frequency of a binary event (here the exceedance of a predefined discharge threshold) in relation to the observed frequency of an event of the same kind (number of predicted threshold exceedances/number of observed threshold exceedances). A bias of one means that the event was equally often predicted as it was observed. Bias higher or lower than one indicates a relative over- and underforcasting. The frequency bias, however, does not say anything about temporal coincidence of observed and predicted events.

10

K. Liechti and M. Zappa

Fig. 4 Correlation between forecast and observed mean daily discharge. Dots are colored by lead time. For the COSMO-LEPS model chain, the forecast mean daily discharge was calculated on the basis of the ensemble mean

The frequency bias for the deterministic COSMO-2- and COSMO-7-based forecasts and for the median of the COSMO-LEPS-based ensemble forecast is presented here as a function of lead time and for different flood quantiles (50 %–98 %). For the Sihl forecasting system, the peak flows are the most relevant. Therefore, only thresholds higher than the observed median hourly discharge of the forecast period were considered for the analysis. However, these thresholds are still very low compared to the discharge values that are of interest for the purpose of this flood forecasting system. The 98 % quantile of the observed hourly discharge is around 40 m3/s (max is around 190 m3/s), whereas the first warning level defined by the stakeholders is at 100 m3/s. So the results for the frequency bias presented here are useful information about the general tendency of the system and not about the behavior of the system during flood conditions. Also the daily cycle that is visible in the bias over the 132 h of lead time (Fig. 5) results from small-scale daily

Verification of Short-Range Hydrological Forecasts

11

Fig. 5 Frequency bias for the COSMO-LEPS median (top), COSMO-7 (center), and COSMO-2 (bottom). Values below 1 indicate underforecasting. 1 means no bias. Frequency bias for COSMOLEPS median at high lead times and high quantiles is below 0.3 and not colored

12

K. Liechti and M. Zappa

fluctuations in the discharge that are superposed in periods with high discharge; thus, these fluctuations are irrelevant with respect to flood forecasting. So the information about the frequency bias presented here as a function of lead time and for different flood quantiles is a general one. Figure 5 is rather blue, corresponding to a frequency bias below 1, which indicates that the forecasts more often underpredict the discharge on the tested thresholds.

3.3

Value Score

The value score is a measure for the economic value of a forecast system. It is dependent on the cost-loss ratio of the specific user. The cost is the amount of money the user has to pay for the prevention of damages through an event; here these are the costs for the hydrological forecast system. The loss is the amount of money the end user loses if an event takes place without having taken any preventive measures. Meaningful cost/loss ratios vary between zero and one. For cost/loss ratios higher than one, a preventive measure does not pay off. End users with a cost/loss ratio close to zero will suffer big damages in case of an event. An end user should take preventive measures as soon as the forecast probability of an event is higher than the cost/lost ratio of the end user. If the forecast system reaches a value score higher than zero, the end user can profit from the system. A value score of one means that the events were perfectly forecast and the end user only has to pay the cost for the preventive measure. The Sihl forecast system was analyzed for the 80 %, 90 %, and 95 % discharge quantiles for different lead times and cost/loss ratios ranging from 0 to 1 (Fig. 6). The value score for the Sihl forecast system is particularly good for users with a low cost/ loss rate and also for long lead times. As mentioned in the introduction of Sect. 3, this is the case for the authorities of the city of Zurich, and thus they can profit from the HEPS.

3.4

Brier Skill Score

The Brier skill score has been proven to be a suitable measure to compare deterministic and ensemble forecasts. Unlike deterministic measures it is not necessary to reduce the ensemble to its median. However, it can be recommended to use the ensemble median as an additional deterministic forecast to assess the amount of additional skill that results from the ensemble spread only. The comparison of the score from the ensemble median with the scores for other deterministic forecasts, here COSMO-2 and COSMO-7, is straightforward and shows the minimal gain that can be achieved by the ensemble forecast. This may also be used to communicate with end users that are still skeptic to ensemble forecasts. For flood forecasts the magnitude and timing of peak flows are most important. Depending on the available data one can work with the daily maxima of forecasts

Verification of Short-Range Hydrological Forecasts

100

0.8

80

0.6

60

0.4

40

100

0.8

80

0.6

60

0.4

40

0.2

0.4

0.6

0

0.8

0.2

cost/loss

1.0 0.8

0.6

80

0.6

60

0.4

40

0.2

1.0

100

0.8

80

0.6

60

0.4

40

0.2

0.4

0.6

0.4

40

0.0

0 0.2

0.4

40

1.0 0.8

80

0.6

60

0.4

40 20

0.0

0 0.8

0.8

100

0.8

80

0.6

60

0.4

40

0.2

0.2 20

0.6

Value Score COSMO2/PREVAH forecast Sihl 2010–03–01 03:00:00 – 2014–11–01 21:00:00 1.0 120

lead–time[n]

0.6

60

0.4

cost/loss

100 lead–time[n]

80

cost/loss

0.6

60

0.8

120

0.8

0.6

0.8

80

0.2

Value Score COSMO2/PREVAH forecast Sihl 2010–03–01 03:00:00 – 2014–11–01 21:00:00 1.0

0.4

100

cost/loss

cost/loss Value Score COSMO2/PREVAH forecast Sihl 2010–03–01 03:00:00 – 2014–11–01 21:00:00 120

0.8

20 0.0

0

0.8

100

0.6

Value Score COSMO7/PREVAH forecast Sihl 2010–03–01 00:00:00 – 2014–11–03 12:00:00 1.0 120

20 0.0

0

0.4

0.2

20

0.2

0.0

0 cost/loss

0.2

0.6

0.4

40

0.8

120

lead–time[n]

120

lead–time[n]

0.4

Value Score COSMO7/PREVAH forecast Sihl 2010–03–01 00:00:00 – 2014–11–03 12:00:00

100

0.4

0.6

60

cost/loss

Value Score COSMO7/PREVAH forecast Sihl 2010–03–01 00:00:00 – 2014–11–03 12:00:00

0.2

0.8

80

20 0.0

lead–time[n]

0.2

100

0.2

20 0.0

0

1.0

120

0.2

20

lead–time[n]

Value Score COSMO-LEPS/PREVAH forecast Sihl 2010–03–01 13:00:00 – 2014–11–06 00:00:00

1.0

120

lead–time[n]

120

lead–time[n]

Value Score COSMO-LEPS/PREVAH forecast Sihl 2010–03–01 13:00:00 – 2014–11–06 00:00:00

1.0

lead–time[n]

Value Score COSMO-LEPS/PREVAH forecast Sihl 2010–03–01 13:00:00 – 2014–11–06 00:00:00

13

0.2 20

0.0

0 0.2

0.4

0.6

cost/loss

0.8

0.0

0 0.2

0.4

0.6

0.8

cost/loss

Fig. 6 Value score of forecasts based on COSMO-LEPS, COSMO-7, and COSMO-2 for different lead times (y-axes) and cost/loss rates (x-axes). Displayed are the results for events with a 20 % (left), 10 % (center), and 5 % (right) probability of occurrence

and observations (or reference simulation), or if data is available at, for example, hourly time steps, it is possible to choose a moving window for which the maximum values are extracted. In this way it is possible to choose an appropriate tolerance in timing for the peak flow forecasts. However, it is to be considered that in most cases, new model runs are not available at every time step. This means that forecasts from different lead times are mixed with the moving window method. The advantage of the moving window method above the daily maxima method is, however, that it avoids classifying an event as a missed event in cases where the forecast and the observed peak are close together but on different calendar days. For the analysis of the Sihl forecasting system, a moving window of 13 h was chosen. The tested thresholds are equal to the 80 %, 90 %, 95 %, and 99 % quantiles of the climatology. Where the climatology here corresponds to the observed 13 h maxima from the months March to October of the years 2007–2014. To estimate the uncertainty of the BSS values, a bootstrap was applied, as described in Sect. 2.1.

14

K. Liechti and M. Zappa

Because of the 13 h moving window, there are no data points for the first 6 h of the forecast. For better comparison of the BSS from the three forecast chains (COSMO-2, COSMO-7, and COSMO-LEPS), the first BSS value of each chain corresponds to the BSS value for the seventh hour of available hydrological forecast (Fig. 7). Due to the short computation time, for COSMO-2 and COSMO-7 this is exactly the seventh forecast hour. For COSMO-LEPS, however, this is the 17th forecast hour, because the first 10 h of the forecast (12–21 UTC) have already passed until the hydrological forecast is completed. So the underlying forecast of the COSMO-LEPS forecast chain is much older once the hydrological forecast is completed. Nonetheless, the COSMO-LEPS forecast chain performs remarkably well compared to the deterministic forecasts (Fig. 7). The BSS of all forecast chains decays with increasing lead time. However, the decay is slower for the ensemble forecast chain. The BSS of the ensemble forecast is positive over the whole forecast period of 5 days and also for very high thresholds (99 % – quantile). The performance of the ensemble median is mostly as good as or better than the performance of the COSMO-7 forecast chain and additionally provides longer lead time. Also, the BSS values for the COSMO-2 forecast chain is as good as or better than the values of the COSMO-7 chain. This means that for the forecast system presented here, the COSMO-7 chain is redundant, and the forecasters and users should base their decisions on COSMO-2 and COSMO-LEPS. The 95 % confidence bounds resulting from the bootstrap show that the uncertainty in the estimated BSS values increase with increasing threshold (Fig. 8). However for the ensemble forecast, the confidence bounds do not cross the zero BSS line for tested thresholds up to the 95 % quantile. This shows that, in contrast to the deterministic forecast, the ensemble forecasts have skill (in terms of Brier score) for the whole forecast horizon.

3.5

Rank Histograms

The COSMO-LEPS ensemble consists of 16 members, so ideally each rank of the rank histogram would be populated with a relative frequency of 1/17. This is not the case for the ensemble forecasts of the Sihl forecast system. The first and last rank are populated much more often, indicating an underdispersion of the ensemble (Fig. 9). Also, the histogram is skewed toward the highest rank indicating an underforecasting bias. This is an overall analysis, thus the results are dominated by low-flow periods. In periods with little or no precipitation, the ensemble members rapidly converge, and thus often all ensemble members are above or, as seen from Fig. 9, even more often below the observation.

3.6

Conditional Bias: Reliability Plots

Reliability plots, also called attribute diagrams, are used to analyze ensemble forecasts. Compared to scalar summary-scores like the BSS, the reliability plots are

Verification of Short-Range Hydrological Forecasts

15

Brier Skill Score (w: 13 h; March-October oVA) 80% : 11.16 m3/s 1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

24

30

36

42

48

54

60

66

72

78

84

90

96

102 108 114

72

78

84

90

96

102 108 114

72

78

84

90

96

102 108 114

72

78

84

90

96

102 108 114

90% : 19.56 m3/s 1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

24

30

36

42

48

54

60

66

95% : 34.75 m3/s 1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

24

30

36

42

48

54

60

66

99% : 76.89 m3/s 1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

24

30

36

42

48

54

60

66

Fig. 7 Brier skill score. Values were estimated by applying a bootstrap with 500 iterations. Points displayed refer to the median of the bootstrap. Thresholds relate to 13 h max values from the months March to October from the period of 2007 to 2014. Labels on x-axes refer to the center of the 13 h running window

much more comprehensive as they show the full joint distribution of forecasts and observations for events of a specific magnitude. They are created by binning all forecast-observation pairs according to the forecast probability (Wilks 2006). The basic features of the reliability plot are the diagonal 1:1 line, the no resolution line, the no skill line, and of course the reliability curve.

16

K. Liechti and M. Zappa Brier Skill Score (w: 13 h; March-October oVA) 80% : 11.16 m3/s

1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

24

30

36

42

48

54 60 66 72 90% : 19.56 m3/s

24

30

36

42

48

54

78

84

90

96

102 108 114

72

78

84

90

96

102 108 114

72

78

84

90

96

102 108 114

72

78

84

90

96

102 108 114

1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

60

66

3 95% : 34.75 m /s

1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

24

30

36

42

48

54

60

66

3 99% : 76.89 m /s

1.0 0.5 0.0 –0.5

c-leps c-leps median c-7 c-2

–1.0 6

12

18

24

30

36

42

48

54

60

66

Fig. 8 Same as Fig. 7, additionally showing the 95 % confidence intervals of the estimated BSS, based on a bootstrap with 500 iterations

The diagonal 1:1 line is the line for which the forecast probability and the observed frequency are equal. The no resolution line is the line that marks the climatological frequency (the observed mean frequency) of the event in question. The reliability curve, the red line on Fig. 10, consists of the points resulting from plotting the forecast probabilities of the bins against the corresponding observed frequencies. From the position of the points on the plot information about forecast reliability and resolution can be gained. The reliability is the mean squared vertical

Verification of Short-Range Hydrological Forecasts

17

Fig. 9 Rank histograms for forecast days 1–5 (corresponding to lead time 36, 60, 84, 108, 132 h)

distance of the points on the reliability curve to the 1:1 line. The resolution is the mean squared vertical distance of the points on the reliability curve to the no resolution line. As long as the absolute value of the resolution term is bigger than the reliability term the forecast adds to a positive skill. The no skill line marks the line on which reliability and resolution are equal. All points that fall into the shaded area add positively to the forecast skill. The position of the reliability curve also exhibits information about conditional and unconditional bias of the forecast. If the overall slope of the reliability curve differs from the slope of the 1:1 line a conditional bias exists. If the reliability curve is parallel but below or above the 1:1 line an unconditional bias (systematic bias) exists and the forecast is said to be overforcasting and underforecasting, respectively.

Observed relative frequency, – o1

Observed relative frequency, – o

l

0.791

0.0

1.0

0.8

0.6

0.4

0.2

0.0

l

0.0

l

0.4

l

skill

0.6

No resolution

No

l

l

0.8

0.4

l

kill

0.6

No resolution

s No

0.8

0.4

0.6

No resolution

No

skill

l

0.8

i

Forecast probability, y

0.2

l

l

l

l

Thr:95%, LT: 13−36h

i

Forecast probability, y

0.2

l

l

l

l

Thr:90%, LT: 13−36h

Forecast probability, yi

0.2

l

l

0.007

l

l

1.0

l

1.0

1.0

l

0.744

0.0

0.0

l

0.0

l

0.4

l

skill

0.6

No resolution

No

l

0.8

0.4

l

kill

0.6

No resolution

s No

l

0.4

l

skill

0.6

No resolution

No

l

0.8 i

Forecast probability, y

0.2

l

l

l

Thr:95%, LT: 37−60h

i

0.8

Forecast probability, y

0.2

l

l

l

Thr:90%, LT: 37−60h

Forecast probability, yi

0.2

l

l

l

Thr:80%, LT: 37−60h

l

l

1.0

l

1.0

1.0

l

0.709

0.0

0.0

0.0

l

l

0.4

l

skill

0.6

No resolution

No

l

0.8

0.4

l

kill

0.6

No resolution

s No

l

0.4

l

skill

0.6

No resolution

No

l

0.8 Forecast probability, y

0.2

l

l

l

Thr:95%, LT: 61−84h

i

0.8

Forecast probability, y

0.2

l

l

l

Thr:90%, LT: 61−84h

i

Forecast probability, yi

0.2

l

l

l

Thr:80%, LT: 61−84h

l

l

1.0

l

1.0

1.0

l

0.670

0.0

0.0

0.0

l

l

l

skill

0.4

0.6

No resolution

No

l

0.8

0.4

l

kill

0.6

No resolution

s No

l

0.4

l

0.6

No resolution

No

skill

l

0.8 Forecast probability, y

0.2

l

l

l

Thr:95%, LT: 85−108h

i

0.8

Forecast probability, y

0.2

l

l

l

Thr:90%, LT: 85−108h

i

Forecast probability, yi

0.2

l

l

l

Thr:80%, LT: 85−108h

l

l

1.0

l

1.0

1.0

0.0

l

0.637

1

1.0

0.0

l

0.0

l

0.4

l

skill

0.6

No resolution

No

l

0.8

0.4

l

l

kill

0.6

No resolution

s No

0.4

l

skill

0.6

No resolution

No

l

0.8 Forecast probability, y

0.2

l

l

l

i

Thr:95%, LT: 109−132h

i

0.8 Forecast probability, y

0.2

l

l

l

Thr:90%, LT: 109−132h

Forecast probability, yi

0.2

l

l

l

Thr:80%, LT: 109−132h

l

l

1.0

1.0

1.0

Fig. 10 Forecast probability (x-axis) and observed frequency (y-axis) of the COSMO-LEPS-based forecasts for seven different probability classes. The numbers next to the estimated frequencies correspond to the relative frequency of events in the classes. The tiles are sorted by bins of lead times (columns) and discharge thresholds (rows). The thresholds correspond to the 80 % (ca. 8.7 m3/s), 90 % (ca. 14 m3/s), and 95 % (ca. 21.5 m3/s) quantile

Observed relative frequency, – o

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

1

1

0.009

0.011

0.011

0.8

0.007

0.006

0.006

0.892

0.935

0.012

0.012

0.007

0.6

0.4

0.2

0.0

0.012

0.011

0.008

0.162

0.064

0.028

0.006

0.005

0.021

0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2

Observed relative frequency, – o1 Observed relative frequency, – o1 Observed relative frequency, – o1

0.028 0.023 0.017

0.0 1.0 0.8 0.6 0.4 0.2 0.0

0.035 0.034 0.027

0.032

1.0 0.8 0.6 0.4 0.2

0.021 0.015 0.010

0.859 0.912

0.038 0.025 0.016

0.113 0.031 0.011

0.012 0.007

Observed relative frequency, – o1 Observed relative frequency, – o Observed relative frequency, – o

0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0

0.042 0.032 0.025

1 1

0.056 0.050 0.038

0.838 0.901

0.034 0.018 0.008

0.8 0.6 0.4 0.2 0.0

0.051 0.035 0.019

0.076 0.015 0.003

0.013 0.005

0.030

1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2

0.061 0.044 0.031

Observed relative frequency, – o Observed relative frequency, – o1 Observed relative frequency, – o1

0.076 0.070 0.052

0.0 1.0 0.8 0.6 0.4 0.2 0.0

0.035 0.012 0.005

0.815 0.892

0.073 0.043 0.017

0.054 0.006 0.000

0.028

1.0 0.8 0.6 0.4 0.2

Thr:80%, LT: 13−36h

Observed relative frequency, – o

0.010 0.002

Observed relative frequency, – o1 1

– Observed relative frequency, o

0.0 1.0 0.8 0.6 0.4

0.082 0.052 0.027

0.2 1.0

0.110 0.092 0.065

0.0 1

0.080 0.038

0.014

0.802 0.891

0.027 0.005 0.000

0.003

0.036 0.008 0.002

0.8 0.6 0.4 0.2 0.0

18 K. Liechti and M. Zappa

Verification of Short-Range Hydrological Forecasts

19

To complete the reliability plot the relative frequencies of the forecasts in the forecast bins (refinement distribution) are written beside the points on the reliability curve. Alternatively, this information can be plotted in a histogram. It gives information about the forecast’s sharpness and the confidence the forecaster can have in the estimated reliability curve. It illustrates how well the forecast can distinguish situations that are different from the climatology. The reliability plots for the Sihl forecast system (Fig. 10) indicate that the higher the tested discharge threshold and the higher the forecast probability of exceeding this threshold, the more an overforecasting bias is present. However, this result has to be taken with caution, because the bins at high thresholds and high forecast probabilities are very scarcely populated. Nonetheless, the reliability plots show that the forecasting system has skill for the tested thresholds.

3.7

Performance Overview

When it comes to the interpretation of the case study presented in this chapter, the following main points can be listed: • Correlation: On the first forecast day, the daily mean runoff forecast is better when derived using the COSMO-2- and COSMO-7-weather forecast. COSMOLEPS outperforms COSMO-7 on forecast days two and three. • Frequency bias: For the tested discharge thresholds, the forecasts are rather underpredicting the discharge. • Brier skill score: The BSS values achieved by the ensemble forecast are positive also for long lead times and high thresholds. The skill of the ensemble forecast in terms of Brier score is generally better than the skill of the deterministic forecasts. • Rank histogram: The rank histograms for the ensemble forecast of the Sihl forecast system indicate an underdispersion of the ensemble and an underforecasting bias. • Value score: The end user with a low-cost loss rate would benefit from the information provided by the ensemble forecasts. • Reliability plots: The higher the tested threshold and the higher the forecast probability of exceeding this threshold, the more an overforecasting bias can be seen from the reliability plot. For the lead times relevant for the hydrological forecast system for the Sihl river in Zurich, the COSMO-LEPS-based ensemble forecasts are clearly preferred over the deterministic forecasts. Of course there are other measures that can be applied to evaluate a HEPS (Brown et al. 2010; Wilks 2006), and these might be more appropriate to evaluate other aspects than the here addressed question on the usefulness of ensemble predictions. Prior to each verification effort, the forecaster needs to carefully select verification metrics that are meaningful to the current application and not redundant in describing a given aspect of the forecast quality. However, if the main findings do not change, it can be recommended to keep things

20

K. Liechti and M. Zappa

as simple as possible. This can also ease the communication of the results to the end users.

4

Flood Peak and Flood Timing Verification

In Sects. 2 and 3, it was shown that it pays off to work with the ensemble approach. Ensemble forecasts provide valuable information about the uncertainty of the forecast. However, the interpretation of ensemble forecasts can be somewhat difficult, especially when the ensemble spread is high, as it is often the case when forecasting flood events. Therefore Zappa et al. (2013) developed the “Peak-Box.” The PeakBox is a simple but efficient tool to ease the interpretation of ensemble forecasts and to support decision-making when facing a forecast flood event. For decision makers the crucial question is how high and when the flood will be. The Peak-Box extracts from the ensemble summary information to best address this question. The PeakBox is defined as the best estimate of the timing and magnitude of a forecast flood event. It frames the discharge peaks of all members of the ensemble forecast and then takes the median in timing and magnitude of these peaks. The steps to get the Peak-Box are the following (Fig. 11): • Mark the highest peak of each member. • Draw boxes limited by quantiles in timing and magnitude of the ensemble member peaks. • “Best estimate”: Intercept of median in peak timing and in peak magnitude (x).

Fig. 11 Illustration of how to draw the Peak-Box. Small circles indicate the peaks of each ensemble member. The crossline indicates the median in peak timing and magnitude. The inner rectangle frames the interquartile of peak timing and magnitude, and the outer rectangle frames the whole range of peak timing and magnitude

Verification of Short-Range Hydrological Forecasts

21

This tool has been proven to be useful for flood peak estimation in the operational use of HEPS (Zappa et al. 2013), with some obvious limitations in the assessment of peak flow timing in case of multi-peak forecast events. It is also a good tool to show people the advantage of ensemble forecasts, and it can be related to the work of Francis Galton, who established the concept of the wisdom of the crowds (Galton 1907). It means that for a specific question the collective answer of a group of individuals is trusted more than the answer of a single expert. Translated to hydrological forecasting and referring back to the leading question of this chapter, this would mean that ensemble forecasts (group of individuals) should be preferred over deterministic forecasts (single expert) because they should be more trustworthy. To test this a game was played with 162 persons, all experts in the field of flood forecasting. They were given spaghetti plots of four ensemble hindcasts each consisting of 16 members for a flood event. Their task was to estimate the timing and magnitude of the peak that was observed for that event. Then for each estimate of an expert, the distance in timing and magnitude to the observed peak was calculated. The same was done for the best estimate from the Peak-Box. The result was that none of the experts made a peak estimate better than the Peak-Box, and only four of the experts scored equal to the Peak-Box. So the peak estimate of the ensemble was better than 158 out of 162 experts. However, if the median peak estimate of all experts was considered, this expert median was only slightly inferior compared to the Peak-Box estimate. Only nine of the experts made a better estimate than the expert median. Therefore this example supports the concept of the wisdom of the crowds and shows the value of working with ensembles instead of deterministic forecasts. Even the median of an ensemble should generally be preferred over a deterministic forecast. This game is freely available from the HEPEX resources web page (http://hepex.irstea.fr/resources/). An analysis of the game is presented as a blogpost at http://hepex.irstea.fr/hepschallenges-the-wisdom-of-the-crowds/. The Peak-Box is a good tool to summarize the information in an ensemble when it comes to flood forecasts.

5

Conclusions

The examples in Sect. 2 showed that ensembles generated with different parameter sets as well as ensembles generated from varying meteorological input have an additional value compared to deterministic nowcasts, when it comes to peak flows. The ensembles resulting from applying different model parameterization showed a significant underdispersion. This was somewhat less pronounced for the ensembles generated from varying input, i.e., radar ensemble. The combination of the two approaches would certainly ease the issue with the underdispersion (Zappa et al. 2011). A general recommendation cannot be given, but for every system, the preferred approach has to be found based on data resources and computational resources. In Sect. 3 the three model chains COSMO-2, COSMO-7, and COSMO-LEPS of the HEPS for the Sihl river were analyzed and compared. A basic comparison

22

K. Liechti and M. Zappa

between forecast and observed daily means showed that the ensemble forecast scores better than the deterministic forecast beyond day one of lead time. A general investigation of the frequency bias indicates that the system is rather underforecasting the discharge. However a more detailed analysis of the ensemble forecast based on the conditional bias showed that the higher the tested threshold of peak discharge and the higher the forecast probability of exceeding this threshold, the more an overforecasting bias can be seen. Even though the COSMO-LEPS ensemble forecasts are 10 h older than the deterministic COSMO-2 and COSMO-7 forecasts when they get available for the end user, they perform remarkably well in comparison to the deterministic forecasts. The BSS of the COSMO-LEPS ensemble forecast is positive over the whole forecast period of 5 days and also for very high thresholds (99 % – quantile). For the 80 %, 90 %, and 95 % thresholds, the skill of the ensemble forecast outperforms the deterministic forecasts, while for the 99 % threshold, this is the case after about 18 h. Because flood mitigation for the Sihl river in Zurich needs lead times of 1 day and more, the COSMO-LEPS-based ensemble discharge forecasts are much more valuable than the deterministic forecasts. Section 4 finally presented the Peak-Box, a tool for the interpretation of ensemble forecasts of floods. It summarizes the information about timing and magnitude of the different ensemble members and provides a “best guess” for the forecast event’s peak flow timing and magnitude. Additionally it can be used as a verification measure with respect to flood peak timing and magnitude predictions (Zappa et al. 2013). Generally it was shown that there is added value in the application of ensemble predictions. To which extent this applies depends certainly on the problem at hand, but for flood-related problems, it is highly recommended to work with ensemble forecasting systems. Nevertheless, this chapter demonstrated how systematic and robust verification could help forecasters and users in the analysis of (ensemble) prediction performance for specific situations and therefore maximize the value of the particular forecast system.

References N. Addor, S. Jaun, M. Zappa, An operational hydrological ensemble prediction system for the city of Zurich (Switzerland): skill, case studies and scenarios. Hydrol. Earth Syst. Sci. 15(7), 2327–2347 (2011) N. Andres, G. Lieberherr, I.V. Sideris, F. Jordan, M. Zappa, From calibration to real-time operations: an assessment of three precipitation benchmarks for a Swiss river system. Meteorol. Appl. 23(3), 448–461 (2016) A. Badoux et al., IFKIS-Hydro Sihl: Beratung und Alarmorganisation während des Baus der Durchmesserlinie beim Hauptbahnhof Zürich. Wasser Energ. Luft 102(4), 309–320 (2010) K. Beven, J. Freer, Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems using the GLUE methodology. J. Hydrol. 249 (1–4), 11–29 (2001)

Verification of Short-Range Hydrological Forecasts

23

K. Bogner, K. Liechti, M. Zappa, Post-processing of stream flows in Switzerland with an emphasis on low flows and floods. Water 8(4), 115 (2016) J.D. Brown, J. Demargne, D.-J. Seo, Y. Liu, The Ensemble Verification System (EVS): a software tool for verifying ensemble forecasts of hydrometeorological and hydrologic variables at discrete locations. Environ. Model. Software 25(7), 854–872 (2010) B. Efron, 1977 Rietz lecture – bootstrap methods – another look at the jackknife. Ann. Stat. 7(1), 1–26 (1979) F. Fundel, A. Walser, M.A. Liniger, C. Frei, C. Appenzeller, Calibrated precipitation forecasts for a limited area ensemble forecast system using reforecasts. Mon. Weather Rev. 138(1), 176–189 (2010) F. Galton, The wisdom of crowds. Nature 75(1949), 450–451 (1907) U. Germann, M. Berenguer, D. Sempere-Torres, M. Zappa, REAL – Ensemble radar precipitation estimation for hydrology in a mountainous region. Q. J. Roy. Meteorol. Soc. 135(639), 445–456 (2009) S. Hemri, F. Fundel, M. Zappa, Simultaneous calibration of ensemble river flow predictions over an entire range of lead times. Water Resour. Res. 49(10), 6744–6755 (2013) S. Jaun, B. Ahrens, A. Walser, T. Ewen, C. Schär, A probabilistic view on the August 2005 floods in the upper Rhine catchment. Nat. Hazards Earth Syst. Sci. 8(2), 281–291 (2008) G. Kuczera, E. Parent, Monte Carlo assessment of parameter uncertainty in conceptual catchment models: the Metropolis algorithm. J. Hydrol. 211(1–4), 69–85 (1998) R. Lamb, Calibration of a conceptual rainfall-runoff model for flood frequency estimation by continuous simulation. Water Resour. Res. 35(10), 3103–3114 (1999) K. Liechti, L. Panziera, U. Germann, M. Zappa, The potential of radar-based ensemble forecasts for flash-flood early warning in the southern Swiss Alps. Hydrol. Earth Syst. Sci. 17, 3853–3869 (2013a) K. Liechti, M. Zappa, F. Fundel, U. Germann, Probabilistic evaluation of ensemble discharge nowcasts in two nested Alpine basins prone to flash floods. Hydrol. Process. 27(1), 5–17 (2013b) P.V. Mandapaka, U. Germann, L. Panziera, A. Hering, Can Lagrangian extrapolation of radar fields be used for precipitation nowcasting over complex Alpine orography? Weather Forecast. 27(1), 28–49 (2012) C. Marsigli, F. Boccanera, A. Montani, T. Paccagnella, The COSMO-LEPS mesoscale ensemble system: validation of the methodology and verification. Nonlinear Processes Geophys. 12(4), 527–536 (2005) J.E. Nash, J.V. Sutcliffe, River flow forecasting through conceptual models (1), a discussion of principles. J. Hydrol. 10, 282–290 (1970) L. Panziera, U. Germann, The relation between airflow and orographic precipitation on the southern side of the Alps as revealed by weather radar. Q. J. Roy. Meteorol. Soc. 136(646), 222–238 (2010) D. Viviroli, H. Mittelbach, J. Gurtz, R. Weingartner, Continuous simulation for flood estimation in ungauged mesoscale catchments of Switzerland – Part II: Parameter regionalisation and flood estimation results. J. Hydrol. 377(1–2), 208–225 (2009a) D. Viviroli, M. Zappa, J. Gurtz, R. Weingartner, An introduction to the hydrological modelling system PREVAH and its pre- and post-processing-tools. Environ. Model. Software 24(10), 1209–1222 (2009b) D. Viviroli, M. Zappa, J. Schwanbeck, J. Gurtz, R. Weingartner, Continuous simulation for flood estimation in ungauged mesoscale catchments of Switzerland – Part I: Modelling framework and calibration results. J. Hydrol. 377(1–2), 191–207 (2009c) J.A. Vrugt, C.J.F. ter Braak, M.P. Clark, J.M. Hyman, B.A. Robinson, Treatment of input uncertainty in hydrologic modeling: Doing hydrology backward with Markov chain Monte Carlo simulation. Water Resour. Res. 44, W00B09 (2008) D.S. Wilks, Statistical Methods in the Atmospheric Sciences (Elsevier, Amsterdam, 2006), p. 627

24

K. Liechti and M. Zappa

M. Zappa et al., IFKIS-Hydro Sihl: Ein operationelles Hochwasservorhersagesystem für die Stadt Zürich und das Sihltal. Wasser Energ. Luft 102(3), 238–248 (2010) M. Zappa, S. Jaun, U. Germann, A. Walser, F. Fundel, Superposition of three sources of uncertainties in operational flood forecasting chains. Atmos. Res. 100(2–3), 246–262 (2011) M. Zappa, F. Fundel, S. Jaun, A ‘Peak-Box’ approach for supporting interpretation and verification of operational ensemble peak-flow forecasts. Hydrol. Process. 27(1), 117–131 (2013) M. Zappa et al., Crash tests for forward-looking flood control in the city of Zürich (Switzerland). Proc. IAHS 370, 235–242 (2015)

Verification of Medium- to Long-Range Hydrological Forecasts Luc Perreault, Jocelyn Gaudet, Louis Delorme, and Simon Chatelain

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Probabilistic Hydrological Forecasts at Hydro-Québec (HQ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Midterm Hydrological Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Long-Term Hydrological Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Assessing Hydrological Probabilistic Forecasts: Statistical Framework . . . . . . . . . . . . . . . . . . . . 4 Estimation of Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Medium-Term Hydrological Forecasts Verification: Application to Real Data . . . . . . . . . . . . . 6 Using a Production Planning Model to Evaluate Long-Term Hydrological Forecast Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 4 6 6 8 14 17 22 31 35

Abstract

Hydrological forecasting is crucial for hydropower production and risk management related to extreme events. Since uncertainty cannot be eliminated from such a process, forecasts should be probabilistic in nature, taking the form of probability distributions over future events. However, verification tools adapted to probabilistic hydrological forecasting have only been recently considered. How can such forecasts be verified accurately? In this chapter a simple theoretical framework proposed by Gneiting et al. (2007) is employed to provide a formal guidance to verify probabilistic forecasts. Some strategies and scoring rules used

L. Perreault (*) • J. Gaudet (*) • L. Delorme (*) IREQ Hydro-Québec Research Institute, Varennes, QC, Canada e-mail: [email protected]; [email protected]; [email protected] S. Chatelain (*) McGill University, Montreal, QC, Canada e-mail: [email protected] # Her Majesty the Queen in Right of Canada 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_6-1

1

2

L. Perreault et al.

to measure the performance of hydrological forecasting systems, namely, HydroQuébec, are presented. Monte Carlo simulation experiments and applications to a real archive of operational medium-range forecasts are also presented. An experiment is finally performed to evaluate long-range hydrological forecasts in a decisional perspective, by employing hydrological forecasts in a stochastic midterm planning model designed for optimizing electricity production. Future research perspectives and operational challenges on diagnostic approaches for hydrological probabilistic forecasts are given. Keywords

Probabilistic forecasting • Hydrological forecasts • Proper scoring rules • Skill scores • Estimation • Multivariate verification • Energy score • Economic value of forecasts

1

Introduction

The purpose of forecasting is to support informed decision-making. Inflow forecasts are issued to help operate reservoirs, assess resource capabilities, evaluate risks, and determine pricing of hydro-energy. Hydrological forecasts are typically provided using hydrological models driven by some estimates of future weather; flows observed during the previous time period, a.k.a. persistence forecasts; or average flows, a.k.a. climatology forecasts. Hydropower companies and agencies make considerable efforts to provide accurate hydrological forecasts for various lead times, using physically based conceptual and statistical hydrological models, weather forecasts, and both deterministic and probabilistic techniques. The ability to provide reliable and accurate medium- and long-range hydrological forecasts is fundamental for the effective operation and management of water resource systems. For instance, at Hydro-Québec, daily hydrological forecasts serve as input to stochastic decision-making models designed for optimizing electricity production while avoiding spillage and inundation. Very important decisions therefore depend upon these forecasts. Advanced forecast verification measures are critical to the assessment of operational impacts of inflow events (Weber et al. 2006; Perreault 2013; Alfieri et al. 2014). In fact, it gives objective information on various aspects and conditions of the forecast quality, which should benefit the forecasters and users in their decisionmaking process. Insufficient knowledge of the forecast skill eventually translates into uncertainty on the level of risk adopted into operations and may lead to a lack of confidence in the forecasts and suboptimal decisions. To communicate forecast skill to decision-makers and forecasters but also to establish benchmarks of forecasting systems, a variety of performance measures have been developed. The evaluation of probabilistic forecasts is a challenging task since it involves the comparison of two different quantities: functions (the probability distributions of the forecasts) and real values (the observations), which are not directly comparable. This situation complicates the verification framework and the interpretation of verification results and is a

Verification of Medium- to Long-Range Hydrological Forecasts

3

contributing factor explaining in part the wide variety of performance measures that exists. In this chapter, we address both the assessment of the forecast quality and the assessment of the forecast value. We first focus on the use of statistical verification measures and skill scores to assess the quality of midterm probabilistic hydrological forecasts. Then, we go a step further by presenting an experiment to evaluate the economic value of long-term hydrological forecasts using a stochastic planning model specifically designed for the optimization of Hydro-Québec’s (HQ) electricity production. HQ ranks among the world’s largest electric companies. For more than 50 years, Hydro-Québec has generated, transmitted, and distributed nearly all the electricity consumed in Québec, now totaling more than 176 TWh a year. It has an installed capacity of 36,912 MW. It also sells power on wholesale markets in northeastern North America. Hydrological forecasting is thus a central activity for this public company. Section 2 briefly describes Hydro-Québec’s hydrological forecasting system and the case studies considered herein. In Sects. 3 and 4, we explain the verification statistical framework adopted at Hydro-Québec and the metrics in use and address the important problem of the estimation of scoring rules. Section 5 focuses on the application of verification tools to actual midterm forecasts, while Sect. 6 is dedicated to the economic value of long-term forecasts. We end the chapter with some conclusions and a number of questions about the use and development of forecast verification.

2

Probabilistic Hydrological Forecasts at Hydro-Québec (HQ)

One of the practical purposes of hydrology is to make inflow forecasts for the future. Hydrological processes, models, and variables associated with forecasting future hydrological states all contain a wide part of uncertainty. Therefore, hydrological forecasts should be probabilistic and expressed in the form of a probability distribution, the predictive distribution (Dawid 1984), in order to properly represent the phenomenon and what we know about it. Cloke and Pappenberger (2009) and Krzysztofowicz (2001) eloquently presented the need, the significance, and the unavoidable nature of probabilistic hydrological forecasts in today’s world. A wide variety of methodologies are used to produce probabilistic hydrological forecasts. The most common approaches are either based on hydrological models and a number of explanatory variables for the future states of the hydrometeorological system; another family of approaches uses statistical models. Hydro-Québec actual inflow forecasting system relies on both methodologies. On the one hand, daily midterm hydrological forecasts are produced for more than 90 basins by using a conceptual hydrological model and the approach of the extended streamflow prediction similar to the one proposed by Day (1985). On the other hand, once a year in December, a statistical model is employed to generate long-term spring volume forecasts for some major watersheds. Here, midterm hydrological forecasts

4

L. Perreault et al.

refer to a period ranging from 2 to 30 days; long-term hydrological forecasts refer to a period exceeding 30 days.

2.1

Midterm Hydrological Forecast

HQ’s daily hydrological forecasts use as inputs a pseudo-probabilistic weather forecast derived from a perturbed deterministic weather forecast for the first 9 days. Historical error of the deterministic meteorological prediction (integrated at the basin scale) is used to create several scenarios from the available deterministic forecast. To extend the hydrological forecast beyond this meteorological forecast lead time, historical weather series are used as multiple possible inputs in the hydrological model. The end result is a hydrological forecast that extends at least 30 days in the future and possesses 189 flow series or members as of 2014 (Major changes are currently made to the actual HQ’s hydrological forecast system. In particular, work is under way to integrate ensemble weather forecasts for mid-term hydrological predictions). Those scenarios are viewed as ensemble hydrological forecasts. While they do provide a distribution of forecasts, the distributional hypotheses are often hard to define and variable in time. This adds some challenges to the problem of evaluating the performance of hydrological probabilistic forecasts using statistical diagnostic tools. From these members, distribution information, such as quantiles with certain probabilities of exceedance, can be estimated on the basis of some probability distributions. However, the assumption on the form of the probability distribution is, more often than not, wrong. By construction and hypothesis, we would expect a Pearson III type of distribution. The last several years of our archives of daily inflow forecasts rather show that for a lead time of 1–5 days, our forecasts usually can be approximated by standard symmetric (60% of the time) and asymmetric distributions (30%) or a mixture of distributions (10%). There is a shift for longer lead time, where we have a roughly equal amount of standard asymmetric and mixtures of distributions and very few symmetric distributions. Figure 1 presents typical forecast distributions, for a flow volume of 7 days, for one of HQ’s large watershed. The wide variety of distributions is immediately visible upon examination of the figure. Bimodal (and sometimes trimodal) types of distributions, at first thought to be the results of a system malfunction, correspond to meteorological forecasts where there is a probability of heavy precipitation (or warm temperatures causing snowmelt) and a probability of light precipitation (or cold temperatures). They truly represent the future expected state of our hydrological system and should show a high level a forecasting skill, if the probabilities are properly represented. What is of concern for us, in terms of verification, is that we do not know what the true distribution is, and we do not know ahead of time what the distribution will look like. Two challenges emerge from that situation for forecast verification. First, in the context of hydropower production, verification diagnostic tools must correctly reward a hydrological forecasting system which produces such detailed predictive densities, for instance multimodal predictive distributions. To our view, their

Fig. 1 Some typical forecast predictive distributions on a large watershed for the total flow volume of the next 7 days

Verification of Medium- to Long-Range Hydrological Forecasts 5

6

L. Perreault et al.

sensitivity to the shape of the distribution is clearly an important property. We will expand on that problem in Sects. 3 and 5. Secondly, for a given scoring rule, we must make assumptions about the predictive distributions when calculating verification scores and metrics if a parametric estimation method is used. The choice of the estimation method is then an important issue. We will come back to that problem in Sect. 4.

2.2

Long-Term Hydrological Forecast

Other forecasting approaches to produce inflow forecasts are truly probabilistic and rely on a variety of statistical tools to issue a forecast. The most common approaches are based upon a probabilistic model such as a linear regression model, ARIMA, etc. Predictive distributions are generally known probability laws with such approaches, as they are consequent with the fundamental hypotheses. In this context, the verification problem is generally easier to tackle, as one is working with known family of distributions and has to make fewer hypotheses. At HQ, a statistical approach is used to produce in December a probabilistic forecast of the aggregated May–July (MJJ) spring streamflow for several basins. In fact, an ability to issue a MJJ streamflow forecasts 5 months in advance is of particular interest to water planners and managers at HQ with respect to hydropower generation. The May–July aggregated flows count for about 50% of the total annual flow volume in the basins and are composed of melted winter snowpack and spring precipitation, with the prior winter months December–April being low-flow months. Thus knowledge in December of the future MJJ reservoir inflows can be used for making decisions about future releases during the winter contributing to a more proactive water management that may prove useful in extreme dry or wet years. The approach is based upon a linear regression model using the principal components of exogenous measures of atmospheric circulation inferred from the National Centers for Environmental Prediction/National Center for Atmospheric Research reanalysis project (see Sveinsson et al. 2008; Perreault et al. 2007). In the following, we will refer to this statistical approach as the ATM model.

2.3

Case Studies

The case study developed in this paper to illustrate the use of verification tools concerns recent medium-term hydrological forecasts produced daily at Hydro-Québec for the Manicouagan watershed, a major hydropower system. As illustrated in Fig. 2, the Manicouagan watershed is subdivided into five subcatchments for which forecasts are produced every day: Petit Lac Manicouagan, Manic-5, Manic-3, Toulnustouc, and Manic-2. The Manicouagan watershed is located in northeast of the province of Quebec, Canada. This water resource system consists of two hydropower plants with reservoirs in parallel (Manic-5 and Toulnustouc) and three downstream run-of-river hydropower plants (Manic-3, Manic-2, and Manic-1). The

Verification of Medium- to Long-Range Hydrological Forecasts

7

Fig. 2 Manicouagan watersheds

total installed capacity is 6202 MW, which is about 17% of HQ’s total capacity. In operating the system, generation planners face a variety of decisional problems. Two of those, common to every installation, are safety and the respect of environmental laws and regulations. For the two upstream watersheds, with large reservoirs, the other main concerns are those of long-term energy planning and optimization and

8

L. Perreault et al.

efficient releases for the operation of the run-of-river plants, given the inflows on those subbasins. For the three run-of-river plants, the issue is an efficient scheduling, given the inflows on the watersheds and the upstream releases. It is quite clear that the future state of inflows plays a major role on the decisions that will be made and the efficiency of the operations. The need for probabilistic hydrological forecasts is thus self-evident. Our illustration of the assessment of the hydrological forecast value is specific to the statistical long-term MJJ spring prediction for the Churchill Falls basin where a single power plant, not owed by Hydro-Québec, is installed (Fig. 3). The total capacity is 5 428 MW (15% of HQ’s total capacity). This watershed served as a test basin to explore the potential use of atmospheric circulation variables for making seasonal streamflow forecasts for high-latitude basins on the Québec-Labrador peninsula. Given the nature of the forecasts presented in this section, how can we adequately evaluate probabilistic medium- to long-range hydrological forecasts? How can we compare and rank competing hydrological forecasting approaches? In the following sections, we will try to propose a framework that tackles the operational challenges raised here which ensue from the way the forecasts are being constructed and used to optimize electricity production. We will enlighten the issues such as the unknown forecasts and observations distributions, metric choice, uncertainty in score evaluation, multivariate verification, and the communication of verification information. Note that probabilistic forecasts based on historical hydrographs, denoted by HIST, are used as baseline forecasts to evaluate the performance of HQ medium- to longrange hydrological predictions, in particular when using skill scores.

3

Assessing Hydrological Probabilistic Forecasts: Statistical Framework

We employ the simple theoretical framework proposed by Gneiting et al. (2007) to formalize the problem of assessing hydrological probabilistic forecasts quality. At time t nature chooses a probability distribution Gt, which is considered as the true inflow generating process. On the other hand, the “forecaster” produces a hydrological forecast in the form of a predictive distribution denoted Ft or a sample from Ft (ensemble forecasts). Of course, the “true” distribution Gt is in practice never known but an observation xt drawn from it is available. Figure 4 illustrates the framework proposed by Gneiting et al. (2007). As mentioned by the authors, the diagnostic approach for such a forecast faces a challenge, in that the forecasts take the form of probability distributions, whereas the observations are real valued. The evaluation of probabilistic hydrological forecasts thus needs particular types of diagnostic tools. Of course, it is impossible to evaluate a single probabilistic forecast, and the size T of the forecast archives plays a major role in determining the significance of the result. Before considering verification tools such as scoring rules, we must formally define what can be considered as a good probabilistic forecast. Gneiting et al. (2007)

Verification of Medium- to Long-Range Hydrological Forecasts

9

Fig. 3 Churchill Falls watershed

assert that the goal of probabilistic forecasting is to “maximize the sharpness of the predictive distributions subject to calibration.” Here, calibration refers to the statistical consistency between the predictive distribution and the observation, while sharpness refers to the concentration of the predictive distribution. The sharper the

10

L. Perreault et al.

Nature

G1

G2

x1

x2

ft

Gt

GT

xt

xT

xt

F1

F2

Ft

FT

Forcaster Fig. 4 Framework proposed by Gneiting et al. (2007)

forecast, the higher the information value it will provide if, of course, it is reliable, i.e., well calibrated. Scoring rules, denoted here by S(F,x), address calibration and sharpness simultaneously. Therefore, they are interesting summary measures of predictive performance. Of course, only proper scores must be considered to evaluate probabilistic forecasts. To avoid hedging, the forecaster’s probability assignment to a future inflow event should be completely independent of the particular verification system. This is guaranteed if the scoring rules in use are proper. In short, a proper score will always prefer a probabilistic forecast if it is, in fact, more accurate (Brocker and Smith 2007). In decision theory, finance, and meteorology, scoring rules have been used intensively for many years to evaluate probabilistic forecasts. A number of scores have been proposed in these domains. However, verification of probabilistic forecasts using formal scoring rules and sophisticated diagnostic tools has drawn the attention of hydrologists quite recently. Nevertheless, the use of standard scoring rules is becoming widespread in hydrology, for instance, in hydropower companies where daily inflow forecast is crucial to optimize planning of electricity production capacity (Électricité de France, BC-Hydro, Hydro-Québec, etc.; see Weber et al. 2006; Perreault 2013). Besides deterministic scores such as the Nash-Sutcliffe coefficient and graphical diagnostic tools such as the reliability diagram, the CRPS is probably the most used scoring rule. It is often the only score suited to evaluate

Verification of Medium- to Long-Range Hydrological Forecasts

11

Table 1 Mathematical definition of scoring rules with their respective loss function Scoring rules Logarithmic

Mathematical expression SðFt , xt Þ ¼ log½f t ðxt Þ

Behavior 10

S(F,X)

8 6 4 2 0 -4

SðFt , xt Þ ¼

1

-2

0 x

2

4

-2

0 x

2

4

-0.8 -4

-2

0 x

2

4

3.5 3 2.5 2 1.5 1 0.5 0 -4

-2

0 x

2

4

2

4

0.4

½f t ðyÞ2 dy  2f t ðxt Þ

0.2 S(F,X)

Quadratic

ð1

0 -0.2 -0.4 -0.6 -4

SðFt , xt Þ ¼ ð 1 1

f t ðxt Þ

½f t ðyÞ2 dy

0

1=2 S(F,X)

Spherical

-0.2 -0.4 -0.6

SðFt , xt Þ ¼



2 FðyÞ  ½xt , þ1Þ ðyÞ dy S(F,X)

CRPS

ð

Quantile

  Sðqt , xt Þ ¼ ðxt  qt Þ ð1, xt  ðqtÞ  α

7

q0.15 q0.50 q0.85 Global

6 S(F,X)

5 4 3 2 1 0 -4

-2

0 x

probabilistic forecasts that is considered in number of verification experiments (see, for instance, Alfieri et al. 2014; Brown et al. 2014). The fact that the CRPS is reported in the units of the observations and that it reduces to the mean absolute error if F is a deterministic forecast, thereby allowing a direct comparison between probabilistic and deterministic forecasts, may explain its popularity. However, the hydrologists should not limit their analysis to the use of a single score. A verification analysis could benefit from additional scoring rules. In what follows, we briefly present some scores that, to our knowledge, are rarely considered in the evaluation of medium- to long-range hydrological forecasts. Mathematical definitions of several scoring rules in use at Hydro-Québec for univariate continuous predictive distributions are presented in Table 1: the logarithmic (ignorance score), quadratic, spherical, and CRPS scores. The quadratic (Selten

12

L. Perreault et al.

1998) and spherical (Jose 2009) scoring rules are currently used in finance and economics. Quantile scoring rules, which are rarely considered in hydrology, are also presented in Table 1. We find this type of scoring rules very useful since some hydraulic planning models use as input not all the predictive distribution but often predictive quantile estimates. The expression of the simplest quantile scoring rule is given. As mentioned by Gneiting et al. (2007), the econometric literature refers to this scoring rule as the tick or the check loss function. This score is also well known for its application in quantile regression. Note that ft stands here for the predictive density function and A ðyÞ is the wellknown indicator function defined as  A ðy Þ ¼

0 1

if y 2 =A : if y  A

(1)

To illustrate how these scoring rules differ, we present in Table 1 a graph which shows their behavior for different possible values of the observation xt, assuming a normal predictive density function. The logarithmic score penalizes the “forecaster” very severely when the observation xt is situated in the extremities of the predictive density. This is very clear in the corresponding graph of Table 1, where the value of the score quickly tends to infinity when the observation is far from the center of the predictive distribution. The quadratic and spherical score functions are bounded. Selten (1998) recommends that the quadratic score should, for that reason, be favored compared with the logarithmic score. According to this author, the latter would be too sensitive to the situations where the observation is situated far in the tails of the predictive density. However, that remains debatable. In fact, the rejection of a scoring rule should take into account not only its mathematical behavior but also the decisional contexts for which forecasts are produced. In some critical situations, namely, face to the possible occurrence of harmful extreme events, we believe that severe scoring rules which are sensitive to the shape of the predictive distribution must be included in a verification system. In a recent comparative study with quadratic and spherical scores, Bickel (2007) finds several advantages to use the logarithmic score and other sensitive scoring rules. Finally, the CRPS is known to be less sensitive to the shape of predictive densities than many other scoring rules. Its expression is in fact related to the so-called robust M-estimators (Huber 1981). This “robustness” property is often viewed as an advantage. To our view, this may be questionable in our decisional context. As mentioned, the scoring rules presented in Table 1 are designed to evaluate a univariate predictive distribution. However, good hydrological forecasts should reproduce spatial coherence for basins within a given region. Preserving the spatial dependence structure of inflows becomes crucial when the hydrological forecasts are used to plan hydroelectric production of a large set of sites such as the park of equipment of Hydro-Québec (more than 90 sites). Verifying the performance marginally (i.e., only the forecasts issued for each individual site) is then clearly not enough. This issue becomes very important since statistical post-processing of

Verification of Medium- to Long-Range Hydrological Forecasts

13

hydrometeorological probabilistic forecasts is gaining popularity. Do these statistical approaches preserve the spatial coherence and the dependence structure between variables? How can we adequately evaluate a multivariate probabilistic forecast (here a multisite hydrological forecast)? One option is to use a multidimensional generalization of the CRPS, the energy score (Gneiting et al. 2008). Given an ensemble forecasts y1, . . ., yn for a vector-valued quantity that takes values in ℜd and the corresponding observation x (for instance, ensemble hydrological forecasts for d basins), then the nonparametric estimator of the energy score proposed by Gneiting et al. (2008) is given by SðF, xÞ ¼

n n X n X 1X yj  x  1 y  yj ; n j¼1 2n2 i¼1 j¼1 i

(2)

where k • k denotes the Euclidian norm. The energy score is proper. In a multivariate setting, this property of a scoring rule gains extra meaning. Indeed a multivariate distribution is composed of marginal distributions tied together by a dependence structure. This means that a proper multivariate score, such as the energy score, will reward a forecast in which the marginal distributions are well predicted individually as well as in relation to each other. To illustrate this, we conduct a Monte Carlo experiment with the energy score on copulas, which describes the dependence structure exactly with marginal effects eliminated (Genest and Favre 2007). We use two different predictions and compare the distributions of the average scores for bivariate Clayton copula observations. One prediction takes into account the dependence structure (DEP), while the other treats the marginal effects as independent via the independence copula (IND). This way both forecasts perfectly predict the marginal distributions which are uniform, but one wrongly assumes that the two variables are independent, while the other one uses the correct copula. As we can see in Fig. 5, the distribution of the average energy score is lower for the forecasts that picked the right dependence structure. The difference is more significant as the Kendall rank correlation coefficient (commonly referred to as Kendall’s tau coefficient) increases. Note that it has been pointed out recently by Pinson and Tastu (2013) that the ability of the energy score to detect incorrectly specified correlations can be limited. Scheuerer and Hamill (2015) have then proposed an alternative multivariate family of scoring rules based upon the concept of variogram which are “more discriminative with respect to the correlation structure.” This simple experiment demonstrates that a multivariate forecast is not simply the sum of its univariate parts. In meteorology and hydrology, different physical measures have strong dependence structures which a forecast should accurately recreate. The use of proper multivariate scores such as the energy score or the more recent variogram-based scoring rules is recommended as it penalizes incoherent predictions in the marginal distributions as well as misrepresentations of the dependence structure. As mentioned earlier such scoring rules can also be used to verify that statistical

14

L. Perreault et al. Tau = 0

Tau = 0.3

Tau = 0.6

Tau = 0.9

0.265

Energy score

0.26 0.255 0.25 0.245 0.24 0.235 0.23 DEP

IND

DEP

IND DEP Type of forecast

IND

DEP

IND

Fig. 5 Distributions of average energy scores obtained with an independence assumption (IND) and with the correct dependence structure (DEP) of a bivariate Clayton copula, as Kendall’s tau varies. Kendall’s tau is a measure of association between variables

post-processing of hydrometeorological forecasts does not alter the relationship between different variables it is affecting.

4

Estimation of Scores

Very seldom do hydrological forecasts actually consist of an explicitly defined predictive probability distribution function. Instead we might produce different scenarios loosely based on quantiles or deterministic forecasts whose initial values are perturbed with random noise. This leaves us with several options to estimate a given score of a forecast. For instance, shall we consider a parametric approach or a more robust nonparametric method to estimate the CRPS? These approaches can lead to different estimated score values and therefore may have an impact on the following conclusions made about the forecast system. This issue becomes very important when the forecasting process produces predictive distributions with various shapes such as those presented in Fig. 1. To illustrate this problem, we performed a simulation study aimed at comparing different parametric and nonparametric estimation methods for the CRPS. We pick nature’s probability distribution G and produce a sample y1, . . ., yn as well as an observation x from that distribution. The true score, which is the best one on average, can be calculated and compared to an estimator based on the ensemble y1, . . ., yn. For the parametric methods, we study the effect of misjudging the real probability distribution G in different cases such as asymmetry or overdispersion. The

Verification of Medium- to Long-Range Hydrological Forecasts

15

nonparametric estimators do not suffer from that issue but might not be appropriate in certain situations, notably for small sample sizes. In this study we focus on comparing different nonparametric estimators Snp,k of the CRPS as well as a parametric estimator Sp numerically computed using a Monte Carlo approximation (Robert and Casella 2000). The first nonparametric estimator is based on the representation of the CRPS with expectations (Gneiting and Raftery 2007) and has the following simple expression: Snp, 1 ðF, xÞ ¼

m m

1X 1 X

y  y0 ; jyi  xj  i m i¼1 2m i¼1 i

(3)

where y ¼ ðy1 , . . . , ym Þ and y0 ¼ ðy0 1 , . . . , y0 m Þ denote two independent samples of size m from the predictive distribution F. In practice, we have to split the ensemble y1, . . ., yn into two subsamples of equal length. This estimator was used, namely, by Friederichs and Thorarinsdottir (2012). We also propose to use another nonparametric estimator which appears natural when considering the integral representation of the CRPS, where we swap the predictive distribution function F for the empirical distribution function Fn: ð Snp, 2 ðF, xÞ ¼



2 Fn ðzÞ  ½x, þ1Þ ðzÞ dz;

(4)

where Fn ðzÞ ¼

n 1X ð1, z ðyi Þ: n i¼1

(5)

The third nonparametric estimator from Naveau et al. (2014) is based on U-statistics and has the following form: Snp, 3 ðF, xÞ ¼ x þ

2n 1X ðy  xÞð1, yi Þ ðxÞ  2b μn ; n i¼1 i

(6)

where μbn ¼

X 1 max yi , yj 2nð2n  1Þ 1i12 h to a few days) for flash floods and medium range (3–15 days) for flooding to monthly to seasonal for groundwater and water management predictions. Alfieri et al. (2012) listed the operational short- to medium-range meteorological forecasting methods in increasing temporal (and spatial) resolution as radar and ensemble radar nowcasting, radar-NWP blending, deterministic and ensemblelimited area NWP, and deterministic and ensemble global NWP (Fig. 2). Radar precipitation forecasts can be very useful as an estimate of high-intensity precipitation, as was showed in the project HAREN (http://haren-project.eu/) and

Fig. 2 Typical lead times for hazardous events and the lead time and skill horizons of operational forecast systems (Alfieri et al. 2012)

Hydrological Challenges in Meteorological Post-processing

7

follow-up project EDHIT (http://edhit.eu/), and in combination with ensemble forecasts, it can be a powerful nowcasting tool, given that the limitations and uncertainties are sufficiently taken into account (Smith et al. 2014). The radar data also has to be available in real time and easily accessible for systems. Flood forecasting of rivers, lakes, and reservoirs on longer lead times, ranging from 1 day up to weeks in advance, depends upon NWP models. The sensitivity of the hydrological forecasts to the NWP output depends on the physical properties of the watershed. The world-leading NWPs are typically skilful up to 15 days. However, the need for post-processing is still substantial due to the shortcomings of model output in capturing all aspects of the most important weather variables, such as precipitation, temperature, and evapotranspiration. The most important aspect for flood forecasting is the prediction of the magnitude, timing, and location of extreme precipitation events. Similar requirements, but potentially with less focus on the extreme, may be needed for the real-time management of water infrastructure, for example, in the dam management involved in the generation of hydroelectric power. On the sub-seasonal to seasonal scale, it becomes more important to correctly model the water balance as these forecasts are potentially useful to assess, for example, the inflow into reservoirs from snowmelt or the risk of droughts or low flows. These forecasts are useful for seasonal planning and the post-processing needs to be able to correct systematic errors rather than extreme values. On the longer forecast time scales, there is also the possibility to mix dynamical climate models with statistical tools, for example, using known correlations of large-scale phenomenon such as NAO with local predictors to correct the model output.

2.3

Spatial Relevance

As already noted in hydrological applications, it is important that the NWP exhibits the same spatial correlation and variability as the observational data; otherwise, the assumptions that were used during the calibration are violated, for example, the correct representation of orographic lifting which will have large impact on both the location and the magnitude of precipitation, which in turn will affect the modelled flow (He et al. 2009). The spatial correlation between modelled NWP output and observations is inherently linked to the spatial resolution of the forecast. Computational constraints limit the current resolution in operational NWPs which in turn affect the output in terms of spatial smoothing. Further, the “effective resolution,” which is the resolution at which the NWP produces useful information, is often coarser than the formal output resolution. It is to be hoped that these issues will become less prominent as models progress with higher resolution and improved description of the sub-grid physical processes (Haiden et al. 2014). Currently however post-processing is required to relate the meteorological forecasts on the “effective resolution” to the observed variability. This may involve elements of downscaling, the degree of which is dependent upon the hydrological model, to capture features at a scale smaller than the “effective resolution.” One aspect of this is discussed in the following section.

8

F. Wetterhall and P. Smith

However post-processing could also be used to account for uncertainty in the spatial displacement of, for example, convective cells which may be significant if it results in precipitation falling in the wrong catchment.

2.4

Event Magnitude

Even a timely spatially correct forecast may not be of use in forecasting if the event magnitude is poorly predicted. The event magnitude is important in this aspect, pluvial flood events are caused entirely by intense precipitation. The difficulties is predicting event magnitude are exemplified by two typical problems with general circulation models (GCMs), regardless whether they are applied as NWP or climate models. These are the overestimation of the number of rainy days (the “drizzle effect”; Murphy 1999) and the underestimation of extreme precipitation. These two contradicting errors (overestimation of low-intensity events and underestimation of the magnitude of high-intensity events) can result in a good estimation of the mean precipitation over time (Wetterhall et al. 2012). The “drizzle” problem attributes itself as too few days with zero precipitation and will cause problems for applications that are dependent on the succession of dry days to define, for example, dry spells (Wetterhall et al. 2015). The inability to correctly model heavy precipitation events limits the model’s usability in flood forecasting since intensity in precipitation is a flood-generating mechanism. Both these deficiencies are to some extent related to the resolution, since the model describes the weather over each grid area. Sub-grid extremes will therefore not be represented correctly in the model. But there are other factors that contribute to the underestimation of both extremes. Precipitation is in itself an intermittent and nonlinear process, which makes it very difficult to model correctly. Important processes are parameterized and simplified which will have a negative effect on representing the observed precipitation distribution. The correction of such deficiencies is one of the most common uses of statistical post-processing. Care though should be taken that the post-processing can deal with the event sizes that are most crucial for a particular location (Wetterhall et al. 2012).

2.5

Consistency Between Variables

Hydrological models are sensitive to preserving the consistency between the driving variables such as precipitation, temperature, and evapotranspiration in order to correctly model important process. This is, for example, particularly important during snowmelt periods which are to a large extent driven by temperature. The diurnal cycle of temperature will heavily affect the melting; a cold night will slow down any melting, whereas a warm cloudy night during winter will heavily accelerate the melting. Likewise, the evapotranspiration during warm, dry summer days may be misrepresented if the variables are individually post-processed. Post-

Hydrological Challenges in Meteorological Post-processing

9

processing needs to preserve the consistency between variables to prevent unphysical behavior in the hydrological modelling. Roulin and Vannitsem (2015) show that only post-processing of precipitation did not improve the resolution of the hydrological ensembles.

3

How Meteorological Post-processing Affects Hydrological Prediction

There are a number of post-processing methods used to calibrate meteorological fields in hydrological applications, and for an overview of the used methods, we refer to other sections in this handbook or the vast literature on this topic (Glahn and Lowry 1972; Wilks 2011; Gneiting 2014). This section describes these methods, including the advantages and disadvantages from a hydrological perspective.

3.1

Spatial and Temporal Interpolation

In most forecasting chains, there is a mismatch of the resolution of the driving NWP and the hydrological model, and the driving data consequently has to be interpolated. The choice of interpolation method will depend on the variable, for example, can bilinear be useful for temperature and evapotranspiration, whereas a massconservative method might be more useful for precipitation. The interpolation can include a spatial correction, for example, using a DEM with high resolution to take into account local effects. He et al. (2009) used the previous months’ precipitation to estimate a mass-conservative spatial correction of over a small catchment. The method was basically a reshuffling of the precipitation over the catchment without preserving the predominating pattern of the observed precipitation. Balsamo et al. (2010) used the opposite approach by preserving the spatial fields generated by the model, in that case ERA-Interim (Dee et al. 2011), and correcting the monthly mean to correspond to the global GPCC dataset. This method has the advantage that the modelled consistency between the output parameters is preserved and that the water balance over time is comparable to observations. Correlation between variables needs to be addressed in any application, especially regarding spatially varying variables such as precipitation, and a rule of thumb is to coarsen the output spatially by taking the mean from the surrounding grid points rather than using individual points. Another technique is spatial pooling, where the neighboring points are included in the analysis. However, often the model output is taken as face value without consideration as to which spatial scale you would expect predictability. This creates a false sense of accuracy of the model system. Reducing the spatial resolution, either through averaging or pooling, can have large impacts in areas where there are sharp borders in the landscape, such as coastal zones and mountainous regions.

10

3.2

F. Wetterhall and P. Smith

The Analogue Approach and Poor Man’s Ensemble

Instead of using modelled output, another approach is to use historical observations as forecasts. In its simplest form, previous years are randomly selected and used as an ensemble forecast, or the “poor man’s ensemble.” The nice feature of this approach is that it is perfectly reliable. The analogue approach also reuses historical observations, but it makes the selection based on similarities between observed weather and the historical catalogue (Obled et al. 2002; Radanovics et al. 2013; Wetterhall 2005). The analogues are usually selected based on large-scale parameters, such as pressure fields, winds, and atmospheric humidity. Analogue methods work over smaller areas where there is a clear connection between the large-scale circulation and local variables, but they have also been used in country-wide applications. The main drawback is the stationarity of the relationship between predictor and predictand, and the methods need long time series of calibration.

3.3

Model Output Statistics

Model output statistics (MOS; Wilks 2011) has in NWP traditionally been applied through regression techniques in order to correct systematic errors in the output. Methods have been developed over the years and have gone from simpler mean bias correction techniques to full-blown probabilistic post-processing techniques (Gneiting 2014). One limitation of MOS is that it requires long time series of consistent model output. This is often not the case for hydrometeorological forecasts where the model development is constant, which has the effect that the operational output is nonstationary (i.e., not constant over time). These developments are, for example, through upgrades in the model physics, in the resolution, or in the initial conditions. This can be overcome if the NWP produces hindcasts which can be used for calibration (Hagedorn et al. 2008; Hamill et al. 2008) or by using the latest operational runs. The latter approach has the disadvantage of using only a small subsample of the historical period for the calibration, whereas hindcasts are usually run over a larger sample of the historical period. In the case of ECMWF, the hindcasts are run twice a week for the forecasts starting on a particular date over the previous 20 years. The most common correction is to remove the mean bias from the forecast in comparison with the observations. This is straightforward if variables can be assumed to be normally distributed and the model error can be assumed linear. Quantile mapping (QM) is a technique where the cumulative distributions of the forecasted variables are adjusted to match those of the observed counterparts, for example (Wood and Lettenmaier 2006; Wetterhall et al. 2012; Yang et al. 2010; Themeßl et al. 2011). A disadvantage of QM for hydrological application is that in a multivariate system that the pairing between the variables is disrupted (Madadgar et al. 2014).

Hydrological Challenges in Meteorological Post-processing

3.4

11

Bayesian Post-processing Methods

The most used methods in hydrological uncertainty analysis are Bayesian frameworks, which are used to derive the joint distribution of forecasts and observations. New data is used to update prior knowledge and likelihoods to provide conditional posterior distributions which estimate the predictive uncertainty of the predictands. Hydrological forecasting has used Bayesian uncertainty processors for a long time, and the methods cover a wide range of applications, such as Bayesian model averaging (BMA; Raftery et al. 2005), ensemble dressing technique (Roulston and Smith 2003; Wang and Bishop 2005), and ensemble kernel density MOS. Many BMA methods post-process one variable at one location for a specific time. However, as discussed earlier, consistency between variables are important, and it is necessary to account for the joint probability distributions corrected variables. There are methods for addressing this, for example, the “Schaake shuffle” which uses the rank order structure from past observations (Clark et al. 2004). Another approach is using ensemble copula coupling (Schefzik et al. 2013), which uses the multivariate rank dependence structure from the raw ensemble to produce corrected ensembles without destroying the correlation between the variables. Madadgar et al. (2014) suggested using a multivariate copula approach to account for the joint behavior of the included variables.

4

Future Challenges in Post-processing

Post-processing of NWP has come a long way in theory and praxis, and they are now routinely used in many hydrological and meteorological institutes (Gneiting 2014). However, there are areas that still need to be addressed, and in this section the most important of them are highlighted: In the case of flood forecasting, predictive uncertainty can be defined as the uncertainty that a decision maker has on the future evolution of a predictand that he uses to make a specific decision. (Coccia and Todini 2011)

4.1

Extreme Values

One of the most important aspects of hydrological impact modelling is extreme events, both in terms of flash floods and riverine and groundwater floods. These events occur on different time scales and pose therefore naturally different demands on the post-processing. Flash floods caused by intense precipitation are mainly event driven, and the magnitude and location of convective storms are particularly challenging, both because of the NWP’s inability to model convection and due to the rapid developments of these thunderstorms. In this context, blending forecast model

12

F. Wetterhall and P. Smith

output with, for example, remote sensing techniques (satellite, weather radar) has become increasingly used (Tafferner et al. 2008; Velasco-Forero et al. 2009). Modelling of extreme precipitation will most certainly improve in the future since processes that previously were parameterized will with higher resolution and increasing computer power be resolved explicitly, thus enabling a more realistic model output (Haiden et al. 2014). This is however not trivial, and it will still take many years before this is possible on a global scale. Statistical and dynamical downscaling are techniques that can bridge this gap in the meantime (Maraun et al. 2010).

4.2

Combining with Hydrological Post-processing

Studies have shown that post-processing of the forecasted discharge is more efficient in terms of improving scores than post-processing of the driving meteorological variables (Arheimer et al. 2011; Roulin and Vannitsem 2015). However, Roulin and Vannitsem (2015) did find that only using pre-processing of precipitation did not improve the resolution of the hydrological ensembles. Good initial conditions were necessary to improve resolution, but post-processing could compensate for this. Post-processing of meteorological forecasts can be combined with hydrologic post-processing through data assimilation techniques applied for model state variable updating or output error correction. It is important in those situations to separate the improvements from the pre- and post-processing to rightly account for the individual treatments, for example, through multivariate analysis.

4.3

Toward a Seamless Forecasting System

The holy grail of meteo-hydrological forecasting is the idea of seamless predictions with the basic concept of using one model system over all forecast horizon, in theory from nowcasting to multi-decadal forecasting (Pappenberger et al. 2013). The term “seamless” is misleading, since it assumes a forecast without “seams” between the modelling components. This idea is appealing but unrealistic and unnecessary since the demands on complexity, dissemination, and temporal and spatial resolution on the forecast will change with lead time. Therefore, “seemingly seamless” which implies an unnoticeable transition between forecasts over time more accurately describes the desired attributes of the forecast. The simplest form of seamless forecasts concatenates different models of different spatial and temporal resolution. The resulting forecast will not be homogenous and biases will not be constant in time which puts hard demands on the postprocessing to account for this. MOS for different lead times (i.e., model versions) might overcome some of these problems, such as distribution mapping (Wetterhall et al. 2012; Yang et al. 2010) and quantile transformation which are methods to address the biases. Di Giuseppe et al. (2013) proposed a mapping of the observed principal components on the modelled eigenvectors which results in spatially

Hydrological Challenges in Meteorological Post-processing

13

unbiased forecasts. Temporal inconsistencies are more difficult to address in such a system, especially on the shorter lead times. The forecast models will inevitably have different initial conditions and weather developments which potentially can lead to forecast jumps in the transition from one model to the other.

5

Summary

This paper summarizes the hydrological needs of meteorological post-processing in order to make NWP model output useful for hydrological applications. It is important that post-processing not only removes the structural errors but also preserves the temporal and spatial correlation between variables and produces reliable forecasts. The most important techniques in post-processing and how they affect the NWP output are discussed. Finally, the most interesting future research directions are discussed as research areas where more research is needed.

References L. Alfieri, P. Salamon, F. Pappenberger, F. Wetterhall, J. Thielen, Operational early warning systems for water-related hazards in Europe. Environ. Sci. Policy 21, 35–49 (2012) B. Arheimer, G. Lindström, J. Olsson, A systematic review of sensitivities in the Swedish floodforecasting system. Atmos. Res. 100, 275–284 (2011) G. Balsamo, S. Boussetta, P. Lopez, L. Ferranti, Evaluation of ERA-Interim and ERA-InterimGPCP-rescaled Precipitation Over the U.S.A. ERA Report Series (European Centre for Medium Range Weather Forecasts, Reading, 2010) A. Bárdossy, T. Das, Influence of rainfall observation network on model calibration and application. Hydrol. Earth Syst. Sci. 12, 77–89 (2008) K. Beven, P. Smith, Concepts of information content and likelihood in parameter calibration for hydrological simulation models. J. Hydrol. Eng. 20(1) (2015). doi:10.1061/(ASCE)HE.19435584.0000991 M. Clark, S. Gangopadhyay, L. Hay, B. Rajagopalan, R. Wilby, The Schaake Shuffle: a method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeorol. 5, 243–262 (2004) H.L. Cloke, F. Pappenberger, Ensemble flood forecasting: a review. J. Hydrol. 375, 613–626 (2009) G. Coccia, E. Todini, Recent developments in predictive uncertainty assessment based on the model conditional processor approach. Hydrol. Earth Syst. Sci. 15, 3253–3274 (2011) D.P. Dee, S.M. Uppala, A.J. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U. Andrae, M.A. Balmaseda, G. Balsamo, P. Bauer, P. Bechtold, A.C.M. Beljaars, L. Van De Berg, J. Bidlot, N. Bormann, C. Delsol, R. Dragani, M. Fuentes, A.J. Geer, L. Haimberger, S.B. Healy, H. Hersbach, E.V. Hólm, L. Isaksen, P. Kållberg, M. Köhler, M. Matricardi, A.P. Mcnally, B.M. Monge-Sanz, J.J. Morcrette, B.K. Park, C. Peubey, P. De Rosnay, C. Tavolato, J.N. Thépaut, F. Vitart, The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Q. J. Roy. Meteorol. Soc. 137, 553–597 (2011) F. Di Giuseppe, F. Molteni, A.M. Tompkins, A rainfall calibration methodology for impacts modelling based on spatial mapping. Q. J. Roy. Meteorol. Soc. 139, 1389–1401 (2013) H.R. Glahn, D.A. Lowry, The Use of Model Output Statistics (MOS) in objective weather forecasting. J. Appl. Meteorol. 11, 1203–1211 (1972) T. Gneiting, Calibration of Medium-Range Weather Forecasts. Technicla Memorandum (ECMWF, Reading, 2014)

14

F. Wetterhall and P. Smith

R. Hagedorn, T.M. Hamill, J.S. Whitaker, Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: two-meter temperatures. Mon. Weather Rev. 136, 2608–2619 (2008) T. Haiden, L. Magnusson, I. Tsonevsky, F. Wetterhall, L. Alfieri, F. Pappenberger, P. De Rosnay, J. Muñoz-Sabater, G. Balsamo, C. Albergel, R. Forbes, T. Hewson, S. Malardel, D. Richardson, ECMWF Forecast Performance During the June 2013 Flood in Central Europe (European Centre for Medium-Range Weather Forecasts, Reading, 2014) T.M. Hamill, R. Hagedorn, J.S. Whitaker, Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: precipitation. Mon. Weather Rev. 136, 2620–2632 (2008) Y. He, F. Wetterhall, H.L. Cloke, F. Pappenberger, M. Wilson, J. Freer, G. Mcgregor, Tracking the uncertainty in flood alerts driven by grand ensemble weather predictions. Meteorol. Appl. 16, 91–101 (2009) HEPEX, HEPEX-SIP Topic: Post-processing (1/3) (2015) [Online]. Available: http://hepex.irstea. fr/hepex-sip-topic-post-processing-13. Accessed 10 June 2015 V. Klemeš, Operational testing of hydrological simulation models. Hydrol. Sci. J. 31, 13–24 (1986) F. Lalaurette, Early detection of abnormal weather conditions using a probabilistic extreme forecast index. Q. J. Roy. Meteorol. Soc. 129, 3037–3057 (2003) K. Liechti, L. Panziera, U. Germann, M. Zappa, The potential of radar-based ensemble forecasts for flash-flood early warning in the southern Swiss Alps. Hydrol. Earth Syst. Sci. 17, 3853–3869 (2013) Y. Liu, A.H. Weerts, M. Clark, H.J. Hendricks Franssen, S. Kumar, H. Moradkhani, D.J. Seo, D. Schwanenberg, P. Smith, A.I.J.M. Van Dijk, N. Van Velzen, M. He, H. Lee, S.J. Noh, O. Rakovec, P. Restrepo, Advancing data assimilation in operational hydrologic forecasting: progresses, challenges, and emerging opportunities. Hydrol. Earth Syst. Sci. 16, 3863–3887 (2012) S. Madadgar, H. Moradkhani, D. Garen, Towards improved post-processing of hydrologic forecast ensembles. Hydrol. Process. 28, 104–122 (2014) D. Maraun, F. Wetterhall, A.M. Ireson, R.E. Chandler, E.J. Kendon, M. Widmann, S. Brienen, H.W. Rust, T. Sauter, M. Themeßl, V.K.C. Venema, K.P. Chun, C.M. Goodess, R.G. Jones, C. Onof, M. Vrac, I. Thiele-Eich, Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user. Rev. Geophys. 48, RG3003 (2010) J. Murphy, An evaluation of statistical and dynamical techniques for downscaling local climate. J. Climate 12, 2256–2284 (1999) C. Obled, G. Bontron, R. Garçon, Quantitative precipitation forecasts: a statistical adaptation of model outputs through an analogues sorting approach. Atmos. Res. 63, 303–324 (2002) F. Pappenberger, F. Wetterhall, E. Dutra, F. Di Giuseppe, K. Bogner, L. Alfieri, H.L. Cloke, Seamless Forecasting of Extreme Events on a Global Scale (IAHS-IAPSO-IASPEI Assembly, Gothenburg, 2013) S. Radanovics, J.P. Vidal, E. Sauquet, A. Ben Daoud, G. Bontron, Optimising predictor domains for spatially coherent precipitation downscaling. Hydrol. Earth Syst. Sci. 17, 4189–4208 (2013) A.E. Raftery, T. Gneiting, F. Balabdaoui, M. Polakowski, Using Bayesian Model Averaging to calibrate forecast ensembles. Mon. Weather Rev. 133, 1155–1174 (2005) E. Roulin, S. Vannitsem, Post-processing of medium-range probabilistic hydrological forecasting: impact of forcing, initial conditions and model errors. Hydrol. Process. 29, 1434–1449 (2015) M.S. Roulston, L.A. Smith, Combining dynamical and statistical ensembles. Tellus A 55, 16–30 (2003) P. Salamon, L. Feyen, Disentangling uncertainties in distributed hydrological modeling using multiplicative error models and sequential data assimilation. Water Resources Research. 46, n/a–n/a (2010) J. Schaake, J. Pailleux, J. Thielen, R. Arritt, T. Hamill, L. Luo, E. Martin, D. Mccollor, F. Pappenberger, Summary of recommendations of the first workshop on postprocessing and

Hydrological Challenges in Meteorological Post-processing

15

downscaling Atmospheric Forecasts for Hydrologic Applications held at Météo-France, Toulouse, France, 15–18 June 2009. Atmos. Sci. Lett. 11, 59–63 (2010) R. Schefzik, T.L. Thorarinsdottir, T. Gneiting, Uncertainty quantification in complex simulation models using ensemble copula coupling. 48, 616–640 (2013) J.O. Skøien, K. Bogner, P. Salamon, P. Smith, F. Pappenberger, Regionalization of post-processed ensemble runoff forecasts. Proc IAHS 373, 109–114 (2016) P.J. Smith, L. Panziera, K.J. Beven, Forecasting flash floods using data-based mechanistic models and NORA radar rainfall forecasts. Hydrol. Sci. J. 59, 1403–1417 (2014) A. Tafferner, C. Forster, M. Hagen, C. Keil, T. Zinner, H. Volkert, Development and propagation of severe thunderstorms in the upper Danube catchment area: towards an integrated nowcasting and forecasting system using real-time data and high-resolution simulations. Meteorol. Atmos. Phys. 101, 211–227 (2008) M.J. Themeßl, A. Gobiet, A. Leuprecht, Empirical-statistical downscaling and error correction of daily precipitation from regional climate models. Int. J. Climatol. 31, 1530–1544 (2011) C.A. Velasco-Forero, D. Sempere-Torres, E.F. Cassiraga, J.J.A.I.M.E. Gómez-Hernández, A non-parametric automatic blending methodology to estimate rainfall fields from rain gauge and radar data. Adv. Water Resour. 32, 986–1002 (2009) X. Wang, C.H. Bishop, Improvement of ensemble reliability with a new dressing kernel. Q. J. Roy. Meteorol. Soc. 131, 965–986 (2005) F. Wetterhall, Statistical Downscaling of Precipitation from Large-scale Atmospheric Circulation (Philosophical Doctor/Uppsala University, Uppsala, 2005) F. Wetterhall, F. Pappenberger, Y. He, J. Freer, H.L. Cloke, Conditioning model output statistics of regional climate model precipitation on circulation patterns. Nonlinear Processes Geophys. 19, 623–633 (2012) F. Wetterhall, H.C. Winsemius, E. Dutra, M. Werner, E. Pappenberger, Seasonal predictions of agrometeorological drought indicators for the Limpopo basin. Hydrol. Earth Syst. Sci. 19, 2577–2586 (2015) D.S. Wilks, Statistical Methods in the Atmospheric Sciences, 3rd edn. (Elsevier, Burlington, 2011) A.W. Wood, D.P. Lettenmaier, A test bed for new seasonal hydrologic forecasting approaches in the Western United States. Bull. Am. Meteorol. Soc. 87, 1699–1712 (2006) W. Yang, J. Andreasson, P. Graham, J. Olsson, J. Rosberg, F. Wetterhall, Distribution-based scaling to improve usability of regional climate model projections for hydrological climate change impacts studies. Hydrol. Res. 41, 211–229 (2010) X. Yuan, E.F. Wood, Z. Ma, A review on climate-model-based seasonal hydrologic forecasting: physical understanding and system development. Wiley Interdiscip Rev Water 2, 523–536 (2015) E. Zsoter, Recent developments in extreme weather forecasting. ECMWF Newsletter (Reading, 2006)

Application to Post-processing of Meteorological Seasonal Forecasting Andrew Schepen, Q. J. Wang, and David E. Robertson

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 An Overview of CBaM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Bayesian Joint Probability Modeling Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Model Parameter Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Forecast Merging Through Bayesian Model Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Connecting Ensemble Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Application of CBaM for Forecasting Catchment Rainfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The GCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The Catchment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Predictor Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Predictand Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Extending Forecasts to Longer Lead Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Application Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Overall Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Skill of Monthly Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Skill of Three-Month Accumulation Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Statistical Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 4 4 6 10 11 14 15 15 15 16 16 18 18 19 21 21 21 22 23 24 25 27

A. Schepen CSIRO Land and Water, Dutton Park, Queensland, Australia Q.J. Wang (*) CSIRO Land and Water, Highett, VIC, Australia e-mail: [email protected] D.E. Robertson CSIRO Land and Water, Clayton, Victoria, Australia # Springer-Verlag Berlin Heidelberg 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_18-1

1

2

A. Schepen et al.

Abstract

Seasonal hydrological forecasting relies on accurate and reliable ensemble climate forecasts. A calibration, bridging, and merging (CBaM) method has been developed to statistically postprocess seasonal climate forecasts from general circulation models (GCMs). Postprocessing corrects conditional biases in raw GCM outputs and produces forecasts that are reliable in ensemble spread. The CBaM method is designed to extract as much skill as possible from the GCM. This is achieved by firstly producing multiple forecasts using different GCM output fields, such as rainfall, temperature, and sea surface temperatures, as predictors. These forecasts are then combined based on evidence of skill in hindcasts. Calibration refers to direct postprocessing of the target variable – rainfall for example. Bridging refers to indirect forecasting of the target variable – forecasting rainfall with the GCM’s Nino3.4 forecast for example. Merging is designed to optimally combine calibration and bridging forecasts. Merging includes connecting forecast ensemble members across forecast time periods by using the “Schaake Shuffle,” which creates time series forecasts with appropriate temporal correlation structure. CBaM incorporates parameter and model uncertainty, leading to reliable forecasts in most applications. Here, CBaM is applied to produce monthly catchment rainfall forecasts out to 12 months for a catchment in northeastern Australia. Bridging is shown to improve forecast skill in several seasons, and the ensemble time series forecasts are shown to be reliable for both monthly and seasonal totals. Keywords

Seasonal forecasting • Post-processing • Bayesian joint probability • Bayesian model averaging • Precipitation • Temperature • Forecast verification

1

Introduction

Seasonal climate forecasting centers around the world now routinely run coupled ocean–atmosphere general circulation models (GCMs). GCMs are similar to numerical weather prediction (NWP) models, in that they solve physical equations numerically to produce forecasts of meteorological variables. GCMs have simplified or parameterized physics and are run on coarse grids but include coupling of ocean and atmosphere modules. The main aim of running GCMs is to produce intra- to inter-seasonal forecasts driven by the slowly evolving boundary conditions such as sea surface temperatures. The development of GCMs has interested many in the streamflow forecasting community. A number of organizations around the world now officially produce seasonal streamflow forecasts. For example, the Australia Bureau of Meteorology releases probabilistic forecasts of the next 3 months streamflow at the beginning of each month (Tuteja et al. 2012). The forecasts released in Australia are based on a statistical model; however plans are in place to also deliver dynamic forecasts based on hydrological models and recent advances in GCMs.

Application to Post-processing of Meteorological Seasonal Forecasting

3

In an ideal world, it would be convenient to take forecast variables from GCMs and input them directly into hydrological models. A problem with GCM forecast variables is that they are often biased relative to observations. This is largely due to simplifications and parameterizations that are unavoidable elements of a GCM. It is also due to the coarse scale of the GCMs. For example, GCMs can sometimes have difficulty simulating rainfall effects associated with locally complex or steep terrain. Furthermore, beyond about 10 days from initialization, deterministic forecasts from a GCM are not necessarily realistic, due to the chaotic nature of the climate system. GCMs are therefore almost always run in ensemble mode. That is, multiple simulations are obtained by perturbing the initial conditions, using modified physics, or using initial conditions from different points in time. Often, the spread in ensemble forecasts produced using any of these methods is incorrectly dispersed and therefore incorrectly conveys forecast uncertainty. For an example see Lim et al. (2011). Most frequently, the ensembles are underdispersed, although it is quite possible that the ensembles are correctly dispersed or even overdispersed. Incorrectly dispersed ensembles fail to give reasonable estimates of forecast uncertainty. If GCM ensembles are to be used in hydrological models, it is necessary to postprocess the output to overcome biases and incorrectly dispersed ensembles. The severity of the problems is often a function of forecast lead time. For example, the longer a GCM run is, the more likely the ensemble mean a forecast variable is to drift from the mean of observed values. There are many approaches to post-processing in the literature. One such example is an analog downscaling method (Shao and Li 2013; Timbal and Jones 2008) that analyzes spatial patterns of the GCM forecasts of meteorological variables and then resamples the variable of interest, daily, from the historical record, based on a Euclidean distance metric. However, seasonal forecasting is usually concerned with time scales of weeks to months. It is therefore worthwhile to ask the question – is it better to post-process daily or monthly variables? There are a number of reasons in favor of postprocessing monthly variables. Predictability at the seasonal time scale arises due to the slowly evolving (low-frequency) climate signals. Monthly variables will therefore be less influenced by high-frequency weather noise and exhibit a stronger relationship with the source of predictability. After capturing the low-frequency climate signals in monthly post-processing, variables can be disaggregated to daily by, for example, using stochastic weather generators. In addition, monthly postprocessing requires significantly fewer computational resources. Monthly approaches to post-processing GCM outputs have been demonstrated by Schepen and Wang (2014) and Hawthorne et al. (2013). From a streamflow forecasting point of view, monthly rainfall forecasts can be directly used as input to a monthly hydrological model. Wang et al. (2011b) found that simulations of monthly streamflow volumes from the monthly water partition and balance (WAPABA) model are as skillful as those produced by two of the most widely used daily hydrological models in Australia for intra- to inter-seasonal forecasting of streamflows; it may therefore be sufficient to model at the monthly time step.

4

A. Schepen et al.

In this section, the Bayesian method of Schepen and Wang (2014) for postprocessing monthly GCM outputs will be presented. The method is referred to as CBaM, which stands for calibration, bridging, and merging. Calibration is focused on correcting GCM forecasts of variables that may be biased and unreliable in conveying uncertainty through ensemble spread. Bridging forecasts are produced by using GCM forecasts of climate indices, such as sea surface temperature anomalies, to indirectly forecast variables. Sometimes, the GCM is unable to translate skillful forecasts of climate indices into skillful forecasts of regional variables. Bridging can overcome this problem. Merging is designed to optimally combine calibration and bridging forecasts. Merging includes connecting forecast ensemble members across forecast time periods by using the “Schaake Shuffle,” which creates time series forecasts with appropriate temporal correlation structure (Clark et al. 2004). The method builds on the earlier Bayesian joint probability modeling approach of Wang et al. (2009) and Wang and Robertson (2011), and the Bayesian model averaging (BMA) approach of Wang et al. (2012b). The method will be outlined in detail to support implementation. An example application of CBaM to postprocess catchment rainfall in northeastern Australia will highlight the suitability of the approach for producing time series forecasts in the form of ensembles. The example will focus on 12-month forecasts of monthly rainfall and seasonal accumulations.

2

Methods

2.1

An Overview of CBaM

The calibration, bridging, and merging method is underpinned by a number of separate steps. The first step in CBaM is to establish multiple statistical models that relate predictor variables derived from coupled GCM output fields to observed variables. A Bayesian joint probability modeling approach (Wang and Robertson 2011; Wang et al. 2009) is applied for this purpose. The model is referred to as either a calibration model or a bridging model depending on the predictor variable in the model. This distinction will be expanded upon in the next section. The second step in CBaM is to reforecast the variable of interest the (predictand) using each of the calibration and bridging models. This is done by casting the calibration and bridging models as forecasting models by conditioning the joint probability models on predictor values. The third step in CBaM is to infer a weight for each calibration and bridging model based on historical performance. Historically better performing models receive higher weight. Bayesian model averaging (Wang et al. 2012b) is applied for this purpose. The fourth step in CBaM is to merge the forecasts from multiple calibration and bridging models based on the weights. Bayesian model averaging is applied to merge the ensemble forecasts.

Application to Post-processing of Meteorological Seasonal Forecasting

5

The fifth and final step in CBaM is to connect the ensemble members of forecasts for individual months to form ensemble time series forecasts. The Schaake Shuffle (Clark et al. 2004) is applied so that the ensemble time series have a realistic temporal correlation structure. This final step is important for variables that have a strong temporal correlation structure.

2.2

The Bayesian Joint Probability Modeling Approach

2.2.1 Overview Wang et al. (2009) and Wang and Robertson (2011) developed a sophisticated Bayesian joint probability (BJP) modeling approach to address many of the vexatious problems that arise in the statistical modeling of environmental data. The BJP modeling approach is able to handle data with missing records, data records of different lengths, skewed data, heterogeneous error structures, and bounded variables. It is thus ideal for post-processing hydroclimatological data. At the core of the BJP modeling approach is a multivariate normal distribution. Use of a symmetric continuous multivariate distribution eases the statistical modeling and overcomes the problems listed above, as will be described below. To present a formulation here that is as simple as possible, the formulation of BJP models presented here is for a single predictor and a single predictand, i.e., a bivariate formulation. The full multivariate formulation of BJP models is given by Wang et al. (2009) and Wang and Robertson (2011).

2.2.2 Model Setup Consider a predictor variable y1 and a predictand variable y2 such that 

 y1 y¼ : y2

(1)

The relationship of the variables can be modeled as a bivariate distribution. Meteorological and hydrological variables can sometimes have a skewed distribution. For example, while temperature typically follows a normal distribution, rainfall and streamflow often have right-skewed distributions. The BJP modeling approach allows for this by incorporating data transformations. It is thus the distribution of transformed variables that is assumed to follow a continuous bivariate normal distribution. The transformed predictor and predictand variables are y^1 and y^2, such that  y^ ¼

y^1 y^2

  N ðμ, ΣÞ

where μ is the vector of variable means and Σ is the covariance matrix. That is,

(2)

6

A. Schepen et al.



μ μ¼ 1 μ2

 (3)

and  Σ¼

σ 21 rσ 1 σ 2

rσ 1 σ 2 σ 22

 (4)

where σ i is a standard deviation and r is the correlation coefficient. In CBaM, if y1 and y2 refer to the same variable (e.g., y1 is a temperature forecast and y2 is observed temperature), then the model is a calibration model. If y1 and y2 refer to different variables (e.g., y1 is a sea surface temperature index forecast and y2 is observed temperature), then the model is a bridging model. Apart from the conceptual difference, the distinction is merely for convenience, as the BJP models are otherwise formulated identically. There are two transformations commonly used in the BJP modeling approach. For variables like rainfall and streamflow that can be quite skewed and which are bounded below by zero, a two-parameter log-sinh transform is preferred (Wang et al. 2012a). More specifically, 1  y^ ¼ ln sinhðα þ βyÞ: β

(5)

For other variables the log-sinh transformation is not suitable. For example, the log-sinh transformation cannot be used with sea surface temperature (SST) climate indices due to the presence of negative values in the data. A single-parameter Yeo–Johnson transform (Yeo and Johnson 2000) can be used instead. In this case, 8   λ > ð y þ 1 Þ  1 =λ > > > < log ðy þ 1Þ  y^ ¼ > >  ðy þ 1Þ2λ  1 =2  λ > > : logðy þ 1Þ

λ 6¼ 0, y  0 λ ¼ 0, y  0 : λ 6¼ 2, y < 0 λ ¼ 2, y < 0

(6)

In the BJP modeling approach, the predictor and predictand variables do not have to follow the same transformation. In CBaM this most commonly occurs in bridging models. For example, an SST predictor variable will follow a Yeo–Johnson transformation, whereas a rainfall predictand variable will follow a log-sinh transformation. Meteorological and hydrological variables sometimes have a lower bound (e.g., rainfall and streamflow have a natural lower bound of zero). This is problematic, as a probability mass at a lower bound implies a mixed discrete–continuous distribution, which is not compatible with the use of the bivariate normal distribution. The solution to this problem is to treat the occurrences of values equal to the lower

Application to Post-processing of Meteorological Seasonal Forecasting

7

bound as censored data. The lower bounds corresponding to the variables y1 and y2 are denoted c1 and c2, respectively, such that  c¼

 c1 : c2

(7)

When a value of yi is ci, the assignment yi  ci is made. In other words, the variable is hypothetically allowed to be below an actual lower bound, but its precise value is unknown.

2.3

Model Parameter Inference

2.3.1

Posterior Distribution of the Parameters and Numerical Sampling The model parameters, including the bivariate normal distribution parameters and the transformation parameters for each variable, are collected in θ. A Bayesian inference of all the parameters in  θ is made with historical data records for events in yD ¼ y1D , y2D , . . . , ytD , . . . , yTD . This inference accounts for parameter uncertainty. In application of CBaM to the problem of monthly and seasonal forecasting, parameter uncertainty is considerable due to the limited data available for use in model parameter inference. A typical number for T is about 30. According to Bayes’ theorem, the posterior distribution of the model parameters is defined by T  pðθjyD Þ / pðθÞpðyD jθÞ ¼ pðθÞ ∏ p ytD jθ

(8)

t¼1

T  where p(θ) is a prior distribution of the parameters and ∏ p ytD jθ is the likelihood t¼1

function defining the probability of observing the historical data yD given the model and its parameter set. A numerical solution to Eq. 8 is required. A full Bayesian inference of the joint distribution of the model parameters is performed using Markov chain Monte Carlo (MCMC) sampling. More specifically, the Metropolis algorithm with a symmetric, Gaussian proposal distribution is applied to draw samples from the posterior distribution. In CBaM it is typical to retain 1000 sets of parameters from MCMC sampling. Inferences about the parameters can then be made from the drawn samples. Further details of the MCMC sampling used in CBaM are given in the subsequent sections.

2.3.2 Prior Distribution of the Parameters The prior distribution of the parameters is given by

8

A. Schepen et al. 2  pðθÞ ¼ pðr Þ ∏ pðΔi Þp μi , σ 2i

(9)

i¼1

where Δi is the set of model transformation parameters. If a variable is under a log-sinh transformation, then the set of transformation parameters is Δi ¼ fαi , βi g:

(10)

If a variable is under a Yeo–Johnson transformation, then the set of transformation parameters is Δi ¼ fλi g:

(11)

A uniform prior is specified for each of the transformation parameters. That is, pð Δ i Þ / 1

(12)

The prior for the model parameters μi and σ 2i takes the simplest form commonly used for mean and variance (Gelman et al. 1995):  p μi , σ 2i ¼ 1=σ 2i :

(13)

The prior for r is the marginally uniform prior (Barnard et al. 2000). In the bivariate case, it reduces simply to pðr Þ / 1:

(14)

2.3.3 Likelihood Function The calculation of the likelihood function differs for combinations of variables above and below their respective censor thresholds. For each event time period t, the calculation of the likelihood function is given by (the superscript t is dropped for simplicity) 8 pðy , y jθÞ > > < 1 2 pðy1  c1 , y2 jθÞ pðyjθÞ ¼ pðy , y  c2 jθÞ > > : 1 2 pðy1  c1 , y2  c2 jθÞ

¼ J y^1 !y1 J y^2 !y2 pðy^1 , y^2 jθÞ y1 > c1 , y2 > c2 ¼ J y^2 !y2 pðy^1  c^1 , y^2 jθÞ y1  c1 ,y2 > c2 ¼ J y^1 !y1 pðy^1 , y^2  y^2 jθÞ y1 > c1 ,y2  c2 ¼ pðy^1  c^1 , y^2  c^2 jθÞ y1  c1 ,y2  c2

(15)

where J y^i !yi is the Jacobian determinant for the transformation from y^i to yi. If the log-sinh transformation is applied to yi, then J y^i !yi ¼ cothðαi þ βi yi Þ: If the Yeo–Johnson transformation is applied to yi, then

(16)

Application to Post-processing of Meteorological Seasonal Forecasting

J y^i !yi ¼

ðyi þ 1Þλi 1 ðyi  1Þ1λi

9

yi  0 yi < 0:

(17)

pðy^1 j^ y2 , θÞd^ y1 ;

(18)

pðy^2 j^ y1 , θÞd^ y2 ;

(19)

pðy^1 , y^2 jθÞd^ y1 d^ y2 :

(20)

The remaining terms in Eq. 15 are given by ^1 cð

pðy^1  c^1 , y^2 jθÞ ¼ pðy^2 jθÞ 1 ^2 cð

pðy^1 , y^2  c^2 jθÞ ¼ pðy^1 jθÞ 1

and ^1 cð

^2 cð

pðy^1  c^1 , y^2  c^2 θÞ 1 1

The standard bivariate conditional distribution is applied to obtain pðy^1 j^ y2 , θÞ and pðy^2 j^ y1 , θÞ. For example, ð1r2 Þ   σ y^  y2 , θÞ  N μy^1 þ r 1 y^2  μy^2 , 1  r 2 σ y^1 pðy^1 j^ : σ^ y2

(21)

2.3.4 Reparameterization of Model Parameters It is well documented that attempts to sample model parameters using the Metropolis algorithm with a Gaussian proposal can be problematic if some of the parameters are nonlinearly related. The problem manifests as low Metropolis acceptance rates (Thyer et al. 2002; Wang and Robertson 2011) and thus inefficient sampling. Therefore, a number of parameters are reparameterized to ease the MCMC inference and make for more efficient sampling. The parameters μ and σ are reparameterized to m and s through first-order Taylor series approximations where m and s are approximations of the mean and standard deviation of y, respectively. Where the log-sinh transformation is applied to a variable, the reparameterization of μ and σ is as follows: 1  μ ¼ ln sinhðα þ βmÞ β and

(22)

10

A. Schepen et al.

σ ¼ cot hðα þ βyÞs:

(23)

The Jacobian of the reparameterization is J μ, σ 2 !m, s2 ¼ ½cothðα þ βyÞ3 :

(24)

Where the Yeo–Johnson transformation is applied to a variable, the reparameterization of μ and σ is as follows: 8   λ > ð m þ 1 Þ  1 =λ > > > < log ðm þ 1Þ  μ¼ 2λ > >  ð m þ 1 Þ  1 =ð 2  λ Þ > > : logðm þ 1Þ

λ 6¼ 0, m  0 λ ¼ 0, m  0 λ 6¼ 2, m < 0 λ ¼ 2, m < 0

(25)

and

σ¼

ðm þ 1Þλ1 s ðm  1Þ1λ s

m0 m < 0:

(26)

The Jacobian of the reparameterization is

J μ, σ2 !m, s2 ¼

ðm þ 1Þ3λ3 ðm  1Þ33λ

m0 : m 1 . In CBaM a is a function of the number of models; more specifically a ¼ 1 þ a0 =K. A typical value for a0 is 0.5 or 1.0. Such a prior helps stabilizing the weights in presence of significant sampling variability. In the calculation of the likelihood function, the set of time periods  ¼ f1, 2, . . . , t, . . . , T g is partitioned into two sets. One set, 1 ; contains time periods for which ytD, 2 > c2 . The second set, 2 ; contains time periods for which ytD, 2  c2 . The calculation of the likelihood function is given by   p ytD, 2 , t ¼ 1, 2, . . . , Tjwk , f tk ðy2 Þ, k ¼ 1, 2, . . . , K, t ¼ 1, 2, . . . , T     ¼ ∏ f tBMA ytD, 2 ∏ FtBMA ytD, 2 ¼ c2 t  1

t  2

(42)

where K   X   wk f tk ytD, 2 f tBMA ytD, 2 ¼

(43)

k¼1

and cð2 K K   X   X wk Ftk ytD, 2 ¼ c2 ¼ wk f tk ðy2 Þdy2 : FtBMA ytD, 2 ¼ c2 ¼ k¼1

k¼1

(44)

1

A point estimate of the weights is obtained by maximizing the posterior distribution of the weights. This is the maximum a posteriori (MAP) solution, which can be found by using an iterative expectation–maximization (EM) algorithm (Cheng et al. 2006; Zivkovic and van der Heijden 2004). The algorithm is as follows.

14

A. Schepen et al.

ðjÞ t, ðjþ1Þ Given weights wk , k ¼ 1, . . . , K at iteration j, first, Ok is calculated for all t and k, where

t, ðjþ1Þ Ok

  8 ðjÞ t t > w f y > k D , 2 k > >   > XK > ð jÞ t > t < w m f m yD, 2 m¼1   ¼ ðjÞ > > wk Ftk ytD, 2 ¼ c2 > > >   XK > > ð jÞ t t : w m Fm yD, 2 ¼ c2 m¼1

ytD, 2 > c2 :

(45)

ytD, 2  c2

Then, new weights are calculated by

wjþ1 k

1 t, ðjþ1Þ þ ða  1Þ=T Ok T : ¼ 1 þ K ða  1Þ=T

(46)

Equations 44 and 45 are iteratively applied until the posterior probability of the weights converges. The EM algorithm is only guaranteed to converge to a local minimum. It is therefore wise to choose reasonable starting values for the weights w(0) k . A conservative choice in this regard is to use equal weights to start the first iteration. An alternative to using the model fitted predictive densities f tk ðy2 jy1 Þ is to replace ðtÞ

them with cross validation predictive densities f k ðy2 jy1 Þ. This is the approach of Shinozaki et al. (2010). The likelihood function in Eq. 41 is subsequently replaced with a cross validation likelihood function (Rust and Schmittlein 1985; Shinozaki et al. 2010; Smyth 1996, 2000; Stone 1977). By using a cross validation likelihood function instead of the classical likelihood function, the weights are assigned according to the model predictive abilities rather than fitting abilities. Indeed, there is much literature in support of using predictive performance measures for model choice and combination based on the idea that a model is only as good as its predictions (e.g., Eklund and Karlsson 2007; Geweke and Whiteman 2006; Jackson et al. 2009). BMA applies to averaging to forecast probability densities (and thus cumulative probabilities) for a given forecast variable value. An alternative is to average variable values (quantiles) for a given cumulative probability (quantile fraction). This is the method of quantile model averaging (QMA; Schepen and Wang 2015). Merging unimodal forecasts such as those from CBaM with QMA yields a unimodal merged forecast. BMA can sometimes produce multimodal forecasts with excessively wide uncertainty bands. QMA may therefore be more practical for some applications; however it is not expanded upon here.

Application to Post-processing of Meteorological Seasonal Forecasting

2.6

15

Connecting Ensemble Members

In CBaM, ensemble forecasts for different time periods and lead times are postprocessed independently. Due to the inherent randomness in sampling the forecasts, the forecasts do not automatically have an appropriate temporal correlation structure. If the forecasts are to be used as time series, for example, in hydrological modeling, it is necessary to establish ensembles with realistic temporal correlation structure. By way of example, if forecasting monthly temperature, a time series constructed by taking equally positioned ensemble members may indiscriminately swing between relatively hot and cold states. This behavior does not seem reasonable, as an ensemble member forecasting hot for one time period would be expected to forecast warm or hot for the next time period (or more) and so on. A variable can have weak or strong correlation from one forecast period to the next. One pragmatic way to achieve the appropriate behavior is to reorder ensemble members by using a procedure known as the Schaake Shuffle (Clark et al. 2004). The Schaake Shuffle is performed as follows: • Obtain historical data for a similar time period to that being forecast (e.g., month). • For each historical event time period t ¼ 1, 2, . . . , T , assign a rank to each corresponding observation by magnitude. • For each forecast ensemble member in an ensemble of size T, also assign a rank by magnitude. • Reorder the forecast ensemble so that the pairs of forecast ensemble members and historical observations have equal ranks. For example, if the 5th year in the historical record has the largest magnitude, the forecast ensemble member with the largest magnitude will be put in 5th position. The Schaake Shuffle is performed separately on ensemble forecasts for different time periods and lead times. However, once it is done, the temporal rankcorrelation structure across forecast lead times will implicitly be established. Ensemble members with the same position can therefore be used as time series forecasts. More detailed descriptions of the Schaake Shuffle can be found elsewhere in this handbook, and tutorial-style examples are given by Clarke et al. (2004). It is noted here, however, that the Schaake Shuffle is also useful in restoring the appropriate spatial correlation structure among ensemble forecasts for different locations. One restriction of the Schaake Shuffle is that the number of ensemble members that can be reordered at once has to be equal to the number of historical observations. In CBaM, the number of forecast ensemble members greatly exceeds the number of historical observations. The Schaake Shuffle is performed in batches by breaking the forecast into chunks of size T and connecting each chunk separately.

16

A. Schepen et al.

3

Application of CBaM for Forecasting Catchment Rainfall

3.1

Overview

CBaM is applied to post-process GCM rainfall forecasts for the Barron river catchment in the northeast of Australia. The predictors are derived from outputs from a coupled GCM with atmospheric horizontal resolution of 250 km. The predictands are derived from 5 km gridded observed rainfall. This application of CBaM involves establishing multiple Bayesian joint probability (BJP) models, each with a single predictor and a single predictand. The forecasts from calibration and bridging models are weighted and merged with BMA. Ensemble members are connected by using the Schaake Shuffle to create time series ensemble forecasts. Finally, the monthly forecasts are aggregated to seasonal (3-month) forecasts.

3.2

The GCM

The GCM adopted for this application is the Predictive Ocean Atmosphere Model for Australia (POAMA) version M2.4 (Wang et al. 2011a). POAMA is a coupled ocean–atmosphere GCM operated by the Australian Bureau of Meteorology specifically for intra- to inter-seasonal climate forecasting. In hindcast mode, M2.4 is initialized on day 1 of each month and run for 9 months. M2.4 is set up to provide

Fig. 1 Map showing the location of the upper Barron catchment in Queensland, Australia. The boundaries of POAMA M2.4 grid cells are also plotted

Application to Post-processing of Meteorological Seasonal Forecasting

17

Fig. 2 Box plots describing the distribution of observed monthly rainfall for the upper Barron catchment. Observed data for 1982–2010 is used. The markers show the overall median of merged (CBaM) forecasts for the same period

Fig. 3 As Fig. 2, but for 3-month accumulation (seasonal) rainfall

daily and monthly forecasts of many atmospheric and oceanic climate variables, including pressure, temperature, and rainfall. The horizontal resolution of the atmospheric grids is coarse at approximately 250 km. There are three variants of the model (simply named M24A, M24B, and M24C), with only minor differences in the model structure. All three variants are initialized with identical, jointly perturbed atmospheric and ocean initial conditions (Marshall et al. 2012). Each variant outputs an 11-member ensemble. In this application, all ensemble members are treated equally and pooled together to form a 33-member ensemble. Monthly data are used in this application.

18

A. Schepen et al.

Table 1 Definitions of climate indices based on sea surface temperature anomalies Index NINO3 NINO3.4 NINO4 EMI (El Niño Modoki)

DMI (Indian Ocean Dipole)

II (Indonesian Index)

3.3

Definition Average over 150 W–90 W, 5 S–5 N Average over 120 W–170 W, 5 S–5 N Average over 160 E–150 W, 5 S–5 N C – 0.5(E + W), where C = average over 165 E–140 W, 10 S–10 N E = average over 110 W–70 W, 15 S–5 N W = average over 125 E–145 E, 10 S–20 N WPI – EPI, where EPI = average over 90 E–110 E, 10 S–0 S WPI = average over 50 E–70 E, 10 S–10 N Average over 120 E–130 E, 10 S–0 S

The Catchment

The upper Barron river catchment is located in a tropical climatic zone. A map displaying the location of the catchment together with the boundary of M2.4 grid cells is shown in Fig. 1. The catchment experiences heavy rainfall during the so-called wet season from October through to April. The monthly and seasonal rainfall totals can be dominated by several rainfall events associated with intense active phases of the monsoon and the passing of tropical cyclones. Throughout the remainder of the year, rainfall is relatively low and infrequent. The distributions of observed monthly and seasonal (3-month accumulation) rainfall are plotted later in the results section (Figs. 2 and 3).

3.4

Predictor Variables

In a calibration model, y1 is the GCM forecast of catchment rainfall. Since rainfall has a lower bound of zero, the censor threshold of zero is imposed and the log-sinh transformation is applied. If a catchment sits within one POAMA grid cell, the calibration model predictor variable is simply taken as the ensemble mean for that grid cell. However, as shown by Fig. 1, the upper Barron river catchment boundary intersects with two or more GCM grid cells. A straightforward way to derive the calibration predictor variable is to take an area-weighted average of the ensemble means of the grid cells. In a bridging model, y1 is the GCM forecast of a different variable. In this application, the bridging variables are derived from sea surface temperature forecasts. More specifically, the bridging variables are six indices that capture fluctuations in tropical Indian and Pacific Ocean sea surface temperatures (SSTs). The indices calculated from the M2.4 ensemble mean SST anomaly forecast. The SST indices are NINO3, NINO3.4, NINO4, El Niño Modoki (Ashok et al. 2007), the Indonesian Index (Verdon and Franks 2005), and the Indian Ocean Dipole (Saji et al. 1999). Since the variables take negative values, the variables are transformed

Application to Post-processing of Meteorological Seasonal Forecasting

19

using Yeo–Johnson transformations. The variables are not censored. In other applications, bridging variables could be derived from, for example, atmospheric pressure, winds, or subsurface sea temperatures. These indices are suitable for application of CBaM to Australia as it has been shown that fluctuations in the El Niño–Southern Oscillation (ENSO) and Indian Ocean SSTs are linked to fluctuations in Australian rainfall (Risbey et al. 2009; Schepen et al. 2012). Refer to Table 1 for locations of the SST regions used in the calculation of the indices.

3.5

Predictand Variable

In a CBaM model, y2 is an observed variable. In this application, the observed variable is average catchment rainfall. Similarly to the rainfall predictor variable, the rainfall predictand variable is transformed using a log-sinh transformation, and a lower bound of zero is imposed. Observed rainfall data is derived from the Australian Water Availability Project (AWAP) 5 km gridded dataset of monthly rainfall (Jones et al. 2009).

3.6

Extending Forecasts to Longer Lead Times

Ensemble rainfall forecasts produced using CBaM are intended for use in hydrological models to produce streamflow forecasts. For water management planning, streamflow forecasts are desirable for up to 12 months in advance. POAMA only produces 9 months of forecasts; however, CBaM can be applied to produce forecasts for a number of months beyond the end of the GCM run, thus enabling longer leadtime streamflow forecasts. CBaM is applied here to produce forecasts for 12 months. The forecasts are referred to as having lead times from 0 to 11 months. The lead time refers to the number of months between the initialization day of the GCM and the first day of the forecast period. For example, a 0-month lead-time forecast for January is made using forecast data from a GCM model run that is initialized on the 1st of January. Forecasts are able to be made out to 12 months by establishing the calibration and bridging models with lagged predictor–predictand relationships. That is, the predictor values from the 9th month of the POAMA run are used to forecast the 10th–12th months.

3.7

Verification

3.7.1 Cross Validation In this application, forecasts for the period 1982–2010 are verified. That is, forecasts are produced for each month in 1982–2010 with 0–11 months lead time. Although very long records (100 years+) of observed rainfall data are available for the upper

20

A. Schepen et al.

Barron catchment, M2.4 hindcasts are first initialized for the year 1981. Separate CBaM models are established for each month and lead time independently. With the short period of available data, it is necessary to establish CBaM models using data for the same events that are being forecast. It is therefore necessary to assess forecasting performance through cross validation to obtain realistic estimates of forecast quality. Cross validation involves hiding predictor and predictand data that is deemed to not be independent of the verifying event in the establishment of CBaM models. This is repeated for each event in the dataset. In this application, leave-3-years-out cross validation is used. For example, to establish the calibration and bridging models for March 1983, all predictor and predictand data for March 1982–1984 is removed. Cross validation predictive densities are used to develop weights for the merging BMA models. Continuing the previous example, for a March 1983 forecast, the cross validation predictive densities for the March 1982–1984 events are omitted in the calculation of the weights. For the ensemble connection step (the Schaake Shuffle), only the data for the year of event to be forecast is left out. To again continue the previous example for a March 1983 forecast, observed data for 1983 is removed for the Schaake Shuffle.

3.7.2 Tools for Assessing Skill and Reliability As a first check of the forecasts, a visual assessment is made of the overall bias of the forecasts. More detailed checks are then undertaken. Forecast accuracy and sharpness are two desirable attributes of probabilistic forecasts. If forecasts are sharp, then there is a strong tendency for forecast probabilities to deviate from climatological probabilities. A score that is commonly applied to jointly assess the accuracy and sharpness of probabilistic forecasts is the continuous ranked probability score (CRPS; Matheson and Winkler 1976). The average CRPS for event time periods t ¼ 1, 2, . . . , T is given by CRPS ¼

T ð  2 1X Ft ðyÞ  H y  ytD dy T t¼1

(47)

where Ft( y) is the forecast cumulative distribution function and H is the Heaviside step function defined as  H y  ytD ¼



0 1

y < ytD : y  ytD

(48)

CRPS skill scores are calculated to measure the relative improvement in the average CRPS of the CBaM forecasts over average CRPS for simple reference forecasts. The formulation for CRPS skill scores therefore follows the formulation for a generalized skill score: CRPSSS ¼

CRPSref  CRPS 100 CRPSref

(49)

Application to Post-processing of Meteorological Seasonal Forecasting

21

A CRPS skill score of 100 is the maximum score and implies forecasts that are both perfectly sharp and accurate. A skill score of 0 implies that model forecasts are on average no better than the reference forecasts. The generalized skill score is negatively unbounded. A negative skill score implies that the model forecasts are on average worse than the reference forecasts. In this application, the reference forecasts are taken as cross validation climatology. Reliability of a set of probabilistic forecasts refers to the statistical consistency of forecast probabilities with observed values. For instance, an outcome with forecast probability of 20 % should be observed in 20 % of cases. Statistical reliability of the probabilistic forecasts is assessed by plotting observed values transformed through probability integral transforms (PITs) in a uniform probability plot. The PIT Ft( y) for a probabilistic forecast for event at time period t when applied to the observed value for the same event gives  π t ¼ Ft ytD :

(50)

If the forecasts are statistically reliable, then the PIT values should theoretically follow a standard uniform distribution. One way to check for the expected uniformity is to pool together all the values of π t for t ¼ 1, 2, . . . , T and plot the ranked values in a uniform probability plot. This PIT uniform probability plot (termed predictive QQ plot by Thyer et al. (2009)) is useful for indicating whether the forecasts are systematically biased and whether the forecast uncertainty is too narrow or too wide (Laio and Tamea 2007). The plots can include the 95 % Kolmogorov significance bands for a more formal test of uniformity. The width of the band is pffiffiffi calculated as 1:358= T (Laio and Tamea 2007). The forecasts are deemed to be statistically reliable if all the PIT values fall within the significance bands. It is noted that the PIT histogram is an alternative method for displaying PIT uniformity but more suited for large samples. An alternative graphical tool for assessing probabilistic reliability is the attributes or reliability diagram for binary expressions of the probabilistic forecasts (Hsu and Murphy 1986). In addition to reliability assessment, attributes diagrams permit visual checking of forecast sharpness and resolution. The attributes diagram is more suited to large sample sizes and is therefore not considered here.

4

Application Results

4.1

Overview

For the upper Barron catchment, we assess the skill and reliability of the CBaM rainfall forecasts for the period 1982–2010. Monthly forecasts are assessed for lead times of 0–11 months, and 3-month accumulation forecasts are assessed for lead times of 0–9 months. The CRPS skill scores of the calibration, bridging, and merged forecasts are compared. As with any post-processing method, the skill that is evident

22

A. Schepen et al.

in the CBaM forecasts stems predominantly from the intrinsic skill in the GCM. CBaM is designed to capitalize on the intrinsic skill in the GCM by correcting biases, adjusting forecast uncertainty, and merging forecasts based separately on the atmospheric and oceanic modules. Although the focus is on monthly rainfall forecasts, reliability of the 3-month accumulation forecasts is also a critical concern. Failure to properly connect the ensemble members using the Schaake Shuffle can theoretically lead to accumulated forecasts that are too narrow in ensemble spread and therefore not reliable. This is unlikely to be a major issue with precipitation forecasts as monthly rainfall exhibits low autocorrelation.

4.2

Overall Bias

One of the aims of CBaM is to produce forecasts that are overall unbiased relative to observed catchment rainfall. It is important that the bias be minimized so that the ensemble forecasts are suitable as inputs to hydrological models. To check that the CBaM forecasts do not suffer from overall biases, the overall medians of the CBaM forecasts are plotted against the distributions of observed data for each month and season. The plot for monthly rainfall is Fig. 2 and the plot for 3-month accumulation rainfall is Fig. 3. The distribution of observed catchment rainfall is summarized by a box plot. The box spans the interquartile range (IQR) with a line at the median. The whiskers extend to the most extreme point within 1.5 IQR, and the plus symbols represent the remaining data points. It is clear from Figs. 2 and 3 that the observed rainfall data follows a skewed distribution. The diamond markers are the overall medians of the 1-month lead-time merged forecasts considering all forecasts for that period in 1982–2010. Similarly, the square markers are the overall medians of the 9-month lead-time merged forecasts. The medians of the merged forecasts show little deviation from the observed medians, confirming that CBaM generally produces monthly forecasts that are overall unbiased, and 3-month accumulation forecasts are also overall unbiased. The monthly merged forecasts do show a slight positive bias in the median for February and March, the months with the most intense rainfall.

4.3

Skill of Monthly Forecasts

The CRPS skill scores for each month and lead time are shown in Fig. 4. There are 12 bars plotted for each month corresponding to lead times of 0–11 months. The top panel shows skill scores for the calibration forecasts, the middle panel shows skill scores for the bridging forecasts, and the bottom panel shows skill scores for the merged forecasts. It is evident that both calibration and bridging forecasts for the upper Barron catchment are skillful for some months and lead times. However, it is also evident that the forecasts are not skillful for many other months and lead times. The monthly calibration forecasts are mostly skillful for November for lead times of

Application to Post-processing of Meteorological Seasonal Forecasting

23

Fig. 4 Comparison of CRPS skill scores for monthly calibration, bridging, and merged forecasts. Forecasts are for the period 1982–2010 and based on leave-3-years-out cross validation. For each month, there is one bar for each forecast lead time of 0–11 months, in order

up to 8 months. The monthly bridging forecasts exhibit similar patterns of skill to the monthly calibration forecasts, except skill scores for October and March are higher for the bridging forecasts. In this application, the bridging forecast skill improves upon the calibration forecast skill, and it is evident that merging the calibration and bridging forecasts takes advantage of the skill of both calibration and bridging forecasts. Because the skill scores for monthly rainfall forecasting are typically low, it is difficult to discern real positive skill scores from random positive skill scores related to sampling variability. If in a particular application of CBaM it is deemed necessary to quantity the significance of skill scores, then a pragmatic bootstrap procedure could be used to determine a skill score threshold above which positive skill is judged to be discernible. Such a procedure is described by Schepen and Wang (2014). In their study, which used the root mean square error in probability (RMSEP) score, a skill score of 5 was established as bare minimum threshold above which skill becomes increasingly discernible and significant (i.e., skill scores less than 5 were to be disregarded). Higher skill scores and/or a pattern of skill across multiple target months and/or lead times are seen to increase the credibility of positive skill score results. In this application the aim is simply to extract as much skill as possible from the forecasts through the use of calibration, bridging, and merging.

24

A. Schepen et al.

Fig. 5 As Fig. 4, but for 3-month accumulation forecasts

4.4

Skill of Three-Month Accumulation Forecasts

The ensemble connection step of CBaM connects ensemble members of forecasts for different lead times to form time series forecasts that have a realistic correlation structure across lead times. This allows aggregation of rainfall ensemble members to obtain accumulation forecasts. Any number of months can be accumulated. The most common number of months to accumulate is three, corresponding to seasonal forecasts. Hence 3-month accumulations are the focus here. The CRPS skill scores for each 3-month accumulation period and lead time are shown in Fig. 5. The top panel shows skill scores for the calibration forecasts, the middle panel shows skill scores for the bridging forecasts, and the bottom panel shows skill scores for the merged forecasts. It is evident that the 3-month accumulation forecasts are skillful for more periods and lead times than the monthly forecasts. The calibration forecasts are predominantly skillful from SON to NDJ. The bridging forecasts show a similar pattern of skill across lead times; however the skill scores are generally higher than those for the calibration forecasts. Skill is evident for the bridging forecasts from SON to FMA. Although skill is seen to mostly decrease with lead time, the bridging forecasts are evidently skillful in some cases for very long lead times. For example, the bridging forecasts for OND are skillful for all 12 lead times. Similar to the monthly forecasts, the merged 3-month accumulation forecasts take advantage of the skill in both the calibration and bridging forecasts.

Application to Post-processing of Meteorological Seasonal Forecasting

25

Fig. 6 PIT uniform probability plots for the monthly merged forecasts for the period 1982–2010. Left panel is for forecasts at a lead time of 1 month, and right panel is for forecasts at a lead time of 9 months

4.5

Statistical Reliability

The statistical reliability of the merged forecasts is visually assessed using PIT uniform probability plots. If the forecasts are found to be statistically reliable, it means that the forecast ensembles typically have appropriate spread and therefore correctly characterize forecast uncertainty. PIT uniform probability plots are presented for monthly and 3-month accumulation forecasts, and for lead times of 1 month and 9 months (Figs. 6 and 7). There are 12 panels in each figure, each corresponding to a month or 3-month period. For each month and 3-month period, and for both lead times, every PIT value falls within the 95 % Kolmogorov significance bands. It can be concluded that the merged forecasts are statistically reliable. That the 3-month accumulation forecasts are reliable implies that when the ensemble members are taken as time series, the ensembles have an appropriate temporal correlation structure. It is noted that the PIT uniform probability plot will not reveal some types of systematic problems with the forecasts. For example, Wang et al. (2009) also plot the PIT values against the event magnitude as well as the forecast year. If such plots reveal patterns or trends in the PIT values, then the forecasts could be conditionally biased.

26

A. Schepen et al.

Fig. 7 As Fig. 6 but for 3-month accumulation forecasts

4.6

Discussion and Conclusion

CBaM is designed to harness the skill in the raw CGCM forecasts and produce reliable ensemble time series of monthly meteorological variables. The application demonstrates that CBaM produces useful ensembles of rainfall that can be used to force a hydrological model calibrated against observed data. CBaM forecasts exhibit low overall bias and extract skill from both the atmospheric and oceanic modules of the GCM. The variation in skill between catchments and months and seasons is related to a combination of the natural predictability of regional and remote influences on the catchment climate and the quality of the raw CGCM forecasts. In some months or 3-month periods, it is clear that the raw CGCM forecasts cannot provide particularly skillful forecasts of inter-annual rainfall variability. The skill of monthly rainfall forecasts is generally much lower than the skill of 3-month accumulation forecasts. Monthly rainfall is more influenced by climate features that occur on shorter time scales and that are inherently less predictable. For example, monthly rainfall in the upper Barron during the skillful period (November–March) is likely to be more strongly influenced by phases of the monsoon than is the 3-month accumulated rainfall. The 3-month accumulated rainfall in the upper Barron is also likely to be influenced by the phases of the monsoon, but is additionally likely to be influenced by ENSO, which is inherently more predictable.

Application to Post-processing of Meteorological Seasonal Forecasting

27

The application here did not consider a catchment with zero rainfall. Monthly or seasonal rainfall totals of zero are not common. The CBaM methodology is applicable at shorter time scales (e.g., weeks), and in such applications the requirement to properly handle zero values would become more evident. In the merging step of CBaM, it is not assumed that the predictors provide independent information. Some approaches (e.g., Luo et al. 2007) assume independent information in the predictors. As a result, merging identical forecasts reduces the forecast uncertainty. In contrast, if identical forecasts are merged using Bayesian model averaging in CBaM, the forecast uncertainty will remain unchanged. Explicit model selection is not a part of CBaM. The weight given to each model is determined in the merging step. For streamflow forecasting, the dominant factors in determining forecast skill are the initial catchment conditions and accurate representation of the soil moisture states in the model. Even if the CBaM forecasts are climatological in nature, the incorporation of forecast uncertainty through MCMC sampling may mean the ensemble members are more suitable for forcing the hydrological model than are historical observations. The results show there is a strong need to improve the skill of monthly and 3-monthly accumulated rainfall forecasts. Such improvements are seen as coming through advances in GCM modeling rather than in statistical post-processing techniques. It is noted that other GCMs may produce different results, although the broad patterns of predictability are expected to be similar given that it is large-scale climate features that give rise to monthly and 3-monthly forecasting skill.

References K. Ashok, S. Behera, S. Rao, H. Weng, T. Yamagata, El Niño Modoki and its possible teleconnection. J. Geophys. Res. 112, C11007 (2007) J. Barnard, R. McCulloch, X.-L. Meng, Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat. Sin. 10, 1281–1312 (2000) J. Cheng, J. Yang, Y. Zhou, Y. Cui, Flexible background mixture models for foreground segmentation. Image Vis. Comput. 24, 473–482 (2006) M. Clark, S. Gangopadhyay, L. Hay, B. Rajagopalan, R. Wilby, The Schaake shuffle: a method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeorol. 5, 243–262 (2004) J. Eklund, S. Karlsson, Forecast combination and model averaging using predictive measures. Econ. Rev. 26, 329–363 (2007) A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian data analysis (Chapman and Hall, New York, 1995). 526 pp J. Geweke, C. Whiteman, Chapter 1, Bayesian forecasting, in Handbook of Economic Forecasting, ed. by G. Elliott, C.W.J. Granger, A. Timmermann, vol. 1 (Elsevier B.V., Amsterdam, 2006), pp. 3–80. ISSN 1574-0706, ISBN 9780444513953, http://dx.doi.org/10.1016/S1574-0706(05) 01001-3 S. Hawthorne, Q.J. Wang, A. Schepen, D. Robertson, Effective use of general circulation model outputs for forecasting monthly rainfalls to long lead times. Water Resour. Res. 49(9), 5427–5436 (2013)

28

A. Schepen et al.

W.-R. Hsu, A.H. Murphy, The attributes diagram. A geometrical framework for assessing the quality of probability forecasts. Int. J. Forecast. 2, 285–293 (1986) C.H. Jackson, S.G. Thompson, L.D. Sharples, Accounting for uncertainty in health economic decision models by using model averaging. J. R. Stat. Soc. A. Stat. Soc. 172, 383–404 (2009) D.A. Jones, W. Wang, R. Fawcett, High-quality spatial climate data-sets for Australia. Aust. Meteorol. Oceanogr. J. 58, 233–248 (2009) F. Laio, S. Tamea, Verification tools for probabilistic forecasts of continuous hydrological variables. Hydrol. Earth Syst. Sci. 11, 1267–1277 (2007) E.-P. Lim, H.H. Hendon, D.L.T. Anderson, A. Charles, O. Alves, Dynamical, statistical-dynamical, and multimodel ensemble forecasts of Australian spring season rainfall. Mon. Weather Rev. 139, 958–975 (2011) L. Luo, E.F. Wood, M. Pan, Bayesian merging of multiple climate model forecasts for seasonal hydrological predictions. J. Geophys. Res. 112, D10102 (2007) A.G. Marshall, D. Hudson, M.C. Wheeler, H.H. Hendon, O. Alves, Evaluating key drivers of Australian intra-seasonal climate variability in POAMA-2: a progress report. CAWCR Res. Lett. 7, 10–16 (2012) J.E. Matheson, R.L. Winkler, Scoring rules for continuous probability distributions. Manag. Sci. 22, 1087–1096 (1976) J.S. Risbey, M.J. Pook, P.C. McIntosh, M.C. Wheeler, H.H. Hendon, On the remote drivers of rainfall variability in Australia. Mon. Weather Rev. 137, 3233–3253 (2009) D. Robertson, D. Shrestha, Q. Wang, Post processing rainfall forecasts from numerical weather prediction models for short term streamflow forecasting. Hydrol. Earth Syst. Sci. Discuss. 10, 6765–6806 (2013) R.T. Rust, D.C. Schmittlein, A Bayesian cross-validated likelihood method for comparing alternative specifications of quantitative models. Market. Sci. 4(1), 20–40 (1985) N.H. Saji, B.N. Goswami, P.N. Vinayachandran, T. Yamagata, A dipole mode in the tropical Indian Ocean. Nature 401, 360–363 (1999) A. Schepen, Q.J. Wang, Ensemble forecasts of monthly catchment rainfall out to long lead times by post-processing coupled general circulation model output. J. Hydrol. 519, 2920–2931 (2014) A. Schepen, Q.J. Wang, Model averaging methods to merge operational statistical and dynamic seasonal streamflow forecasts in Australia. Water Resour. Res. 51(3), 1797–1812 (2015) A. Schepen, Q.J. Wang, D. Robertson, Evidence for using lagged climate indices to forecast Australian seasonal rainfall. J. Climate 25, 1230–1246 (2012) Q. Shao, M. Li, An improved statistical analogue downscaling procedure for seasonal precipitation forecast. Stoch. Env. Res. Risk Assess. 27(4), 819–830 (2013) T. Shinozaki, S. Furui, T. Kawahara, Gaussian mixture optimization based on efficient crossvalidation. IEEE J. Sel. Top. Sign. Process 4, 540–547 (2010) P. Smyth, Clustering using Monte Carlo cross-validation. KDD, 126–133 (1996). http://www.aaai. org/Papers/KDD/1996/KDD96-021.pdf P. Smyth, Model selection for probabilistic clustering using cross-validated likelihood. Stat. Comput. 10(1), 63–72 (2000) M. Stone, An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J. R. Stat. Soc. Ser. B Methodol. 44–47 (1977) M. Thyer, G. Kuczera, Q.J. Wang, Quantifying parameter uncertainty in stochastic models using the Box-Cox transformation. J. Hydrol. 265, 246–257 (2002) M. Thyer, B. Renard, D. Kavetski, G. Kuczera, S.W. Franks, S. Srikanthan, Critical evaluation of parameter consistency and predictive uncertainty in hydrological modeling: a case study using Bayesian total error analysis. Water Resour. Res. 45, W00B14 (2009) B. Timbal, D. Jones, Future projections of winter rainfall in southeast Australia using a statistical downscaling technique. Clim. Change 86, 165–187 (2008) N.K. Tuteja, D. Shin, R. Laugesen, U. Khan, Q. Shao, E. Wang, M. Li et al., Experimental evaluation of the dynamic seasonal streamflow forecasting approach (2012). Report published

Application to Post-processing of Meteorological Seasonal Forecasting

29

by the Bureau of Meteorology, Melbourne. Available online http://www.bom.gov.au/water/ about/publications/document/dynamic_seasonal_streamflow_forecasting.pdf D.C. Verdon, S.W. Franks, Indian Ocean sea surface temperature variability and winter rainfall: Eastern Australia. Water Resour. Res. 41 W09413 (2005). doi:10.1029/2004WR003845. Q.J. Wang, D.E. Robertson, Multisite probabilistic forecasting of seasonal flows for streams with zero value occurrences. Water Resour. Res. 47, W02546 (2011) Q.J. Wang, D.E. Robertson, F.H.S. Chiew, A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites. Water Resour. Res. 45 W05407 (2009), doi:10.1029/2008WR007355 G. Wang et al., POAMA-2 SST skill assessment and beyond. CAWCR Res. Lett. 6, 40–46 (2011a) Q. Wang, T. Pagano, S. Zhou, H. Hapuarachchi, L. Zhang, D. Robertson, Monthly versus daily water balance models in simulating monthly runoff. J. Hydrol. 404, 166–175 (2011b) Q. Wang, D. Shrestha, D. Robertson, P. Pokhrel, A log-sinh transformation for data normalization and variance stabilization. Water Resour. Res. 48, W05514 (2012a) Q.J. Wang, A. Schepen, D.E. Robertson, Merging seasonal rainfall forecasts from multiple statistical models through Bayesian model averaging. J. Climate 25, 5524–5537 (2012b) I.K. Yeo, R.A. Johnson, A new family of power transformations to improve normality or symmetry. Biometrika 87, 954–959 (2000) D. Zhu, A.O. Hero, Bayesian hierarchical model for large-scale covariance matrix estimation. J. Comput. Biol. 14, 1311–1326 (2007) Z. Zivkovic, F. van der Heijden, Recursive unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26, 651–656 (2004)

Multi-model Combination and Seamless Prediction Stephan Hemri

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Raw and Post-processed Ensemble Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Raw Ensemble Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Post-processing of Multi-model Ensemble Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Seamless Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Generation of Seamless Hydrologic Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 How to Account for Spatiotemporal Dependence Structures? . . . . . . . . . . . . . . . . . . . . . . . . 4 Probabilistic Single-Model Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Ensemble Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Bayesian Forecasting System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Univariate Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Multivariate Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 4 6 10 10 11 14 15 16 17 17 19 19 20

Abstract

(Hydro-) Meteorological predictions are inherently uncertain. Forecasters are trying to estimate and to ultimately also reduce predictive uncertainty. Atmospheric ensemble prediction systems (EPS) provide forecast ensembles that give a first idea of forecast uncertainty. Combining EPS forecasts, issued by different weather services, to multi-model ensembles gives an even better understanding of forecast uncertainty. This article reviews state of the art statistical post-processing methods that allow for sound combinations of multi-model ensemble forecasts. The aim of statistical post-processing is to maximize the sharpness of the predictive distribution subject to calibration. This article focuses on the well-established S. Hemri (*) Department of Computational Statistics, HITS gGmbH, Heidelberg, Germany e-mail: [email protected] # Springer-Verlag Berlin Heidelberg 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_19-1

1

2

S. Hemri

parametric approaches: Bayesian model averaging (BMA) and ensemble model output statistics (EMOS). Both are readily available and can be used for straightforward implementation of methods for multi-model ensemble combination. Furthermore, methods to ensure seamless predictions in the context of statistical post-processing are summarized. These methods are mainly based on different types of copula approaches. Since skill of (statistically post-processed) ensemble forecasts is generally assessed using particular verification methods, an overview over such methods to verify probabilistic forecasts is provided as well. Keywords

Bayesian forecasting system (BFS) • Bayesian model averaging • Bayesian model averaging (BMA) • Box-Cox transformation • Brier score (BS) • Consortium for Small-Scale Modeling (COSMO) ensemble • Development of the European Multi-model Ensemble system for seasonal-to-interannual prediction (DEMETER) • DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm • Ensemble Kalman filter (EnKF) • Ensemble model output statistics (EMOS) • Ensemble model output statistics (EMOS) method • Ensemble prediction systems (EPS) • European Centre for Medium-Range Weather Forecasts (ECMWF) • European Flood Alert System (EFAS) • Multivariate verification • Numerical weather prediction (NWP) model • Schaake shuffle • Seamless prediction methods • Spatiotemporal dependence structures • U.S. National Centers for Environmental Prediction (NCEP) • Univariate verification • Verification

1

Introduction

Even though (hydro-) meteorological forecasts are inherently uncertain, deterministic forecasts, which completely neglect uncertainty, have been the state of the art over many decades. However, it is crucial to assess predictive uncertainty, i.e., the uncertainty conditional on the available information set and the forecaster’s expertise (Krzysztofowicz 1999; Todini 2008). Accordingly, over the last 20 years the paradigm in weather forecasting has shifted to probabilistic forecasting (see e.g., Palmer 2000; Hamill et al. 2000). This led to the development of ensemble prediction systems (EPS) in the early 1990s. An EPS consists of multiple runs of the same numerical weather prediction (NWP) model with different initial conditions and/or model variants. An EPS forecast provides an estimate of the predictive distribution of atmospheric variables like air pressure, temperature, precipitation, or wind speed. Such an ensemble forecast can ideally be interpreted as a random sample from this distribution. Based on the availability of meteorological EPSs, hydrologists started to develop hydrologic ensemble forecasts (Cloke and Pappenberger 2009). Hydrologic ensemble forecasts are often driven by input from several atmospheric models, issued by different weather centers, of which each is either deterministic or probabilistic (Thielen et al. 2009; Bartholmes et al. 2009).

Multi-model Combination and Seamless Prediction

3

EPS forecasts are affected by systematic biases and dispersion errors (i.e., wrong forecast spread). Hence, they typically benefit from (a) the combination of predictions from several independent weather centers as in the poor person’s approach (Arribas et al. 2005; Atger 1999; Ebert 2001; Ziehmann 2000) and (b) from statistical post-processing (Glahn and Lowry 1972; Gneiting et al. 2005). The former approach explicitly accounts for model uncertainty, while the latter infers predictive uncertainty using statistical methods. The goal of statistical post-processing is to obtain well calibrated and yet sharp forecasts (Gneiting et al. 2007). Calibration is a joint property of the forecasts and the observations. In a nutshell, the observations look like random samples from the predictive distributions in case of a wellcalibrated forecast. Sharpness refers to the spread of the forecasts. Subject to calibration, minimization of forecast spread, i.e., maximizing sharpness, maximizes forecast skill. This chapter discusses methods for (hydro-) meteorological multi-model combination with a particular focus on statistical post-processing and seamless prediction. For a broader overview of the hydrological challenges in meteorological postprocessing refer to the chapter on “▶ Hydrological Challenges in Meteorological Post-processing” by F. Wetterhall in this handbook. A more detailed discussion on the general aspects of meteorological post-processing and forecast verification is given in the chapter on “▶ Methods for Meteorological Post-processing” by T. Thorarinsdottir. The first goal of the chapter at hand is to summarize methods for multi-model combination. These methods comprise primarily poor person’s approaches and statistical post-processing methods based on parametric distributions like ensemble model output statistics (EMOS, Gneiting et al. 2005) and Bayesian model averaging (BMA, Raftery et al. 2005). As stated above, poor person’s approaches refer to the combination of NWP forecasts from several operational centers in order to obtain an ensemble forecast. Since multi-model combination aims at quantifying the uncertainty of the forecasts, alternative reference methods for uncertainty quantification in hydrology like the ensemble Kalman filter (EnKF, Evensen 1994) and the Bayesian forecasting system (BFS) are summarized as well. The second goal of this chapter is to summarize statistical methods that ensure seamless predictions. Given that the different forecast models cover the same range of lead times, discontinuities in the marginal predictive distributions can be reduced by parameter smoothing. The more challenging task is to ensure that correlations among different lead times are well represented by the forecasts. Several different methods have been proposed for this purpose. Most of them are related to the concept of copulas (Nelsen 2006; Sklar 1959). The most prominent method to incorporate correlation structures into (hydro-) meteorological forecasts is the Schaake shuffle (Clark et al. 2004). The remainder of this chapter is organized as follows. In Sect. 2 a summary on uncertainty quantification by ensemble forecasting is given. Within that context, both the combination of multi-model raw ensembles and post-processing of ensemble forecasts are discussed. Here, a raw ensemble denotes the forecast ensemble prior to any statistical post-processing. Section 3 focuses on seamless prediction with main

4

S. Hemri

focus on spatiotemporal dependence structures. In Sect. 4 alternative uncertainty quantification methods based on deterministic forecasts are discussed. As probabilistic forecasting is inherently connected to probabilistic verification, some verification methods are summarized in Sect. 5. This is followed by a summary in Sect. 6.

2

Raw and Post-processed Ensemble Forecasts

2.1

Raw Ensemble Forecasts

2.1.1 Early Techniques In the 1990s, EPS methods like the ones implemented by the US National Centers for Environmental Prediction (NCEP) and the European Centre for Medium-Range Weather Forecasts (ECMWF) took only the uncertainty about initial states into consideration, while neglecting model uncertainty (Ziehmann 2000). However, the poor person’s ensemble implicitly takes model uncertainty into account. It consists of a set of deterministic NWP forecasts from different forecast centers. In this virtually cost-free approach, the ensemble is constructed from independent operational deterministic forecasts. As there are only a few operational centers that run global atmospheric models, the ensemble size of poor person’s ensembles constructed from deterministic NWP forecasts is usually limited to a few members. The combination of EPS forecasts from independent operation centers, which leads to very large ensemble sizes, will be discussed in Sect. 2.1.2. Poor person’s ensemble approaches proved to be competitive compared to the NCEP and ECMWF EPS ensemble forecasts (Arribas et al. 2005; Atger 1999; Ebert 2001; Ziehmann 2000). While poor person’s ensembles often lack a correct representation of spread, they perform particularly well in terms of resolution. Resolution is a measure of the ability to issue case-dependent forecasts that differ from the climatological forecasts. For details on resolution refer to Hersbach (2000). The multi-model superensemble (Krishnamurti et al. 1999, 2000) is closely related to the poor person’s ensemble. In a nutshell, the superensemble technique fits a multiple regression model with the members of a poor person’s ensemble as predictors and the observations as dependent variable. Since this is a statistical approach, parameters have to be estimated over a training period. The coefficients of the model, which can also be understood as weights assigned to the ensemble members, are estimated for each location (geographical and vertical) and each variable separately. In Krishnamurti et al. (1999, 2000) predictions from such a model proved to outperform any of the poor person’s ensemble members in terms of root mean squared error. They performed even better than the ensemble mean and the mean of an ensemble consisting of individually bias-corrected ensemble members. The superensemble method is a deterministic approach in that it uses the output from an ensemble forecast, predominantly from a poor person’s ensemble, to generate a deterministic forecast with increased skill.

Multi-model Combination and Seamless Prediction

5

2.1.2 More Recent Developments As already stated above, the most commonly used technique to account for uncertainty in meteorological forecasting is ensemble prediction. Such ensembles of parallel deterministic forecast runs may ideally be understood as probabilistic forecasts through their empirical cumulative distribution function. The combination of different model runs that have been generated by different uncertainty algorithms (i.e., perturbations in initial states and/or model parameters) typically accounts for the corresponding sources of uncertainty. Additionally, ensemble forecasts issued by different weather services are combined to multi-model ensembles to account for model formulation uncertainty. This is a further development of the poor person’s ensemble approach. In other words, a multi-model raw ensemble is a physically based approach to quantify multifaceted sources of uncertainty. Hereby, the uncertainties in initial conditions, parameterizations, model structure, and data assimilation methods of the different meteorological ensembles are considered as well. The most prominent example of such a multi-model ensemble is the THORPEX Interactive Grand Global Ensemble (TIGGE) project ensemble (Bougeault et al. 2010; Park et al. 2008; Richardson et al. 2005). It currently comprises the EPS forecasts from ten different operational centers. The TIGGE ensemble exhibits high forecast skill. Hagedorn et al. (2012) analyzed its performance for 850 hPa temperature and 2-m temperature. A reduced TIGGE ensemble consisting only of the four best EPSs (provided by Canada, the US, the UK, and ECMWF) showed an improved performance on the global domain compared to the best single-model EPS, the ECMWF EPS. In case of precipitation, the multi-model ensemble consisting of the four best TIGGE EPSs performed even better compared to the reforecast calibrated ECMWF EPS (Hamill 2012). Furthermore, the results indicated that statistical post-processing of the multi-model ensemble did not provide as much improvement as postprocessing of the ECMWF EPS did. From this, Hamill (2012) concluded that “all operational centers, even ECMWF, would benefit from the open, real-time sharing of precipitation data and the use of reforecasts.” Multi-model approaches are not restricted to the global domain. For instance, the Grand Limited Area Model Ensemble Prediction System (GLAMEPS) combines four regional EPS forecasts over the European domain (Iversen et al. 2011). Turning now to river runoff, Cloke and Pappenberger (2009) provide a review on ensemble flood forecasting. For a forecast horizon up to 2–3 days, most forecast uncertainty is related to uncertainty in the meteorological inputs. This source of uncertainty can be addressed in a straightforward manner by using a meteorological ensemble forecast as input to the hydrologic model. This approach can be improved by using a multi-model ensemble of meteorological EPS forecasts from different forecast centers. However, (multi-model) meteorological raw ensembles are prone to systematic biases and spread errors (Park et al. 2008). Furthermore, catchment related hydrologic sources of uncertainty affect river runoff forecasts as well. Palmer et al. (2004) summarize the development of the European Multi-model Ensemble system for seasonal-to-interannual prediction (DEMETER). As part of the DEMETER project, a multi-model ensemble seasonal forecasting system based on seven global atmospheric models has been tested. The results indicate that such a multi-

6

S. Hemri

model combination approach leads to more reliable seasonal-to-interannual predictions. Detailed analyses on the performance of the DEMETER multi-model ensemble can be found in Hagedorn et al. (2005), Doblas-Reyes et al. (2005), and Weinheimer et al. (2005). Such multi-model raw ensemble forecasts are used operationally by different flood warning services. The European Flood Alert System (EFAS) incorporates two deterministic models, i.e., the high-resolution run of the ECMWF and the deterministic model of the German weather service (DWD), and two ensemble models, i.e., the 51 member ECMWF ensemble and the 16 member Consortium for Small-Scale Modeling (COSMO) ensemble (Bartholmes et al. 2009; Thielen et al. 2009; European Flood Awareness System 2014). On a regional scale, there are a lot of similar flood alert systems. For instance, an operational hydrologic ensemble prediction system based on the COSMO ensemble and the COSMO-7 deterministic model by MeteoSwiss is run routinely for river Sihl (Addor et al. 2011). The first study that quantifies the uncertainty in hydrologic model structure by multi-model combination has been performed by Georgakakos et al. (2004). They constructed a multi-model ensemble consisting of ensemble members stemming from both calibrated and uncalibrated deterministic hydrologic models. That ensemble performed quite well in terms of reliability and its mean performed better than the best single model in terms of quadratic error, which is in line with the results from the studies on the poor person’s ensemble. Zappa et al. (2011) present a framework to investigate the relative contributions of meteorological inputs, initial conditions, and hydrologic model parameter estimates to the total predictive uncertainty. To this end, a large multi-model ensemble is constructed. It consists of any permutation of the meteorological raw ensemble members, of the weather radar precipitation field ensemble members, which account for uncertainty in initial conditions, and of an ensemble of equifinal parameter sets. The above mentioned multi-model approaches constitute a major improvement in quantifying predictive uncertainty of hydrologic forecasts. Nevertheless, forecasts based on raw ensembles may still tend to be biased and exhibit spread errors (mostly underdispersion). These problems demand for more detailed assessments of reliability and, if indicated, statistical post-processing. Studies addressing reliability benefit strongly from the availability of reforecasts and adequate statistical post-processing methods (Schaake et al. 2010; Thielen et al. 2008). Such post-processing methods, which are applicable to ensemble forecasts, are discussed next.

2.2

Post-processing of Multi-model Ensemble Forecasts

State of the art techniques for univariate, i.e., each lead-time and each location is considered independently, probabilistic multi-model combination and simultaneous statistical post-processing include Bayesian model averaging (BMA) developed by Raftery et al. (2005), and the ensemble model output statistics (EMOS) method introduced by Gneiting et al. (2005). An illustrative example of both methods is given in Fig. 1. Alongside a general introduction to BMA and EMOS, some

Multi-model Combination and Seamless Prediction

7

Fig. 1 Examples of BMA (upper panels) and EMOS (lower panels) predictive probability density functions (PDFs). The panels on the left show the PDFs in the Box-Cox transformed space, those in the panels on the right are in the original space. FC1 to FC4 refer to different forecast models, of which FC1 is an ensemble of size 16 and the others are deterministic. The horizontally aligned dots show the values of the raw ensemble members

specialties of hydrologic applications are discussed in the following. Note that the description of univariate post-processing methods closely follows Gneiting (2014).

2.2.1 Bayesian Model Averaging For a variable of interest, y, the BMA approach employs a mixture distribution of the general form y j x1 , . . . , xM 

M X wm gðy j xm Þ;

(1)

m¼1

where y j x1 , . . . , xM denotes the distribution of the variable y conditional on the ensemble forecast members x1, . . ., xM. Here, gðy j xm Þ is a parametric density function that depends on the specific ensemble member forecast xm in suitable ways, and the mixture weights w1, . . ., wM are nonnegative and sum to 1. The mixture weights w1, . . ., wM reflect the corresponding members’ relative contributions to predictive skill over the training period. As river runoff generally follows a highly skewed distribution, it is recommended to either transform the variables such that they are approximately normal or to model the distributions gðÞ using more appropriate parametric (or nonparametric) density

8

S. Hemri

functions. The former approach (see also Duan et al. 2007) may be employed using the Box-Cox transformation (Box and Cox 1964) 8 < xλ  1 if z¼ : λ logðxÞ if

λ 6¼ 0, λ ¼ 0;

(2)

where z is the Box-Cox transformed value of the original value x > 0 and λ is the parameter of the Box-Cox transformation. This approach works well in practice, though it does not ensure normality. Alternatively, the quantile-quantile transformation z ¼ Φ1 ðFðxÞÞ;

(3)

can be used to ensure normality (see e.g., Bogner et al. 2012). F(x) is the value of an empirical cumulative distribution function (CDF) F at the point x and Φ1 is the inverse CDF of the standard normal distribution. Assuming that gðy j xm Þ is a normal density, where the mean is a bias-corrected affine function of xm and the variance is fixed, is reasonable in the case of temperature and pressure (see e.g., Gneiting et al. 2005). For Box-Cox transformed runoff it is reasonable as well (Duan et al. 2007). For normally distributed variables, Raftery et al. (2005) propose the BMA specification y j x1 , . . . , xM 

M X   wm N a0m þ a1m xm , σ 2m ;

(4)

m¼1

so that the kernel densities gðy j xm Þ are Gaussian with mean a0m þ a1m xm and variance σ 2m. The BMA weights w1, . . ., wM, the bias parameters a01, . . ., a0M and a11 , . . . , a1M  0; and the variance parameters σ 21, . . ., σ 2M are fitted on training data in ways described by Raftery et al. (2005). This mainly involves maximumlikelihood estimation. Note that functions for the estimation of BMA models with Gaussian- and Gamma-distributed kernel distributions are readily available in the R package ensembleBMA (Fraley et al. 2015). As stated in Sect. 2.1.2, probabilistic river runoff forecasts are usually based on multi-model atmospheric input ensembles. The members of a particular meteorological center’s ensemble are in general exchangeable, that is, they are physically indistinguishable. Hence, the corresponding hydrologic ensemble members are exchangeable as well. Fraley et al. (2010) discuss the adaptation of the basic Gaussian BMA specification (Eq. 4) to ensembles with exchangeable members. Their model is given by y j x11 , . . . x1K1 , . . . , xM1 , . . . xMKM 

M X m¼1

wm

Km X   N a0m þ a1m xmk , σ 2m ; k¼1

(5)

Multi-model Combination and Seamless Prediction

9

where the members xm1 , . . . , xmK m are the exchangeable members of model m. Km is the ensemble size of model m and equals one for a deterministic model, while, for instance, for the 50 member ECMWF ensemble, it would equal 50. In cases with only deterministic models, the model sizes are K m ¼ 1 for all m ¼ 1, . . . , M, and, hence, Km is omitted. More recently, flexible BMA approaches that do not depend on parametric kernel distributions have been proposed. Parrish et al. (2012) introduced a BMA algorithm that is based on a particle filter approach. The kernel distributions are obtained by a sequential Monte Carlo simulation method. This allows for multimodal kernels, which may reflect the true uncertainty better than any parametric distribution. Rings et al. (2012) provide an alternative flexible BMA method that is also based on a Monte Carlo method, in which the kernel densities are estimated using the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al. 2008, 2009).

2.2.2 Ensemble Model Output Statistics Gneiting et al. (2005) introduced ensemble model output statistics (EMOS), also named as nonhomogeneous regression (NR), which models the predictive distribution as a single parametric distribution of the general form: y j x1 , . . . , xM  gðy j x1 , . . . , xM Þ;

(6)

where the variables are defined as above for BMA. Assuming a Gaussian density to be appropriate for the variable to be forecast, the EMOS predictive distribution is   y j x1 , . . . , xM  N μ, σ 2 ;

(7)

where μ ¼ a0 þ a1 x1 þ    þ aM xM and σ 2 ¼ b0 þ b1 s2 , with M M 1X 1X s ¼ xm  xm M m¼1 M m¼1

!2

2

(8)

denoting the variance of the raw ensemble. In case of exchangeable members, the mean parameter μ is given by μ ¼ a0 þ a1 x1 þ    þ aM xM , where x1 , . . . , xM are the means of each set of exchangeable ensemble members (i.e., the members stemming from the Km X same EPS) with xm ¼ K1m xmk . The ensemble variance is then k¼1

1 s2 ¼ XM

Km M X X ðxmk  xm Þ:

K m¼1 k¼1 m¼1 m

(9)

10

S. Hemri

The coefficients a0, a1, . . ., aM, b0, b1 are estimated by numerical optimization over a training period. That is, a scoring function, which depends on the model coefficients and the observations, is minimized. Usually, the continuous ranked probability score (CRPS, Matheson and Winkler 1976; Hersbach 2000) is well suited for that purpose. More details on the CRPS can be found in Sect. 5 about verification. For EMOS based on a Gaussian distribution, functions are available in the R package ensembleMOS (Yuen et al. 2013). Depending on the variable to be forecast, other, non-Gaussian distribution functions may be used instead. The EMOS model can alternatively be formulated as an extended logistic regression (ExtLR) model (Wilks 2009). In order to obtain a complete predictive distribution, ExtLR uses the threshold to be forecast, y, as an additional predictor. The ExtLR model can then be written as   exp a0 þ a1 xα1 þ    þ aK xαK þ byβ   for y  0; FðyÞ ¼ 1 þ exp a0 þ a1 xα1 þ    þ aK xαK þ byβ

(10)

where α > 0 and β > 0 are fixed coefficients and x  ℝK is the vector of predictors. Fundel and Zappa (2011) apply ExtLR to hydrological reforecasts following Wilks (2011) who uses the ensemble mean x and spread s as predictors. This leads to the ExtLR model   exp a0 þ a1 xα1 þ a2 sα2 þ byβ Fð y Þ ¼ ; 1 þ expða0 þ a1 xα1 þ a2 sα2 þ byβ Þ

(11)

where α1, α2, β have to be determined based on the forecasters knowledge, and a0, a1, a2, b are estimated, for instance, by maximum-likelihood estimation. A more recent development of ExtLR allows for interactions between the vector of predictors, x, and the threshold, y (Ben Bouallègue 2013).

3

Seamless Prediction Methods

3.1

Generation of Seamless Hydrologic Forecasts

Seamless predictions, i.e., consistent predictions over successive lead times, are of increasing importance in the field of (hydro-) meteorological forecasting. For instance, Palmer et al. (2008) motivate the use of seamless predictions by the verification of climate models. Based on the premise that the fundamental physical processes of seasonal forecasts and decadal climate projections are similar, probabilistic climate forecasts can be calibrated according to the validation results of the seasonal predictions of the corresponding models. In meteorology, a seamless prediction system is designed to cover the time span from weather to climate predictions. However, in hydrology, seamless predictions span a somewhat shorter time horizon from nowcasting flash floods to seasonal drought predictions

Multi-model Combination and Seamless Prediction

11

(Yuan et al. 2014). Short range hydrologic forecasts may benefit from a blending of precipitation nowcasts and forecasts. Kober et al. (2012, 2014) and Scheufele et al. (2014) propose and apply such a blending method using a weighting function that depends on lead time and the conditional square root of the ranked probability score. Hydrologic model runs based on seamless meteorological predictions can be expected to be seamless as well. If hydrologic ensemble forecast trajectories, which have been obtained by statistical post-processing, are used as inputs to a hydrodynamic model or for river routing, it is crucial to avoid discontinuities in the marginal predictive distributions. Naïve approaches smooth the parameter estimates of the univariate model fits. For instance, in case of EMOS, the estimates of the parameters (a0, a1, . . ., aM, b0, b1) can be smoothed using cubic smoothing splines as implemented in the R function smooth.spline, where the smoothing parameter is estimated using leave-oneout cross-validation. For instance, Hemri et al. (2015) apply such a smoothing method. More sophisticated approaches would be based on simultaneous parameter estimation over the entire range of lead times. To the author’s knowledge, there are no studies addressing this in the context of hydrological post-processing, though several methods used for spatially adapted post-processing of meteorological forecasts have been developed. Such methods can often be transferred to temporal problems. For instance, the locally adaptive EMOS method (Feldmann et al. 2015; Scheuerer and B€ uermann 2014) could probably be modified in such a way that simultaneous parameter estimation over the entire range of lead times becomes feasible.

3.2

How to Account for Spatiotemporal Dependence Structures?

Recently, different methods to incorporate multivariate dependence structures into the post-processing of ensemble forecasts have been proposed. Let us first have a look at nonparametric reordering approaches that comprise mainly the Schaake shuffle (Clark et al. 2004) and ensemble copula coupling (ECC, Schefzik et al. 2013). Both approaches implicitly rely on empirical copulas. The notion of a copula is defined by Sklar’s theorem (Sklar 1959), which states that any multivariate CDF F with margins F1, . . ., FL can be represented by Fðy1 , . . . , yL Þ ¼ CðF1 ðy1 Þ, . . . , FL ðyL ÞÞ;

(12)

where y1 , . . . , yL  ℝ and C : ½0, 1L ! ½0, 1. The copula C can be understood as a multivariate CDF with standard uniform marginal distributions. In case of ensemble forecasts, the rank order structure of the ensemble members defines an empirical copula over the forecast margins (i.e., particular lead times and locations). The Schaake shuffle transfers historical spatiotemporal dependence structures to the forecasts of interest. For simplicity, it is assumed here that only one variable is of interest (e.g., runoff, precipitation, or temperature), though the method could easily be used to model dependence structures between different variables. Additionally,

12

S. Hemri

the difference between single model and multi-model ensembles is ignored here. Let Xm,t,s be the ensemble forecast array at a specific day. The index m ¼ 1, . . . , M refers to the ensemble members, t ¼ 1, . . . , T to the lead times, and s ¼ 1, . . . , S to the locations. Then a corresponding observation array Ym,t,s of equal size is selected. Ym, t,s is constructed by selecting the same number of historical observations as there are ensemble members. Times of day (to reflect lead times correctly) and locations have to be equal in Xm,t,s and in Ym,t,s. In Clark et al. (2004) the dates of the historical observations are selected such that they match the calendar day of the forecast of interest by 7 days, regardless of the year. The multi-index ‘ ¼ ðs, tÞ defines the margins at which univariate, statistically post-processed ensemble forecasts are available. For a given location s and lead time t, i.e., margin ‘, the Schaake shuffle can be summarized as follows:   1. Sort the forecast vector X‘ ¼ x‘1 , . . . , x‘M such that    ‘  X~‘ ¼ x~1 , . . . , x~‘M ¼ x‘ð1Þ , . . . , x‘ðMÞ , with x‘ð1Þ  x‘ð2Þ      x‘ðMÞ .     2. Sort the observation vector Y‘ ¼ y‘1 , . . . , y‘M such that Y~‘ ¼ y~‘1 , . . . , y~‘M ¼   y‘ð1Þ , . . . , y‘ðMÞ , y‘ð1Þ  y‘ð2Þ      y‘ðMÞ, and denote the corresponding ranks by rk‘m.

  3. Construct the reordered forecast vector Xss ¼ x~‘rk‘ , . . . , x~‘rk‘ . 1

M

The above reordering procedure is applied to all margins ‘. With this, the Schaake shuffle preserves the Spearman rank correlation structure between the margins ‘. For instance, two positively correlated stations have probably similar historical rank orders and hence also similar forecast rank orders after applying the Schaake shuffle. For hydrologic ensemble forecasts of runoff from a larger catchment, it is crucial that meteorological input variables show an appropriate correlation structure among its subcatchments, because this ensures that forecasts of extremes are not leveled out by averaging over the subcatchments. The ECC method works similar to the Schaake shuffle, but the ranks for reordering of the forecasts are derived from the corresponding raw ensemble instead of historical observations. For ECC, one first determines the rank order structure of the raw ensemble of size M at each margin ‘. Then one samples M times from each post-processed marginal predictive CDF Fb‘ . These samples are then reordered using the rank order structure of the raw ensemble, which leads to reordered post-processed ensemble trajectories. ECC distinguishes between three different variants. The first variant, ECC-Q, uses equidistant quantiles as b‘ . The second variant, ECC-R, uses random samples from F b‘ . The samples from F third variant, ECC-T, is a quantile mapping or transformation approach, where the b‘ are determined by first fitting a parametric quantiles to be calculated from F distribution to the raw ensemble and then examining to which quantiles of this distribution the raw ensemble members correspond. Thus, ECC ensures that the

Multi-model Combination and Seamless Prediction

13

Spearman rank correlation coefficient of the raw ensemble is retained (Schefzik   et al. 2013). Denoting the raw ensemble at margin ‘ by R‘ ¼ r ‘1 , . . . , r ‘M , ECC can be summarized as follows: b‘ by univariate post-processing. 1. Obtain a marginal forecast CDF F 2. Sort the raw ensemble members r ‘ð1Þ      r ‘ðMÞ at each margin ‘. Denote the ranks by rk‘m. 3. Select one of the following three methods, which have been proposed by Schefzik et al. (2013), to obtain the ECC ensemble members ybm‘ at margin ‘:   b1 x ‘ , where x ‘ are elements of the vector of equidistant • ECC-Q: ybm‘ ¼ F ‘ rkm rkm     3=2 M1=2 1 2 M probabilities Mþ1 , Mþ1 , . . . , Mþ1 , , . . . , or 1=2 reordered M M M according to the ranks rk‘m.   b1 u ‘ , where u ‘ are elements of the vector (u(1), . . ., u(m)) • ECC-R: ybm‘ ¼ F ‘ rkm rkm of sorted random samples from a standard uniform random variable reordered according to the ranks rk‘m.    b is the fitted CDF of a suitable parametb1 Sb‘ r ‘ , where S‘ • ECC-T: ybm‘ ¼ F ‘ m ric distribution to the raw ensemble R‘.

In contrast to the Schaake shuffle, ECC intrinsically adapts to different weather patterns by relying on the current raw ensemble forecast and not on historical data. Obviously, the performance of ECC depends on how well the true dependence structures are represented by the raw ensemble. Spatiotemporal dependence structures can also be modeled using parametric copulas. Many different copula approaches have been proposed to model such dependence structures in a (hydro-) meteorological context. The review article by Schölzel and Friederichs (2008) provides an introduction to this topic. As most of the applied studies rely on Gaussian copulas, the idea of using parametric copulas is illustrated using Gaussian copulas in the following. Following Schölzel and Friederichs (2008), let Y ¼ ðY 1 , . . . , Y L Þ denote the random vector of interest. As b‘ , are known from prior post-processing. before, the marginal forecast CDFs, F b‘ is univariately calibrated, we set U ‘ Reconsidering Eq. 12 and assuming that F b ¼ F‘ ðy‘ Þ  U ð0, 1Þ following a standard uniform distribution. By applying the inverse of the CDF of the standard normal distribution, Φ, one obtains Z‘ ¼ Φ1 ðU ‘ Þ;

(13)

which follows a standard normal distribution. Additionally, it is assumed that Z ¼ ðZ 1 , . . . , ZL Þ follows a multivariate normal distribution with covariance matrix Σ, i.e., Z  N ð0, ΣÞ, and let the CDF of Z be denoted by GðÞ. Then, the CDF of Y is given by

14

S. Hemri

  Cðu1 , . . . , uL Þ ¼ GðZ1 , . . . , ZL Þ ¼ G Φ1 ðU 1 Þ, . . . , Φ1 ðU L Þ :

(14)

As an example of the usage of parametric copulas, the Gaussian copula approach by Pinson and Girard (2012) is mentioned here. For this example, only temporal correlations are considered, i.e., the margins ‘ correspond to the lead times ‘ ¼ 1, b‘ being the . . . , L . This approach follows the procedure described above with F predictive CDF obtained by univariate post-processing of the raw ensemble at lead time ‘. The covariance matrix Σ, which is equal to a correlation matrix here, is then estimated by fitting a parametric correlation function to the correlations between different lead times in the training period. Using, for instance, an exponential correlation function, Σ is given by     j‘1  ‘2 j cov Y t, ‘1 , Y t, ‘2 ¼ exp  , 0 < ‘1 , ‘2  L; v

(15)

where Yt,‘ is the observation at training day t with lead time ‘. The measurements are assumed to be taken at hourly intervals. The parameter v needs to be estimated from the training data. An additional advantage of using Gaussian copulas is the straightforward calculation of conditional forecast densities. Note that the joint distribution of runoff values y1 : L over the range of lead times 1, . . ., L can be written as       f ðy1:L Þ ¼ f y1:‘1  f y‘1 þ1:‘2 jy1:‘1   f y‘n þ1:‘L jy1:‘n ;

(16)

where 1 < ‘1 < ‘2 <    < ‘n < L . For instance, Hemri et al. (2013) use this approach to construct seamless hydrologic BMA predictive densities from a raw ensemble, of which ensemble members drop out at particular lead times. Engeland and Steinsland (2014) apply a similar method to model the dependence between neighboring catchments and lead times in the framework of an EMOS forecast based on a raw ensemble of size three, which is obtained by combining the predictions from a hydrologic model, a sliding window climatology, and the persistence forecast.

4

Probabilistic Single-Model Forecasts

In parallel to the development of ensemble forecasting and multi-model combination methods, several approaches that quantify uncertainty on the basis of deterministic hydrologic models have been proposed. These methods are appealing, in that they do not rely on computationally expensive ensemble forecasts.

Multi-model Combination and Seamless Prediction

4.1

15

Ensemble Kalman Filter

As stated by Vrugt et al. (2005), most classical hydrologic models ignore most of the different sources of uncertainty and represent them by the uncertainty in the parameter estimates only. However, the ensemble Kalman filter (EnKF) provides a framework for the treatment of the different sources of uncertainty in deterministic hydrological models (Evensen 1994). The EnKF is an extension of the Kalman filter (KF). Both are described in detail in Vrugt et al. (2005) and in Vrugt and Robinson (2007). Here, the basic principle of sequential data assimilation via Kalman filtering approaches is presented using the example of the KF. The evolution of the state vector Ψt  ℝm (the index t denotes time) of a nonlinear hydrologic model is given by   Ψtþ1 ¼ η Ψt , X~ t , θ þ qt ;

(17)

where X~ t denotes the input vector, θ is the parameter vector, the function ηðÞ stands for the nonlinear hydrologic model, and qt denotes a noise term that simulates the errors in the hydrological model formulation. The prediction of the model at time t is then mapped from the model state space to the model output space y t ¼ H ð Ψt Þ þ ϵ t ,

  ϵ t  N 0, σ ot ;

(18)

where HðÞ is the measurement operator, yt denotes model forecast at time t, and ϵ t represents the measurement error. In case of a scalar output, such as runoff, H ðÞ simplifies to the transpose of a vector of size m. At each time step, when an observation measurement y~t becomes available, the forecast states, Ψ ft, are updated using    Ψut ¼ Ψft þ K t y~t  H Ψft :

(19)

In Eq. 19, Kt denotes the Kalman gain matrix, which is computed by  m T T o 1 ; K t ¼ Σm t H HΣt H þ Σt

(20)

o where Σm t is the covariance matrix of qt and Σt is the covariance matrix of the o observations (simplifies to σ t in case of scalar observations). The linear model structure of the KF cannot account for the nonlinear, mainly threshold triggered, hydrologic responses. The EnKF resolves this issue by propa  gating an ensemble of model states At ¼ Ψ1t , . . . , ΨNt  ℝm N through the KF. A detailed description can be found in Vrugt et al. (2005).

4.2

Bayesian Forecasting System

The Bayesian forecasting system (BFS) has been suggested as a method to obtain probabilistic forecasts from any deterministic hydrologic model (Krzysztofowicz

16

S. Hemri

1999). It decomposes uncertainty into the dominant uncertainty (i.e., input uncertainty, IU) and all other sources of uncertainty (hydrologic uncertainty, HU). Subsequently, input and hydrologic uncertainty are integrated to a probabilistic forecast. Following the BFS approach, the forecast density of river stages H ¼ ½ðH 11 , . . . , H T1 Þ, . . . , ðH 1L , . . . , H TL ÞT (lower case h denotes a realization of the random vector H, n ¼ 1, . . . , T is an index for forecast time, and ‘ ¼ 1, . . . , L for forecast location) can be written as: 1 ð

ψ ðh j h0 , y, u, vÞ ¼ 1

ϕðh j s, h0 , yÞ π ðs j u, vÞ ds; |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflffl{zfflfflfflfflfflffl} HU

(21)

IU

where s denotes hydrologic model output, h0 is the series of observations up to the time of forecast issue t0, y denotes a realization of a state vector Y that is available at the time of forecast issue, V is the random vector of precipitation inputs, and U is the random vector of inputs, whose realization u comprises all deterministic inputs to the hydrologic model (exogenous variables and internal states, but not model parameters, which are assumed to be fixed here). Here, ϕ and π denote probability density functions. Equation 21 can be rewritten as ψ ðh j h0 , y, u, vÞ ¼ γ ðh; h0 , y, u, vÞgðh j h0 Þ;

(22)

where gðh j h0 Þ denotes the prior density conditional on h0, which is typically modeled by a time series model. The weighting function γ(h; h0, y, u, v) is given by 1 ð

γ ðh; h0 , y, u, vÞ ¼

f ðs j h, yÞ 1

π ðs j u, vÞ ds; κ ðs j h0 , yÞ

(23)

where κðs j h0 , yÞ denotes the expected density of model output S for any fixed h0 and hydrologic model state y, that is, 1 ð

κðs j h0 , yÞ ¼

f ðs j h, yÞgðh j h0 Þdh;

(24)

1

and f ðs j h, yÞ is the likelihood of s conditional on observation h and state y. BFS provides a rather general Bayesian framework for handling uncertainty when using deterministic hydrologic models. For numerical applications, π ð j u, vÞ has to be estimated by simulation, models for g and f have to be chosen, and the state vector y has to be determined by experimentation. The vector y may include some elements of u and h0. Besides amending deterministic forecasts by a statement on uncertainty and providing an uncertainty decomposition that is soundly based on Bayesian statistics, BFS has two additional favorable properties. Firstly, it has a selfcalibration property. That is, given the inputs u and v are calibrated, the BFS outputs

Multi-model Combination and Seamless Prediction

17

are also calibrated. Secondly, the BFS predictive distribution converges automatically to the prior distribution if the hydrologic model has no predictive capability or the input density is noninformative with regard to the variable to be forecast. This constitutes an effective barrier to systematically poor predictions. A discussion of the usage of such Bayesian frameworks for hydrological uncertainty assessment can be found in the chapter on “Hydrological Challenges in Meteorological Postprocessing” by F. Wetterhall in this handbook.

5

Verification

5.1

Univariate Verification

As probabilistic forecasting is inherently connected to forecast verification, a short summary on probabilistic verification methods is given here. As stated in Gneiting et al. (2007), probabilistic forecasts should be (statistically) well calibrated and yet sharp. A probabilistic forecast is well calibrated if the predicted probability of any forecast interval and the relative frequency of the realizations to fall into that interval are similar. More technically, a forecast is well calibrated if its probability integral transform (PIT, Rosenblatt 1952) follows a standard uniform distribution. The PIT, zt, for time t is given by ðyt zt ¼

f ðyjr t Þdy;

(25)

1

where y denotes the variable of interest (for instance, runoff or precipitation), f is the predictive density, rt denotes here the information set available for time t, e.g., the raw ensemble forecast, and yt is the value that materializes, i.e., the observation. Using the PIT values zt for many realizations over a verification period, a histogram like PIT plot can be drawn (Hamill 2001). As shown in Fig. 2, underdispersion, overdispersion, and biases are detected by such PIT histograms. Sharpness relates to the concentration of predictive distributions and is evaluated by assessing the distribution of the widths of particular prediction intervals, e.g., the centered 90 % interval, of the forecasts over the verification period. The narrower those intervals the sharper is the forecast. An example of box-plot like diagrams that assess sharpness can be found in Gneiting et al. (2007). For the verification of probabilistic forecasts, proper scoring rules should be applied as they “encourage the forecaster to make careful assessments and to be honest” (Gneiting and Raftery 2007). The continuous ranked probability score (CRPS, Matheson and Winkler 1976; Hersbach 2000) is such a proper score. It assesses both calibration and sharpness of probabilistic forecasts and hence gives an estimate of predictive skill. The CRPS for a single forecast-observation pair is defined as

18

S. Hemri

Fig. 2 Left to right: PIT histograms for a well calibrated, an underdispersed, an overdispersed, and a biased forecast (Figures are taken from Hemri et al. (2014))

where F is the predictive CDF, y the verifying observation, and the indicator function of the set A. The crps is negatively oriented, i.e., the smaller the score the more skillful is the forecast. In applications, one usually reports the average CRPS XT crpsðFt , yt Þ, where jTj denotes the over a verification period T, i.e., CRPS ¼ jT1 j t¼1 length of the period T. The associated skill score can be written as CRPSS ¼

CRPSref  CRPS ; CRPSref

(27)

a climatological or a where CRPSref is the CRPS of a reference forecast, for instance,  persistence forecast. The CRPSS attains values in  1, 1 and is positively oriented. Forecasts with skill equal to the skill of the reference forecast lead to a CRPSS of zero. If the main interest is the forecast of a dichotomous event (e.g., rain/no rain or exceeding/not exceeding a threshold), the Brier score (BS) is an appropriate verification method (Brier 1950). The BS is given by

where Ft is the predictive distribution at verification time t and d is the threshold of interest (e.g., an amount of precipitation or a flood alert level for river runoff). Note that the CRPS is the integral of the BS over the support of Ft.

5.2

Multivariate Verification

In case of issuing multivariate predictions, it is crucial to ensure a realistic correlation structure among the forecast margins (see also Sect. 3.2). To this end Gneiting et al. (2008) proposed the multivariate rank histogram, and Thorarinsdottir

Multi-model Combination and Seamless Prediction

19

et al. (2015) introduced the average rank and the band depth rank histogram. For forecast vectors of low dimension, the multivariate rank histogram works well, whereas it is recommended to use the average or the band depth rank histograms in case of forecasts of higher dimensions as it is typically the case in the field of (hydro-) meteorological forecasting. Given univariate calibration, these histograms can be used to detect unrealistic correlation structures among lead times and/or locations of the forecast distribution. Refer to the abovementioned literature for more details on these verification tools. For multivariate assessment of forecast skill, Gneiting and Raftery (2007) proposed the energy score (ES) given by: 1 esðF, xÞ ¼ EF k X  x k  EF k X  X0 k ; 2

(29)

where k  k denotes the Euclidian norm, EF denotes expectation, X and X0 are independent copies of a random vector following the distribution F, and x is the corresponding vector of observations. The ES generalizes the CRPS to higher dimensions. Accordingly, ES and CRPS are equal for univariate forecasts. If the forecast is available as an ensemble of size M with ensemble member trajectories f1 , . . . , fM  ℝL , according to Gneiting et al. (2008) the ES can be calculated by: esðF, xÞ ¼

M M X M 1X 1 X k fj  x k  k fi  fj k : M j¼1 2M2 i¼1 j¼1

(30)

In case of a deterministic forecast trajectory f, the ES reduces to the Euclidian norm: esðf, xÞ ¼k f  x k :

(31)

Hence, the ES may be used to compare multivariate density forecasts, discrete ensemble forecasts, and deterministic forecasts (Gneiting et al. 2008).

6

Summary

The most natural combination of multi-model (hydro-) meteorological forecasts is to merge them to a multi-model ensemble consisting of all members of all models. Such multi-model ensembles generally outperform single model ensemble predictions. In order to benefit even more from the concept of multi-model combination, the models should be combined and weighted using state of the art statistical post-processing methods like BMA or EMOS. At present, forecasts for meteorological variables and river runoff benefit strongly from statistical post-processing. Keeping in mind that the methods presented in this chapter are rather simple, more sophisticated postprocessing methods may even increase this benefit. Seamless predictions in a (hydro-) meteorological context should, on the one hand, avoid discontinuities between the marginal forecast distributions from lead

20

S. Hemri

time to lead time and, on the other hand, consider the correlation structure among lead times. While in general, univariate statistical post-processing can take account of the former by straightforward smoothing of parameters, the latter needs more sophisticated approaches. Methods based on empirical copulas like the Schaake shuffle or ECC are very flexible and thus well suited for a wealth of different forecasting problems. They can be applied to both spatial and temporal problems. Alternative methods like the Gaussian copula approach are less flexible, but they in turn provide complete multivariate predictive distributions. This allows, for instance, for straightforward calculation of conditional forecast distributions. Furthermore, EnKF and BFS, which are alternative uncertainty quantification methods, have been discussed briefly in this chapter. As probabilistic forecasting and verification are inherently connected to each other, some methods for probabilistic verification have been presented from an applied perspective. For a more detailed discussion of verification methods, we refer to the chapter on “Methods for Meteorological Post-processing” by T. Thorarinsdottir in this handbook.

7

Conclusion

Probabilistic (hydro-) meteorological forecasting benefits from combining EPS forecasts to multi-model ensembles. In order to further improve forecast performance EPS forecasts can be combined and weighted using statistical post-processing methods. With some modifications and additions such statistical methods can be used to obtain seamless probabilistic predictions.

References N. Addor, S. Jaun, F. Fundel, M. Zappa, An operational hydrological ensemble prediction system for the city of Zurich (Switzerland): skill, case studies and scenarios. Hydrol. Earth Syst. Sci. 15, 2327–2347 (2011) A. Arribas, K.B. Robertson, K.R. Mylne, Test of a poor man’s ensemble prediction system for shortrange probability forecasting. Mon. Weather Rev. 133, 1825–1839 (2005) F. Atger, The skill of ensemble prediction systems. Mon. Weather Rev. 127, 1941–1953 (1999) J.C. Bartholmes, J. Thielen, M.-H. Ramos, S. Gentilini, The European Flood Alert System EFAS – part 2: statistical skill assessment of probabilistic and deterministic operational forecasts. Hydrol. Earth Syst. Sci. 13, 141–153 (2009) Z. Ben Bouallègue, Calibrated short-range ensemble precipitation forecasts using extended logistic regression with interaction terms. Weather Forecast. 28, 515–524 (2013) K. Bogner, F. Pappenberger, H.L. Cloke, Technical note: the normal quantile transformation and its application in a flood forecasting system. Hydrol. Earth Syst. Sci. 16, 1085–1094 (2012) P. Bougeault, Z. Toth, C. Bishop, B. Brown, D. Burridge, D.H. Chen, B. Ebert, M. Fuentes, T.M. Hamill, K. Mylne, J. Nicolau, T. Paccagnella, Y.-Y. Park, D. Parsons, B. Raoult, D. Schuster, P.S. Dias, R. Swinbank, Y. Takeuchi, W. Tennant, L. Wilson, S. Worley, The THORPEX Interactive Grand Global Ensemble. Bull. Am. Meteorol. Soc. 91, 1059–1072 (2010)

Multi-model Combination and Seamless Prediction

21

G. Box, D. Cox, An analysis of transformations. J. R. Stat. Soc. Ser. B Stat. Methodol. 26, 211–252 (1964) G.W. Brier, Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1–3 (1950) M. Clark, S. Gangopadhyay, L. Rajagalopalan, R. Wilby, The Schaake shuffle: a method for reconstructing space-time variability in forecasted precipitation and temperature fields. J. Hydrometeorol. 5, 243–262 (2004) H.L. Cloke, F. Pappenberger, Ensemble flood forecasting: a review. J. Hydrol. 375, 613–626 (2009) F.J. Doblas-Reyes, R. Hagedorn, T.N. Palmer, The rationale behind the success of multi-model ensembles in seasonal forecasting – II. Calibration and combination. Tellus A 57, 234–252 (2005) Q. Duan, N.K. Ajami, X. Gao, S. Sorooshian, Multi-model ensemble hydrologic prediction using Bayesian model averaging. Adv. Water Resour. 30(5), 1371–1386 (2007) E.E. Ebert, Ability of a poor man’s ensemble to predict the probability and distribution of precipitation. Mon. Weather Rev. 129, 2461–2480 (2001) K. Engeland, I. Steinsland, Probabilistic postprocessing models for flow forecasts for a system of catchments and several lead times. Water Resour. Res. 50, 182–197 (2014) European Flood Awareness System, EFAS concepts and tools (2014), https://www.efas.eu/aboutefas.html. Accessed 24 Apr 2015 G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. 99(10), 143–162 (1994) K. Feldmann, M. Scheuerer, T.L. Thorarinsdottir, Spatial postprocessing of ensemble forecasts for temperature using nonhomogeneous Gaussian regression. Mon. Weather Rev. 143, 955–971 (2015) C. Fraley, A.E. Raftery, T. Gneiting, Calibrating multimodel forecast ensembles with exchangeable and missing members using Bayesian model averaging. Mon. Weather Rev. 138(1), 190–202 (2010) C. Fraley, A.E. Raftery, J.M. Sloughter, T. Gneiting, University of Washington, ensembleBMA: probabilistic forecasting using ensembles and Bayesian model averaging. R package version 5.1.1 (2015), http://CRAN.R-project.org/package=ensembleBMA. Accessed 23 Apr 2015 F. Fundel, M. Zappa, Hydrological ensemble forecasting in mesoscale catchments: sensitivity to initial conditions and value of reforecasts. Water Resour. Res. 47, W09520 (2011) K.P. Georgakakos, D.-J. Seo, H. Gupta, J. Schaake, M.B. Butts, Towards the characterization of streamflow simulation uncertainty through multimodel ensembles. J. Hydrol. 298, 222–241 (2004) H.R. Glahn, D.A. Lowry, The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteorol. 11, 1203–1211 (1972) T. Gneiting, Calibration of medium-range weather forecasts. ECMWF Technical Memorandum, No. 720 (2014), 28p T. Gneiting, A.E. Raftery, Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007) T. Gneiting, A.E. Raftery, A.H. Westveld, T. Goldman, Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133(5), 1098–1118 (2005) T. Gneiting, F. Balabdoui, A.E. Raftery, Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol. 69, 243–268 (2007) T. Gneiting, L.I. Stanberry, E.P. Grimit, L. Held, N.A. Johnson, Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 7(2), 211–235 (2008) R. Hagedorn, F.J. Doblas-Reyes, T.N. Palmer, The rationale behind the success of multi-model ensembles in seasonal forecasting – I. Basic concept. Tellus A 57, 219–233 (2005)

22

S. Hemri

R. Hagedorn, R. Buizza, T.M. Hamill, M. Leutbecher, T.N. Palmer, Comparing TIGGE multimodel forecasts with reforecast-calibrated ECMWF ensemble forecasts. Q. J. Roy. Meteorol. Soc. 138 (668), 1814–1827 (2012) T.M. Hamill, Interpretation of rank histograms for verifying ensemble forecasts. Mon. Weather Rev. 129, 550–560 (2001) T.M. Hamill, Verification of TIGGE multimodel and ECMWF reforecast-calibrated probabilistic precipitation forecasts over the contiguous United States. Mon. Weather Rev. 140, 2232–2252 (2012) T.M. Hamill, C. Snyder, R.E. Morss, A comparison of probabilistic forecasts from bred, singularvector, and perturbed observation ensembles. Mon. Weather Rev. 128(6), 1835–1851 (2000) S. Hemri, F. Fundel, M. Zappa, Simultaneous calibration of ensemble river flow predictions over an entire range of lead-times. Water Resour. Res. 49 (2013). doi:10.1002/wrcr.20542 S. Hemri, D. Lisniak, B. Klein, Ascertainment of probabilistic runoff forecasts considering censored data (in German). Hydrol. Wasserbewirtsch. 58(2), 84–94 (2014). doi:10.5675/HyWa_2014,2_4 S. Hemri, D. Lisniak, B. Klein, Multivariate post-processing techniques for probabilistic hydrological forecasting. Water Resour. Res. 51(9), 7436–7451 (2015) H. Hersbach, Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15(5), 559–570 (2000) T. Iversen, A. Deckmyn, C. Santos, K. Sattler, J.B. Bremnes, H. Feddersen, I.-L. Frogner, Evaluation of ‘GLAMEPS’ – a proposed multimodel EPS for short range forecasting. Tellus A 63, 513–530 (2011) K. Kober, G.C. Craig, C. Keil, A. Dörnbrack, Blending a probabilistic nowcasting method with a high-resolution numerical weather prediction ensemble for convective precipitation forecasts. Q. J. Roy. Meteorol. Soc. 138, 755–768 (2012) K. Kober, G.C. Craig, C. Keil, Aspects of short-term probabilistic blending in different weather regimes. Q. J. Roy. Meteorol. Soc. 140, 1179–1188 (2014) T.N. Krishnamurti, C.M. Kishtawal, Z. Zhang, T. LaRow, D. Bachiochi, E. Williford, S. Gadgil, S. Surendran, Improved weather and seasonal climate forecasts from multimodel superensemble. Science 285(5433), 1548–1550 (1999) T.N. Krishnamurti, C.M. Kishtawal, Z. Zhang, T. LaRow, D. Bachiochi, E. Williford, S. Gadgil, S. Surendran, Multimodel ensemble forecasts for weather and seasonal climate. J. Climate 13 (23), 4196–4216 (2000) R. Krzysztofowicz, Bayesian theory of probabilistic forecasting via deterministic hydrologic model. Water Resour. Res. 35(9), 2739–2750 (1999) J.E. Matheson, R.L. Winkler, Scoring rules for continuous probability distributions. Manag. Sci. 22, 1087–1096 (1976) R. Nelsen, An Introduction to Copulas (Springer, New York, 2006) T. Palmer, Predicting uncertainty in forecasts of weather and climate. Rep. Prog. Phys. 63, 71–116 (2000) T.N. Palmer, F.J. Doblas-Reyes, R. Hagedorn, A. Alessandri, S. Gualdi, U. Andersen, H. Feddersen, P. Cantelaube, J.-M. Terres, M. Davey, R. Graham, P. Délécluse, A. Lazar, M. Déqué, J.-F. Guérémy, E. Díez, B. Orfila, M. Hoshen, A.P. Morse, N. Keenlyside, M. Latif, E. Maisonnave, P. Rogel, V. Marletto, M.C. Thomson, Development of a European multimodel ensemble system for seasonal-to-interannual prediction (DEMETER). Bull. Am. Meteorol. Soc. 85, 853–872 (2004) T.N. Palmer, F.J. Doblas-Reyes, A. Weisheimer, M.J. Rodwell, Toward seamless prediction: calibration of climate change projections using seasonal forecasts. Bull. Am. Meteorol. Soc. 89, 459–470 (2008) Y.-Y. Park, R. Buizza, M. Leutbecher, TIGGE: preliminary results on comparing and combining ensembles. Q. J. Roy. Meteorol. Soc. 134, 2029–2050 (2008) M.A. Parrish, H. Moradkhani, C.M. DeChant, Toward reduction of model uncertainty: integration of Bayesian model averaging and data assimilation. Water Resour. Res. 48, 1–18 (2012)

Multi-model Combination and Seamless Prediction

23

P. Pinson, R. Girard, Evaluating the quality of scenarios of short-term wind power generation. Appl. Energy 96, 12–20 (2012) A.E. Raftery, T. Gneiting, F. Balabdaoui, M. Polakowski, Using Bayesian model averaging to calibrate forecast ensembles. Mon. Weather Rev. 133(2), 1155–1174 (2005) D.S. Richardson, R. Buizza, R. Hagedorn, TIGGE – first final report. WMO/TD-No. 1273 (2005), http://www.wmo.int/pages/prog/arep/wwrp/new/thorpex_publications.html. Accessed 24 Apr 2015. J. Rings, J.A. Vrugt, G. Schoups, J.A. Husman, H. Vereecken, Bayesian model averaging using particle filtering and Gaussian mixture modeling: theory, concepts, and simulation experiment. Water Resour. Res. 48, W0552 (2012) M. Rosenblatt, Remarks on a multivariate transformation. Ann. Math. Stat. 23, 470–472 (1952) J. Schaake, J. Pailleux, J. Thielen, R. Arritt, T. Hamill, L. Luo, E. Martin, D. McCollor, F. Pappenberger, Summary of recommendations of the first workshop on postprocessing and downscaling atmospheric forecasts for hydrologic applications held at Météo-France, Toulouse, France, 15–18 June 2009. Atmos. Sci. Lett. 11, 59–63 (2010) R. Schefzik, T.L. Thorarinsdottir, T. Gneiting, Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci. 28, 616–640 (2013) M. Scheuerer, L. B€uermann, Spatially adaptive post-processing of ensemble forecasts for temperature. J. R. Stat. Soc. Ser. C 63(3), 405–422 (2014) K. Scheufele, K. Kober, G.C. Craig, C. Keil, Combining probabilistic precipitation forecasts from a nowcasting technique with a time-lagged ensemble. Meteorol. Appl. 21, 230–240 (2014) C. Schölzel, P. Friederichs, Multivariate non-normally distributed random variables in climate research – introduction to the copula approach. Nonlinear Processes Geophys. 15, 761–772 (2008) A. Sklar, Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959) J. Thielen, J. Schaake, R. Hartman, R. Buizza, Aims, challenges and progress of the Hydrological Ensemble Prediction Experiment (HEPEX) following the third HEPEX workshop held in Stresa 27 to 29 June 2007. Atmos. Sci. Lett. 9, 29–35 (2008) J. Thielen, J.C. Bartholmes, M.-H. Ramos, A. de Roo, The European Flood Alert System – part 1: concept and development. Hydrol. Earth Syst. Sci. 13, 125–140 (2009) T.L. Thorarinsdottir, M. Scheuerer, C. Heinz, Assessing the calibration of high-dimensional ensemble forecasts using rank histograms. J. Comput. Graph. Stat. 25, 105–122 (2016) E. Todini, A model conditional processor to assess predictive uncertainty in flood forecasting. Int. J. River Basin Manag. 6, 123–137 (2008) J.A. Vrugt, B.A. Robinson, Treatment of uncertainty using ensemble methods: comparison of sequential data assimilation and Bayesian model averaging. Water Resour. Res. 43, 1–18 (2007) J.A. Vrugt, C.G.H. Diks, H.V. Gupta, W. Bouten, J.M. Verstraten, Improved treatment of uncertainty in hydrologic modeling: combining the strengths of global optimization and data assimilation. Water Resour. Res. 41, W01017 (2005). doi:10.1029/2004WR003059 J.A. Vrugt, C.J.F. ter Braak, M.P. Clark, J.M. Hyman, B.A. Robinson, Treatment of input uncertainty in hydrologic modeling: doing hydrology backward with Markov chain Monte Carlo simulation. Water Resour. Res. 44, W00B09 (2008). doi:10.1029/2007WR006720 J.A. Vrugt, C.J.F. ter Braak, C.G.H. Diks, B.A. Robinson, J.M. Hyman, D. Higdon, Accelerating Markov chain Monte Carlo simulations by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10(3), 271–288 (2009) A. Weinheimer, L.A. Smith, K. Judd, A new view of seasonal forecast skill: bounding boxes from the DEMETER ensemble forecasts. Tellus A 57, 265–279 (2005) D.S. Wilks, Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteorol. Appl. 16(3), 361–368 (2009). doi:10.1002/met.134 D. Wilks, Statistical Methods in the Atmospheric Sciences (Academic Press, Oxford, 2011) X. Yuan, E. Wood, M. Liang, Developing a seamless hydrologic forecast system: integrating weather and climate prediction. Geophysical Research Abstracts 16(EGU2014-2268) (2014)

24

S. Hemri

R. Yuen, T. Gneiting, T. Thorarinsdottir, C. Fraley, ensembleMOS: ensemble model output statistics. R package version 0.7 (2013), http://CRAN.R-project.org/package=ensembleMOS. Accessed 23 Apr 2015 M. Zappa, S. Jaun, U. Germann, A. Walser, F. Fundel, Superposition of three sources of uncertainties in operational flood forecasting chains. Atmos. Res. 100, 246–262 (2011) C. Ziehmann, Comparison of a single-model EPS with a multi-model ensemble consisting of a few operational models. Tellus A 52, 280–299 (2000)

Soil Moisture Data Assimilation Gabrielle Jacinthe Maria De Lannoy, Patricia de Rosnay, and Rolf Helmut Reichle

Contents 1 2 3 4 5

6

7

8

9

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Components of a Soil Moisture Data Assimilation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Observations Related to Soil Moisture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Land Surface Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Observation Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Screen-Level Observation Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Microwave Observation Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Terrestrial Water Storage Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assimilation of Observations Related to Soil Moisture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Sequential State Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Joint State and Parameter Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random Errors and Biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Bias, Autocorrelated Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Random Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluation of Soil Moisture Estimates from Data Assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 In Situ Soil Moisture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Validation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toward Operational Soil Moisture Data Assimilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 6 8 9 9 10 12 12 12 20 21 22 22 25 30 30 31 33 33

G.J.M. De Lannoy (*) NASA Goddard Space Flight Center, Code 610.1, Greenbelt, MD, USA KU Leuven, Department of Earth and Environmental Sciences, Leuven, Belgium e-mail: [email protected] P. de Rosnay (*) Data Assimilation Section, European Center for Medium-Range Weather Forecasts, Reading, Berkshire, UK e-mail: [email protected] R.H. Reichle (*) NASA Goddard Space Flight Center, Code 610.1, Greenbelt, MD, USA e-mail: [email protected] # Springer-Verlag Berlin Heidelberg (outside the USA) 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_32-1

1

2

G.J.M. De Lannoy et al.

9.1 ECMWF Soil Moisture Data Assimilation for NWP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 NASA SMAP Surface and Root-Zone Soil Moisture Product . . . . . . . . . . . . . . . . . . . . . . . 10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35 36 39 40

Abstract

Accurate knowledge of soil moisture at the continental scale is important for improving predictions of weather, agricultural productivity, and natural hazards, but observations of soil moisture at such scales are limited to indirect measurements, either obtained through satellite remote sensing or from meteorological networks. Land surface models simulate soil moisture processes, using observation-based meteorological forcing data, and auxiliary information about soil, terrain, and vegetation characteristics. Enhanced estimates of soil moisture and other land surface variables, along with their uncertainty, can be obtained by assimilating observations of soil moisture into land surface models. These assimilation results are of direct relevance for the initialization of hydrometeorological ensemble forecasting systems. The success of the assimilation depends on the choice of the assimilation technique, the nature of the model and the assimilated observations, and, most importantly, the characterization of model and observation error. Systematic differences between satellite-based microwave observations or satellite-retrieved soil moisture and their simulated counterparts require special attention. Other challenges include inferring root-zone soil moisture information from observations that pertain to a shallow surface soil layer, propagating information to unobserved areas and downscaling of coarse information to finer-scale soil moisture estimates. This chapter summarizes state-of-the-art solutions to these issues with conceptual data assimilation examples, using techniques ranging from simplified optimal interpolation to spatial ensemble Kalman filtering. In addition, operational soil moisture assimilation systems are discussed that support numerical weather prediction at ECMWF and provide value-added soil moisture products for the NASA Soil Moisture Active Passive mission. Keywords

Soil moisture retrieval • Microwave brightness temperature • Radar backscatter • Terrestrial water storage • Analysis • Innovation • Increment • Kalman filter • Observation operator • Numerical weather prediction • Initialization • State update • Calibration • Radiative transfer model • Land surface model • Screenlevel observations • ASCAT • AMSR2 • SMOS • SMAP • GRACE

1

Introduction

Soil moisture is the quantity of water contained in the upper layers of the soil, water that directly interacts with the atmosphere through evapotranspiration and partitions rainfall into infiltration and runoff. While the exact definition of soil moisture is

Soil Moisture Data Assimilation

3

slightly different for weather forecasters, farmers, water managers, construction engineers, and ecologists, in hydrological and Earth system applications, soil moisture usually includes the water in the upper ~1–2 m of the soil. Even though soil moisture represents only 0.0012 % of all water available on Earth ( 1:4 billion km3), and only 0.05 % of all freshwater ( 35, 000 km3 ) (Gleick 1996), it is of primary importance because it links the water, energy, and carbon cycles and therefore has a significant impact on weather and global climate (Dirmeyer 2000; Koster et al. 2004), and it controls droughts and floods, agricultural yield, diseases, and other socioeconomic phenomena. Soil moisture is being monitored with modeling and observing systems. Numerical models provide spatially and temporally continuous estimates of soil moisture at customized space and time resolutions, and at various soil depths down to the water table. However, models are always simplified and prone to errors. Observations are usually obtained from sparse in situ networks or satellite swaths, and they are therefore limited in their spatial and temporal coverage. One particularly important limitation of current remote-sensing observations is that they only provide soil moisture in a shallow surface layer. By using models and observations synergistically, observational gaps can be filled and superior estimates of soil moisture can be constructed. This process, known as data assimilation, merges the observations into soil moisture modeling systems, either to improve model simulations or to add value to observations. The focus of this chapter is on updating the soil moisture state in dynamic land surface models for continental and global applications. At these scales, observations related to soil moisture are mainly provided by global surface observational networks or through remote sensing, either in the form of satellite retrievals, microwave radiances, or radar backscatter values. Data assimilation interpolates and extrapolates the observations in space and time and updates the entire simulated soil moisture column, while ensuring consistency with all other geophysical variables in the model. This process leads to improved initial conditions for subsequent hydrometeorological forecasts across a range of applications such as weather and drought forecasts. In addition, meaningful estimates of the uncertainty in soil moisture estimates can be obtained, provided the characteristics of observation and forecast error are well described in the assimilation system. These uncertainty estimates can be used to quantify the uncertainty in hydrometeorological ensemble forecasts (Chapter ▶ Ensemble methods for meteorological predictions and ▶ Hydrological Ensemble Prediction Systems around the Globe). A broader definition of “data assimilation” methods aimed at improving soil moisture estimates might include the construction of improved forcing information, the revision of modeling systems, or the estimation of model parameters (Chapter ▶ Deterministic parameter estimation methods) using in situ or satellite observations. Parameter estimation is crucial to limit biases in the modeling part of the assimilation system. This mostly involves the calibration of static model parameters against historical datasets. The dynamic estimation of evolving parameters, possibly along with soil moisture updates, has also been explored, but not for large-

4

G.J.M. De Lannoy et al.

scale soil moisture modeling systems. The use of observational information to construct superior forcing information and improve modeling systems is at the core of land surface reanalysis products generated by various operational groups (Sect. 9). The objective of this chapter is to educate students and scientists new to the discipline about the current, well-established, practices in large-scale soil moisture data assimilation, without providing an in-depth literature review. The chapter first provides an overview of the components of a soil moisture data assimilation system (Sect. 2). Next, the assimilated observations (Sect. 3), the soil moisture modeling (Sect. 4) and the modeling of observation predictions (Sect. 5) are discussed. Stateof-the-art assimilation techniques for soil moisture state updating are presented in Sect. 6, with attention to an optimal characterization of random and systematic errors (Sect. 7). The implementation of advanced ensemble Kalman filter systems is highlighted for its direct relevance to hydrometeorological ensemble forecasting. Section 8 discusses the evaluation of large-scale assimilation results with in situ soil moisture observations. Section 9 provides examples of cutting-edge, preoperational soil moisture assimilation systems. The chapter is concluded with a summary (Sect. 10).

2

Components of a Soil Moisture Data Assimilation System

The basic components of a data assimilation system include observations (Sect. 3), modeling (Sects. 4 and 5), and the analysis update (Sect. 6). The observations used for large-scale soil moisture data assimilation systems include screen-level observations or remote-sensing observations. The modeling comprises two parts: (i) the prognostic land surface model to dynamically propagate the state and (ii) diagnostic modeling to translate land surface variables (e.g., soil moisture) into observed quantities such as remotely sensed radiances. The analysis update optimally combines the information from the observations and the model. The land surface system (Sect. 4) dynamically propagates the prognostic land surface variables in time. The prognostic variables are the key variables that are needed to initialize or restart a model simulation. Depending on the formulation of the land surface model, the prognostic variables could consist of soil moisture, soil temperature, and snow in all its layers, as well as vegetation variables. These model variables have a memory of the past and evolve in time governed by physical laws such as energy and mass conservation and gravity, and in response to external meteorological forcings, such as precipitation and evapotranspiration. The land surface model f i,i1 (.) uses the estimate of the state b xþ i-1 at the previous time i-1 (which is an a posteriori or analysis estimate, if observations were used to update it, x see Sect. 6) together with external forcing information ui to predict the state b i at the

Soil Moisture Data Assimilation

5

current time i, also called the a priori or forecast state estimate. For simplicity, the land surface parameters α are assumed constant. The state trajectory can be written as  þ   b xi ¼ f i,i-1 b xi - 1 , u i , α

(1)

which approximates the true system xi ¼ f i,i1 ðxi-1 , ui , α, wi Þ

(2)

where model errors wi are due to errors in the external forcings ui, in the model f i,i1 þ (.) structure and parameters α. Errors in the analysis state b xi-1 and model errors wi  þ  will introduce errors in the forecasted state b xi . The uncertainties in b xi-1 , wi, and b xi are described by the analysis error covariance Pþ i-1 , the model error covariance Qi, and the forecast error covariance matrix P i , respectively. One way to estimate the uncertainties is to generate an ensemble of trajectories b x i,j by perturbing forcings, state variables, or parameters (further discussed in Sect. 7.2.1). The surface soil moisture that is simulated with a land surface model is often directly comparable to satellite-based soil moisture retrievals. If the land surface model output does not directly correspond to the assimilated observations, such as for brightness temperature or radar backscatter observations, a second modeling step is needed to transform the land model output into observation predictions b y i (Sect. 5):    b y xi , β i ¼ hi b

(3)

This observation model, or observation operator, hi(.) maps state variables from state space to observation space. This step may also include any spatial or temporal aggregation of land surface variables to satellite-scale observation predictions. Here again, for simplicity, the parameters β of the observation operator are assumed constant, and uncertainties in the observation predictions could be estimated through ensemble methods. The assimilated observations yobs,i (Sect. 3) can be written as function of the true state xi and the observation operator hi (.): yobs,i ¼ hi ðxi , β, vi Þ

(4)

The observation error term vi includes measurement errors as well as representativeness errors and is assumed additive in most data assimilation systems. The corresponding observation error covariance matrix will be denoted as Ri (discussed in Sect. 7.2.2). The difference between the observations and observation predictions is used to update the state in the analysis step (Sect. 6), for example, as follows:

6

G.J.M. De Lannoy et al.

h i   b b b ¼ x þ K y  y xþ i obs, i i i i

(5)

where Ki is a gain matrix. In most soil moisture data assimilation systems, the state b x i used in the updating procedure is a limited subset of the land model prognostic variables that render the system “observable.” (For simplicity, the same notation b x i is used for either the full set or a subset of prognostic variables.) A system is observable if the state b x i is sufficiently connected to the assimilated observations yobs. In practice, the state in Eq. 5 contains “observed” state variables that directly contribute to observation predictions, plus “unobserved” variables that are correlated (in the errors) to the “observed” state variables and that will be updated along with the observed state variables during the data assimilation. For example, for the assimilation of surface soil moisture retrievals, the state vector will contain both (observed) surface and (unobserved) deeper-layer soil moisture (Sect. 6.1). Furthermore, the state vector can contain state variables for multiple fine-scale grid cells needed to generate coarser-scale observation predictions, “unobserved” neighboring grid cells (Sect. 6.1.4), and state variables at different time steps (Sect. 6.2).

3

Observations Related to Soil Moisture

Global-scale soil moisture can be inferred from global surface observational networks or from satellite-based observations. Near-surface meteorological observations of two-meter air temperature and relative humidity are measured routinely by the land surface synoptic report (SYNOP) operational network at the global scale and, under certain conditions, are related to surface and root-zone soil moisture. The coverage of these station observations varies greatly between dense measurements in Europe, North America, and parts of Asia and significantly sparser coverage elsewhere. Screen-level variables are assimilated operationally at major numerical weather prediction centers, because their assimilation improves medium-range forecasts of surface meteorological conditions, albeit with limited improvement to the soil moisture estimates themselves (Sect. 6.1.1). Satellite remote sensing offers a way to observe the surface soil water content at continental to global scales. Most existing remote-sensing devices use the distinct physical properties of soil water in interaction with electromagnetic signals at specific wavelengths. Microwave radiometry (wavelength 1–20 cm, frequency 10–1.4 GHz) has been used for several decades to estimate the soil’s dielectric constant, and hence soil moisture. Examples of radiometers on board satellite platforms that passively measure microwave emission from the land surface are the Advanced Microwave Sounding Radiometer 2 (AMSR2), the Microwave Imaging Radiometer using Aperture Synthesis on board the Soil Moisture Ocean Salinity (SMOS) mission, the Aquarius radiometer, and the radiometer on board the Soil Moisture Active Passive (SMAP) mission, among others. Active microwave sensors send and receive microwave signals. Examples include the radars on board the two European remote-sensing satellites (ERS-1 and ERS-2), the Advanced

Soil Moisture Data Assimilation

7

SCATterometer (ASCAT) on board the European Organisation for the Exploitation of Meteorological Satellites (METOP) series of satellites, and also the radar on board SMAP. All the above-mentioned satellites are polar orbiting, and each provides global coverage approximately every 2–3 days. Microwave remote sensing is attractive because the signals at L-band wavelengths (1.4 GHz) used for SMOS and SMAP are most sensitive to soil moisture, less impacted by vegetation, and not impacted by clouds or light rain. There are, however, some obvious limitations: (i) Microwave instruments only sense the soil moisture in a thin (1–5 cm) surface layer, with the penetration depth depending on the microwave wavelength and soil moisture content. (ii) The spatial coverage is limited to swaths of  250  1000 km, and the revisit time is once every couple of days. (iii) Passive microwave data have coarse spatial resolutions ( 10  100 km). (iv) Active microwave data are typically very noisy. Through data assimilation, some of these limitations can be overcome, as will be illustrated in Sect. 6. Microwave observations can be assimilated directly as brightness temperatures or backscatter values or after inversion to soil moisture retrievals. Various methods exist to infer soil moisture retrievals from active or passive microwave signals. For example, AMSR2, SMOS ,and SMAP retrievals are obtained by explicitly inverting the relationship between soil moisture and passive microwave emission (e.g., Wigneron et al. 2007) and ASCAT retrievals employ a change detection technique (e.g., Bartalis et al. 2007). It is important to note that the climatologies of various retrieval products can be very different and require careful attention when comparing or merging various retrieval products (Dorigo et al. 2015; Reichle and Koster 2004). The soil moisture retrieval process is very sensitive to radiative transfer or backscatter model parameters (Sect. 5.2), and to auxiliary information (e.g., about soil temperature and vegetation) provided by land surface modeling systems. For example, the SMOS soil moisture retrievals use ECMWF’s simulated surface soil temperature and moisture as prior information in the retrieval, and SMAP retrievals use prior and auxiliary information from the NASA Goddard Earth Observing System Model, version 5 (GEOS-5) (Sect. 4). In addition, retrievals can technically be calculated under any conditions, but the soil moisture estimates are only meaningful in areas with moderate topographic complexity, nonfrozen and snow-free conditions, sparse vegetation, and at times with limited precipitation. An alternative to using microwave signals is to use gravity measurements to determine changes in the mass of water: the Gravity Recovery and Climate Experiment (GRACE) mission consists of two satellites whose relative distance and velocity can be related to anomalies in water amounts at and near the land surface, including soil moisture, snow, and surface water. GRACE observations thus provide information on deeper soil moisture under any weather conditions, but their resolution is very coarse in space (~250 km) and time (monthly).

8

4

G.J.M. De Lannoy et al.

Land Surface Modeling

A wide variety of models has been used to simulate the dynamic evolution of soil moisture, ranging from simple solutions of equations that represent the movement of water in unsaturated soils to full land surface models (LSM, Chapter on ▶ Land Surface Hydrological Models) that simulate the soil-vegetation-atmosphere interactions. The strength of models is in their reliance on physical laws known to operate in nature and in their ability to provide a consistent and balanced distribution of water and heat. Historically, some LSMs have been optimized to simulate select land surface variables for specific regions, whereas others are tuned to provide good boundary information in general circulation models, sometimes with little attention paid to the physical realism of the simulated soil moisture. Examples of LSMs that are part of operational integrated modeling systems are the Hydrology-Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL, Balsamo et al. 2009) in ECMWF’s operational Integrated Forecasting System (IFS) and the Catchment land surface model (Koster et al. 2000) in NASA’s GEOS-5 system. The prognostic variables related to soil moisture (i.e., part of b x i in Eq. 1) are very different in these models. In HTESSEL the soil moisture is calculated in four layers of thicknesses of 0.07, 0.21, 0.72, and 1.89 m from top to bottom. The Catchment model defines three prognostic variables that describe the equilibrium soil moisture profile and deviations from the equilibrium across the entire watershed (or modeling unit). Specifically, the catchment deficit (catdef), root-zone excess (rzexc), and surface excess (srfexc) prognostic variables are used together to diagnose soil moisture in a surface layer (sfmc, 0–0.05 m), a root-zone layer (rzmc, 0–1 m), and the entire profile (prmc). The latter extends from the surface to the bedrock at a variable depth between 1.3 and 10 m. The temperature (tsoil) of the topmost soil layer (of thickness 0.1 m) is diagnosed from the corresponding ground heat content prognostic variable (ght). The LSM structure (denoted fi,i  1(.) in Sect. 2), parameters (α), and forcing inputs (ui) determine the climatology of the simulated soil moisture, that is, its longterm average and seasonal variation. For example, the porosity parameter determines the maximum amount of water that can be contained in a soil layer, and the hydraulic conductivity determines how fast soil moisture moves across soil layers. Locally, optimal parameters for soil moisture modeling could be found through calibration against historic data records of observations (Chapter on ▶ Deterministic parameter estimation methods). However, the calibration of LSMs is a nontrivial task, because of multiparameter interactions, equifinality, and the scale-dependency of the land model parameters. Global LSMs are usually not calibrated and solely rely on auxiliary static information about soil and vegetation properties. Soil physical parameters, for example, are typically inferred from global soil texture maps using static lookup tables or pedotransfer functions. These parameters are not perfect, yet they determine the average (climatological) level of soil moisture and may thus be responsible for persistent biases. LSM parameters also impact random errors because they affect the nature of shorter-term soil moisture dynamics.

Soil Moisture Data Assimilation

9

The most important forcing input to soil moisture simulations is precipitation, but other surface meteorological fields are also required, including radiation, air temperature and humidity, and wind speed. For small-scale applications, forcing data are usually collected from meteorological towers. Forcing data for large-scale applications rely on a merger of surface observations and atmospheric reanalysis fields. Examples include the recent MERRA-Land (Reichle et al. 2011) and ERA-Land (Balsamo et al. 2013) products. The underlying atmospheric reanalysis products assimilate a very large number of conventional and satellite-based observations of the atmosphere into a global atmospheric model and provide self-consistent meteorological fields with complete spatial and temporal coverage. However, these products have a relatively coarse resolution and are subject to errors in the reanalysis systems. Errors in the long-term average precipitation amounts or intensity result in biased simulations. At shorter time scales, missed or excessive precipitation events cause random errors in simulated soil moisture. Merging the reanalysis data with satellite and gauge-based precipitation data products mitigates some, but not all, of these errors. Another determining factor for soil moisture simulations is the land model initialization. A so-called cold model start (using an arbitrary state initialization) will generally cause a drift in soil moisture until a steady soil moisture level is reached, which typically takes simulations across several years. It is therefore important to spin up the model until its prognostics are in agreement with the climatological boundaries as determined by the forcings and model parameters.

5

Observation Operator

If satellite-based soil moisture retrievals are assimilated, then the model output from land surface models can be directly compared to the assimilated observations. However, the assimilation of screen-level observations, microwave observations, and terrestrial water storage anomalies requires a diagnostic modeling step with an observation operator that facilitates the direct comparison between the land model output and the observed variables (Reichle et al. 2014).

5.1

Screen-Level Observation Predictions

Screen-level observations are in situ measurements of temperature and relative humidity (T2m,obs, RH2m,obs) at 2 m above the land surface. In global atmospheric b - and RH c are typically obtained by interpolating the model models, estimates of T 2m 2m variables from the surface (as computed by the land surface model component) to the atmospheric conditions at the height of the lowest atmospheric model level (Mahfouf et al. 2009), following the Monin-Obukhov similarity theory. The latter describes flow and turbulence properties in the lowest 10 % of the atmospheric boundary

10

G.J.M. De Lannoy et al.

layer. This assumes that the first atmospheric level is not at 2 m, but instead imposed higher up. Formally, this can be written as 

d T 2m RH2m



 ¼ h2m ðb x , u, β2m Þ

(6)

h i   d  d T  d , rzmc d , tsurf where b x ¼ sfmc , tsoil are the land surface state variables, including surface and root-zone soil moisture, and surface and soil temperature, respectively. The vector u contains the imposed atmospheric variables, with air temperature, humidity, and wind speed at the lowest atmospheric level, and h2m(.) represents the vertical interpolation with β2m the interpolation parameters.

5.2

Microwave Observation Predictions

Passive microwave emission from the land surface is often referred to as brightness temperature (Tb) and measured for certain polarizations, wavelengths, and incidence angles, collectively denoted as the instrument configuration. The brightness temperature for a given configuration (c) can be simulated as a function of surface soil temperature, attenuated by soil and vegetation characteristics:     c ¼ hp b Tb x , βp , cp c

(7)

x ¼ with hp (.) a radiative transfer model for passive (p) microwave emission; b h i    T d , tsurf d , tsoil d sfmc a set of dynamic land surface variables such as surface soil moisture and surface and soil temperature; βp a vector with land surface-specific parameters, including the microwave soil roughness length, scattering albedo, and vegetation parameters; and cp a set of radiometer configuration constants. Active radar backscattering coefficients (σ0) measured for a given sensor configuration c can be similarly related to land surface variables as follows:  b σ0c ¼ ha ðb x , βa , ca Þ

(8)

with ha(.) the backscattering model for active (a) microwave signals; b x again a set of dynamic land surface variables including soil moisture; βa a vector with land surface specific parameters, such as the root-mean-square (rms) surface height and correlation length to quantify the roughness; and ca a set of radar properties. Figure 1 illustrates the nonlinear relationship between the surface soil moisture content sfmc and Tbc or σ0c as simulated by physically based radiative transfer and backscattering models over bare soil. In these models, soil moisture is first converted to a soil dielectric constant using a dielectric mixing model. The dielectric constant then determines the surface reflectivity and thus the emission, reflection, and

Soil Moisture Data Assimilation

11

Fig. 1 (Top) Relationship between soil moisture (sfmc) and brightness temperature (Tb) at (a) horizontal and (b) vertical polarization. (Bottom) Same but for radar backscatter (σ0) at (c) horizontal and (d) vertical co-polarization. The soil contains 43 % sand and 23 % clay and has a bulk density of 1.24 g/cm3. The weighted surface/soil temperature (Ts) is set to 283 K. The sensor-specific constants are following the SMAP instrument details (40 incidence angle, radar at 1.29 GHz, radiometer at 1.41 GHz)

scattering of waves. Figures 1a–b use a zero-order radiative transfer model (Mo et al. 1982) as hp (.) for Tbc simulation. Figures 1c–d use the Integral Equation Model (Fung et al. 1992) as ha (.) for σ0c simulation. For the illustrations in the figure, the sensor configurations c are chosen according to the design of the radiometer and radar sensors on board SMAP, i.e., with an incidence angle of 40 , either horizontal (H) or vertical (V) polarization (co-polarization for the radar), and a frequency of 1.41 GHz for the radiometer and 1.26 GHz for the radar. Figures 1a–b illustrate that the brightness temperature is close to surface soil temperature (Ts) under dry conditions, and it decreases with soil moisture. As a rule of thumb, a 2–3 K increase in Tb is associated with a 0.01 m3/m3 decrease in soil moisture for incidence angles around 40 and for low vegetation regions (vegetation water content of less than about 5 kg/m2). Figures 1c–d show that a 1–5 dB decrease in σ0c can be expected for a 0.1 m3/m3 decrease in soil moisture below 0.35 m3/m3, whereas little sensitivity is found for wetter soil moisture. The figure further highlights the sensitivity of the relationships to a change in only one select parameter ( β ), i.e., the microwave roughness HR [] for Tbc simulation, or the rms surface height [cm] for σ0c simulation. A realistic range of HR (Fig. 1a–b) easily introduces differences of

12

G.J.M. De Lannoy et al.

50 K or more in Tb for a moderate level of soil moisture. Figures 1c–d show that a 1 cm increase in the rms roughness parameter could increase the backscattering by 5 dB. In reality, vegetation further complicates the picture. To summarize, the relationship between the soil moisture content and Tbc or σc0 is highly dependent on a set of very uncertain parameters.

5.3

Terrestrial Water Storage Predictions

Changes in terrestrial water storage (TWS) are among the dominant mass variations that can be detected in the GRACE signal (Rodell et al. 2007). The simulated monthly TWS represents the vertically integrated water amount as 

 d ¼ hTWS ðb TWS x , βTWS Þ

(9)

with hTWS(.) the vertical integration of the individual components of h iT     c  d , rzmc d , gd d , ice b wt , snow , . . . , including surface soil moisture, x ¼ sfmc root-zone soil moisture, depth to the groundwater table (gwt), snow, ice, and possibly water stored in or on vegetation, and βTWS refers to any parameter needed to compute the TWS. The corresponding observations are mostly provided in terms of anomalies, i. e., as deviations from a long-term (multi-month, multiyear) average, which is not necessarily known. For assimilation into a model, the observed TWS anomalies are  d >. typically converted to actual TWSobs by adding a long-term model estimate < TWS

6

Assimilation of Observations Related to Soil Moisture

6.1

Sequential State Updating

Popular methods for the sequential assimilation of soil moisture observations at large scales include direct insertion, nudging and statistical correction (Dharssi et al. 2011), optimal interpolation (Mahfouf et al. 2009), extended Kalman filtering (Sabater et al. 2007), variational assimilation (Reichle et al. 2001; Hess et al. 2008), ensemble Kalman filtering (Reichle et al. 2002), and particle filtering (Pan et al. 2008). Details of these techniques are given in Chapter on ▶ Fundamentals of data assimilation and theoretical advances. The ensemble Kalman filter and particle filtering variants are the methods that lend themselves most directly to integration in hydrometeorological ensemble forecast systems. Sequential filtering involves cycling through two steps, as summarized in Table 1. In the first step, a background, a priori, or forecast state estimate is generated with a dynamic model f i, i1 (.). This forecast could either be deterministic, i.e., b x i , or

Soil Moisture Data Assimilation

13

Table 1 Summary of (1) forecast and (2) update equations for optimal interpolation (OI), (Extended) Kalman filtering ((E)KF) and ensemble Kalman filtering (EnKF). The OI and (E)KF use a single state trajectory and a predefined or linearly evolving P i , respectively, whereas the EnKF uses N ensemble members, perturbed with model error wi,j to diagnose P i OI, (E)KF (1) A priori state and uncertainty  þ  b xi-1 , ui , α x i ¼ f i, i-1 b P ðOIÞ i ¼B þ T P i ¼ Fi, i-1 Pi-1 F i, i-1 þ Qi ((E)KF) Observation predictions    b xi , β y i ¼ hi b

  b x xþ i, j ¼ f i,i-1 b i-1, j , ui , α, wi, j with j ¼ 1, . . . , N    xi , b xi P i ¼ Cov b     b xi, j , β yi, j ¼ hi b

(2) A posteriori state and uncertainty   T 1 T Ki ¼ P i Hi HP H þ R i h i b x y xþ i ¼b i þ Ki yobs,i  b i  Pþ i ¼ ½IKHi Pi

EnKF

       1 Ki ¼ Cov b xi , b yi Cov b yi þ R i yi , b h i b x y xþ i,j ¼ b i,j þ Ki yobs,i, j  b i,j X 1 N þ þ b b xi ¼ x j¼1 i, j N   Pþ xþ xþ i ,b i i ¼ Cov b

consist of an ensemble b x i, j where forecast perturbations (wi,j) are applied to generate each ensemble member j (j ¼ 1, . . . , N). The second step generates an a posteriori or analysis state by correcting the state with observations, as in Eq. 5. Two update variants have been classic for the assimilation of soil moisture observations. The most commonly used variant adds an increment to the forecasted state. This increment is determined using h i the difference between observations and observation predictions

yobs, i  b y i

and a blending matrix, or gain matrix Ki. In “optimal”

(minimum analysis error variance) assimilation schemes, the gain Ki is found by weighting the uncertainty in the forecast state P i and in the observations Ri. If the forecast error covariance P i is dynamically propagated in time, then Ki is called a Kalman gain. If P i is diagnosed from ensemble forecasts, then the filter is called an ensemble Kalman filter. Another update variant preferentially weighs a set of possible forecasts (particles, ensemble members) so that the resulting observation prediction is closest to observations. This approach is used in particle filters, which have a great potential for soil moisture assimilation problems and subsequent ensemble forecasting, but have not yet been explored thoroughly for large-scale or multi-scale land surface data assimilation. The following examples conceptually describe the assimilation of various soil moisture observations using filtering techniques with increasing complexity. Each example can be seen as a variant of the basic sequential update Eq. 5. It is important to note that the combinations of the selected observation types and assimilation techniques are not exclusive and only chosen for illustrative purposes.

14

G.J.M. De Lannoy et al.

6.1.1

Screen-Level Data Assimilation Illustrated with Optimal Interpolation Screen-level observations (T2m,obs, RH2m,obs) have been assimilated operationally for numerical weather prediction (NWP) (Giard and Bazile 2000; Bélair et al. 2003). To limit the computational effort, the state vector is often limited to a few prognostic variables. Consider a state consisting of surface moisture content (sfmc), root-zone soil moisture (rzmc), surface temperature (tsurf), and soil temperature (tsoil). The typical update equation can be written as: 2 d 3þ 2 d 3 sfmc sfmc

    6 rzmc 7 6 rzmc 7 d T2m T 2m 6 7 ¼6 7 þ Ki  4 tsurf 5 4 tsurf 5 RH2m obs RH2m i tsoil i tsoil i

(10)

The initial implementations of this analysis scheme at operational centers use a priori defined constants for the Ki matrix (4x2). These constants are a priori calibrated, and do not use any explicit observation operator or any statistical information about background or observation errors. Theoretically this approach cannot be classified as “optimal interpolation,” but it has been commonly referred to as such in the literature. Optimal interpolation uses explicit expressions for the a priori and observation error covariance matrices to determine the Ki matrix. At Météo-France and at ECMWF, the blending matrix Ki for the assimilation of screen-level observations uses statistical error information and is further advanced by the use of analytical Jacobians of the land surface model (Mahfouf et al. 2009; Drusch et al. 2009; de Rosnay et al. 2013), i.e.,  1 Ki ¼ BHTi Hi BHTi þ R

(11)

with B (4  4) a time-invariant background error covariance matrix, R (2  2) a time  the linearized invariant diagonal observation error covariance matrix, and Hi ¼ @h @x b x i

observation operator of dimension ( 2  4 ). The observation operator hi (.) is a physically based model, and the Jacobian elements are computed in finite differb- and RH c is computed for a small perturbation in the ences, i.e., the change in T 2m 2m individual state components (sfmc, rzmc, etc.). The Hi matrix maps differences between simulated and observed screen-level variables (innovations) to updates (increments) in soil moisture and temperature. In general, the limited sensitivity of b- or RH c to root-zone soil moisture leads to small updates (Drusch et al. 2009). T 2m 2m The above approach of using a dynamic Jacobian for the observation operator in the blending matrix Ki has been referred to as “simplified extended Kalman filtering” by the land surface community involved in NWP, because of its close ties with the

Soil Moisture Data Assimilation

15

extended Kalman filter. However, the use of a fixed background error covariance matrix B by definition means that no Kalman filtering is involved, and the assimilation technique is theoretically an “optimal interpolation.”

6.1.2

Soil Moisture (Retrieval) Data Assimilation Illustrated with (Extended) Kalman Filtering The (extended) Kalman filter ((E)KF) is similar to the optimal interpolation method in its incremental update equation. The difference is that the Kalman filter dynamically propagates the a priori error covariance matrix, using a linear dynamic model. The (E)KF or its close variants (Sabater et al. 2007; Hess et al. 2008) have not been widely used for operational soil moisture data assimilation, because land surface models typically require a rather complex linearization. However, the (E)KF provides the fundamentals to all other Kalman filter variants. Assume that observations (e.g., retrievals) of soil moisture content (sfmcobs) are assimilated into a model, with surface and root-zone soil moisture as the state variables. The update equation is  d þ  d  h i sfmc sfmc d , with ¼ þ Ki sfmcobs  sfmc (12) rzmc i rzmc i i " # h i1   T 1 σ2  T 2 2 c  þ σ Ki ¼ Pi Hi HP H þ R i ¼ ρ : σ sfmc σ (13) sfmc, obs   c sfmc i c : σrzmc c sfmc

i

 Pþ i ¼ ½IKHi Pi

(14)

where H ¼ ½1 0 in this case, but the linearized observation operator could be written  . The observation error variance is σ2sfmc,obs,i, the more generally as Hi ¼ @h @x b x i

  , and P observation prediction error variance is σs2 i (2x2) contains a timefmci

2 , σ  ) on the variable a priori surface and root-zone error variances ( σs2 fmci rzmc i   diagonal, and covariances ρi σs fmci , σrzmc i as off-diagonal elements:

" P i ¼

σs2 fmc - ρ  σs fmc  σrzmc

ρσc- σcrzmc sfmc σ2 c rzmc

# (15) i

It is through the error correlations (ρi ) in P i that surface soil moisture innoh i  d vations sfmcobs - sfmc are propagated to both surface and root-zone increi i h  þ d . The a priori P ments Ki sfmcobs sfmc i is reduced to Pi after each i

assimilation update (Eq. 14).

16

G.J.M. De Lannoy et al.

The a priori P i is determined dynamically as function of the modeling system. For additive Gaussian model error wi with an error covariance matrix of Qi (Eq. 2), the forecast error covariance P i can be approximated by þ T P i ¼ Fi, i1 Pi-1 Fi, i1 þ Qi

(16)

T where Fi, i1 (2  2) is a linearized version of f i, i1 (.) and Fi, i1 Pþ i1 Fi, i1 is the propagated analysis error covariance. In using a tangent linear Fi, i1 operator, the method is referred to as “extended” Kalman filter. If Fi, i1 is a linearized model version, then Eq. 16 is known to suffer from unlimited error variance growth, because the third and higher order moments of the Taylor expansion are discarded in Eq. 16 (closure problem). It is possible to avoid these problems with other Kalman filter variants, as discussed in the next sections.

6.1.3

Soil Moisture (Retrieval) Data Assimilation Illustrated with 1D Ensemble Kalman Filtering The ensemble Kalman filter (EnKF, Reichle et al. 2002) circumvents the need for a linear(-ized) state propagation model and observation operator by diagnosing error covariance matrices from ensemble information. Examples of one-dimensional (“1D”) EnKF studies using soil moisture retrievals from various microwave sensors include Liu et al. (2011) and Draper et al. (2012). A “1D” EnKF updates the state at the locations that coincide with the assimilated observations, and it is assumed that the observations and the model have the same spatial resolution. In the next Sect. 6.1.4, a spatially distributed or three-dimensional (“3D”) expansion of the EnKF will be presented, with inclusion of horizontal information propagation and with the ability to possibly deal with multiple scales. Assume again that satellite-based surface soil moisture retrievals sfmcobs are assimilated and that the model operates at the same spatial resolution as the observations. An ensemble of states b x i, j (j ¼ 1, . . . , N) is generated by perturbing the model simulations (discussed in Sect. 7.2.1). The observations are also perturbed to ensure consistency in the EnKF formulation used here. Note though that some variants of the EnKF exist that avoid  such  perturbations. The observation predictions d are given by b yi, j ¼ sfmci, j ¼ h b xi, j , i.e., accounting for a possibly nonlinear mapping between the observed sfmcobs,i,j and the state variables, even though in this example the observation operator is linear H ¼ ½1 0. The update equation for each state member can be written as:  d þ  d  h i  sfmc sfmc d ¼ þ Ki sfmcobs  sfmc , with rzmc i, j rzmc i, j i, j

(17)

Soil Moisture Data Assimilation

17

       1 yi , b Ki ¼ Cov b xi , b yi Cov b yi þ R i i1    h ¼ Cov b xi , b yi σ c  2 þ σ2sfmc, obs sfmc

i

(18)

Unlike Eq. 16, the error covariances used in the Kalman gain are now dynamically diagnosed from the dispersion in the forecasts and observation predictions.  ensemble  in the state y Specifically, Cov b xi , b i is found by correlating the ensemble departures    variables with those in the observation predictions, and Cov b yi , b yi is the error    xi , b xi , but the covariance of the observation predictions. Note that P i ¼ Cov b computation of this matrix is not required for the Kalman gain (Eq. 18). The gain h -i d factor Ki maps the surface soil moisture innovation sfmcobs - sfmc to increments i h -i d in all prognostic state variables, using the diagnosed error Ki sfmcobs - sfmc i

covariances between these variables. The final a posteriori state estimate b xþ i and its uncertainty are given by  d þ  d þ 1 XN sfmc sfmc ¼ j¼1 rzmc rzmc i N i, j

with

 þ þ Pþ xi , b xi i ¼ Cov b

(19)

The a posteriori uncertainty in the state estimate Pþ i is diagnosed from the analysis ensemble, which typically contracts during the assimilation.

6.1.4

Brightness Temperature Data Assimilation Illustrated with 3D Ensemble Kalman Filtering Classical retrieval assimilation is appealing because of its relatively straightforward implementation, but there is a serious concern about observation biases. The inversion process from brightness temperatures to soil moisture retrievals relies on parameters and ancillary information that may be inconsistent with that used in the land surface model within the assimilation system. It is thus more natural to couple a c or b radiative transfer or backscatter model to a land surface model, to forecast Tb σ0c c 0 along with soil moisture, and then assimilate Tbc,obs or σc,obs (rather than the soil moisture retrievals) as in Entekhabi et al. (1994), Reichle et al. (2001), and Balsamo et al. (2006), among others. In the following example, the above EnKF equations are further illustrated for spatial (or “3D”) filtering (Reichle and Koster 2003) and using brightness temperature observations Tbobs at a coarser resolution than the fine-scale model simulations. This will highlight that (i) brightness temperature information can be translated into soil moisture updates, (ii) soil moisture can be updated in unobserved areas, and (iii) fine-scale soil moisture estimates can be obtained, through dynamic disaggregation of coarse-scale observations. The observation

18

G.J.M. De Lannoy et al.

Fig. 2 Schematic of a multi-scale 3D filter. (a) State variables in fine-scale model grid cells are aggregated to coarse-scale observation predictions through the operator h(.).  observation  (b) Coarse-scale innovations (observation-minus-forecasts, yobs, i -h b xi ) located within the influence radius (curved shaded area) around a fine-scale model grid cell are indicated by circles and will contribute to the fine-scale state update

c require model information about soil moisture, temperature, and predictions Tb i vegetation. The state vector therefore contains surface and soil temperature and possibly vegetation, especially if the latter is dynamically evolving in the model. Yet, here it is excluded for simplicity. The state update is presented at a single fine-scale location k, and a single ensemble member j, using (possibly multiple) coarse-scale observations Tbobs, κ ðκ ¼ 1, . . . , mÞ that are within a chosen influence area around the finescale location, as illustrated in Fig. 2: 2 d 31 Tb1 6... 7 C 7 7 6 7 C 7  6 Tbκ 7 C 7 6 7 C 5 4... 5 C A Tb m obs m

02 2 d 3þ 2 d 3Tb1 sfmci, j sfmci, j B6 B6 . . . 6 rzmci, j 7 6 7 6 6 7 ¼ 6 rzmci, j 7 þ ½Ki  B k B6 Tbκ 4 tsurf i, j 5 4 tsurf i, j 5 B4 @ ... tsoili, j k tsoili, j k Tb

3

(20) i, j

The coarse-scale observation predictions are calculated by the fineh (i) transforming i  c scale model state variables into fine-scale Tbi, j, k ¼ hp b using a radiative xi, j k  c transfer model hp (.) and (ii) aggregating the fine-scale Tb i, j, k (k ¼ 1, . . . , Nκ) to a  c coarse-scale Tb i, j, κ (Fig. 2a). The observation operator h(.) combines these two operations, so that

Soil Moisture Data Assimilation

19

h i  3 1 XN1  b h xi , j Tb1 k¼1 p 6 N1 k 7 7 7 6 6  . . . 7 7 6 6   6  h i  X 7 7 6 1 Nκ  7 with 7 6 b Tb xi-, j ¼ 6 ¼ yi-, j ¼ h b b h x p , j i 7 6 Nκ 6 κ 7 k¼1 k 7 7 6 6 7 6 4... 5  4 XNm h  i  5 1 Tbm i, j h b xi, j k¼1 p k Nm 2 3 d sfmci, j 6 7 h i 6 rzmci, j 7  6 7 b b xi , j ¼ 6 x i, j : 7 k 4 tsurf i, j 5 tsoili, j k h i The Kalman gain to update each fine-scale b x is found as i, j k 2

3

2

(21)

        -1 yi , b ½Ki k ¼ Cov b xi k , b yi Cov b yi þ R i (22)   where Cov b x yi is the error covariance between the fine-scale state variables i k, b and the coarse-scale observation predictions. The Kalman gain thus effectively partitions (Fig. 2b) the information in the coarse-scale Tbobs observations to finescale increments in soil moisture and temperature, using the error cross-correlations between fine-scale state variables, such as soil moisture, and coarse-scale Tb‐κ observation predictions. The satellite-observed Tbobs only provides information about the top ( 5 cm) layer soil moisture, but data assimilation propagates this surface information to deeper soil moisture layers through the model soil profile dynamics and the vertical error correlations between state variables. The added advantage of 3D filtering is primarily in data-sparse regions, because information is horizontally propagated through the spatial forecast error structure. The spatial error correlationsthat are expressed in the off-diagonal of the h elements i    cross-covariance matrix Cov b xi k , b yi allow updating the state b xi, j at the finek scale location k with surrounding multiple coarse-scale innovations, as long as the latter are within h i the influence area around the fine-scale location (even if the state is “unobserved” and not part of any coarse-scale observation variable b xi , j k prediction). The influence area is typically obtained by localizing the spatial error correlations (discussed in Sect. 7.2.1) to limit the impact of distant observations. Finally, it should be noted that the (spatial) observation vector can be further expanded by assimilating multiple types of observations simultaneously. For example, brightness temperatures are typically observed at two polarizations (H and V), and possibly at multiple incidence angles (e.g., SMOS). A 3D EnKF using both Hand V-polarized brightness temperatures is used for an operational SMAP data assimilation product as discussed in Sect. 9.2.

20

G.J.M. De Lannoy et al.

6.2

Smoothing

Smoothers update state vectors that are distributed in time. Smoothers have the potential to improve soil moisture reanalyses by assimilating multiple observations in time or time-integrated observations, such as, for example, TWS or river discharge. Here, a smoother is illustrated for the assimilation of monthly coarse-scale TWS as an extension of a spatially distributed (“3D”) ensemble Kalman filter in which the fine-scale state is distributed in time. The relevant model prognostic variables included in the state vector are the depth to the groundwater table (gwt) and root-zone soil moisture (rzmc). The concept is first introduced by updating each member j of the time-augmented (i ¼ 1, . . . , T) state vector at a fine-scale location k using the traditional ensemble Kalman filter equations: 3þ 2 3 d1, j d1, j gwt gwt 3 1 02 2 6 rzmc1, j 7 6 rzmc1, j 7 3 d1 7 7 6 6 TWS TWS 1 7 7 6 ... 6 ... 7 C B6 6 7 7 6 6 B6 . . . 7 6 ... 7 C 6 gwt 7 6 gwt 7 7 7 C 6 i, j 7 ¼ 6 i, j 7 ½K B 6 6 7  6 TWSκ 7 C k B6 TWSκ 7 6 rzmci, j 7 6 rzmci, j 7 7 C B4 6 7 7 6 6 5 ... @ 4 ... 5 A 6 ... 7 6 ... 7 7 7 6 6 TWSm obs TWSm 4 gwtT, j 5 4 gwtT, j 5 j rzmcT, j k rzmcT, j k 2

(23)

The coarse-scale monthly (time index omitted) observation predictions (κ ¼ 1, . . . , m) are obtained by first calculating fine-scale vertically integrated TWS using hTWS ðÞ and then aggregating the fine-scale state variables in space (k ¼ 1, . . . , Nnkappa, for observation prediction \kappa) and time (i ¼ 1, . . . , T): 2

3 TWS1 7   6 6... 7   6 b xj ¼ 6 TWSκ 7 yj ¼ h b 7 4... 5 TWSm j 2 h i  3 X 1 1 N1 XT b h x TWS i, j k¼1 i¼1 6 T N1 k 7 6  h i  7 7 6 1 1 XNκ XT  7 ¼6 b h x TWS , j i 7 6 T Nκ k¼1 i¼1 k 7 6    4 1 1 XNm XT h i  5 b h x i, j k¼1 i¼1 TWS k T Nm 2 3 gwti, j ¼ 4 rzmci, j 5 ... k

with

h i b x i, j

k

(24)

Soil Moisture Data Assimilation

21

The above update equation (Eq. 23) could potentially involve a large-dimensional Kalman gain for which the error covariances are only valid if the number of included time steps (T) is small relative to the ensemble size, as in Dunne and Entekhabi (2006) who assimilated a batch of temporally distributed soil moisture observations and in Pauwels and De Lannoy (2009) who assimilated time-integrated discharge observations. Alternatively, and in analogy with strong constraint variational assimilation (Chapter X632X), one can limit the update to the initial conditions at the beginning (time step i0) of the assimilation window ( b x i0, j, k ) so that the model trajectory over the entire smoothing window best fits the observations. The [K1]k matrix then becomes     ½Ki0 k ¼ Cov b xi0 k , b y ½Covðb y Þ þ R1 y , b

(25)

Yet, adding a full increment to the initial soil moisture state b x i0, k, j alone may not cause the desired persistent trajectory shift in hydrologic models. Instead, Zaitchik et al. 2008 computed an effective increment at each time step i by equally distributing the increment over the T time steps in the smoothing window, so that 02

TWS1

B6 B6 . . . B6 h i h i 1 B þ  b xi , j ¼ b TWSκ xi, j þ ½Ki0 k B6 B6 k k T B6 @4 . . .

TWSm

2

3

d1 TWS

6 6 ... 6 6  6 TWSκ 6 6 ... 4

7 7 7 7 7 7 5 obs

TWSm

3 1 7 7 7 7 7 7 7 5

C C C C C, j for each i ¼ 1, . . . , T C C A (26)

The TWS innovations are thus partitioned in time and space and into different water storage components. The methodological development for smoothing GRACE observations in the context of soil moisture assimilation is still in its infancy. Alternative formulations are under investigation.

6.3

Joint State and Parameter Updating

The above examples update the land surface state in response to observations related to soil moisture. The term “data assimilation” can also be used to estimate model parameters using observations related to soil moisture. Such parameter estimates could be static, as typically obtained after optimization of long-term statistics that involve differences between long time series of simulations and observations. Slowly varying parameters could be updated dynamically through recursive filtering, using similar techniques as described above, but after replacing (i) the state variables with parameters and (ii) the prognostic state propagation model with a persistent model. A combined state and parameter estimation has also been explored (e.g., Montzka et al. 2013), but not for large-scale soil moisture modeling systems:

22

G.J.M. De Lannoy et al.

the realism and observability of evolving parameters in interaction with dynamic state updates may pose difficulties.

7

Random Errors and Biases

The key to successful data assimilation is to qualify and quantify the errors in the model forecasts and observations. The errors include both random and systematic error, or bias. In the absence of knowledge about the true ensemble error at each time instant or location, land surface modelers often turn to the ergodicity principle and analyze errors in time series or spatial patterns. In that sense, bias can be defined as autocorrelated error, and the correlation length determines the temporal or spatial scale of bias. For large-scale (continental, global) hydrometeorological modeling, random errors have autocorrelation lengths of less than a few days in time (microscale, mesoscale), whereas bias or systematic error has time scales of several weeks or more. The following sections discuss the treatment of random errors and biases specifically for soil moisture data assimilation.

7.1

Bias, Autocorrelated Error

Statistically “optimal” data assimilation techniques, such as Kalman filtering, rely on observations and forecasts with zero-mean errors (first moment). A typical problem with assimilating satellite observations of soil moisture, however, is that their climatology differs from that of the land model integrations. If biases cannot be addressed through model calibration, then one can either treat the bias a priori and perform anomaly assimilation (i.e., after mapping the observations to the model climatology) or estimate the bias dynamically inside the assimilation scheme. Either approach to dealing with bias reduces the average magnitude of the innovations and avoids that the model is pushed away from its own climatology through data filtering. The climatological rescaling techniques are based on a priori knowledge and thus require historical data, whereas online bias estimation methods update the bias estimates dynamically at each assimilation event. Another difference is that rescaling techniques could address discrepancies between datasets in higher order moments under the assumption of stationary differences in the first moment, whereas the online bias estimation methods are focused on resolving nonstationary differences in the first moments.

7.1.1 A Priori Static Bias Treatment A commonly used approach in soil moisture assimilation systems is to remove longterm differences between observations and forecasts by rescaling the observations to the model climatology. Rescaling does not per se assign the systematic errors to either the model or the observations, but rather removes the total bias from the innovations.

Soil Moisture Data Assimilation

23

Fig. 3 Illustration of scaling techniques to remove bias from the innovations in the assimilation soil moisture observations at a single 36-km grid cell in Walnut Gulch (WG3, see Table 3). (a) CDFs of soil moisture from SMOS retrievals and GEOS-5 simulations. (b) Seasonal climatology of SMOSobserved and simulated brightness temperatures (horizontal polarization, 40 incidence angle, ascending overpass, smoothed, and multiyear averaged). Both figures are based on 3 years (1 July 2010–1 July 2013) of data

A first approach to rescaling is to match the cumulative distribution function (CDF) of observed soil moisture values to the CDF of the soil moisture simulations (Reichle and Koster 2004; Drusch et al. 2005), thereby matching the long-term first, second, and higher order moments of the observations to the model. Figure 3a illustrates the CDF-matching approach. CDFs are calculated using a multiyear historical dataset of observations and land surface simulations for each location in space. This is illustrated by sampling SMOS retrievals and GEOS-5 simulations for ascending orbits (6:00 a.m. local time) at a 36-km grid cell inside the Walnut Gulch Experimental Watershed in Arizona (the USA). To complement the temporal sampling, additional sampling can be performed in a spatial window, which effectively smoothes the statistics. Here, all 36-km grid cells within a 0.5 radius around the central Walnut Gulch pixel are sampled. In this example, the SMOS retrievals are systematically drier than the simulations. To rescale an individual soil moisture retrieval (e.g., at 0.075 m3/m3), the cumulative probability density for this value is found (e.g., 0.62 []). Then, this probability is transferred to the model CDF, and the corresponding modeled soil moisture (e.g., 0.12 m3/m3) is found. In practice, this mapping happens by fitting a relationship between the observed and simulated soil moisture at identical cumulative probability densities. The observation rescaling possibly involves a change in the dynamical range of the observations, and therefore it is important to also rescale the original observation errors (R). The error standard 0 deviation σsfmc,obs is rescaled to σsfmc,obs , using the long-term (climatological) h i  d , i.e., standard deviation of the observations S[sfmcobs] and simulations S sfmc h i d - =S½sfmcobs  σ0sfmc, obs ¼ σsfmc, obs : S sfmc

(27)

24

G.J.M. De Lannoy et al.

The static nature of the CDF-matching approach is not always ideal, because it discards seasonal cycles in biases and may therefore even introduce biases into the system. In some applications, the CDF-matching approach has therefore been applied by season (de Rosnay et al. 2014). The seasonality in biases is of particular concern when assimilating brightness temperatures, which are strongly impacted by seasonally varying surface temperature (Sect. 9.2). When it is important to resolve the seasonal and diurnal cycles of the climatology, rescaling is usually limited to the first moment and possibly the second moment, rather than the complete CDF, due to the limited availability of historical data. Figure 3b illustrates temporally variable biases between multiyear-averaged brightness temperatures from SMOS at 40 incidence angle and simulations with the GEOS-5 land system, both at 6:00 a.m. local time (ascending overpass). The temporally variable climatology of the mean brightness temperature is calculated by temporally smoothing the datasets (with the model cross masked for the availability of observations), and calculating a mean brightness temperature for each pentad (5-day period) averaged across 3 years. The climatological mean values of observed and modeled brightness temperature for each pentad p are c >. Each individual Tbobs,i observation is then rescaled given by and < Tb p 0 to Tbobs,i using climatological information from the closest pentad p: 



c >¼ Tbobs_anom, i þ < Tb c > Tb0obs, i ¼ Tbobs, i  þ < Tb p p

(28)

where Tbobs_anom, i is the anomaly of the assimilated observations. The observation     c ¼ Tb c c c predictions can be similarly written as Tb i anom, i þ < Tbp > , with Tbanom, i denoting the anomaly in the simulations. Consequently, the innovations can be written as 



c ¼ Tbobs_anom, i  Tb c Tb0obs, i  Tb i anom, i

(29)

This effectively means that anomaly information is assimilated when a rescaling approach is implemented.

7.1.2 Online Dynamic Bias Estimation Bias can be estimated dynamically inside the data assimilation system, using information in the innovations. Without knowledge of the origin of persistent errors in the innovations, it is impossible to assign the bias to state forecast errors or observation errors. Observation bias estimation removes bias from the innovations, and leaves the model and assimilation output in its own climatology. Similarly to observation rescaling techniques, dynamic observation bias estimation effectively aims at anomaly assimilation. The difference is that the climatological differences between the  c > - < Tbobs, p > in Eq. 28) observation predictions and observations (e.g., < Tb p

Soil Moisture Data Assimilation

25

obs are replaced by temporally variable bias estimates bd at each assimilation time step i i. As an example, the state update equation (Eq. 5) would become

b xþ i

   - obbs ¼b xi þ Ki yobs, i  hi b x i  bi

(30)

Forecast bias estimation techniques, on the other hand, remove state bias from the innovations (similar to rescaling techniques) and correct the model simulations for bias (i.e., they assign bias to the model, unlike rescaling techniques), either only in post-processing or with feedback into the model. Given a dynamic estimate of the b forecast bias bfi , the state update equation will be



 bf bf þ   b b b xi ¼ xi  bi þ Ki yobs, i  hi xi  bi

(31)

For soil moisture data assimilation, the online forecast bias estimation technique has only been used in small-scale research studies (De Lannoy et al. 2007), but not yet in large-scale or operational applications. Forecast bias estimation and a correction of soil moisture simulations are particularly useful if knowledge of absolute levels of soil moisture is important in addition to knowledge of the temporal evolution of soil moisture. However, the main problem is that simple forecast bias models do not allow to propagate information from observed (e.g., surface soil moisture) to unobserved (e.g., root-zone soil moisture) variables.

7.2

Random Error

The observation and forecast error covariances (second moments) determine the relative weight of the observations in the assimilation scheme as well as the spatial (horizontal and vertical) distribution of the analysis increments. The optimality of data assimilation system depends on how these uncertainties are defined upon input.

7.2.1 Forecast Error Covariance The various data assimilation techniques described above differ in how they handle the forecast error covariance matrix P i . For soil moisture assimilation, ensemble simulations have arguably become the most popular method to dynamically estimate the forecast error variance and the inter-variable error correlations (Reichle et al. 2002). This approach is also most directly relevant for ensemble hydrometeorological forecasting (Chapters ▶ Major operational ensemble prediction systems (EPS) and the future of EPS and ▶ Hydrological Ensemble Prediction Systems around the Globe). In ensemble soil moisture assimilation, the ensemble is usually generated by perturbing the forcings, model prognostic variables, and possibly land model parameters. In practice, the data assimilation system is calibrated by tuning of the magnitude of the perturbations to ensure that the innovations and increments

Additive (A) or multiplicative (M) M M A

A A

Perturbation Precipitation (P) Downward shortwave (SW) Downward longwave (LW)

Catchment deficit (catdef) Surface excess (srfexc)

AR(1) time series correlation scale 24 h 24 h 24 h 3h 3h

Standard deviation 0.5 0.3 20 W/m2 0.24 kg/m2/h 0.16 kg/m2/h

50 km 50 km

Spatial correlation scale 50 km 50 km 50 km

Cross-correlation with perturbations in P SW n/a 0.8 0.8 n/a 0.5 0.5 catdef srfexc n/a 0.0 0.0 n/a

LW 0.5 0.5 n/a

Table 2 Example of perturbations to forcing and model prognostic variable in the GEOS-5 land data assimilation system for brightness temperature assimilation. Values are from the preliminary system calibration used for the results discussed in Sect. 9.2

26 G.J.M. De Lannoy et al.

Soil Moisture Data Assimilation

27

show the expected behavior (Sect. 7.2.3) and the assimilation results perform well (Sect. 8). Some a priori ensemble verification (De Lannoy et al. 2006, Chapter ▶ Verification metrics for hydrometeorological ensemble forecast systems) could also be performed before activating the data assimilation. Good ensembles should envelop the observations with an ensemble standard deviation that is comparable to the root-mean-square error between the ensemble mean predictions and the observations. Perturbations to model forcing and prognostic variables are usually applied at regular time intervals during the model integration, because the dispersion in soil moisture simulations is bounded and would collapse unless model forcing or prognostic variables are applied. Alternatively, perturbing land model parameters ensures a persistent ensemble spread, but this approach creates temporal error autocorrelation and may not reflect random errors properly. As an example, Table 2 shows preliminary perturbations that are currently used for brightness temperature assimilation in the GEOS-5 land data assimilation system (Sect. 9.2). Select model prognostic variables and forcings are perturbed at each model time step, and land model parameters are not perturbed. Perturbations to radiation, temperature, and humidity are normally distributed and additive, whereas precipitation perturbations are lognormally distributed and multiplicative, because of the skewed nature of precipitation distributions. The forcing perturbations are applied with a temporal autocorrelation of 1 day and a spatial correlation scale of 50 km. Perturbations to the soil moisture (catdef, rzexc, srfexc) model prognostic variables (Sect. 4) are additive and applied with shorter correlation scales in space and time. Soil temperature is not explicitly perturbed to avoid excessive ensemble spreads in areas without vegetation, but it is indirectly perturbed through the perturbation of the radiation. Cross-correlations are also imposed to ensure some physical realism between the perturbed fields. The ensemble mean for all perturbations is constrained to zero for additive perturbations and to one for multiplicative perturbations. When spatial filtering is applied, spatial error correlations are critical to ensure meaningful updates to the fine-scale state variables. The coarse-scale observation prediction error variance obtained by spatially aggregating N-independent fine-scale state variables equals the fine-scale error variance divided by N, i.e., aggregation causes a reduction in uncertainty. With inclusion of appropriate spatial correlations, the spatially aggregated uncertainty of the observation predictions is effectively increased. The same holds for temporal smoother applications, where temporal autocorrelation is important to ensure sufficient uncertainty in the observation predictions (relative to the observations). As opposed to the useful spatial autocorrelations, the spurious long-range correlations are detrimental statistical artifacts in diagnosed ensemble P i matrices. To limit these spurious correlations and to reduce the undesirable impact of distant observations, ensemble filter techniques mostly include some covariance localization (Reichle and Koster 2003). Correlations beyond a certain separation distance are suppressed by using a Hadamard product with a local compactly supported

28

G.J.M. De Lannoy et al.

correlation function that reaches zero beyond a given distance. This product is    b b b , y , y applied to the sample covariance terms Cov b x and Cov y in the i i i i expression for the Kalman gain. Finally, the number of ensemble members needs to allow a statistically valid diagnosis of the forecast error covariance matrix and should thus increase with the number of variables in the state. Most soil moisture assimilation applications use 10–100 ensemble members.

7.2.2 Observation Error Covariance Observation uncertainty refers to all errors present in observation space, i.e., uncertainty due to sensor error, retrieval error, and representativeness error. Sensor error is given by instrument design specifications. For example, the Microwave Imaging Radiometer with Aperture Synthesis instrument on board SMOS measures multiangular brightness temperatures with a radiometric error of 4 K (Kerr et al. 2010), and the radiometer on board SMAP is designed to measure brightness temperatures at a single 40 incidence angle with an error of 1.3 K (Enthekabi et al., 2014). Retrieval error depends on the radiative transfer or backscattering model, dynamic auxiliary information, and parameters. Representativeness error is often due to a horizontal or vertical mismatch between the observation and the observation predictions. For example, a collection of model grid cells may not accurately capture the actual sensor field of view area. Moreover, the penetration depth of microwaves decreases with increasing soil moisture content, whereas the simulated soil moisture is valid for a surface layer with a fixed thickness. Observation error also includes representativeness errors in the observation operator hi (.). This error specifically refers to uncertainty in the observation operator (e.g., due to erroneous parameters) and does not include errors in the state forecast b x i . One promising method to obtain an estimate of soil moisture observation errors is the triple collocation procedure (Scipal et al. 2008). This method uses three independent estimates of soil moisture in order to estimate the uncertainty in one of them. However, the success of this method relies on conditions that are often difficult to meet in practice. 7.2.3 Error Optimization and Adaptive Filtering The specification of error parameters as input to the data assimilation system is often user-defined and not necessarily optimal. The innovations provide a means to assess the filter behavior: for a linear system, the innovations should  have a Gaussian distribution with zero mean and a covariance of HP- HT þ R i . In practice, land surface models are nonlinear, and the Gaussian assumption is difficult to meet. However, the consistency of the innovations with the imposed a priori error covariances and observation error covariances (Reichle et al. 2002) can be (approximately) verified by normalizing the (ensemble mean) innovation < yobs, i  Hib x > for each  i T point in space and at each assimilation time step by the imposed HP H þ R i .

Soil Moisture Data Assimilation

29

Fig. 4 Ratio of the analysis error standard deviation (std-dev) to the forecast ensemble standard deviation in surface soil moisture averaged over 3 years (July 2010–July 2013). The analysis is obtained by assimilating both ascending and descending SMOS soil moisture retrievals (v551) in the GEOS-5 land data assimilation system. Locations with less than 50 SMOS retrievals available for assimilation are shown in white. The spatial mean and standard deviation are denoted by m and s in the figure title

The distribution of these normalized innovations (in time and/or space, ergodic sampling) should follow a normal distribution. If the standard deviation is less (larger) than one, then the values for Ri and/or P i are too large (small). This can be used to manually optimize the filter. The alternative is to automatically tune the forecast error covariance P i , or more specifically the model error component Qi, during the online cycling of the data assimilation system through adaptive filtering (Crow and Yilmaz 2014). However, the adaptive methods have not yet been thoroughly tested for large-scale data assimilation systems.

7.2.4 Analysis Error Covariance A typical feature of data assimilation is that it reduces the analysis errors, both in terms of ensemble errors and in time series errors (see Sect. 8). Figure 4 illustrates how the ensemble analysis uncertainty in GEOS-5-simulated soil moisture is reduced compared to the forecast uncertainty, with a globally averaged fraction of 0.8, when assimilating SMOS retrievals during the period July 2010–July 2013.

30

G.J.M. De Lannoy et al.

Table 3 Details of core validation sites (Entekhabi et al. 2014) and 36-km reference grid cells within each site used for independent validation of the SMOS data assimilation experiment in Sect. 8.3. Each reference grid cell contains a minimum of 5 sensors and a maximum of N sensors. The latitude and longitude refer to the center of 36-km grid cells on the Equal-Area Scalable Earth Grid version 2 State (USA) Idaho

Core site Reynolds Creek Walnut Gulch

Arizona

Little Washita Fort Cobb Little River

Oklahoma Oklahoma Georgia

Saint Joseph South Fork

Indiana Iowa

Reference grid cell RC1 RC2 WG1 WG2 WG3 LW

Latitude (oN) 43.33 42.95 31.96 31.62 31.62 34.99

FC LR1 LR2 SJ SF

35.34 31.62 31.62 41.43 42.57

Longitude (oW) 116.70 116.70 110.73 110.35 109.98 98.03 98.40 83.84 83.46 84.96 93.55

Maximum N sensors 6 9 6 14 22 15 10 15 12 15 12

8

Evaluation of Soil Moisture Estimates from Data Assimilation

8.1

In Situ Soil Moisture

Large-scale soil moisture data assimilation results are typically validated with in situ soil moisture observations, or some other measurements that depend on soil moisture, such as turbulent fluxes or river discharge. This section specifically focuses on validation against independent in situ soil moisture measurements, i.e., observations that were not used in the data assimilation. The most accurate measurements of soil moisture involve collecting soil samples that are weighed before and after drying to determine the amount of water present in the soil matrix. Yet, such destructive gravimetric measurements cannot be frequent in time or space, and therefore, they mainly serve to calibrate automated in situ soil moisture measuring devices, such as capacitance probes, time domain reflectometry probes, and neutron probes. Across the world, thousands of measurement stations are equipped with soil probes that record time series of soil moisture at various depths in the ground. Select examples of networks with more than 50 soil moisture measuring sites are the US Natural Resources Conservation Service Soil Climate Analysis Network (SCAN), the Snowpack Telemetry network (SNOTEL), and the US Climate Reference Network (USCRN). Other in situ soil

Soil Moisture Data Assimilation

31

moisture networks include the Cosmic-Ray Soil Moisture Observing System (COSMOS) and Global Positioning System receivers (GPS), both of which use relationships between signals collected above the ground and vertically and horizontally integrated soil moisture, rather than using probes inserted in the ground. The limited spatial support of the point measurements complicates the comparison of spatial patterns or absolute values against gridded regional or global LSM simulations. However, the temporal variability in soil moisture at point locations provides useful information to validate the dynamics of model and assimilation results. In addition, some watershed-averaged in situ measurements of surface soil moisture are available from core validation sites (Entekhabi et al. 2014, Chapter 7, pp.119–150) listed in Table 3. These USDA watersheds are equipped with locally dense sensor networks covering the area of a satellite footprint, which makes them directly relevant for the validation of remote-sensing retrievals, but are also very attractive to validate coarse-scale model or assimilation results. Table 3 provides details about the 36-km reference grid cells that are identified within each watershed for the evaluation of SMOS data assimilation in the example of Sect. 8.3 below. The soil moisture in each reference grid cell is computed as the average soil moisture across at least five profile sensors.

8.2

Validation Metrics

A number of metrics exist to validate simulation and assimilation results with in situ observations. It is often advised to focus on the temporal variability and use bias-free metrics for two reasons: (i) a comparison of gridded coarse-scale model output against point-scale in situ observations will suffer from representativeness biases and (ii) data assimilation for state updating alone is meant to correct for random errors and is not designed to fix any long-term biases between the model and observations. Examples of suitable metrics include the unbiased root-mean-square error (ubRMSE) (Entekhabi et al. 2010; Albergel et al. 2013) and the anomaly correlation coefficient. The RMSE between time series (i ¼ 1, . . . , T) of simulated soil moisture content (mcest, either forecasts or analyses) and in situ soil moisture measurements (mcinsitu) can be expanded as rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 XT  RMSE ¼ mcest, i -mcinsitu, i i¼1 T qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ S2 ½mcest  þ S2 ½mcinsitu -2: R: S½mcest : S½mcinsitu  þ bias2

(32) (33)

32

G.J.M. De Lannoy et al.

Fig. 5 Performance of (a) surface and (b) root-zone soil moisture in terms of ubRMSE for (black) both ascending and descending SMOS retrievals, (dark gray) open loop, and (light gray) SMOS retrieval data assimilation at various 36-km core validation sites across the USA (Table 3). The metrics are based on 3 years (1 July 2010–1 July 2013) of 3-hourly output at the analysis time steps only. The error bars indicate the 99 % confidence intervals

where S[.] is the temporal standard deviation, R is the time series correlation coefficient, and “bias” is the difference between the long-term mean simulations and observations. This metric thus focuses on both similarities in temporal variability and on bias. The ubRMSE removes the bias term, or ubRMSE2 ¼ RMSE2  bias2

(34)

and measures only the random error component of the RMSE. The anomaly time series correlation coefficient measures the linear correlation between simulations and observations after subtracting their respective seasonal climatologies, i.e., seasonally varying climatological mean values are subtracted from each data point. The temporal mean of the anomalies is zero by definition. The climatologies are obtained by smoothing the datasets (for a particular time of day,in case subdiurnal output is  validated) and then calculating a multiyear average  ðiÞ for each day, week, or month, depending on the temporal resolution. Note that the calculation of a climatology requires at least a few years of data. The anomaly correlation is thus given by    ΣTi¼1 mcest, i -mc estðiÞ mcinsitu, i -mc insituðiÞ q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi anomR ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2 ΣTi¼1 mcinsitu, i -mc insituðiÞ ΣTi¼1 mcest, i -mc estðiÞ

(35)

Examples of soil moisture data assimilation studies that used the anomaly correlation include Liu et al. (2011) and Draper et al. (2012).

Soil Moisture Data Assimilation

8.3

33

Example

Figure 5 illustrates how SMOS retrieval assimilation, using a 1D EnKF with CDF matching, improves the ubRMSE of surface and root-zone soil moisture at select 36-km reference grid cells located in core validation watersheds across the USA (Table 3). The metrics are computed using 3-hourly model output and in situ data during the period July 2010–July 2013 at analysis time steps only (i.e., when SMOS observations are available) and excluding frozen conditions. The dark gray bars show that the current GEOS-5 system without assimilation (open loop) performs very well, with the ubRMSE at or below 0.04 m3/m3. When assimilating SMOS retrievals, the ubRMSE is further reduced in both the surface and root-zone, even though the SMOS retrievals have an uncertainty at or higher than 0.04 m3/m3. This highlights that data assimilation has the potential to improve model results, even if the assimilated observations are very uncertain. The confidence intervals are relatively large in this example, because only 3 years of data is used at the analysis time steps only. When including forecast time steps (not shown), the reductions in ubRMSE become statistically significant.

9

Toward Operational Soil Moisture Data Assimilation

A number of research and operational centers routinely generate data products that include a soil moisture analysis (in the broadest sense) at continental or global scales, either in reanalysis mode or to support operational prediction systems. Examples of data products generated with land-only systems include those from the North American Land Data Assimilation System (NLDAS, Xia et al. 2012) and the Global Land Data Assimilation System (GLDAS, Rodell et al. 2004) as well as the MERRA-Land and ERA-Land data products (Sect. 4). These data products primarily rely on the use of precipitation observations from gauges and satellites to improve the soil moisture simulations from atmospheric assimilation systems. The NLDAS and GLDAS data products use the NASA Land Information System (LIS, Kumar et al. 2008) software infrastructure, which is also integrated into the atmospheric assimilation systems at the US National Oceanic and Atmospheric Administration’s (NOAA) National Centers for Environmental Prediction (NCEP) and the US Air Force Weather Agency. At NCEP, this combined land-atmosphere assimilation system is the basis for the Climate Forecast System Reanalysis (Saha et al. 2010). The reanalysis data products benefit from the use of precipitation gauge information mainly because they are not subject to the strict latency constraints of atmospheric assimilation systems used for numerical weather prediction (NWP), which require observations to be available within hours. For NWP, the objective of soil moisture data assimilation is to initialize soil moisture conditions in order to provide the best possible accuracy and consistency of surface and near-surface weather forecasts in near-real time. Operational centers such as the UK Met Office (Dharssi et al. 2011), Deutscher Wetterdienst (Hess et al. 2008), Météo-France (Mahfouf et al. 2009), ECMWF (de Rosnay et al. 2014), and Environment Canada (Bélair

34

G.J.M. De Lannoy et al.

Fig. 6 Illustration of ECMWF soil moisture analysis (left) mean innovations and (right) accumulated soil moisture increments for a 6-day period from 25 through 30 June 2013. Innovations show the differences between observations and ECMWF first guess for (a) scaled ASCAT observations and ECMWF surface soil moisture (in m3/m3), (c) two-meter temperature (T2m), and (e) two-meter relative humidity (RH2m). Right panel shows accumulated surface soil moisture increments (m3/ m3) due to (b) ASCAT data assimilation, (d) screen-level data assimilation, and (f) total accumulated increments. The spatial mean and standard deviation are denoted by m and s in the figure titles

et al. 2003) assimilate satellite soil moisture retrievals and/or screen-level observations to update the soil moisture state. This section describes two examples of soil moisture assimilation systems: (i) the soil moisture analysis in the ECMWF Integrated Forecasting System (IFS) and (ii) the NASA Goddard Earth Observing System Model, Version 5 (GEOS-5) land data assimilation system used to generate a soil moisture data assimilation product for the SMAP mission.

Soil Moisture Data Assimilation

9.1

35

ECMWF Soil Moisture Data Assimilation for NWP

The ECMWF operational soil moisture analysis relies on a point-wise “simplified (E)KF” approach (de Rosnay et al. 2013, Sect. 6.1.1). For each grid point, the first three layers of soil moisture (of depth 0–7 cm, 7–28 cm, and 28–100 cm in the ECMWF IFS) are analyzed using observations of screen-level temperature and relative humidity, as well as ASCAT soil moisture retrievals. In this system, the observation operator is provided by the land surface model which gives the relation between screen-level temperature and humidity and soil moisture. Using screenlevel observations as proxy information to analyze soil moisture has proved to be very relevant for NWP applications, because it consistently improves screen-level variables whose accurate forecast is a key objective of NWP. However, since screenlevel observations are indirectly related to soil moisture, their assimilation is effective only in areas with strong coupling between soil moisture and screen-level variables. Recent developments at ECMWF focused on using satellite information from active and passive microwave sensors in addition to screen-level observations, to analyze soil moisture. In particular, ASCAT soil moisture data assimilation was recently implemented in operations in May 2015. Figure 6 illustrates ECMWF soil moisture analysis components when screenlevel observations are assimilated together with ASCAT surface soil moisture data, for a 6-day numerical experiment from 25 through 30 June 2013. The global NWP experiment was conducted at a resolution of 40 km as part of preoperational tests. The soil moisture data assimilation system accounts for soil moisture background errors fixed at 0.01 m3/m3 and observation errors of 1 K, 4 %, and 0.04 m3/m3 for two-meter temperature, relative humidity, and ASCAT surface soil moisture, respectively. The ASCAT soil moisture index mean and range are rescaled to those of the ECMWF volumetric soil moisture using a seasonal (3-month moving window) CDF-matching approach. Innovations of two-meter temperature and humidity (Fig. 6c, e) indicate a good complementarity between two types of screen-level observations. They generally show warmer and/or drier conditions than the model over India and China, Eastern Siberia, and the northern part of South America, whereas colder and wetter conditions are observed in northern Canada, Eastern Europe, and Kazakhstan. In some areas two-meter temperature and humidity innovations indicate relatively patchy discrepancies between observations and the model (e.g., Australia). There is also good agreement between ASCAT and screen-level temperature and relative humidity innovations: in both cases, the observations indicate that the model is too wet over India and Eastern Siberia and too dry in the eastern part of South America. Over North America, ASCAT and relative humidity innovations are also consistent in the Great Lakes area, with wetter (drier) observed conditions north (south) of the lakes. However, in Russia, ASCAT and relative humidity innovations suggest soil moisture errors of opposite sign. The right panel of Fig. 6 shows 6-day accumulated increments in the first soil layer (0–7 cm) due to ASCAT data assimilation (Fig. 6b), due to screen-level observations assimilation (Fig. 6d) and total surface soil moisture increments (Fig. 6f). It shows a good complementarity between ASCAT and screen-level data

36

G.J.M. De Lannoy et al.

assimilation in term of spatial distribution of the increments. ASCAT-induced increments are prominent at relatively high latitude, as well as in Argentina and Australia. Screen-level data assimilation mainly contributes to soil moisture increments in Kazakhstan and North America. It is interesting to notice that there is no contribution of screen-level observations to the soil moisture increments over India. This is because of rainy (monsoon) conditions that prevail at this time of the year in this region, leading to small values of the Jacobians (not shown) for screen-level variables. The spatially averaged value of the combined accumulated increments is 0.004 m3/m3 for the first soil layer, with a standard deviation of 0.022 m3/m3 across the map. Spatially averaged (time series) standard deviation values of 0.02 m3/m3 and 0.01 m3/m3 are obtained for increments due ASCAT and screen-level assimilation, respectively. These values indicate the average magnitude of the increments and show that the contribution of ASCAT to the top layer soil moisture increments is larger than that of screen-level observations. This is consistent with the fact that satellite data provides direct information on surface soil moisture, whereas screenlevel observations are only indirectly related to soil moisture (but including rootzone soil moisture). For NWP applications the assimilation window length is generally relatively short (12 hours at ECMWF). Such a time window is shorter than most of the soil diffusion processes, and therefore the relation between observed surface soil moisture and root-zone soil moisture is weak. This results in low values of the Jacobian matrix elements that relate deep soil moisture to the surface soil moisture. Therefore ASCAT data assimilation mostly provides increments at surface. In the second soil layer (not shown), increments are smaller than in the first layer, with standard deviation of 0.0032 m3/m3 and 0.007 m3/m3 for ASCAT and screen-level contributions. In the third layer, most of the increments result from screen-level data assimilation with negligible increments due to ASCAT data assimilation. Although the screen-level observations provide only indirect information on soil moisture, their relation with root-zone soil moisture through soil-plant interaction processes makes them relevant to analyze root-zone soil moisture profile. These results illustrate the complementarity between satellite information related to surface soil moisture and conventional observations related to root-zone soil moisture profiles, both in terms of spatial and vertical distributions of the increments. Future implementation plans for soil moisture data assimilation at ECMWF include the combined use of screen-level observations, ASCAT soil moisture, and SMOS and SMAP brightness temperature observations.

9.2

NASA SMAP Surface and Root-Zone Soil Moisture Product

Global estimates of surface soil moisture estimates are routinely obtained through satellite remote sensing, but many applications need an estimate of root-zone soil moisture. The NASA SMAP mission (Entekhabi et al. 2014), launched on 31 January 2015, provides a number of operational soil moisture data products, including Level 2 (half-orbit) and Level 3 (daily composite) soil moisture retrievals and a value-

Soil Moisture Data Assimilation

37

Fig. 7 (Top) Illustration of differences between scaled SMOS-observed and NASA GEOS-5 simulated brightness temperatures at 40 incidence angle for (a) H- and (b) V-polarization. (Bottom) Temporally averaged increments (Δ) to the Catchment land surface model (c) srfexc, (d) rzexc, (e) catdef, and (f) tsoil, after assimilation of SMOS brightness temperatures. The range of the color bars for rzexc is 20 times that of the srfexc, because rzexc applies to a 1-m root-zone, while srfexc applies to a 0.05-m surface layer. For catdef, the color-bar scaling factor of 40 is motivated by the average depth of bedrock (or profile soil moisture layer thickness) of 2 m. The time period covers ascending and descending orbits for 3 days from 27 through 29 July 2013. The spatial mean and standard deviation are denoted by m and s in the figure titles

added Level 4 surface and root-zone soil moisture (L4_SM) data product (Entekhabi et al. 2014, Chapter 5, pp.89–100). This latter product is based on the assimilation of SMAP 36-km brightness temperatures into the NASA GEOS-5 Catchment land surface model, using a 3D ensemble Kalman filter. Recall from Sect. 4 that the model prognostic variables related to soil moisture in the Catchment land surface model are the catchment deficit (catdef), root-zone excess (rzexc), and surface excess (srfexc). Other relevant model prognostic

38

G.J.M. De Lannoy et al.

variables are the land surface temperature (tsurf) and the near-surface ground heat content (ght) which determines the near-surface soil temperature (tsoil). These soil moisture variables as well as the forcings are suitably perturbed to generate an ensemble of forecasts (Table 2, Sect. 7.2.1). A radiative transfer model (Sect. 5.2) diagnoses brightness temperature based onthe surface soil moisture and temperature    c in the assimilation system, i.e., Tbi, j ¼ h b xi, j for each time step i and ensemble member j. This radiative transfer model is optimized to simulate a realistic long-term mean and variability in brightness temperature. The optimization is based on multiangular SMOS observations and uses statistically optimized estimates of simulation and observation errors (De Lannoy et al. 2013). Consequently, long-term biases c  ] are small by between observations and observation predictions [ Tobs, i, j  Tb i, j design, but seasonal biases are still present. To deal with the remaining biases, the instantaneous Tbobs,i are rescaled to the model climatology using seasonally varying means, as discussed in Sect. 7.1.1 (Fig. 3b). The data assimilation then maps the differences between the rescaled brightness temperature observations and simulations to increments in the prognostic variables of the Catchment land surface model. Figure 7 illustrates the concept of assimilating coarse-scale (36 km) brightness temperatures into the GEOS-5 Catchment land surface model. In the absence of SMAP observations at the time of writing, SMOS (Kerr et al. 2010) brightness temperature observations are assimilated at 40 incidence angle, for both H- and V-polarization. For simplicity, it is assumed here that the state variables and observations are at the same coarse-scale 36-km resolution. Note however that the SMAP L4_SM product is produced at 9 km.  c > for (a) horFigures 7a–b show the ensemble mean innovations < Tb0obs, i -Tb i izontal polarization and (b) vertical polarization and for both ascending and descending overpasses during 3 days. The swaths are relatively narrow, because of the strict quality control against aliased data and quality control on the angular fitting from multi-angular data to 40 incidence brightness temperature. Some swaths are also incomplete, because insufficient historical data are available (often due to radiofrequency interference, primarily over China, the Middle East, and Eastern Europe) to ensure a statistically reliable rescaling. Figures 7c –f illustrate how the brightness temperature innovations are mapped to state increments. Negative brightness temperature innovations in Figs. 7a–b indicate that the model is too warm and/or too dry. Consequently, the negative innovations in the central western region of the USA and in the southwestern region of Australia result in an increase in soil moisture and a decrease the soil temperature. Likewise, positive innovations result in a decrease in soil moisture and an increase in soil temperature. Note that an increase in soil moisture corresponds to an increase in the water excess terms (srfexc, rzexc) and a decrease in the water deficit term (catdef). The magnitude of the increments to srfexc, rzexc, and catdef should be interpreted in relation to the storage capacity of each of these components (with equivalent soil layer thicknesses of 0.05, 1, and 1.3–10 m, respectively), and the color-bar range is scaled accordingly. While the absolute values of the increments are largest for catdef

Soil Moisture Data Assimilation

39

(global standard deviation of about 5 mm, pertaining to the entire profile), the increments relative to the layer depth are largest in the 5-cm surface layer (srfexc). The spatial filtering introduces some additional important features: the increments are spatially smoothed, and the spatial coverage of the increments is wider than the coverage of brightness temperature innovations. The spatial error correlations also introduce a horizontal propagation of information to unobserved areas. In this example, the support of the forecast error correlation function is limited to 1.25 , and increments thus taper off over a distance of 1.25 away from the observed swath. The L4_SM product is generated on a 9-km model grid, and the spatial filtering further improves soil moisture estimation through downscaling of 36-km brightness temperature innovations. The L4_SM product is not limited to global surface and root-zone soil moisture. Research output includes other land surface state variables such as soil temperature and snow, as well as land surface fluxes and meteorological forcings. In addition, ensemble-derived error estimates are provided.

10

Conclusions

Soil moisture is a key variable in hydrological and Earth modeling and assimilation systems. Good estimates of soil moisture at regional to global scales are important for predictions of weather and climate, agricultural productivity, and natural hazards and for various other environmental and socioeconomic applications. Data assimilation provides a means to obtain enhanced soil moisture estimates by combining (often indirect) observations of soil moisture with land surface modeling. At large spatial scales, observations related to soil moisture are mainly provided by global surface observational networks or through remote sensing, either in the form of satellite retrievals, microwave radiances, or backscatter values. This chapter provides conceptual examples on how to assimilate these observations with widely accepted assimilation techniques, such as optimal interpolation and various types of Kalman filtering and smoothing. Special attention is paid to practical issues, such as dealing with multiple scales, the treatment of typical biases, and the characterization of random forecast and observation errors. Operational centers have used screen-level observations of temperature and relative humidity to update soil moisture and temperature for numerical weather prediction. Satellite-based microwave observations provide more direct measurements of surface soil moisture and recent satellite missions such as ASCAT, SMOS, and SMAP are aiming at continued and improved surface soil moisture observations. The assimilation of satellite-based surface soil moisture retrievals has the capability to improve both surface and root-zone soil moisture, as illustrated in various research applications. However, the soil moisture retrieval process relies on parameters and a priori information that may be inconsistent with the land surface model used in the assimilation system. It is thus more natural to couple a radiative transfer or backscatter model to a land surface model, and then directly assimilate microwave observations such as brightness temperature or backscatter. Further improvements

40

G.J.M. De Lannoy et al.

in continental-scale root-zone soil moisture can perhaps be obtained from the assimilation of integrated terrestrial water storage observations (e.g., from GRACE). The growing experience with assimilation of soil moisture observations is reflected in the preparation of new cutting-edge data assimilation systems for operational applications. This chapter provides details on two of these systems. A first example shows how ASCAT surface soil moisture retrievals will be assimilated along with screen-level observations for numerical weather prediction at ECMWF. A second example discusses the assimilation of brightness temperature observations from SMOS to prepare for the operational global SMAP surface and root-zone product (L4_SM) at NASA. Both these systems benefit from increasingly available computational power as well as from recent and future satellite missions that are specifically designed to measure soil moisture. These operational systems are able to provide improved soil moisture estimates that have the potential to improve hydrometeorological forecasting across a range of applications such as weather forecasting and the monitoring and prediction of droughts.

References C. Albergel, W. Dorigo, R. Reichle, G. Balsamo, P. de Rosnay, J. Muñoz-Sabater, L. Isaksen, R. de Jeu, W. Wagner, Skill and global trend analysis of soil moisture from reanalyses and microwave remote sensing. J. Hydrometeorol. 14, 1259–1277 (2013). doi:10.1175/JHM-D-12-0161.1 G. Balsamo, J.F. Mahfouf, S. Bélair, G. Deblonde, A global root-zone soil moisture analysis using simulated L-band brightness temperature in preparation for the Hydros satellite mission. J. Hydrometeorol. 7, 1126–1146 (2006) G. Balsamo, P. Viterbo, A. Beljaars, B. van den Hurk, M. Hirschi, A.K. Betts, K. Scipal, A revised hydrology for the ECMWF model: verification from field site to terrestrial water storage and impact in the integrated forecast system. J. Hydrometeorol. 10, 623–643 (2009). doi:10.1175/ 2008JHM1068.1 G. Balsamo, C. Albergel, A. Beljaars, S. Boussetta, H. Cloke, D. Dee, E. Dutra, J. Muñoz-Sabater, F. Pappenberger, P. de Rosnay, T. Stockdale, F. Vitart, ERA-Interim/Land: a global land water resources dataset. Hydrol. Earth Syst. Sci. 10, 14705–14745 (2013). doi:10.5194/hessd-1014705-2013 Z. Bartalis, W. Wagner, V. Naeimi, S. Hasenauer, K. Scipal, H. Bonekamp, J. Figa, C. Anderson, Initial soil moisture retrievals from the METOP-A advanced scatterometer (ASCAT). Geophys. Res. Lett. 34, L20401 (2007). doi:10.1029/2007GL031088 S. Bélair, L.P. Crevier, J. Mailhot, B. Bilodeau, Y. Delage, Operational implementation of the ISBA land surface scheme in the Canadian regional weather forecast model. Part I: warm season results. J. Hydrometeorol. 4, 352–370 (2003) W.T. Crow, M.T. Yilmaz, The auto-tuned land data assimilation system (ATLAS). Water Resour. Res. 50, 371–385 (2014). doi:10.1002/2013WR014550 G.J.M. De Lannoy, P.R. Houser, V.R.N. Pauwels, N.E. Verhoest, Assessment of model uncertainty for soil moisture through ensemble verification. J. Geophys. Res. 111, D10101 (2009). doi:10.1029/2005JD006367 G.J.M. De Lannoy, R.H. Reichle, P.R. Houser, V.R.N. Pauwels, N.E.C. Verhoest, Correcting for forecast bias in soil moisture assimilation with the ensemble Kalman filter. Water Resour. Res. 43, W09410 (2007). doi:10.1029/2006WR00544 G.J.M. De Lannoy, R.H. Riechle, V.N.R. Pauwels, Global calibration of the GEOS-5 L-band microwave radiative transfer model over non-frozen land using SMOS observations. J. Hydrometeorol. 14, 765–785 (2013). doi:10.1175/JHM-D-12-092.1

Soil Moisture Data Assimilation

41

P. de Rosnay, M. Drusch, D. Vasiljevic, G. Balsamo, C. Albergel, L. Isaksen, A simplified extended Kalman filter for the global operational soil moisture analysis at ECMWF. Q. J. Roy. Meteorol. Soc. 139(674), 1199–1213 (2013). doi:10.1002/qj.2023 P. de Rosnay, G. Balsamo, C. Albergel, J. Muñoz-Sabater, L. Isaksen, Initialisation of land surface variables for numerical weather prediction. Surv. Geophys. 35(3), 607–621 (2014). doi:10.1007/s10712-012-9207-x I. Dharssi, K.J. Bovis, B. Macpherson, C.P. Jones, Operational assimilation of ASCAT surface soil wetness at the Met Office. Hydrol. Earth Syst. Sci. 15, 2729–2746 (2011). doi:10.5194/hess-152729-2011 P. Dirmeyer, Using a global soil wetness dataset to improve seasonal climate simulation. J. Climate 13, 2900–2921 (2000) W.A. Dorigo, A. Gruber, R.A.M. de Jeu, W. Wagner, T. Stacke, A. Loew, C. Albergel, L. Brocca, D. Chung, R. Parinussa, R. Kidd, Evaluation of the ESA CCI soil moisture product using ground-based observations. Remote Sens. Environ. 162, 380–395 (2015). doi:10.1016/j. rse.2014.07.023 C.S. Draper, R.H. Reichle, G.J.M. De Lannoy, Q. Liu, Assimilation of passive and active microwave soil moisture retrievals. Geophys. Res. Lett. 39, L04401 (2012). doi:10.1029/ 2011GL050655 M. Drusch, E.F. Wood, H. Gao, Observation operators for the direct assimilation of TRMM microwave imager retrieved soil moisture. Geophys. Res. Lett. 32, L15403 (2005). doi:10.1029/2005GL023623 M. Drusch, K. Scipal, P. de Rosnay, G. Balsamo, E. Andersson, P. Bougeault, P. Viterbo, Towards a Kalman filter-based soil moisture analysis system for the operational ECMWF Integrated Forecast System. Geophys. Res. Lett. 36, L10401 (2009). doi:10.1029/2009GL037716 S. Dunne, D. Entekhabi, Land surface state and flux estimation using the ensemble Kalman smoother during the Southern Great Plains 1997 field experiment. Water Resour. Res. 42, W01407 (2006) D. Entekhabi, H. Nakamura, E.G. Njoku, Solving the inverse problems for soil moisture and temperature profiles by sequential assimilation of multifrequency remotely-sensed observations. IEEE Trans. Geosci. Remote Sens. 32, 438–448 (1994) D. Entekhabi, R.H. Reichle, R.D. Koster, W.T. Crow, Performance metrics for soil moisture retrievals and application requirements. J. Hydrometeorol. 11, 832–840 (2010). doi:10.1175/ 2010JHM1223.1 D. Entekhabi, S. Yueh, P. O’Neill, K. Kellogg, SMAP Handbook, NASA/JPL Publication JPL 4001567, Pasadena, CA, USA, p. 182 (2014). A.K. Fung, Z. Li, K.S. Chen, Backscattering from a randomly rough dielectric surface. IEEE Trans. Geosci. Remote Sens. 30, 356–369 (1992) D. Giard, E. Bazile, Implementation of a new assimilation scheme for soil and surface variables in a global NWP model. Mon. Weather Rev. 128, 997–1015 (2000) P.H. Gleick, Water resources, in Encyclopedia of climate and weather, ed. by S.H. Schneider, vol. 2 (Oxford University Press, New York, 1996), pp. 817–823 R. Hess, M. Lange, W. Werner, Evaluation of the variational soil moisture assimilation scheme at Deutscher Wetterdienst. Hydrol. Earth Syst. Sci. 134(635), 1499–1512 (2008) Y. Kerr et al., The SMOS mission: new tool for monitoring key elements of the global water cycle. Proc. IEEE 98, 666–687 (2010) R.D. Koster, M.J. Suarez, A. Ducharne, M. Stieglitz, P. Kumar, A catchment-based approach to modeling land surface processes in a general circulation model 1. Model structure. J. Geophys. Res. 105(D20), 24,809–24,822 (2000) R.D. Koster, P.A. Dirmeyer, Z. Guo, G. Bonan, P. Cox, C. Gordon, S. Kanae, E. Kowalczyk, D. Lawrence, P. Liu, C. Lu, S. Malyshev, B. McAvaney, K. Mitchell, D. Mocko, T. Oki, K. Oleson, A. Pitman, Y. Sud, C. Taylor, D. Verseghy, R. Vasic, Y. Xue, T. Yamada, Regions of strong coupling between soil moisture and precipitation. Science 305, 1138–1140 (2004)

42

G.J.M. De Lannoy et al.

S. Kumar, C. Peters-Lidard, Y. Tian, R. Reichle, J. Geiger, C. Alonge, J. Eylander, P. Houser, An integrated hydrologic modeling and data assimilation framework. IEEE Comput. 41, 52–59 (2008). doi:10.1109/MC.2008.511 Q. Liu, R.H. Reichle, R. Bindlish, M.H. Cosh, W.T. Crow, R. de Jeu, G.J.M. De Lannoy, G.J. Huffman, T.J. Jackson, The contributions of precipitation and soil moisture observations to the skill of soil moisture estimates in a land data assimilation system. J. Hydrometeorol. 12, 750–765 (2011). doi:10.1175/JHM-D-10-05000 J.F. Mahfouf, K. Bergaoui, C. Draper, F. Bouyssel, F. Taillefer, L. Taseva, A comparison of two off-line soil analysis schemes for assimilation of screen level observations. J. Geophys. Res. 114, D08105 (2009). doi:10.1029/2008JD011077 T. Mo, B.J. Choudhury, T.J. Schmugge, J.R. Wang, T.J. Jackson, A model for microwave emission from vegetation-covered fields. J. Geophys. Res. Oceans Atmos. 87(C13), 1229–1237 (1982) C. Montzka, J.P. Grant, J. Moradkhani, H.J. Hendricks-Franssen, L. Weiherm€ uller, M. Drusch, H. Vereecken, Estimation of radiative transfer parameters from L-band passive microwave brightness temperatures using advanced data assimilation. Vadose Zone J. 12(3), 1–17 (2013). https://dl.sciencesocieties.org/publications/vzj/pdfs/12/3/vzj2012.0040 M. Pan, E.F. Wood, R. Wojcik, M.F. McCabe, Estimation of regional terrestrial water cycle using multi-sensor remote sensing observations and data assimilation. Remote Sens. Environ. 112, 1282–1294 (2008) V.R.N. Pauwels, G.J.M. De Lannoy, Ensemble-based assimilation of discharge into rainfall-runoff models: a comparison of approaches to mapping observational information to state space. Water Resour. Res. 45(8), W08428 (2009). doi:10.1029/2008WR007590 R.H. Reichle, R.D. Koster, Assessing the impact of horizontal error correlations in background fields on soil moisture estimation. J. Hydrometeorol. 4(6), 1229–1242 (2003) R.H. Reichle, R.D. Koster, Bias reduction in short records of satellite soil moisture. Geophys. Res. Lett. 31, L19501 (2004). doi:10.1029/2004GL020938 R.H. Reichle, D. Entekhabi, D. McLaughlin, Downscaling of radio brightness measurements for soil moisture estimation: a four dimensional variational data assimilation approach. Water Resour. Res. 37, 2353–2364 (2001) R.H. Reichle, D.B. McLaughlin, D. Entekhabi, Hydrologic data assimilation with the ensemble Kalman filter. Mon. Weather Rev. 120, 103–114 (2002) R.H. Reichle, R.D. Koster, G.J.M. De Lannoy, B.A. Forman, Q. Liu, S.P.P. Mahanama, A. Toure, Assessment and enhancement of MERRA land surface hydrology estimates. J. Climate 24, 6322–6338 (2011) R.H. Reichle, G.J.M. De Lannoy, B.A. Forman, C.S. Draper, Q. Liu, Connecting satellite observations with water cycle variables through land data assimilation: examples using the NASA GEOS-5 LDAS. Surv. Geophys. 35, 577–606 (2014). doi:10.1007/s10712-013-9220-8 M. Rodell, P.R. Houser, U. Jambor, J. Gottschalck, K. Mitchell, C.J. Meng, K. Arsenault, B. Cosgrove, J. Radakovich, M. Bosilovich, J.K. Entin, J.P. Walker, D. Lohmann, D. Toll, The global land data assimilation system. Bull. Am. Meteorol. Soc. 85(3), 381–394 (2004) M. Rodell, J. Chen, H. Kato, J.S. Famiglietti, J. Nigro, C.R. Wilson, Estimating groundwater storage changes in the Mississippi River basin (USA) using GRACE. Hydrogeol. J. 15, 159–166 (2007) J. Sabater, L. Jarlan, J. Calvet, F. Bouyssel, P. de Rosnay, From near-surface to root-zone soil moisture using different assimilation techniques. J. Hydrometeorol. 8(2), 194–206 (2007) S. Saha et al., The NCEP climate forecast system reanalysis. Bull. Am. Meteorol. Soc. ES9–ES24 (2010). doi:10.1175/2010Bams3001.1 K. Scipal, T. Holmes, R. de Jeu, V. Naeimi, W. Wagner, A possible solution for the problem of estimating the error structure of global soil moisture data sets. Geophys. Res. Lett. 35, L24403.1–L24403.4 (2008) J.P. Wigneron et al., L-band microwave emission of the biosphere (L-MEB) model: description and calibration against experimental data sets over crop fields. Remote Sens. Environ. 107, 639–655 (2007)

Soil Moisture Data Assimilation

43

Y. Xia, K. Mitchell, M. Ek, J. Sheffield, B. Cosgrove, E. Wood, L. Luo, C. Alonge, H. Wei, J. Meng, B. Livneh, D. Lettenmaier, V. Koren, Q. Duan, K. Mo, Y. Fan, D. Mocko, Continental scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1. Intercomparison and application of model products. J. Geophys. Res. 117, D03109 (2012). doi:10.1029/2011JD016048 B.F. Zaitchik, M. Rodell, R.H. Reichle, Assimilation of GRACE terrestrial water storage data into a land surface model: results for the Mississippi river basin. J. Hydrometeorol. 9, 535–548 (2008)

Present and Future Requirements for Using and Communicating Hydro-Meteorological Ensemble Prediction Systems for Short-, Medium-, and Long-Term Applications Geoff Pegram, Damien Raynaud, Eric Sprokkereef, Martin Ebel, Silke Rademacher, Jonas Olsson, Cristina Alionte-Eklund, Barbro Johansson, Göran Lindström, and Henrik Spångmyr Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Requirements for a Flood Nowcasting System Based on Radar Ensembles . . . . . . . . . . . . . . . . 4 2.1 Catchment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Forecast Updates Using the Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Short-Term Rainfall Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Forecasting with the “String of Beads” Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5 The S_PROG Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.6 Forecast Model Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

G. Pegram (*) Satellite Applications and Hydrology Group, School of Civil Engineering, Surveying and Construction Management, University of KwaZulu-Natal, Durban, South Africa e-mail: [email protected] D. Raynaud Universite Joseph Fourier, Grenoble, France e-mail: [email protected] E. Sprokkereef Water Management Centre Netherlands, Rijkswaterstaat Traffic and Water Management, Lelystad, The Netherlands e-mail: [email protected]; [email protected] M. Ebel Bundesamt f€ur Umwelt, Ittigen, Switzerland e-mail: [email protected] S. Rademacher Bundesanstalt fuer Gewaesserkunde, Koblenz, Germany e-mail: [email protected] J. Olsson • C. Alionte-Eklund • B. Johansson • G. Lindström • H. Spångmyr Swedish Meteorological and Hydrological Institute, Norrköping, Sweden e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected] # Springer-Verlag Berlin Heidelberg 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_39-1

1

2

G. Pegram et al.

2.7 Areal Rainfall Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Dissemination of Outputs to Disaster Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Communicating Uncertainty to the End Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Conclusion on Nowcasting Systems Based on Radar Ensembles . . . . . . . . . . . . . . . . . . . 3 Increasing Warning Time for Flash Floods with High-Resolution Ensemble Prediction Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Current Flash flood Warning Systems in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Using Limited Area Ensemble Prediction System for Extending the Predictability of Flash Floods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Conclusions for LEPS-Based Early Warning Systems for Flash Floods . . . . . . . . . . . . . . 4 Medium-Range Ensemble Prediction Requirements for a Multinational and Well-Controlled River . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The International Rhine Basin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Operational Forecasting in the Rhine Basin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Experiences in Probabilistic Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Challenges and Developments in Probabilistic Forecasting. Towards Actual Predictive Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Conclusions for Medium-Range Flood Forecasting for Controlled Rivers . . . . . . . . . . . 5 Requirements for Using and Communicating Hydro-Meteorological EPS Medium-Seasonal Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Hydropower, Forecasting and Communication Requirements . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Seasonal (Spring-Flood) Ensemble Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Medium-Range (10-Day) Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusions and Future Outlook for Ensemble-Based Hydropower Forecasting . . . . 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 16 17 19 20 22 24 25 25 26 28 30 33 34 34 35 38 39 42 43

Abstract

This chapter provides representative examples for using and communicating hydro-meteorological ensemble prediction systems (HEPS) for short-, medium-, and long-term applications. The needs of the specific end users in disaster management, flood forecasting centers, and water management are highlighted. In the first section, the requirements are presented for a nowcasting system based on radar data designed to provide sufficient lead time for decision makers responsible for urban areas. The generation of rainfall ensembles from radar measurements is described using the so-called string of beads methodology. The aspects required by decision makers for flood management are followed by the technical set-up and constraints. The second section illustrates how short-term HEPS can contribute to increasing the predictability of flash flood events on a regional scale with complementary indicators to higher-resolution local information systems based on short-term forecasts and observations. In the third section, the benefits of hydrological ensembles are elaborated for flood forecasting and shipping for a well-controlled, trans-national river basin such as the river Rhine. For several years, medium-range ensemble prediction systems have been explored for flood forecasting in medium to large river basins such as the Rhine.

Present and Future Requirements for Using and Communicating Hydro. . .

3

The last section describes the requirements of HEPS for water management and how they are used at the example of hydropower in Sweden. In snowdominated hydrological regimes such as in Scandinavia, reservoirs need to be carefully managed in order to have enough capacity for storing spring flood volume, while keeping enough water for securing the required power generation. The chapter concludes with the strengths and the limitations of HEPS for various applications. Keywords

Ensemble prediction systems • Radar • Trans-national river basins • Decision support

1

Introduction

Ensemble Prediction Systems (EPS) were launched in early 1990s by the US National Meteorological Centre (NMC) and the European Centre for MediumRange Weather Forecast (ECMWF) (Tracton and Kalnay 1993; Palmer et al. 1993). It was only after they became an integral part of operational weather forecasts in the medium-range that the hydrological community started exploring their benefit also for the prediction of hydrological processes (de Roo et al. 2003; Schaake et al. 2006). Although research studies clearly indicated the potential for improving predictions in the medium range, the translation from research into operational hydrological applications, e.g., for flood forecasting, was slow initially (Cloke and Pappenberger 2009). With the development of EPS with different resolutions and lead times, this has changed in the recent years and the use of EPS now spans across a wide range of hydro-meteorological applications including shortterm flash floods, medium-range riverine floods, and long-terms water and energy resource management. Furthermore, where numerical weather predictions still do not provide sufficient spatial and temporal resolution and skill, alternative ways of constructing ensembles, e.g., from radar estimates, have been developed. In contrast to using single, deterministic forecasts, EPS allow quantifying the magnitude of the uncertainty in the predictions. This has opened the door to entire new fields of research ranging from the correct representation of probabilities, evaluation of probabilistic forecasts, and the communication of probabilistic results to decision makers. Depending on the situational context, decision makers have different requirements: a mayor responsible for evacuating campsites or urban areas in case of flash flood danger has different time constraints and critical thresholds to consider for decision making than a manager of a hydropower plant responsible for meeting production targets. It is beyond the scope of a single chapter to describe all requirements for using and communicating hydro-meteorological EPS for short-, medium-, and long-term applications. Instead, representative examples for nowcasting, short-term, mediumterm, and long-term applications are described illustrating the specific end user

4

G. Pegram et al.

requirements, how the requirements are met by using EPS and how the results are communicated to end users and decision makers.

2

Requirements for a Flood Nowcasting System Based on Radar Ensembles

In the year 2000, the Municipality Disaster Management Centre of the city of Durban in South Africa did not have any facility for anticipating floods except from emergency weather reports and forecasts. They typically found themselves reacting to information phoned in by people who had either experienced damage or noticed that flooding was occurring. By the year 2003 they had, in the Disaster Management Centre, a GIS display overlain by real-time images of rainfall measured by radar at 5-min intervals, showing them where the rain was and had fallen. This quantum improvement meant that the people living near rivers had now got the potential for some warning about impending floods. In addition, most knew that the Disaster Management Group was working towards mitigating floods in their area in a proactive rather than reactive way. Major industrial developments had been established in Mgeni Park and in the Mlazi Basin in the Durban Metro. Some of these were strategic industries. With the flood forecasting capability in the Durban Metro Disaster Management Centre, a 6–12 h warning of an impending flood would enable industry to evacuate staff and perform controlled shut downs or take steps to reduce the damage to the sensitive plants. The requirements of an effective flood forecasting system region, which carry over to other problem sites, were set out based on a definition of what constitutes such a system. The Provincial Umgeni Water expected a model or suite of models that • • • • •

Use radar data as precipitation input Forecast storm movement and precipitation Determine the expected runoff from catchments Are easily updateable Have a user friendly front end

whereas the Durban municipality expected a system which: • • • • • •

Is quick and easy to run Is easy to update Is spatially presented (preferably with a GIS) Is able to provide predictions with a good degree of confidence Makes it easy to obtain information Provides reasonable advance warning of potentially serious events (2–3 h)

Of key importance are reliable real-time measurements of rainfall and streamflow, which can be difficult to obtain in bad weather. It is usually difficult to get people

Present and Future Requirements for Using and Communicating Hydro. . .

5

who are in danger to move away from their homes and possessions; therefore, it is crucial that the warnings are reliable, in order to facilitate the development of a relationship of trust between the affected communities and the Disaster Managers – there is a very high cost associated with “crying wolf.” To obtain a measure of the reliability of forecasts, so that they can be meaningfully transferred to the public, it is important that they are accompanied by confidence bands to indicate their precision. These confidence bands can realistically be obtained from examining the forecasts of historical events and determining how quickly they decay in terms of realistic information. The best way to do this is to determine confidence intervals, generated from ensembles of forecasts, seeded with small initial perturbations. The next step, before issuing warnings to the public, is to translate the technical measures of uncertainty (as measured by the confidence bands) into easy to understand messages, e.g., visually through “traffic light” colors Green, Yellow, Orange, and Red bands of alert associated with future periods. This aspect of measuring uncertainty is typical of other systems, but the alerting measures were left to the disaster managers to interpret, with assistance from the designers of the system. Some of the requirements of the Alerting Procedure have to be balanced with practical requirements due to the limited resources available. In the following development, first the catchment model which was selected for the flood forecasting system is briefly described, then the methodology of short-term rainfall forecasting using the “String of Beads” rainfall simulation model is outlined. The incorporation of the real-time flood forecasting rainfall-runoff model and the operation of the String of Beads model in parallel forecast streams is described. Finally, the manner in which the system developers responded to the decision makers needs for user friendly interfaces and clear information is reported.

2.1

Catchment Model

The catchment model selected was composed of an arrangement of three interconnected linear reservoirs, parameterized by the reservoir response constants and including a loss term for each reservoir; a complete description of the model is given by Pegram and Sinclair (2002), based on the ARMA-based linear tank model introduced by Pegram (1980). The model is used in a semidistributed form where each of 12 quaternary subcatchments (each approximately 300 km2) of the Umgeni catchment is treated in a lumped manner. These are forced with a mean areal rainfall input in aggregations of radar estimates of rainfall measured in 5-min intervals. For the purposes of making forecasts, the model is formulated according to its governing state-space equations rather than in the more efficient pseudo-ARMA form. This was to take advantage of the Kalman (1960) and MISP (Todini 1978) filters for online forecast updates. In our application, the Kalman filter updates the model states while the MISP algorithm is a modified version the filter, which provides both state and parameter updates at each time step. The advantage of recursive estimation techniques (such as the Kalman filter) is that they only require the measurements from the immediate past in order to provide

6

G. Pegram et al.

the optimal (least squares) estimates; this makes them ideally suited to online computational situations. A further important benefit is the routine computation of the error covariance matrix, which can provide an estimate of the forecast confidence at each time step. Figures 1 and 2 indicate the difference between state updates and parameter updates, in the context of the catchment model used in this study. State updates refer to a change of storages within the model’s reservoirs, while parameter updates change the way the model will respond to a given set of storages to reflect the changes in the catchments expected response with time.

Fig. 1 Model state updates (note the different storage levels at the different epochs)

Fig. 2 Model parameter updates (note the same storages, but differing responses)

Present and Future Requirements for Using and Communicating Hydro. . .

2.2

7

Forecast Updates Using the Kalman Filter

The Kalman filter is a recursive state estimation scheme. It produces optimal estimates (in a least squares sense) of a system’s state vector xt, where the system may be defined by the following linear stochastic difference equation (Eq. 1) xt ¼ Axt1 þ But þ wt1

(1)

where A is the state transition matrix, B is the system input conversion matrix, ut is the exogenous input vector, and wt1 is the system noise, which is assumed to be normally distributed white noise with mean w and covariance matrix Q. The system measurement equation is given by Eq. 2 yt ¼ Hxt þ vt

(2)

where yt is the system measurement vector, H is the state to measurement transfer matrix, and vt is the measurement noise which is assumed to be normally distributed white noise with mean v and covariance matrix R. The derivation of the filter equations can be found in various texts (e.g., Gelb 1974; Young 1984) and is omitted here. The filter equations fall into two categories: prediction and correction equations. The prediction equations provide a priori (forecast) estimates of the system states and state error covariance matrix; the correction equations compute the a posteriori estimates of the system states. The matrices A, B, Q, R & H, the measurements yt, and the input vector ut are all assumed to be known until time t; the unknowns are the states xbt and the noise terms wt and vt. The filter is applied recursively with initial estimates of the system states and associated errors being made to start the filter. The prediction equations are used to forecast to the next time step and the correction equations applied once a measurement becomes available. This is illustrated in Fig. 3, along with the filter equations. In the figure, xbtþ1=t is the a priori estimate of the system states, xbt=t the a posteriori estimate of the system states, Ptþ1=t the a priori estimate of the system error covariance matrix Pt/t the a posteriori estimate of the error covariance matrix Kt the gain of the filter.

2.3

Short-Term Rainfall Forecasting

Rainfall forecasts applicable to flood forecasting typically span a whole range of timescales from seasonal outlooks to several days ahead as well as right down to forecasts for the next few minutes or hours. In order for the longer range forecasts to be significant the uncertainties associated with them need to be reduced by decreasing the spatial resolution at which the forecasts can be made. This unfortunately has the effect of reducing their usefulness for flood forecasting, even for relatively large

8

G. Pegram et al.

Prediction Equations States:

xt+1/t = Axt/t + But + wt Error covariance:

Pt+1/t = APt/t AT + Q

New measurements

Correction Equations Kalman Gain:

Kt = Pt/t –1HT (HPt/t–1HT + R)–1 State update:

xt/t = xt/t–1+Kt (yt – Hxt/t–1–vt–1) Error covariance update:

Pt/t =(I –Kt H)Pt/t–1(I –Kt H)T +Kt RK)Tt

Fig. 3 The Kalman filter cycle

catchments, for this reason two statistically based spatial rainfall models were investigated as possible candidates to provide short-term forecasts of rainfall fields in the study. These were the “String of Beads” Model (SBM) of Pegram and Clothier (2001) and the “Spectral prognosis” (S_PROG) model (Seed 2001). Numerical weather prediction is not able to provide sufficient quantitative detail at the appropriate spatial and temporal scales for the purposes of this urban flood forecasting system.

2.4

Forecasting with the “String of Beads” Model

A full description of the String of Beads Model in forecast mode has recently been published by Berenguer et al. (2011). A complete description of the SBM in

Present and Future Requirements for Using and Communicating Hydro. . .

9

Fig. 4 The SBM noise stack

simulation mode is given by Clothier and Pegram (2002), with a basic outline given here. A typical simulation begins with the generation of a stack of random noise fields. The values of the noise at each pixel in the field are independent and distributed according to the standard normal distribution with a mean of zero and unit variance. A new field is constructed (pixel by pixel) using an Autoregressive lag 5, AR(5), model linking the new frame to the previous 5 and placed at the first position on the stack, with the existing fields moving backwards one position and the final field falling away (Fig. 4). A mean field advection vector is used to establish the correct temporal alignment of the pixel values in the field. A warm-up period is required to ensure that the sequence of fields used in the generation is properly conditioned. Once the stack has been given a sufficient number of recursions to be correctly conditioned, the newly generated field is power-law filtered using the fast Fourier transform to ensure the spatial correlation has the desired shape, then the correcting image scale statistics “Wet Area Ratio” (WAR) and “Image Mean Flux” (IMF) are imposed on the field by a thresholding and scaling process. The resulting field is then exponentiated to produce a field of simulated rainfall rates. In forecast mode the process is more streamlined. The spatial correlation structure from the observed fields is retained, and the pixel scale development is computed directly from the observed fields. The image scale statistics are forecast using the same bivariate AR(5) process of the simulation mode, but this time conditioned on the mean values of WAR and IMF from the previous five observations. Instead of using Lagrange advection, a sophisticated motion-tracking algorithm (Bab-Hadiashar et al. 1996; Seed 2001) was used to estimate the field advection. A dense grid of advection vectors is computed at each time step and used to advect the forecasts and maintain the appropriate temporal alignment between forecasts. The smoothed advection grid is updated, at each time step, as new information

10

G. Pegram et al.

becomes available. The computation of the advection vectors is efficient enough to allow it to be used for real-time applications. Optical flow algorithms that provide comparable (or better) accuracy are generally prohibitively inefficient to compute without specialized hardware components (Camus and Bulthoff 1995).

2.5

The S_PROG Model

The S_PROG model exploits the idea that rainfall fields exist as structures encompassing a range of spatial scales with the persistence of structures being proportional to their spatial scale i.e., larger scale structures have a longer persistence time than smaller scale ones (Seed et al. 1999). Generation of new forecast rainfall fields is achieved by first disaggregating the observed field into a multilayered hierarchy of fields each of which represents features at different scales. This disaggregation is achieved by applying a notch filter in the Fourier domain to separate the field into a number of spectral components. Each of the fields in the hierarchy is modeled by a dynamically fitted AR(2) process at the relevant space scale and combined to produce the forecasts. The forecast field is conditioned to have the same mean rainfall rate and wetted area as the latest observed image; this is in contrast to the SBM where the evolution of these quantities is predicted using a bivariate AR(5) model.

2.6

Forecast Model Comparisons

Both qualitative and quantitative comparisons were made between the forecasts of SBM and S_PROG. Both models were configured to produce forecasts conditioned on an observed initialization period of a rainfall event measured by the Durban weather radar and the resulting forecasts compared to the observations. Figure 5 shows one of a set of 4 comparisons of forecasts made using the SBM and S_PROG models as well as the observed sequences from the equivalent times. The forecasts in Fig. 5 have been made at time steps of approximately 5 min; thus, the forecasts are up to half an hour ahead. The images shown in the figure represent observed and forecast reflectivity fields measured in dBZ. All fields are thresholded below a value of 10 dBZ because the rainfall rates below this level are negligible (approximately 0.15 mm/h). To illustrate an ensemble of radar forecasts, Fig. 6 gives an example of fine scale (1 km) radar image forecasting showing four members of an ensemble of forecasts using simulated sequences, comparing the expected, a possible growth and a possible decay scenario with the actual sequence. The images are shown at 10 min intervals. The missing quadrant in each scan at lower right was to remove irritating ground-clutter. This image is taken from Clothier and Pegram (2002). The forecast fields in Fig. 5 were compared to the observed fields in terms of two measures. The first was the square root of the mean of the sum of squared errors over all pixels in the field (RMSSE), while the second was a comparison of the mean

Fig. 5 Comparative forecasts and observed reflectivity fields – top row Observed, second row SBM forecasts, third row S_PROG forecasts – 5 min intervals starting simultaneously

Present and Future Requirements for Using and Communicating Hydro. . . 11

12

G. Pegram et al.

Fig. 6 An example of fine scale (1 km) radar image forecasting showing four members of an ensemble of forecasts using simulated sequences, comparing the expected, a possible growth and a possible decay scenario, together with the observed sequence

spatial difference error, which is closely related to the IMF (average spatial rain rate) in the rainfall rate domain. Figure 7 compares the magnitude of these two measures for the advancing forecast sequences produced by the models. S_PROG performs better than SBM if RMSSE at the primary pixel scale is the criterion used for the comparison. The RMSSE for S_PROG increases rapidly with forecast time, while the RMSSE for SBM remains fairly constant with time, despite starting at a higher initial value. It makes sense that S_PROG performs better (in this case) since the model is formulated to give an optimum forecast in the least squares sense, with the forecasts being degraded to a mean field as the forecast time increases, resulting in errors with a smaller magnitude. The SBM does not behave in this way since the image scale parameters (WAR and IMF) evolve according to the bivariate AR(5) model, meaning that changes in intensity and wetted area are evident in the forecasts. The RMSSE is very sensitive to the relative positioning of peaks in the field, having a large effect on the squaring of the errors. Squaring the error values

Present and Future Requirements for Using and Communicating Hydro. . .

sprog

SBM

sprog

SBM

5 Mean errors (dBZ)

20 R.M.S.S.E (dBZ)

13

15 10 5 0 0

5 10 Leadtime (number of images)

4 3 2 1 0 -1 0

5 10 Leadtime (number of images)

Fig. 7 Root mean sum of squared forecast errors and mean errors at 5-min intervals

Fig. 8 Sensitivity of RMSSE to the position of peaks; (a) and (b) are time series which are nearly the same in that the same amount of rain fell in both cases, but are staggered by one time step; (c) shows the differences, while (d) shows the squared differences

removes the effects of sign from the comparison. Figure 8 shows this effect graphically in one dimension. The RMSSE can be misleading in this context; because the right amount of rainfall integrated over subcatchments was our goal (not matching at the pixel scale), the mean errors were considered more appropriate in the model comparison. The final decision was to use SBM as the forecasting model in our application, instead of S_PROG, as it is better suited to event scale simulation.

14

2.7

G. Pegram et al.

Areal Rainfall Estimates

The first step is to accumulate the spatial rainfall fields over the required time period as described in Pegram and Sinclair (2002). The radar rainfall fields are sampled, instantaneously, at discrete time steps, while the physical rainfall field is evolving continuously with time. This evolution encompasses the growth and decay of the field’s structures as well as complex field advection processes. If the sampling interval is not fine enough to capture the dynamic changes in the field, then simple linear accumulation techniques are inadequate and more sophisticated accumulations schemes are required. Figure 9 shows a 6 h accumulation done using a linear accumulation technique (where the time between radar scans is 15 min), and Fig. 10 the same accumulation period using a morphing approach described by Hannesen (2002). It is obvious that the second approach produces an accumulated field which is what we would expect due to a rainstorm moving through the radar’s field of view.

Fig. 9 A crude superimposed accumulation technique (After Hannesen 2002)

Present and Future Requirements for Using and Communicating Hydro. . .

15

Fig. 10 Cross-correlation tracking and morphing accumulation (After Hannesen 2002)

2.8

Dissemination of Outputs to Disaster Managers

The dissemination of outputs refers to the kinds of information presented to the end user of the flood forecasting system, in this case the disaster managers (DMs). The products were agreed upon after consultation with the members of the disaster management team at Durban Metro and Umgeni Water. The consultation process yielded two major requirements from the DM’s point of view; some warning should be provided ahead of a likely flood and the affected areas should be indicated (preferably in a graphical format). To meet these requirements, ArcView GIS was chosen as a display tool since the DMs already made use of this system. Images of current and historical rainfall were available for display in the disaster management control room, and dynamically selected flood lines could be called up in response to current observed and forecast streamflows.

2.8.1 Integration of the Radar Rainfall Images into a GIS When radar rainfall images are presented in a spatial context, they can provide a visual advanced warning of heavy rainfall approaching sensitive (flood-prone) catchments. Figure 11 shows an example of an instantaneous radar image of a

16

G. Pegram et al.

Fig. 11 A major storm over the upper Mlazi catchment (17 December 2002 at 14:58)

major storm event that moved over the Mlazi catchment and the lower reaches of the Mgeni catchment. The images were available in an ArcView compatible format at the Durban Municipality’s Disaster Management Centre in near real-time.

2.8.2 Indicating the Flood Affected Areas in a GIS There was a need expressed by the disaster managers (DMs) to determine which areas they might expect to be affected by floodwaters. It is all very well having a sequence of observed streamflows and possible forecasts of the future flows described in m3/s, but these numbers usually have no meaning to the DMs. They wanted images interpreting flood levels and depths of inundation showing which areas are to be affected. The chosen process for producing floodlines was as follows: a Hydraulic model of the river channel and adjacent flood plains was produced, using a Digital Elevation Model (DEM) in combination with field surveys. This is a once off procedure since the same terrain model may be used to hydraulically model the effects of many different flow rates. The hydraulic model used in this study was the well-known HEC-RAS model. HEC-RAS routes the flood-wave down the modeled channel and computes flood levels at each model cross-section. A typical output from HEC-RAS (steady state) is shown in Fig. 12, modeling the river near its mouth to the sea. HEC-RAS can also output the flood levels as a set of points in a three-dimensional coordinate system as a text file. In order to interpret these points as flood lines and inundation levels, an interpolation between channel cross-sections is required, preferably in conjunction with information from a DEM. A separate floodline must be produced for each flow

Present and Future Requirements for Using and Communicating Hydro. . .

17

Fig. 12 HEC-RAS output for the lower reaches of the Umgeni River near its mouth, Northeast of the City of Durban, the turquoise surface being the modeled 50-year flood

rate considered. A typical one for the 50-year flood in the Umgeni river, 21 km from the river mouth, is shown in plan in Fig. 13. For the purposes of this pilot study, a simple solution was proposed. Floodlines at various recurrence intervals are available in the ArcView shapefile format for the Umgeni River through previous studies commissioned by the Durban municipality. These served as the starting point and an ArcView script was written which selects the appropriate floodlines to display based on the current observed streamflow and the forecast flow. Then images such as Fig. 11, tracking storms, and Fig. 13, showing computed levels, were mounted in the city’s disaster management center and updated automatically as recent rainfall-runoff data were collected and processed. Figure 14 shows an ensemble of spatially averaged rainfall forecasts over the Mlazi catchment indicated in Fig. 11. These were modeled using the String of Beads Model. For this application, ensembles of future possible riverflows were not computed; however, the methodology is an extension of the above and the reader is referred to Sect. 4 following for a description of the process.

2.9

Communicating Uncertainty to the End Users

The major aims in communicating flood forecast information to the end users are clarity and simplicity. People in charge of decisions during flood events are not necessarily technical people and are working under stress; therefore, efforts have to be made to convert probabilities into meaningful decision tables or support matrices.

18

G. Pegram et al.

Average.catchment.rainfall.mm/h

Fig. 13 A 2 % Annual exceedance probability (50 year) flood line (Umgeni catchment downstream of Inanda Dam) also using HEC-RAS, upstream of the river in Fig. 12

2.0

1.5

1.0

0.5

0.0 3h

6h

9h

12h

Fig. 14 An ensemble of forecast rainfall over the Umlazi catchment created using the String of Beads Model. Three are colored up to distinguish the individual paths from the others

An example of one is offered in Table 1. The proposed methodology requires the preliminary definition, during a planning phase, of a number of alert levels (blue, yellow, and red) corresponding to bankfull (blue), inundation of property (yellow), and potential loss of life (red). The decision to take (or not to take) action is derived as a function of (1) the alert level, (2) the

Present and Future Requirements for Using and Communicating Hydro. . .

19

Table 1 An example of a flood decision support matrix Reliability

Reliability

Reliability

Likelihood Blue Very low Low Medium High Very high Likelihood Yellow Very low Low Medium H Very high Likelihood Red Very low Low Medium High Very high

Very low NA NA NA NA NA

Low NA NA NA NA NA

Medium NA NA NA A A

High NA NA NA A A

Very high NA NA A A A

Very low NA NA NA NA NA

Low NA NA NA A A

Medium NA NA A A A

High NA NA A A A

Very high NA NA A A A

Very low NA NA NA A A

Low NA NA A A A

Medium NA NA A A A

High NA A A A A

Very high NA A A A A

likelihood of the event, and (3) the reliability of the forecast. The likelihood of the event is defined as a function of the probability of exceedance (Very Low, Low, Medium, High, Very High) of each specific alert level, while the reliability (Very Low, Low, Medium, High, Very High) of the forecast is defined in terms of its coefficient of variation (CV). The probability of exceedance is computed as a function of the probability of a future value conditional to all the prior knowledge one can establish, including the actual forecast of the flood forecasting model. The rationale for this approach is that on the one hand it becomes critical to take the right decision when the Red level is reached and on the other hand one has to be more careful when the uncertainty of the forecast is low.

2.10

Conclusion on Nowcasting Systems Based on Radar Ensembles

This section set out to describe the main factors to keep in mind when designing a flood forecasting system for small to medium-sized catchments. For this purpose, weather radar is an indispensable tool because the images are available in near real time. In addition, models like the String of Beads can be used to provide ensemble forecast of rainfall in near real time starting within 5–10 min ahead of their assimilation into the system. This is in contrast to ensembles available from Numerical Weather Prediction models which have delays of the order of hours. With all the

20

G. Pegram et al.

sophistication of the process, this needs to be seamless and invisible to the disaster manager who is interested in the forecast outcome, not the math. The key words are speed and simplicity, for the amelioration of the fearful damage and loss of life caused by heavy rainfall.

3

Increasing Warning Time for Flash Floods with HighResolution Ensemble Prediction Systems

Flash floods can be generated by a variety of different processes or combination of processes that are often influenced by very small scale and local features. For example, flash floods can be the result of short-term but intense precipitation but can equally be triggered by moderate rainfall combined with snow melt. Rainfalls also can suddenly be enhanced by local features related to orography, e.g., convection through sun facing slopes (e.g., Thielen and Gadian 1996) or urban heat islands (e.g., Bornstein and Lin 2000; Kusaka et al. 2014; Thielen et al. 2000; Thielen and Gadian 1997), which can lead to extreme rainfall rates or stationary regeneration of storm cells. In principle, the small scales of most of the meteorological processes involved in flash flood triggering can be resolved to some extent by high-resolution numerical models (Younis et al. 2008; Afieri 2011). With steadily increasing computing power, there is also an increasing number of operational numerical weather prediction models with grid spacings well below 10 km and thus approach flash flood relevant scales (e.g., Stephan et al. 2008; Rotach et al. 2009). However, even with such high-resolution operational numerical weather prediction models, uncertainty about the exact intensity, timing and location of precipitation remains high. More importantly, even if meteorologists were able to forecast rainfalls accurately, predicting flash floods would remain difficult because the efficiency of the runoff processes in the catchments play an important role as well – moderate precipitation on totally saturated soils can lead to flash flood generation as well as heavy precipitation on unsaturated soils. Therefore, in order to forecast flash floods, both the meteorological rainfalls and the runoff processes in the catchments need to be well described (UCAR 2010). Due to the difficulty in predicting rainfalls sufficiently skillful at the local scales, flash flood predictions are often based on observations through in situ networks, radars, lidars, or other remote sensing techniques which have the drawback that the lead times are very short as illustrated in the previous section. Efforts in extending the lead time are made by blending remote sensing fields with very short-term forecasts allowing to extend the predictability from 1–3 up to 6 h (e.g., Germann et al. 2009; Liguori et al. 2012; Leijnse et al. 2007; Kober et al. 2012). Such systems provide high-resolution and high-quality rainfall fields that can be applied in hydrological models specifically developed for certain catchments. The estimation of uncertainty in the estimation and prediction of the rainfall fields is increasingly taking into account (Villarini et al. 2010). However, while flash floods are local phenomena, they can occur almost anywhere and a shift in rainfall of only a few hundred meters can lead to flash floods in a

Present and Future Requirements for Using and Communicating Hydro. . .

21

neighboring catchment where a comprehensive data collection system, rainfallrunoff model, or flash flood guidance system may have not been set up. In case the other catchment belongs to another authority or even country, possible prewarning information may be entirely lost because information is not shared and bi-lateral communication protocols may not have been established. This situation could be improved with a regional flash flood early warning platform which integrates products based on numerical weather prediction which are available over large spatial areas with specific, local information, e.g., from radar and in situ measurements. In this section, the requirements for such a seamless regional flash flood forecasting system are being laid down at the example of the Mediterranean, a region particularly prone to flash floods.

3.1

Current Flash flood Warning Systems in Europe

Few operational early warning systems for flash flood are documented in Europe. Table 2 summarizes locations, geographical extents together with some general properties such as their spatial resolution and forecasting methodologies. Most likely other systems exist, but the lack of publications or information available online or from other sources make it difficult to establish an exhaustive state of the art. The used methodologies vary a lot from one early warning system (EWS) to another. Some of them are only based on rainfall analysis and do not perform any hydrological simulation. They can be used for greater lead times, but their accuracy is often limited by the lack on hydrological processes like for the RISKMED warning system (Savvidou et al. 2009). The Extreme Rainfall Alert (ERA) developed in UK is only based on rainfall threshold exceedance of some given return periods events and for storm durations from 1 to 6 h. Thus, it is mostly design for urban and flash floods. It gives a probabilistic assessment of the risk by taking into account the positional uncertainty of the quantitative precipitation estimate from the high-resolution UK Met Office 4 km model. The first evaluation of the warning systems proved that ERA is better at timing the possible rainfall threshold exceedance rather than for effectively forecast the flood occurrence (Hurford 2012). The second type of systems uses hydrological models to assess the flash flood threat sometimes with a very high resolution (Verzasca EWS: Liechti et al. 2012), but the heavy computation time required do not make them suitable for forecast more than a few hours in advance or for spatial domain larger a few hundreds kilometers. A third approach is adopted by the FLASH system which is completely different from the others as it assumes a direct link between rainfall intensity and the electric activity of the associated thunderstorms (Llasat et al. 2010). For this warning system, the lead time of prediction is again very short as it relies on the triggering and observation of thunderstorms. The rainfall data chosen as input also change from one system to another. Most of the nowcasting systems take the radar rainfall images as meteorological information (Verzasca EWS, AIGA), while the others use the quantitative precipitation estimates (QPE) from the numerical weather prediction (RISKMED, EPIC). The PFFGS by

22

G. Pegram et al.

Table 2 Name Verzasca EWS GFWS

Domain Verzasca watershed Switzerland Guadalhorce watershed, Spain

Resolution 500 m

Lead-time Nowcasting

1 km

Nowcasting

AIGA

South East of France

1 km

Nowcasting

EHIMI

Catalonia

1 km

Nowcasting

ERA

UK

4 km

120 h

FloodPROOFS FLASH

Vall d’Aosta (Italy)



60 h

Whole Mediterranean

3h

PFFGS

Catalonia

≈6 km (accuracy of the lightning detection system) 1 km

PIEDMONT EWS

Piedmonte region, Italy

48 h

RISKMED

South of Italy (except Sicily), Western Greece, Malta, Cyprus European part of the Mediterranean

Depends on lead time From 1 km with radar images to 7 km with QPF 7 km

72 h

Meteorological warnings

1 km (7 km pour the input rainfall data)

132 h

Meteorological warnings

EPIC

6h

Methodology Hydrological simulations Meteorological and hydrological warnings Meteorological and hydrological warnings Meteorological and hydrological warnings Meteorological warnings Hydrological warning Lightning based Meteorological and hydrological warnings Hydrological simulations

using blending methods between the two types of inputs develop an innovative technique (Atencia et al. 2010). Ensemble prediction techniques are being explored for hydrological forecasting since about a decade in projects such as the European Flood Forecasting System (FP5 Research project), through initiatives such as the Hydrologic Ensemble Prediction EXperiment (HEPEX), or the COST 731 project (Rossa et al. 2010; Zappa et al. 2010) and many others (Cloke and Pappenberger 2009). Among the systems listed above, four already explore the technique for flash floods. The Verzasca EWS

Present and Future Requirements for Using and Communicating Hydro. . .

23

uses a new technique called NORA, an acronym for Nowcasting of Orographic Rainfall by mean of Analogues, which built 25 possible rainfall forecasts up to 1 h in advance thanks to the radar data archive and some similar meteorological situations (Panziera et al. 2011). The GFWS creates 16 possible rainfall inputs thanks to the a radar rainfall ensemble (developed during the IMPRINTS European project) up to 2 h with 1 and 10 min spatial and temporal resolution. The Flood-PROOFS forecasting system transforms a low-resolution deterministic forecast into a probabilistic high-resolution one thanks to downscaling methods.

3.2

Using Limited Area Ensemble Prediction System for Extending the Predictability of Flash Floods

Currently, a quasi-European flash flood indicator named EPIC, an acronym for European Precipitation Index based on Climatology, is included in the European Flood Awareness System (EFAS, see chapter ▶ Flash Flood Forecasting Based on Rainfall Thresholds and Medium Range Flood Forecasting example EFAS in this handbook). EPIC is designed to detect catchments with a high probability for heavy precipitation leading to flash flooding within the upcoming 5 days (Alfieri et al. 2011, 2012). One drawback of EPIC is that it is a purely rainfalldriven indicator and thus not capable of capturing flash flooding generated by other processes such as snow melt or the combination of saturated soils with moderate rainfalls. However, many studies have shown that certain geomorphologic features of the basin as well as initial soil moisture should not be neglected when assessing flash flood potential (Penna 2011). EPIC has thus been modified to take into account catchment properties and soil moisture in order to be more accurate in the identification of areas where flash floods can occur in the next 5 days. The new indicator, named ERIC for European Runoff Index based on Climatology, takes advantages of EPIC’s simple methods and introduces a dynamic runoff coefficient to account for the missing hydrological contribution to flash flood triggering in EPIC. Equation 3 illustrates how the upstream runoff is computed in ERIC. It sums up at each cell of the 1 km river network the rainfall data falling on the whole upstream area are for durations of 6, 12, and 24 h after weighting each of these contributions with a dynamic runoff coefficient. URdk ðtÞ ¼

  1 XN C f i ð t Þ  Pd k , i ð t Þ i¼1 N

where dk  f6, 12, 24g, duration chosen for the rainfall accumulation N number of grid cells in the upstream area P0dk , i Rainfall accumulation within dk in over the grid cell i Cf is the runoff coefficient

(3)

24

G. Pegram et al.

This coefficient is derived from rainfall–runoff relationships for different initial soil moisture. They have been determined thanks to the distributed hydrological model LISFLOOD which runs operationally over Europe as part of the EFAS. This model also provides twice a day the initial soil moisture information necessary to compute the simulated upstream runoff. Then the novel indicator, ERIC, is computed similarly to EPIC by comparing the current upstream runoff to the mean annual maxima determined thanks to the COSMO-LEPS and LISFLOOD climatologies (Eq. 4). 0

1

B ERIC ¼ max8dk @

URdk C A 1 XM max ð UR Þ d k j j¼1 M

(4)

where M = 19 number of years of the climatology maxðURdk Þj Maximum value of during year j Results from a 1-year analysis showed that ERIC correlates well with reported flash flood events after selecting a suitable threshold for return periods of upstream runoff, proving that merging meteorological EPS with hydrological features with simple methods helps providing more accurate flash flood forecast over large domains (Raynaud et al. 2014). The threat score defined by the number of hits divided the sum of hits, false alarms and missed events, has been improved from 0.34 for EPIC to 0.5 for ERIC indicator. Moreover, many flash flood events were detected by EPIC from a few hours to 1 day in advance, whereas ERIC was able to locate the flood threat over several consecutive forecasts increasing the confidence the warning that would have been issued. One example is the torrential rain that hit Halkidiki in Greece on February 10, 2010, and triggered rapid flooding and mudslides. EPIC detected the event a day in advance, but ERIC already showed significant return periods of forecast upstream runoff over the region for lead time exceeding 3 days. Figure 15 presents the outputs of ERIC’s forecast 2 days before the event. The reporting points correspond to river sections where the 20-year return period of upstream runoff should be exceeded. For this event, COSMO-LEPS had difficulties catching the intensity and the location of the events with sufficient lead time, but the wet conditions of soils at this time explain why ERIC successfully detected a potential flood.

3.3

Conclusions for LEPS-Based Early Warning Systems for Flash Floods

Short-term EPS have been demonstrated to be useful for the calculation of flash flood indicators over a wide range of spatial scales, ranging from local to quasi-continental scales. Although the temporal and spatial uncertainties are large, useful indication on the possibility of flash floods can be achieved with lead times of 24–72 h. Although being uncertain, awareness can be raised within the responsible authorities for potentially threatening situations in a certain area with a lead time of 24–72 h.

Present and Future Requirements for Using and Communicating Hydro. . .

25

Fig. 15 Two days lead time flash flood forecast of ERIC indicator for the flash flood event of February 10th 2010. The reporting points represent river sections where the 20-year return period of upstream runoff is exceeding by at least 35 % of COSMO-LEPS members

Ideally, the early warning information is complemented with higher-resolution systems based on nowcasting, blending, remote sensing, or in situ observations, like the one described in the first part of this chapter. As the forecasted events draw nearer such systems then allow quantitative estimates of the severity, duration and location of the event and provide guidance for the decision makers. The building of such “staggered warning information flow” with systems providing information with different accuracies for different levels of preparedness actions is very well accepted in the response community (IFRCRCS 2009).

4

Medium-Range Ensemble Prediction Requirements for a Multinational and Well-Controlled River

Flood damage can generally be limited by taking timely precautionary measures. However, this depends on the availability of reliable information about the course of the flood with sufficient lead time. While the mathematical models currently

26

G. Pegram et al.

available enable a good understanding of the runoff in rivers, operational hydrological forecasts of future events remain per definition uncertain. Although knowledge about hydrological behavior of river basins is growing, and model input data are abundantly available, there remain several significant sources of uncertainty. The initial conditions of the catchment at the start of the flood period are not known precisely, nor is it precisely known how much rain or snow will fall when and at which location. Furthermore, even sophisticated operational models used today are still a simplification of reality and are unable to precisely forecast all relevant hydrological processes in the catchment. In this section, the methods of including ensemble prediction system data for realtime estimation of hydrological uncertainty in the river Rhine forecasting system are described. First the Rhine basin, a catchment which constitutes six different country borders, is characterized. In the following part, several forecasting systems for the Rhine and the experiences with ensemble prediction flood forecasting are described. Finally the challenges and developments in probabilistic forecasting towards actual predictive uncertainty in a connected multinational forecasting environment are discussed.

4.1

The International Rhine Basin

With a length of 1,239 km and a catchment area of 185,000 km2 (CHR 2010), the Rhine belongs to the larger rivers in Europe. On a global scale, however, the Rhine does not even belong to the 100 largest rivers of the world and is no more than a medium-sized river basin. Nevertheless, the Rhine is well known all over the world. This is partly due to the variety of attractive scenery in the river basin, for instance, the famous waterfall of Schaffhausen in Switzerland and the Loreley in Germany, are famous tourist attractions. More importantly, the Rhine is one of the busiest shipping routes of the world, with a navigable length of more than 800 km. Approximately 60 million people live in the Rhine basin, which is shared by nine European states (Kalweit et al. 1993). Larger areas belong to Germany, France, The Netherlands, and Switzerland. Medium-sized and smaller areas belong to Austria, Luxemburg, Italy, Liechtenstein, and Belgium (Fig. 16). The Rhine rises in the Swiss Alps at a height of 2,345 m. The river flows through the Alpine foreland, the German uplands, as well as the German and Dutch lowlands. The Rhine is the only river connecting the Alps to the North Sea and the Atlantic Ocean. The Alpine Rhine and the major tributaries rising from the high mountains show a rather uneven discharge regime with large variations between low and high discharges. The discharge regime of the Rhine downstream from the Alpine region is more steady, partly due to the hydrological conditions, but also as a result of human interference. In the Alpine region, the ratio between the highest and the lowest discharge is about 1:70, whereas in the downstream part of the basin this ratio drops to 1:20 or even less.

Present and Future Requirements for Using and Communicating Hydro. . .

N

or

th

Se a

27

Amsterdam Rotterdam

Netherlands Lippe

Germany Ruhr

ine Rh

Cologne

Belgium

Sieg Lahn

Coblence Frankfurt

el

Süre

s Mo

Lux.

Main

e

h Na

Trier

r

cka

Ne

ar

Sa

Karisruhe France

Stuttgart

Nancy Strasbourg III

Munich Mulhouse State boundaries

Basel e

Aar

Rhine basin Rhine Tributaries

Austria

Bem Switzerland

Italy CHR/KHR

Fig. 16 The Rhine basin

The Alpine part of the basin contributes a varying amount to the discharge at the outlet depending on the time of year. In average, almost half of the discharge originates from the Alps. In summer, this contribution may rise to over 70 %; in winter it will drop to less than 30 % due to the fact that winter precipitation in the Alps is mainly stored as snow.

28

G. Pegram et al.

In comparison to the average discharge, the fluctuation in the discharge downstream from Lake Constance is moderate. In the Alpine region low flows occur in winter and floods in summer. From Lake Constance to the Rhine Delta, the temporal appearance of high and low discharges reverses. Traveling downstream, the influence of the German upland tributaries becomes more important, leading to higher winter discharges and lower summer discharges.

4.2

Operational Forecasting in the Rhine Basin

Operational flood forecasting is an essential part of flood protection. Information on expected water levels is an important basis for flood management. In Switzerland the Federal Office for the Environment is responsible for hydrological forecasting for the major rivers and lakes in the Swiss Rhine catchment. Every day at least one set of forecasts for all important gauges in the catchment are published. These forecasts are based on different numerical weather prediction models (NWP) in combination with different hydrological models. Currently COSMO-2, COSMO-7, and ECMWF medium-range forecasts provided by Meteo Swiss are implemented into the Swiss Forecasting System. New model results are available between two and eight times per day, depending on the lead time (from 33 h to 10 days) and spatial grid resolution (between 10 km and 2 km). In critical periods, hydrological forecasts are calculated and published as often as necessary and supplemented by written warning reports. Challenges include not only the correct measurement, interpolation and distinction of regional rainfall and snow accumulation in the alpine topography, but also the correct calculation of snow and glacier melt, as well as the consideration of anthropogenic influences such as large dams and hydropower stations or the runoff regulation of lakes and reservoirs. Following two major floods in 2005 and 2007, causing major damage to larger parts of Switzerland, the need became clear for more physically based models with a high spatial resolution to improve the quality of forecasts especially in small and medium-sized alpine catchments. Several new models have been set up, calibrated, and implemented into the Swiss forecasting system for all Swiss catchments. The use of different hydrological models enables the forecaster to evaluate model uncertainty in operational real-time forecasts. In Germany, the 16 Federal States are responsible for the flood forecasting service within their own territory. In the German Rhine basin, competence is shared between six Federal States. Some of them run their own Flood forecasting center, other share the responsibility. Hence, there are several forecasting centers in the German Rhine basin with a variability of used systems, models, and publishing times. Most of them use forecasting systems which are a combination of hydrological models and hydraulic models driven by different NWPs provided by the German Weather Service. Nationally and internationally, the centers are well connected – the more upstream centers supply forecasts to the downstream ones. Although until now only

Present and Future Requirements for Using and Communicating Hydro. . .

29

a few use EPS-based systems, the discussion about how the upstream-downstream chain will work for EPS already started. In the Netherlands, flood forecasting for the river Rhine is within the jurisdiction of the Dutch Water Management Centre, which is part of the Ministry of Infrastructure and Environment. Under normal conditions, the Centre publishes daily waterlevel forecasts for the gaging station Lobith on the German-Dutch border, mainly focusing on the navigation on the Rhine. In flood periods, the frequency of forecasts increases to a maximum of four times a day. Flood forecasts for the Rhine are published when the water level at the Lobith gauge rises to 14 m above mean sea level and when a further rise to at least 15 m is expected, a situation that occurs twice a year on average. Until January 1999, a statistical model based on multiple linear regression was used for daily forecasts as well as for flood forecasting. After the large floods in 1993 and 1995, it was clear that a new more physically based modeling approach was necessary, which led to the development of the current forecasting system RWSoS Rivers. The system is a combination of hydrological (HBV) and hydraulic (Sobek) models, developed for short- as well as for medium-range forecasts (4–14 days). It uses multiple weather forecasts, both deterministic and probabilistic.

4.3

Experiences in Probabilistic Forecasting

In Switzerland, the combination of hydrological forecasts and probabilistic meteorological ensemble forecasts was extensively tested and evaluated within the research project MAP D-Phase (Arpagaus et al. 2009). Promising results and positive end user feedback led to the decision to include this approach also in the daily operational runoff forecast. The probabilistic forecast is currently based on the COSMO-LEPS ensemble (Marsigli et al. 2005), which is a local area model ensemble, developed and run by the COSMO consortium. This ensemble is a 16 member high-resolution ensemble model with a spatial resolution of 7 km and a lead time of up to 132 h. The operating hydrologist usually discusses the spread and reliability of the current probabilistic NWP with a meteorological forecaster, as is the case with purely deterministic forecasts. In many cases, and especially in critical flood situations, there is a large variation in the forecast results of the different deterministic NWP. The spread of the probabilistic forecasts gives a good impression of the current predictability of the weather development in the next 3–4 days and the variability of the meteorological input to be expected (Fig. 17). Nevertheless, there are many situations when even a wide spread of the resulting runoff calculated by the different ensemble members cannot guarantee that the actual runoff lies between the limits of the calculated forecasts. Another problem for flood forecasting is the long calculation time needed by the ensembles and the rather low updating cycle. The initialization date of the COSMO-LEPS data is more than 11 h old when it is sent to

30

G. Pegram et al.

1,500

Abfluss [m3/s]

1,250

HQ100 HQ30

1,000 HQ10

750 HQ2

500 250 0 29.5.13 M

30.5.13 Do

25- /75-Percentile

31.5.13 Fr

33- /66-Percentile

Measured

1.6.13 Sa Median

2.6.13 So Min / Max

Fig. 17 Runoff forecast for the gauge Andelfingen, Thur, based on COSMO-LEPS forecast

the forecasting center. As the current update cycle is 12 h, only two probabilistic forecasts per day are available. Nonetheless, including the probabilistic NWP in the hydrological forecast has considerably facilitated the process of understanding and evaluating the uncertainty of a forecast situation. All forecasts are published on the Internet, including those based on probabilistic NWP. Even though the interpretation of the results require some training and knowledge, the reactions of the end users are generally very positive. In Germany and the Netherlands, probabilistic water level and discharge forecasts for the River Rhine have been calculated for more than 10 years (Renner et al. 2009). In The Netherlands, the probabilistic results are distributed on a daily basis to professional users only. Currently different meteorological forecasting products are being used to estimate the uncertainty in water-level prediction caused by the numerical weather prediction. These include both deterministic forecasts and ensemble forecasts at different temporal and spatial resolutions. To determine the bandwidth in meteorological uncertainty, the ECMWF-EPS ensemble and the COSMOLEPS ensemble are used. The ECWMF Ensemble Prediction produces 15-day probabilistic forecasts daily at 00 and 12UTC (Buizza et al. 1999, 2006, 2007). Since 2010, the EPS probabilistic forecast has been based on 51 integrations with approximately 32-km resolution up to forecast day 10 and 65-km resolution thereafter. An impression of the resolution of the ensemble can be seen in Fig. 18. Both meteorological ensemble products are run through the combination of hydrological HBV and hydraulic Sobek models. The model output is error corrected by a statistical postprocessing, which then leads to an ensemble water-level prediction.

Present and Future Requirements for Using and Communicating Hydro. . .

31

Fig. 18 Precipitation forecast from ECMWF ensemble prediction system

As a result, all 51 ensemble member forecasts are shown. A derived product is shown in Fig. 19. This Figure displays the 25 % and 75 % percentile forecast and the median.

4.4

Challenges and Developments in Probabilistic Forecasting. Towards Actual Predictive Uncertainty

Probabilistic forecasts do provide added value in comparison to deterministic ones, especially for navigation-related water-level forecast (see Chapter ▶ Probabilistic Shipping Forecast [Meißner and Klein] within this handbook). But they must be improved to guarantee reliable flood forecasts. A more rapid updating cycle and faster availability are essential to achieve this. A reliable probabilistic forecast will become even more important in the future, especially due to the higher spatial resolution of future NWP generations. New generations of NWP ensemble models should improve the quality of the results significantly. Experiences with ensemble water-level prediction based only on meteorological uncertainty in the Netherlands showed the need to implement other sources of uncertainty. Figure 20 shows an ensemble water-level forecast during a medium size flood in January 2011. The orange lines are the forecasted water levels based on ECMWF weather prediction, the blue line is the observed water level. A hindcast of the event has shown that in this specific case the main sources of uncertainty were the initial conditions and the snow melt.

Water Level (m)

32

G. Pegram et al. Lobith

17.25 17.00 16.75 16.50 16.25 16.00 15.75 15.50 15.25 15.00 14.75 14.50 14.25 14.00 13.75 13.50 13.25 13.00 12.75 12.50 12.25 12.00 11.75 11.50 11.25 11.00 10.75 10.50 10.25 10.00 9.75 9.50

National alarm level Regional alarm level

Warning Level Rhine

Flooding Lower Riverbed Levels

07-01-2011 00:00:00

09-01-2011 00:00:00

11-01-2011 00:00:00

13-01-2011 00:00:00

15-01-2011 00:00:00

17-01-2011 00:00:00

19-01-2011 00:00:00

[1] 08-01-2011 23:30:00 Current Rijn_Forecast_ECMWF-EPS [2] 16-01-2011 01:00:00 Current Rijn_Update [3] 09-01-2011 05:30:00 Current Rijn_Forecast_DWD-LM [1] H.fas (ECMWF)

[2] H.us (LobRTK)

[3] H.fas

H.m Merged Import

Fig. 19 Water-level forecast for the Rhine at Lobith based on ECMWF ensemble weather prediction during a medium size flood in January 2011. The orange lines are the 50 ensemble forecasts, the blue line the observed water level

3,750 3,500 3,250 3,000 Discharge (m3/s)

2,750 2,500 2,250 2,000 1,750 1,500

National alarm level Regional alarm level Warning level Meuse

1,250 1,000 750 500 250 0

Fig. 20 COSMO-Leps ensemble discharge forecast, dressed with hydrological uncertainty post processor. An example for the River Meuse in the Netherlands

Present and Future Requirements for Using and Communicating Hydro. . .

33

Rainfall input is the largest obstacle to reliable flood forecasts and is responsible for most of the uncertainty in flood forecast. The inclusion of other sources of uncertainty such as modeled snow melt, initial conditions (e.g., soil properties), and various sources of uncertainty in the hydrological models can help to create a more realistic spread. Stable, robust and easily interpreted approaches are essential to evaluate ensemble forecasts accurately and to improve the actual usability for end users. Different approaches have been used to integrate these sources of uncertainty. In recent years, Bayesian Mean Averaging (BMA) (Liang et al. 2013) and Quantile Regression (QR) (Weerts et al. 2011; Lopez et al. 2014) have been tested in the Netherlands. In BMA, a correction for the bias and the uncertainty of an ensemble forecast in a training period prior to the present forecast is used. In the training period, historical model forecasts are compared with observations. The spread within and between the model realizations is used to quantify the uncertainty of the overall forecast. From the performance of each model in this training period, it is possible to calculate the likelihood of accuracy of the current forecast. This is used as a weight in the overall forecast (Ebel and Beckers 2009). The QR technique conditions forecast uncertainty of the forecasted value itself, based on a retrospective quantile regression of hindcasted water-level forecasts and forecast errors (Weerts et al. 2011). Both methods provide a relatively simple, efficient, and robust method for estimating predictive uncertainty and will be further tested in the coming years. A relatively new way to produce operational forecasts in the Netherlands is the “Ensemble Dressing” method (Verkade et al. 2013). This method assumes that the meteorological forecast ensemble is unbiased. The “naked” ensemble prediction as it is published by the Dutch forecasting center does not contain information on the hydrological uncertainty (model, parameters, initial conditions). The hydrological uncertainty is estimated through a hindcast based on observed precipitation and temperature (“perfect forcing”) and characterized with Quantile Regression. The estimation of the hydrological uncertainty is determined for each of the ensemble members of the Cosmo-LEPS forecast depending on the lead time and on the height of the forecast at that lead time. The total uncertainty is estimated as the mean of the individual density functions. An example (here for the river Meuse) of the COSMOLeps ensembles, “dressed” with hydrological uncertainty is shown in Fig. 21.

4.5

Conclusions for Medium-Range Flood Forecasting for Controlled Rivers

The introduction of information about the uncertainty of operational forecasts can be considered a prerequisite for state-of-the-art forecasts in the future, not only for operational flood forecasts but also for seasonal and for drought forecasts. Currently many end users are still used to only receiving deterministic information on forecasted runoff and water levels, but there is growing appreciation for the added value of probability information in forecasts. Good communication, easy to handle

34

G. Pegram et al.

Fig. 21 Variation in annual inflow to Swedish hydropower plants expressed in TWh. Deviation from median value. The conversion to TWh is based on the current production capacity (Source: Swedenergy)

RUNOFF VARIATIONS IN RELATION TO MEDIAN VALUE 1960-2012 TWh/year 25 15 5 -5 -15 -25 1960

1970

1980

1990

2000

2012

information, and trainings are essential for decision makers to accept these new forms of forecasting. Currently not all forecast centers along the Rhine are using EPS-based systems, but the discussion about how the upstream-downstream chain will work for EPS has already started. In a connected trans-boundary catchment with different models and systems as the one presented, it is challenging but crucial to pass on information about uncertainty from one forecasting center to the next. EPS will become an integral part of the flood forecasting system of the Rhine, allowing an extension of reliable lead times in many cases and helping to identify uncertain forecasting situations. To increase the level of acceptance, the calculations need to be constantly reliable and be complemented with bias corrections. Stable approaches, which are not only easy to configure and calibrate, but also easy to use and interpret, will evolve with time and experience. The introduction of improved ensemble NWP with higher spatial resolution and higher updating cycles will significantly help to reduce the level of uncertainty and to enhance the reliability of these new forecasting products.

5

Requirements for Using and Communicating HydroMeteorological EPS Medium-Seasonal Forecasting

While the previous sections on the this chapter have addressed flash flood and flood forecasting for which nowcasting, short-term and medium-range ensembles are useful, there are many other hydrological applications which act on longer time scales, e.g., for drought applications, water resource management or hydropower. In the following the requirements for using and communicating hydro-meteorological EPS in the medium-seasonal scale is being illustrated at the example of hydropower.

Present and Future Requirements for Using and Communicating Hydro. . .

5.1

35

Hydropower, Forecasting and Communication Requirements

Worldwide, hydropower is the most common form of renewable energy. In Sweden, some 45 %, or 65 TWh, of the total electricity production comes from hydropower. Most of the major rivers are affected by hydropower production. Dams and reservoirs have been constructed to store runoff, generated during snow melt and autumn rains, throughout the winter season when runoff is low and electricity demands are high. In hydropower plants, the production can quickly be increased in response to increased demands. In combination with the availability of stored water, it makes hydropower especially valuable to match supply both from less flexible sources, like nuclear power, and more variable renewable sources like wind. In Sweden the annual runoff, expressed in energy terms, may vary by as much as 30 TWh (Fig. 22). Whether it is a wet year or a dry year has a large effect on the electricity prices in Scandinavia. The increased transfer capacity between Scandinavia and the European continent means that the water availability in Norway and Sweden also starts to affect prices outside Scandinavia. Inflow forecasts are of interest for different lead times as well as time horizons: • Seasonal forecasts are used to predict the inflow during the next 3–6 months. Of particular interest is the estimation of the snow melt runoff volume. The runoff

Fig. 22 Examples of seasonal forecasts based on climatological ensembles. The examples from two different years illustrate the importance of the initial state for the outcome of the forecast. In 1989 the snow pack on the first of February was around three times higher than in February 1996

36

G. Pegram et al.

forecasts are used for long-term production and reservoir planning as well as for price estimates. • Medium-range forecasts are of importance for run-of-river plants with low storage capacity. Predictions of high inflow will most likely decrease electricity prices. During flood periods when reservoirs are full, medium-range forecasts becomes essential in reservoir management and the prevention of damages along the rivers. At the end of the winter when many reservoirs are nearly empty, the interest is in predicting the date when the snow melt runoff starts. The HBV rainfall/runoff model was originally developed in Sweden in the early 1970s (Bergström 1972; Bergström and Forsman 1973; Lindström et al. 1997). Since then it has been widely used by the hydropower industry in Scandinavia to forecast the inflow to reservoirs and power plants. Seasonal forecasting with a climatological ensemble was an early application. Around 1990, the first Windows user interface called Integrated Hydrological Modelling System (IHMS) was created for the HBV model. From being a fairly complicated tool handled by a few, the HBV became an everyday tool for the hydrologists at the hydropower companies. At the beginning, seasonal forecasts were made once a month during winter and mediumrange forecasts on special occasions. Today, seasonal forecasts are made several times a week and medium-range forecasts every day. The uncertainty of the forecasts has largely been handled subjectively, taken into consideration in the decisions made on production plans. Only recently have the hydropower companies started to include the full forecast ensembles in their planning systems.

5.2

Seasonal (Spring-Flood) Ensemble Forecasting

In Sweden seasonal runoff forecasts for hydropower applications are mainly carried out by hydrologists employed by the hydropower companies. They use a common tool, the Integrated Hydrological Modelling System (IHMS) which includes the HBV model. The HBV was originally developed to assist hydropower operations. The aim is to create a conceptual hydrological model with reasonable demands on computer facilities and calibration data. The HBV approach has proved flexible and robust in solving water resource problems and applications now span over a broad range. As was mentioned in the previous section, hydrological forecasts of future events are per definition uncertain and the longer the lead time the larger the uncertainties. In order to capture this uncertainty, ensembles are applied. The seasonal forecasts are traditionally based on a climatological ensemble including all years from 1961 and onwards. For the catchment in question, a set-up and calibrated HBV model is initialized by running it until the day of the forecast using observed meteorological inputs (temperature and precipitation). Then the catchment’s meteorological time series in the forecast period from each historical year is used to feed the initialized HBV model that way generating one possible realization of the future runoff (Fig. 23). The variation in forecast results is thus due

Present and Future Requirements for Using and Communicating Hydro. . .

a

37

m3/s 150

100

50

0 2014-01-16

2014-01-24

2014-02-01

2014-02-06

2014-02-15

2014-02-22

b

3/6

4/6

5/6

6/6

7/6

8/6

9/6

10/6

11/6

Sannolikhet att varningsklass 1 overskrids. Nästan Säker Stor sonnolikhet Sonnolik Ganska sonnolik Läg sonnolikhet Osannolik Värde soknas Scale = 1 : 10M

Datum: 4/6

Fig. 23 (a) Presentation of hydrological model runs based on different sets of meteorological forecasts. Before the forecast time (2014-02-15), black and red lines represent observations of different time resolution and the blue circles model HBV simulations. After the forecast, blue circles represent deterministic HBV forecast, solid blue line and blue fields different probabilities of exceedance (top to bottom: 2 %, 25 %, 50 %, 75 % and 98 %) from the HBV EPS forecast, and the yellow lines deterministic forecast and EPS median from the S-HYPE model. (b) Flood probability maps for exceeding a given threshold

38

G. Pegram et al.

to the initial conditions at the time of the forecast, mainly the amount of water stored in the snow pack. The main aim of the forecast is to predict the inflow volume to reservoirs. The IHMS includes routines to accumulate the runoff over the forecast period and to compute runoff levels with different exceedance probability. The IHMS/HBV also includes routines to evaluate the seasonal forecast with respect to the ensemble mean. The forecast volume of each forecast is saved, and at the end of the season, the program is run to compute the forecast error. Normally, the forecast made just before the start of the melt season is discussed by end users and modelers, and the forecast error is compared to the model error. So far, there is no routine to evaluate the probability levels. Previously, many of the efforts to improve the forecasts have been focused on improving the hydrological model components, which have been estimated to contribute to 60 % of the total forecast error (Arheimer et al. 2011). The remaining error is thus related to the meteorological forecast, and recently more focus has been put on improving this part of the forecasting chain. In the climatological ensemble approach, the mean forecast indicates the hydrological response to a future evolution of the weather which corresponds to the normal conditions. The spread between ensemble members indicates the expected range if the weather will deviate from the normal conditions. But the forecast does not contain any information whether the weather in the specific forecast period ahead is expected to become normal or not. Thus, a conceivable way to improve the hydrological forecasts is to incorporate information about the expected future development of the weather and its deviation from the climatological mean. One obvious possibility is to replace the historical meteorological time series by seasonal (ensemble) meteorological forecasts over the period of interest. Such forecasts are available from, e.g., the European Centre for Medium-range Weather Forecasting (ECMWF). Generally, the accuracy of seasonal meteorological forecasts is limited in Sweden as well as the rest of Northern Europe. This is especially the case for precipitation, whereas temperature forecasts are normally more reliable. As temperature is a key driver for the spring flood generation, this implies a potential added value. Tests of using ECMWF seasonal forecasts for spring flood forecasting in Sweden have been performed in the last 5–10 years but with limited success so far. No distinct improvement over the climatological ensemble approach has been attained. This is likely a combination of scale issues and model bias. The size of the grid cells in the ECMWF forecasting model (e.g., 1  1 ) produces smoothed fields of temperature and precipitation that do not fully reflect the impact of altitudinal gradients and catchment-scale variability. This fact as well as general model uncertainty can make the forecasts differ systematically from catchment-scale observations. Better performance is likely attainable by statistical downscaling and bias correction of the ECMWF forecasts prior to the hydrological forecast simulations. An alternative approach, which is not affected by scale issues, is to try to reduce the historical, climatological ensemble by estimating which of the historical years that are most likely representative of the future evolution of the weather during the current year. The general features of the weather during a certain year or period may

Present and Future Requirements for Using and Communicating Hydro. . .

39

be characterized using, e.g., teleconnection indices (TCI) or circulation patterns (CP). In these methods, the large-scale atmospheric circulation is described in terms of pressure anomalies (i.e., deviations from the climatological mean) or frequency of different weather situations. By characterizing the current year’s weather in terms of TCIs and/or CPs, using either observations up until the forecast time or forecasts over the period of interest, and comparing with the historical years’ values, the most similar historical years may be identified and used in the forecast simulations. Preliminary testing of the approach has failed to show any distinct overall improvement over the climatological ensemble approach although there are signs of increased accuracy in the early forecasts, e.g., spring flood forecasts issued already in the beginning of the year. A further approach that does not involve hydrological modeling is statistical atmospheric downscaling of spring flood volume directly. As (1) the spring flood is a direct reflection of the winter climate and (2) the winter climate in Scandinavia is known to be governed by large-scale climate teleconnections, as manifested in e.g., the North-Atlantic Oscillation (NAO) index, such a direct link is conceivable. By advanced statistical tools, it has proved possible to link the spring flood volume to large-scale atmospheric variables such as pressure, wind speed, and humidity in winter. Also this approach has demonstrated a potential to improve the accuracy of early spring flood forecasts. The Swedish hydropower industry is closely following the development of the new approaches and is partly funding the research (Olsson et al. 2016). An attempt to combine the current climatological procedure with the new approaches in a multimethod system has indicated that there is scope for increasing the overall forecast accuracy by 5–10 %. This would translate into a quite substantially more effective energy production and, in turn, increased economical revenues. Preoperational testing of the system is ongoing.

5.3

Medium-Range (10-Day) Forecasting

The hydrological forecasting division at SMHI has been running a hydrological ensemble prediction system (EPS) to generate probability forecasts operationally since 2004. The inputs to the HBV model are the meteorological 10-day EPS forecasts from ECMWF, consisting of 50 ensemble members and one undisturbed control forecast. An auto-regressive updating of the simulated flow is applied in each single forecast run, making the simulated discharge at the time of the forecast agree with the concurrent observation. From the resulting 51 hydrological ensemble forecasts, “raw” nonexceedance probabilities are estimated. Evaluations of these raw forecasts have however clearly showed that the resulting nonexceedance probabilities are heavily biased (Olsson and Lindström 2008). Mainly, the spread in the hydrological ensemble is far too narrow, especially in the first days of the forecast. Thus, the forecasts often falsely indicate a high degree of certainty, whereas in reality there is a high probability of the actual flow being outside the ensemble range. The reason for the underestimated spread is not yet fully clarified, but it is likely a combination of

40

G. Pegram et al.

limitations in both the meteorological forecasts (such as an underestimated spread also in them as well as scale issues making the forecasts less variable than catchmentscale weather fluctuations) and in the hydrological process descriptions. A simple correction is applied to the raw probabilities in order to provide more realistic estimates, including the combined effect of uncertainties in both the meteorological forecast and the hydrological model. Ensemble stream flow predictions based on the HBV model are done both for selected catchments and on the national scale. The selected catchments comprise 80 so-called indicator basins in Sweden, which are designed to represent the hydrological conditions in different parts of the country (and catchments of different size) and where there is a real-time discharge gauge. The ensemble stream flow prediction at SMHI is integrated with the national forecasting system and tailored products are available to specialized end users via Internet. An example of how the single-catchment ensemble forecasts are visualized for the end users is given in Fig. 24a. On the national scale, flood probability maps for exceeding a certain threshold, i. e., a certain warning level, are produced automatically once a day (Fig. 24b). The flood probabilistic forecasts are based on a HBV-model set-up that covers the whole country (“HBV-Sweden”) divided into 1001 subbasins of sizes between 200 and 700 km2. Probabilities for exceeding a certain warning level (corresponding to return periods of e.g., 2 or 10 years) are calculated for each one of these 1001 subbasins, where the levels have been determined by frequency analysis of historical simulations. Hydrological probability forecasts should be seen as an early warning product that can give improved support in decision making to end users communities, for instance, Civil Protections Offices and County Administrative Boards as well as hydropower plant operators. An early indication that “something is going to happen,” that may not be evident from a deterministic forecast, may be highly valuable especially in conditions with already high flows. Besides the actual estimated levels, the ensemble spread itself gives an indication of the forecast uncertainty, which may be qualitatively useful. There are however also some conceived limitations with the medium-range probability forecasts when used in an operational context. One is the previously discussed limitations related to the spatial resolution of the meteorological ensemble forecasts and the resulting difficulties in capturing local variations. This particularly affects un-gauged catchments without possibility of updating the hydrological forecasts.

5.4

Conclusions and Future Outlook for Ensemble-Based Hydropower Forecasting

Ensemble forecasts on both medium-range and seasonal timescales have been available for the Swedish hydropower industry since more than a decade. The importance is supported by the fact that the industry continuously funds research aiming at improved forecasts by probabilistic approaches. In actual practice, the use

Present and Future Requirements for Using and Communicating Hydro. . .

41

of ensembles and probabilities for concrete planning and management still appears limited although the hydropower companies very recently have asked for full forecast ensembles to be delivered. There is clearly a general understanding of the potential benefits of probability forecasts, but procedures for using the added information in the actual operation remain to be developed. Further, the problems encountered with, e.g., scale issues and biased probabilities may have made hydropower end users hesitate and await the development of more robust and reliable products. A good overview of the experiences and expectations from the international community of hydropower practitioners was provided by the 2011 Workshop on Operational River Flow and Water Supply Forecasting arranged by the Canadian Society for Hydrological Forecasting (Weber et al. 2012). Concerning ensemble forecasting, it was recommended to sample the full range of uncertainties including not only the future weather but also uncertainties in the initial state as well as hydrological model structure and parameter values. The latter issues are being explored also at SMHI, e.g., by studying new ways to estimate the snow cover and by using new, alternative model concepts. In addition to hydrological forecasting using the HBV model, the S-HYPE model (Strömqvist et al. 2012) is gradually being introduced at the SMHI. S-HYPE is a high-resolution model for all of Sweden, based on the HYPE model code (Lindström et al. 2010). The average basin area in the current S-HYPE version is about 13 km2. Some advantages of using the S-HYPE model for hydrological forecasts are that it covers the whole country and updates predictions of discharge and water levels at gaging stations. Hydrological statistics are available for the whole country, based on a method described by Bergstrand et al. (2014). Today, S-HYPE is run with both alternative deterministic forecasts and ensemble forecasts from the ECMWF (Fig. 24a), and the plan is to also introduce seasonal forecasts. The operational forecasts are stored in a database, which makes it possible to follow up and evaluate forecast quality (see e.g., Pechlivanidis et al. 2014). From late 2013, hydrological forecasts based on S-HYPE are available at the operational flood warning service at SMHI. From February 2014, S-HYPE forecasts are also made available to the public at http://vattenwebb.smhi.se/hydronu/. This service puts the present hydrological situation into perspective by comparing with historic data and also includes a 10 day forecast. Another recommendation from the above mentioned workshop was to use a thorough suite of statistical and graphical measures for forecast evaluation. Knowledge of the historical forecasting performance is clearly beneficial, both for the forecaster and for the user of the forecast, but unfortunately only limited attention is generally paid to this issue. The workshop further concluded that postprocessing of forecast ensembles will continue to be necessary and also emphasized that the hydrological forecaster needs to have a good understanding of the physical processes involved, in order to properly interpret and convey the results of model forecasts. Concerning the future role of hydropower production, it is likely to become more and more important as a regulator, to meet both the variation in electricity use and the

42

G. Pegram et al.

variation in production from other renewable sources like wind and solar energy. Climate change is however expected to substantially change the future hydropower production as well as the future role of the hydrological forecasts in Sweden. The global warming will make the winter climate substantially milder which means that not all precipitation will accumulate as snow but also runoff episodes will occur. As the power demand is high during winter, this increased inflow is beneficial for the hydropower production. The reduced snow pack will lead to lower spring flood peaks, and overall the annual cycle will even out. With a reduced need for reservoir storage capacity before the spring flood, electricity can be produced with a higher hydraulic head. The added hydropower potential in Sweden by the end of the century owing to climate change has been estimated to equal the capacity of 1–3 nuclear reactors (Andréasson et al. 2007). Concerning the forecasts, spring flood forecasts will remain important for the foreseeable future, but on the long term their importance is expected to decrease along with the gradual weakening of the spring flood. Long-term forecasts for other seasons, winter in particular, will become more important. A general expected consequence of the global warming is a change towards higher variability and more pronounced extremes. This will likely make forecasting even more challenging than today. A particular difficulty concerns the discharge and water-level thresholds used for issuing warnings and alerts, e.g., 2- or 10-year levels. In a nonstationary, changing climate, these levels will gradually change, and in snow melt dominated hydrological regimes such as northern Sweden, these changes may happen relatively fast. Developing robust routines for frequency analysis and optimal estimation of warning levels under nonstationary conditions is an important and urgent task.

6

Summary

This chapter has illustrated with four different examples the use and communication of ensemble prediction systems for very short-term applications of flooding in urban areas and flash flood indicators, for medium-range applications for flood forecasting in medium-large water sheds and for medium-seasonal applications in reservoir management. The limitations of using Numerical Weather Prediction-based EPS in very short-term applications on the one hand and long-term applications on the other hand are being clearly demonstrated, while the benefits of EPS for short-mediumrange flood forecasting is clearly illustrated. Examples of hydrological ensemble prediction systems produced with different inputs are being shown, examples being remote sensing rainfall estimates, numerical weather prediction systems or climatological data. The examples have shown that, not surprisingly, the requirements depend largely on the application. Nowcasting systems have different requirements on spatial and temporal scales than seasonal forecasting systems, systems for flood forecasting experts have different requirements for visualization than systems for disaster managers. However, the following common points can be highlighted: First of all, uncertainty is inherent in all forecasting and prediction systems and needs to be dealt

Present and Future Requirements for Using and Communicating Hydro. . .

43

with and not ignored. Second, the reliability of the forecasts is considered key and hydrological ensemble prediction system requires postprocessing to ensure this reliability. Biases and unreliability particularly in the meteorological NWP products limit the usability of the data for hydrological applications. Only reliable systems generate the required trust with the experts and the end users that the system is valuable. Third, the uncertainty needs to be quantified and visualized and there is a multitude of different ways to do so. Fourth, information is typically presented either in the form of time series, e.g., hydrographs or rainfall volumes or maps, e.g., flood inundation maps. The complexity of the visualization depends largely on the end user and differs for expert users, e.g., forecasters or operators, or downstream end users such as disaster managers. Fifth, in the examples presented here, the ensemble information is plotted or visualized against “deterministic” thresholds, i.e., warning levels.

References L. Alfieri, D. Velasco, J. Thielen, Flash flood detection through a multi-stage probabilistic warning system for heavy precipitation events. Adv. Geosci. 29, 69–75 (2011) L. Alfieri, J. Thielen, F. Pappenberger, Ensemble hydro-meteorological simulation for flash flood early detection in southern Switzerland. J. Hydrol. 424–425, 143–153 (2012) J. Andréasson, S.-S. Hellström, J. Rosberg, S. Bergström, Summary of climate change maps of the Swedish water resources – background material for the Swedish Commission on Climate and Vulnerability, SMHI Hydrology No 106, SMHI, 601 76 Norrköping, 15 pp (in Swedish) (2007) B. Arheimer, G. Lindström, J. Olsson, A systematic review of sensitivities in the Swedish floodforecasting system. Atmos. Res. 100, 275–284 (2011). doi:10.1016/j.atmosres.2010.09.013 M. Arpagaus et al., MAP D-PHASE: demonstrating forecast capabilities for flood events in the Alpine region. Veröff. MeteoSchweiz 78, 75 pp (2009) A. Atencia, T. Rigo, A. Sairouni, J. Moré, J. Bech, E. Vilaclara, J. Cunillera, M.C. Llasat, L. Garrote, Improving QPF by blending techniques at the Meteorological Service of Catalonia. Nat. Hazards Earth Syst. Sci. 10, 1443–1455 (2010) A. Bab-Hadiashar, D. Suter, R. Jarvis, 2-D motion extraction using an image interpolation technique. SPIE 2564, 271–281 (1996) M. Berenguer, D. Sempere-Torres, G. Pegram, SBMcast – an ensemble nowcasting technique to assess the uncertainty in rainfall forecasts by Lagrangian extrapolation. J. Hydrol. (2011). doi:10.1016/j.jhydrol.2011.04.033 M. Bergstrand, S.-S. Asp, G. Lindström, Nation-wide hydrological statistics for Sweden with high resolution using the hydrological model S-HYPE. Hydrol. Res. 45, 349 (2014) S. Bergström, The application of a simple rainfall-runoff model to a catchment with incomplete data coverage. SMHI, Notiser och preliminära rapporter, ser. Hydrologi No. 26. IHD-Report (1972) S. Bergström, A. Forsman, Development of a conceptual deterministic rainfall-runoff model. Nord. Hydrol. 4, 147–170 (1973) R. Bornstein, Q. Lin, Urban heat islands and summertime convective thunderstorms in Atlanta: three case studies. Atmos. Environ. 34(3), 507–516 (2000) R. Buizza, A. Hollingsworth, F. Lalaurette, A. Ghelli, Probabilistic predictions of precipitation using the ECMWF Ensemble Prediction System. Weather Forecast. 14, 168–189 (1999) R. Buizza et al., The new ECMWF variable resolution ensemble prediction system (VAREPS): methodology and validation. ECMWF Tech. Memo. No. 499 (2006)

44

G. Pegram et al.

R. Buizza et al., The new ECMWF VAREPS (variable resolution ensemble prediction system). Q. J. Roy. Meteorol. Soc. 133, 681–695 (2007) T. Camus, H.H. Bulthoff, Real-time optical flow extended in time. Max-Planck-Institute. Technical report No. 13 (1995) CHR, Website International Commission for the Hydrology of the Rhine basin. The length of the Rhine (2010) H.L. Cloke, F. Pappenberger, Ensemble flood forecasting: a review. J. Hydrol. 375, 613–626 (2009) A.N. Clothier, G.G.S. Pegram, Space-time modeling of rainfall using the String of Beads model: integration of radar and raingauge data. Water Research Commission Report, No. 1010/1/02 (2002). ISBN 1 86845 835 0 M. Ebel, J.L.V. Beckers, Operational river and coastal water level forecast using Bayesian Model averaging. DELTARES (Delft) (2010). doi: Nr. 1200379-003 A. Gelb, Applied Optimal Estimation (MIT Press, Cambridge, MA, 1974) U. Germann, M. Berenguer, D. Sempere-Torres, M. Zappa, REAL – ensemble radar precipitation estimation for hydrology in a mountainous region. Q. J. Roy. Meteorol. Soc. 135, 445–456 (2009). doi:10.1002/qj.375 R. Hannesen, An enhanced surface rainfall algorithm for radar data. Progress report for MUSIC. European Commission contract No. EVK1-CT-2000-00058 (2002) A.P. Hurford, S.J. Priest, D.J. Parker, D.M. Lumbroso, The effectiveness of extreme rainfall alerts in predicting surface water flooding in England and Wales. Int. J. Climatol. 32, 1768–1774 (2012). Doi:10.1002/joc.2391 International Federation of Red Cross and Red Crescent Societies, Word Disaster Report 2009; Focus on Early Warning, Early Action (ATAR Roto Presse, Satigny/Vernier, 2009) R.E. Kalman, A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82(Series D), 35–45 (1960) H. Kalweit et al., Der Rhein unter den Einfluss des Menschen – Ausbau, Schiffahrt, Wasserwirtschaft. CHR report No I-11 (in German) (1993) K. Kober, G.C. Craig, C. Keil, A. Dörnbrack, Blending a probabilistic nowcasting method with a high-resolution numerical weather prediction ensemble for convective precipitation forecasts. Q. J. Roy. Meteorol. Soc. 138, 755–768 (2012). doi:10.1002/qj.939 H. Kusaka, K. Nawata, A. Suzuki-Parker, Y. Takane, N. Furuhashi, Mechanism of precipitation increase with urbanization in Tokyo as revealed by ensemble climate simulations. J. Appl. Meteorol. Climatol. 53(4), 824–839 (2014) H. Leijnse, R. Uijlenhoet, J.N.M. Stricker, Rainfall measurement using radio links from cellular communication networks. Water Resour. Res. 43, W03201 (2007). doi:10.1029/ 2006WR005631 Z. Liang et al., Application of Bayesian model averaging approach to multimodel ensemble hydrologic forecasting. J. Hydrol. Eng. 18(11), 1426–1436 (2013) K. Liechti, M. Zappa, F. Fundel, U. Germann, Probabilistic evaluation of ensemble discharge nowcasts in two nested Alpine basins prone to flash floods. Wiley Online Libr. (2012). doi:10.1002/hyp.9458 S. Liguori, M.A. Rico-Ramirez, A.N.A. Schellart, A.J. Saul, Using probabilistic radar rainfall nowcasts and NWP forecasts for flow prediction in urban catchments. Atmos. Res. 103, 80–95 (2012) G. Lindström, B. Johansson, M. Persson, M. Gardelin, S. Bergström, Development and test of the distributed HBV-96 model. J. Hydrol. 201, 272–288 (1997) G. Lindström, C.P. Pers, R. Rosberg, J. Strömqvist, B. Arheimer, Development and test of the HYPE (Hydrological Predictions for the Environment) model – a water quality model for different spatial scales. Hydrol. Res. 41(3–4), 295–319 (2010) M.C. Llasat, M. Llasat-Botija, M.A. Prat, F. Porcú, C. Price, A. Mugnai, K. Lagouvardos, V. Kotroni, D. Katsanos, S. Michaelides, Y. Yair, K. Savvidou, K. Nicolaides, High-impact

Present and Future Requirements for Using and Communicating Hydro. . .

45

floods and flash floods in Mediterranean countries: the FLASH preliminary database. Adv. Geosci. 23, 1–9 (2010) P. Lopez et al., Alternative configurations of quantile regression for estimating predictive uncertainty in water level forecasts for the Upper Severn River: a comparison. Geophys. Res. Abstr. 16, EGU2014-14591 (2014). EGU General Assembly 2014 C. Marsigli, F. Boccanera, A. Montani, T. Paccagnella, The COSMO-LEPS mesoscale ensemble system: validation of the methodology and verification. Nonlinear Proc. Geoph. 12, 527–536 (2005) J. Olsson, G. Lindström, Evaluation and calibration of operational hydrological ensemble forecasts in Sweden. J. Hydrol. 350, 14–24 (2008). doi:10.1016/j.jhydrol.2007.11.010 J. Olsson, C.B. Uvo, K. Foster, W. Yang, Technical note: initial assessment of a multi-method approach to spring flood forecasting in Sweden, Hydrol. Earth System Sci. 20, 1–9 (2016). doi:10.5194/hess-20-1-2016 T.N. Palmer, F. Molteni, R. Mureau, R. Buizza, P. Chapelet, J. Tribbia, Ensemble prediction, in Proceedings of 1992 ECMWF Seminar: Validation of Models Over Europe (ECMWF, Reading, 1993), pp. 21–66 L. Panziera, U. German, M. Babella, P.V. Mandapaka, NORA – nowcasting of orographic rainfall by means of analogues. Q. J. Roy. Meteorol. Soc. 137, 2106 (2011). ISSN 0035-9009 I.G. Pechlivanidis, T. Bosshard, H. Spångmyr, G. Lindström, G. Gustafsson, B. Arheimer, Uncertainty in the Swedish operational hydrological forecasting systems, in Proceedings of the Second International Conference on Vulnerability and Risk Analysis and Management (ICVRAM2014), University of Liverpool, Liverpool, 13–16 July 2014 (2014) G.G.S. Pegram, A continuous streamflow model. J. Hydrol. 47, 65–89 (1980) G.G.S. Pegram, A.N. Clothier, High resolution space-time modelling of rainfall: the “String of Beads” model. J. Hydrol. 241, 26–41 (2001) G.G.S. Pegram, D.S. Sinclair, A linear catchment model for real time flood forecasting. Water Research Commission Report, No. 1005/1/02 (2002) D. Penna, H.J. Tromp-van Meerveld, A. Gobbi, M. Borga, G. Dalla Fontana, The influence of soil moisture on threshold runoff generation processes in an alpine headwater catchment. Hydrol. Earth Syst. Sci. 15, 689–702 (2011) D. Raynaud, J. Thielen del Pozo, P. Salamon, P.A. Burek, S. Anquetin, L. Alfieri, A dynamic runoff co-efficient to improve flash flood early warning in Europe: evaluation on the 2013 Central European floods in Germany. Meteorol. Appl. 22(3), 410–418 (2014) M. Renner, M.G.F. Werner, S. Rademacher, E. Sprokkereef, Verification of ensemble flow forecasts for the River Rhine. J. Hydrol. 376, 463 (2009) A. De Roo, B. Gouweleeuw, J. Thielen, P. Bates, A. Hollingsworth et al., Development of a European Flood Forecasting System. Int. J. River Basin Manag. 1(1), 49–59 (2003) A. Rossa, G. Haase, C. Keil, P. Alberoni, S. Ballard, J. Bech, U. Germann, M. Pfeifer, K. Salonen, Propagation of uncertainty from observing systems into NWP: COST-731 Working Group 1. Atmos. Sci. Lett. 2, 145–152 (2010) M.W. Rotach et al., MAP D-PHASE: real-time demonstration of weather forecast quality in the Alpine region. Bull. Am. Meteorol. Soc. 90, 1321–1336 (2009). doi:10.1175/2009BAMS2776.1 K. Savvidou, S. Michaelides, K.A. Nicolaides, P. Constantinides, Presentation and preliminary evaluation of the operational Early Warning System in Cyprus. Nat. Hazards Earth Syst. Sci. 9, 1213–1219 (2009). doi:10.5194/nhess-9-1213-2009 J. Schaake, K. Franz, A. Bradley, R. Buizza, The Hydrological Ensemble Prediction EXperiment (HEPEX). Hydrol. Earth Syst. Sci. Discuss. 3, 3321–3332 (2006) A.W. Seed, A dynamic and spatial scaling approach to advection forecasting, in Proceedings of the Fifth International Symposium on Hydrological Applications of Weather Radar – Radar Hydrology, Kyoto, 2001 A.W. Seed, R. Srikanthan, M. Menabde, A space and time model for design storm rainfall. J. Geophys. Res. 100(D21), 31623–31630 (1999)

46

G. Pegram et al.

K. Stephan, S. Klink, C. Schraff, Assimilation of radar-derived rain rates into the convective-scale model COSMO-DE at DWD. Q. J. Roy. Meteorol. Soc. 134, 1315–1326 (2008). doi:10.1002/ qj.269 J. Strömqvist, B. Arheimer, J. Dahné, C. Donnelly, G. Lindström, Water and nutrient predictions in ungauged basins: set-up and evaluation of a model at the national scale. Hydrol. Sci. J. 57(2), 229–247 (2012) J. Thielen, A. Gadian, Influence of different wind directions in relation to topography on the outbreak of convection in Northern England. Ann. Geophys. 14(10), 1078–1087 (1996) J. Thielen, A. Gadian, Influence of topography and urban heat island effects on the outbreak of convective storms under unstable meteorological conditions: a numerical study. Meteorol. Appl. 4(2), 139–149 (1997) J. Thielen, W. Wobrock, P. Mestayer, J.-D. Creutin, A. Gadian, The influence of the lower boundary on convective rainfall development; a sensitivity study. Atmos. Res. 54, 15–39 (2000) E. Todini, Mutually interactive state/parameter estimation (MISP), in Application of Kalman Filter to Hydrology, Hydraulics and Water Resources, ed. by C.-L. Chiu (University of Pittsburg, Pittsburgh, 1978) M.S. Tracton, E. Kalnay, Operational ensemble prediction at the National Meteorological Center: practical aspects. Weather Forecast. 8, 379–398 (1993) University Corporation for Atmospheric Research (UCAR), Flash Flood Early Warning System Reference Guide 2010 (2010), http://www.meted.ucar.edu/communities/hazwarnsys/ffewsrg/ FF_EWS.frontmatter.pdf. ISBN 978-0-615-37421-5 J. Verkade et al., Real-time hydrologic probability forecasting using ensemble dressing, with application to river Rhine. Geophys. Res. Abstr. 15, EGU2013-10763 (2013). EGU General Assembly 2013 G. Villarini, W.F. Krajewski, A.A. Ntelekos, K.P. Georgakakos, J.A. Smith, Towards probabilistic forecasting of flash floods: the combined effects of uncertainty in radar-rainfall and flash flood guidance. J. Hydrol. 394(1–2), 275–284 (2010) F. Weber, D. Garen, A. Gobena, Invited commentary: themes and issues from the workshop “Operational River Flow and Water Supply Forecasting”. Can. Water Resour. J. 37, 151–161 (2012). doi:10.4296/cwrj2012-953 A.H. Weerts et al., Estimation of predictive hydrological uncertainty using quantile regression: examples from the National Flood Forecasting System (England and Wales). Hydrol. Earth Syst. Sci. 15, 255–265 (2011) P. Young, Recursive Estimation and Time Series Analysis (Springer, Berlin, 1984) J. Younis, S. Anquetin, J. Thielen, The benefit of high-resolution operational weather forecasts for flash flood warning. Hydrol. Earth Syst. Sci. 12, 1039–1051 (2008) M. Zappa, K.J. Beven, M. Bruen, A.S. Cofino, K. Kok, E. Martin, P. Nurmi, B. Orfila, E. Roulin, K. Schroter, A. Seed, J. Szturc, B. Vehvilainen, U. Germann, A. Rossa, Propagation of uncertainty from observing systems and NWP into hydrological models: COST-731 Working Group 2. Atmos. Sci. Lett. 11, 83–91 (2010)

Overview of Forecast Communication and Use of Ensemble Hydro-Meteorological Forecasts Jutta Thielen-del Pozo and Michael Bruen

Abstract

Over the last few decades, hydro-meteorological forecasting, warning and decision making has benefited greatly from advances in the natural, physical, computing and social sciences. A fast developing computing capability has enabled meteorologists to produce ensemble prediction systems (EPS) that quantify the uncertainty in forecasting and simulating floods, droughts, and in water management decision making. At the same time, the social sciences have helped to understand the human perceptions of risk information and how different actors communicate hazard, risk and uncertainty information. Ultimately hydro-meteorological forecasts are used in making decisions. However, to be effective, such decisions must be communicated to the hazard response organisations and to the general public. For this, the communication must be simple and clear, it must be relevant and should come from a trusted source. This overview summarises how such communication is organised for a variety of applications in different countries. It is the effectiveness of the entire system which must be considered and assessed. As ensembles are increasingly used in increasingly longer term management and policy decisions, the range of end-users and their differing requirements can only expand and flexibility and adaptability to individual circumstances will be required from both the natural and social scientists involved.

J. Thielen-del Pozo (*) Joint Research Centre (JRC), European Commission, Ispra, Italy e-mail: [email protected] M. Bruen (*) UCD Dooge Centre for Water Resources Research, UCD School of Civil Engineering, Dublin, Ireland e-mail: [email protected] # Springer-Verlag Berlin Heidelberg 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_40-1

1

2

J. Thielen-del Pozo and M. Bruen

Keywords

Ensemble prediction systems (EPS) • Flood • Forecast communication • Hydrometeorological forecasts

Over the last few decades, hydro-meteorological forecasting, warning, and decision making have benefited greatly from advances in the natural, physical, computing, and social sciences. A fast developing computing capability has enabled meteorologists to produce ensemble prediction systems (EPS) that are now applied in various sectors including hydrological applications for the forecasting and simulation of floods, flash floods, droughts, and water management decision making. Ensemble prediction systems allow us to quantify the uncertainty in our forecasts in a meaningful way and to better understand the limits of predictability (Thielen et al. 2009; Rossa et al. 2010; Zappa et al. 2010) and so provide decision makers with longer useful lead times to prepare for the event. The amount and breath of material in this handbook is evidence of the importance that hydrological ensemble prediction systems are playing in various applications in hydrology. At the same time as the natural and technical sciences improved forecasting technology, the social sciences have helped to understand the human perceptions of risk information and how different actors communicate and respond to hazard, risk, and uncertainty information between peers (horizontally) as well as within a hierarchy (vertically) (e.g., Drobot and Parker 2007; Demeritt et al. 2007). The natural sciences and social sciences communities are united in the realization that uncertainties must be managed rather than ignored or eradicated at all cost (Drobot and Parker 2007), a view that is also shared within the response community (IFRCRC 2012). Ultimately, hydro-meteorological forecasts are used in making decisions. Decision making or planning under uncertain conditions can be a complex, yet everyday, experience that people encounter. “Uncertainties exist when details of situations are ambiguous, complex, unpredictable, or probabilistic; when information is unavailable or inconsistent” (Brasher 2011, p. 478). In addition to the uncertainty of whether a predicted future event, e.g., a flood or drought, will occur, forecasters are also faced with the uncertainty an individual person feels about taking the right or wrong decision on a course of action. Decisions could be urgent ones, i.e., related to immediate emergency response, where life is at risk, or less time-sensitive ones possibly relating to the longer term management of resources or the formulation of policy. The most important requirement is that the decisions are effective. This means they save lives, reduce damage to health, property or quality of life, or improve the management of infrastructure and resources. Hydro-meteorological forecasts are just one source of information available to the decision maker who must take all other types of information from many other sources into account in deciding what should be done. Examples of other sources of information could be (i) the demographics of populations at risk for instance their age profile – the very young and very old may be particularly

Overview of Forecast Communication and Use of Ensemble Hydro-Meteorological. . .

3

vulnerable or their health/disability status, (ii) hydropower energy demand, (iii) temporal variations in requirements for irrigation water or public water supply, and (iv) transport traffic on water ways, just to name a few. Some indication of the reliability or uncertainty in each data source is essential to guide their influence on the final decisions. This is straightforward if the information is a direct measurement, e.g., of flows, water levels, or precipitation, for which the measurement uncertainties are well understood. However, it is much more problematic when the information is a forecast and even more so when it involves such complex system and models as for the atmosphere or a river basin. Nevertheless, such information is essential if the decision maker is expected to consistently make good decisions. While forecasters tend to want this information, they find it difficult to quantify any improvement in skill due to having this information, Frick and Hegg (2011). For hydro-meteorological forecasts, based on ensembles of simulations, possibly with multiple models, some uncertainty information is already implicit in the ensembles. The critical issue is how to extract the uncertainty information and how to present it to the decision maker. It is important that the decision maker is not over-burdened with unnecessary information nor is expected to do complex analyses in a hurry, particularly in emergency response situations. Many approaches have been suggested, including some based on a cost-benefit analysis (Dale et al. 2014) and others based on a decision scaling method(Turner et al. 2014). Simplicity, clarity, relevance, and trust are thus the ideal characteristics of the way ensemble information is presented to a decision maker. Their resulting actions may be instrumental in saving lives and property, and they may, after an event, be held accountable for the decisions. This serious responsibility has a large influence on their approach to making decisions, often characterized by a risk-averse conservatism (Block 2011) and a tendency towards institutional inertia (Rayner et al. 2005). The different attitudes of the decision maker to costs and losses can lead to suboptimal decisions (Millner 2009) so there is a complex challenge to demonstrate the effectiveness of model-based decision support systems, Moser (2009). Millner (2008) develops a decision model for use with ensembles which incorporates the user’s cost-loss profile. Weaver et al. (2013) identifies one barrier to the increased use of climate models as a failure to incorporate information from the decision science and social science disciplines and argue for a paradigm shift towards a multidisciplinary decision support framework. Marshall et al. (2011) maintain that social factors are important in the take-up of climate model information and Renn (2011) identifies the perception of risk information and its social amplification as an important issue. End users’ requirements span many factors other than technical model or resolution improvements and include education and training (Wetterhall et al. 2013). The characteristics of good forecast communication, mentioned above, are: (i) Simplicity and Clarity: Spaghetti plots are too complex and potentially confusing (Zappa et al. 2013). This has been recognized by forecasters for some time, e.g., WMO (2003). Demeritt et al. (2010) explain the particular issues that decision makers have with spaghetti plots. Stephens et al. (2012)

4

J. Thielen-del Pozo and M. Bruen

reviews methods of visualization and Bruen et al. (2010) describe the visualization strategies of some operational systems. (ii) Relevance: This depends on the use of the forecast, e.g., emergency response, hazard warning, or water resources management. The timescale may be immediate, medium or longer term, typically ranging from single event scales to climate change impacts on water resources. Actively seeking end user feedback and responding accordingly is a key element of maintaining relevance (Demeritt et al. 2013). Contextually sensitive approaches to presenting information are important (PytlikZillig et al. 2010). (iii) Trust: Sector-specific and local sources of information are most likely to be trusted. However, in order for persons to respond correctly to a warning, it is also important that they are familiar with the event, e.g., occupants of a house that has recently been flooded will most likely act differently to a flood warning compared with occupants who have never experienced this threat. The social aspects of warning are very important and must be correctly understood (Drabek 1999). Individual relationships can be important (Lackstrom et al. 2014), and also the use of established networks (Kirchhoff 2013). Methods of communicating probabilistic forecasts and their use are as many as their applications on the short, medium, and long ranges as well as for local, national, continental, and global scales. A comprehensive review is provided in this handbook by Pappenberger et al. in Section 10 “▶ Ensemble Forecast Application Showcases.” In addition, Alfieri et al. “Flash flood early warning based on precipitation-indices: three examples at the European and regional scale” highlight the growing trend of the hydrological community to increasingly use ensemble prediction systems based on radar, nowcasting, and high-resolution short-term forecasting data or combinations thereof to improve the prediction of flash floods. Both information from regional monitoring networks and the expertise of local flood forecasters are being integrated for better forecasting of location and intensity of the events (cf. Alfieri), which is also important information to translate the hazard forecasts into impactbased forecasting and risk information as described by Wittwer et al. “Challenges of decision making in the context of uncertain forecasts in France”(Section 10). Such information is crucial for the translation of hydrological forecasts into actionable warnings which serve as guidance for decision makers, the response community and the public. Bates et al. illustrate in “Probabilistic Inundation Modelling” (Section 10) how the recent advances in computing power and high-resolution mapping have made probabilistic forecasting of flood inundation possible, allowing now to quantify the probability of flood occurrences with sufficient advanced warning for taking useful preventive actions. Taking for example the June 2013 floods in Central Europe, Bates et al. demonstrate how such applications can work across scales, e. g., how input from a medium-range, continental system such as the European Flood Awareness System (Thielen et al., Section 10) can be successfully combined with a high-resolution hydraulic model (LISFLOOD-FP) to derive probabilistic inundation maps. Such maps could provide important guidance to decision makers and provide information on the temporal as well as spatial uncertainty on expected flood extent.

Overview of Forecast Communication and Use of Ensemble Hydro-Meteorological. . .

5

Thus, the range of examples provided in section 10 shows that ensemble prediction systems have become a trusted and well-established feature for flood forecasting on the short ranges as well as medium range, for local systems as well as global systems. However, the application of ensemble prediction systems is by far not limited to flood forecasting only but extends for example also to shipping (“Probabilistic shipping forecast” by Meissner et al., Section 10), prediction of droughts (“Seasonal drought forecasting” by Wood et al., Section 10), hydropower (“Hydro-power forecasting in Brazil” by Tucci et al., and “Ensemble forecasting for hydropower in Canada” by Boucher and Ramos, Section 10), or generally water resource management. The chapters in this section of the handbook aim at summarizing at a few key issues which are illustrated with concrete examples: These complement the examples in Section 10 and present a snapshot of methods of communication of EPS in forecasting at a number of locations in three very different continents, Australia, Asia, and the USA. A broad range of challenges, both in climate and computational resources, is represented. A number of different timescales are involved, varying between shorter term flood forecasting for emergency response to longer term streamflow forecasting for water resources planning. Tuteja, Zhou, Lerat, Wang, Shin, and Robertson explain that because of the very high variability in stream flows in Australia, seasonal forecasts, which are used in practical water resources applications, have a high degree of uncertainty which must be communicated with the forecasts. The focus is on minimizing the risks of misunderstanding as well as on communicating the forecast skill. A Bayesian Joint Probability model is used (Wang et al. 2009) with a 5000 member ensemble (some results are sensitive to ensemble size) generated from an empirical multivariate model of stream flows. A number of different forecast skill metrics were evaluated. There is some variation of forecast skill with season, but the best forecasts are for the latter half of the year when storages are filling. Hartman describes different approaches to the specific forecast information communicated in the USA. There, the resources to generate ensemble flood forecasts are widely available, and the issue is whether to generate an uncertainty product from the ensembles and deliver this to the end user or to deliver all of the raw, unprocessed, ensemble forecasts to the end user, allowing them to analyze them in whatever way they deem appropriate. Usually the provider of the forecasts is best placed to produce the decision support “tool” for the end user. Use of ensembles in longer range water resources forecasting in the USA is complicated by the attenuating effects of its many reservoirs. The management of (and releases from) each must be included in the simulations. Interestingly, end users often want to see the related input data, such as precipitation and snowpack forecasts, as these convey significant information for the longer term predictions. Considerable emphasis is placed on validating EPS forecasts using hindcasts, e.g., simulating what forecasts the system would have made for past weather. This helps to identify bias in an EPS system, but is computationally demanding. The following chapter, consisting of a number of forecast communicating examples from around the world, addresses the requirements for communicating

6

J. Thielen-del Pozo and M. Bruen

ensembles and uncertainty for short-term, medium-term, and long-term applications and decision making. They address the issues of (i) how the information is produced and used at different scales? (ii) how the specific needs of decision makers, forecasters, and water managers are reflected in the set-up of the early warning systems, and (iii) identifying where ensemble prediction systems can play a useful role and where their application is limited. Pegram highlights the importance of robust and reliable information for disaster managers when emergency actions such as evacuations are required. Repeated false alarms result in lack of trust between the public and the decision makers which is particularly critical when the time to react is limited, e.g., in urban and flash flood prone areas. He gives the example of the city of Durham in South Africa and explains how ensembles are generated from radar data to quantify the uncertainty in the rainfall and hydrological response to produce reliable forecasts for the decision makers. While early warning based on radar ensembles is typically limited in lead time from 0 to 6 h, high-resolution EPS from numerical weather prediction can be useful to provide an early warning indication that potentially critical conditions are possible within a time frame of 1–3 days as illustrated in the second section by Raynaud. Although not sufficiently precise in terms of location and timing for local civil protection to take specific actions, such indicators can be useful for local authorities to be prewarned, take precautionary actions, and put staff and equipment on standby. In particular for trans-national river basins, where typically different agencies and multiple authorities are involved in monitoring, forecasting and decision making, the additional lead times gained through HEPS proves to be important. For instance, Sprokkereef, Ebel & Rademacher illustrate how HEPS have been used for the Rhine, a river basin shared by nine countries, for more than 10 years to calculate probabilistic water levels and discharges and they describe how they are communicated and distributed on a daily basis to expert users. The added value of the probabilistic forecasts in comparison to deterministic ones has been demonstrated especially for navigation-related water-level forecast. Finally, longer term EPS such as monthly and seasonal forecasts find application more in water resources and hydropower management than in emergency response. Being the most common form of renewable energy worldwide, it is not surprising that the optimization of hydropower output is of particular interest for different sectors. Olsson, Alionte-Eklund, Johansson, Lindström, and Spångmyr describe how HEPS are used in Sweden for hydropower optimization. The added value of hydrological probability forecasts as support in decision making to hydropower plant operators is demonstrated, but also the limitations with regard to spatiotemporal resolution are also highlighted. Hirpa, Fagbemi, Afiesimam, Shuaib, and Salamon highlight the special challenges of translating scientific information into practical operations in developing countries where hydro-meteorological disasters have profound impacts on human lives. While, for example, the previous chapters illustrate advanced early warning systems benefiting from high-density observational networks, skillful weather forecasts and ensemble systems as well as state-of-the-art web technologies for

Overview of Forecast Communication and Use of Ensemble Hydro-Meteorological. . .

7

communication and information sharing, there are still many places in the world where such levels of sophistication and technology are not yet utilized. The use and communication of ensemble predictions and uncertainty require a minimum data sharing and IT infrastructure, while for communities without Internet there is a need for a different means of information dissemination such as by phone or through a face-to-face conversation. Trans-disciplinary and multistakeholder partnerships offering global solutions for basic coverage and provision for medium-range forecasts and near-real-time remote sensing detection of events can play a major role in accelerating the process of incorporating science into more effective disaster planning to save human lives and protect the economy. Demeritt, Stephens, Créton-Cazanave, Lutoff, Ruin, and Nobert conclude this section by investigating the practical challenges of communicating and using ensemble forecasts at the example of operational flood incident management. Based on recent social science research on the variety and effectiveness of visualizing hydrological ensemble prediction, the chapter highlights cognitive and other difficulties experienced by the users of probabilistic forecasts to understand the information correctly. Recognizing that the way uncertainty information is presented has influence on its perception, this chapter illustrates the most common way of communicating EPS to different users with varying level of expertise in probabilistic forecasting. It also highlights that probabilistic information can be misunderstood, for example, by the general public but equally by expert users, if the underlying concept of probability is not fully understood. Dialogue between the producers of HEPS and the users, for example, through repeated discussions or training is identified key action for the correct uptake of the information by the decision makers. However, even if the information is well communicated and received, there remains the issue that anticipatory actions and decisions based on uncertain forecasts will always be associated with a risk and different mechanisms for an improved management still need to be further explored. Accepting that there is not a single solution to fit all applications, it emerges that the visualization and the communication of HEPS need to be tailored to the respective end users. This is particularly challenging the less defined the end user community is, e.g., in continental and global systems. Scientists and engineers involved in hydro-meteorological forecasting have realized for some time now that their role is not confined to the generation of forecasts and associated uncertainty information. To be effective and make a real difference in the lives of people, this information must be communicated in a convincing way to the right decision makers whether these are civil protection, water resource managers, or the public. It is the effectiveness of the entire system which must be considered and assessed. Different approaches to what is communicated and how such communication takes place, described above, illustrate how diverse the area can be, how much progress has already been made, and how much more there is to be done. As ensembles are increasingly used in longer term management and policy decisions, the range of end users and their differing requirements can only expand and flexibility and adaptability to individual circumstances will be required.

8

J. Thielen-del Pozo and M. Bruen

References P. Block, Tailoring seasonal climate forecasts for hydropower operations. HESS 15(4), 1355–1368 (2011) D.E. Brashers, Communication and uncertainty management, Journal of Communication, 51(3), 477–497 (2001) M. Bruen, P. Krahe, M. Zappa, J. Olsson, B. Vehvilainen, K. Kok, K. Daamen, Visualising flood forecasting uncertainty: some current European EPS platforms – COST731 Working Group 3. Atmos. Sci. Lett. 11(2), 92–99 (2010) M. Dale, J. Wicks, K. Mylne, Probabilistic flood forecasting and decision-making: An innovative risk-based approach. Nat. Hazards 70(1), 159–172 (2014) D. Demeritt, H. Cloke, F. Pappenberger et al., Ensemble predictions and perceptions of risk, uncertainty, and error in flood forecasting. Environ. Hazard. 7, 115–127 (2007). doi:10.1016/ j.envhaz.2007.05.001 D. Demeritt, S. Nobert, H. Cloke, F. Pappenberger, Challenges in communicating and using ensembles in operational flood forecasting. Meteorol. Appl. 17(2), 209–222 (2010) D. Demeritt, S. Nobert, H.L. Cloke, The European Flood Alert System and the communication, perception, and use of ensemble predictions for operational flood risk management. Hydrol. Process. 27(1), 147–157 (2013) T.E. Drabek, Understanding disaster warning responses. Soc. Sci. J. 36(3), 515–523 (1999) S. Drobot, D.J. Parker, Advances and challenges in flash flood warnings. Environ. Hazard. 7, 173–178 (2007) J. Frick, C. Hegg, Can end-users’ flood management decision making be improved by information about forecast uncertainty? Atmos. Res. 100(2–3), 296–303 (2011) Internation Federation of Red Cross and Red Crescent Societies, Community and Early Warning Systems: guidelines, (2012), 81pp. https://www.ifrc.org/PageFiles/103323/1227800-IFRCCEWS-Guiding-Principles-EN.pdf C.J. Kirchhoff, Understanding and enhancing climate information use in water management. Clim. Change 119(2), 495–509 (2013) K. Lackstrom, N.P. Kettle, B. Haywood, Climate-sensitive decisions and time frames: a crossseectoral analysis of information pathways in the Carolinas. Weather Clim. Soc. 6(2), 238–252 (2014) N.A. Marshall, I.J. Gordon, A.J. Ash, The reluctance of resource-users to adopt seasonal climate forecasts to enhance resilience to climate variability on the rangelands. Clim. Change 107(3–4), 511–529 (2011) A. Millner, Getting the most out of ensemble forecasts: a valuation model based on user-forecast interactions. J. Appl. Meteorol. Climatol. 47(10), 2561–2571 (2008) A. Millner, What is the true value of forecasts? Weather Clim. Soc. 1(1), 22–37 (2009) S. Moser, Making a difference on the ground: The challenge of demonstrating the effectiveness of decision support. Clim. Change 95(1–2), 11–21 (2009) L.M. PytlikZillig, Q. Hu, K.G. Hubbard, Improving Farmers’ perception and use of climate predictions in farming decisions: a transition model. J. Appl. Meteorol. Climatol. 49(6), 1333–1340 (2010) S. Rayner, D. Lach, H. Ingram, Weather forecasts are for wimps: why water resources managers do not use climate forecasts. Clim. Change 69(2–3), 197–227 (2005) O. Renn, The social amplification/attenuation of risk framework: application to climate change. Wiley Interdiscip. Rev. Clim. Chang. 2(2), 154–169 (2011) A. Rossa, K. Liechti, M. Zappa, M. Bruen, U. Germann, G. Haase, C. Keil, P. Krahe, The COST 731 action: a review on uncertainty propagation in advanced hydro-meteorological forecast systems. Atmos. Res. 100(2/3), 150–167 (2010) E.M. Stephens, T.L. Edwards, D. Demeritt, Communicating probabilistic information from climate model ensembles—lessons from numerical weather prediction. WIREs Clim Change, 3, 409–426. doi: 10.1002/wcc.187 (2012)

Overview of Forecast Communication and Use of Ensemble Hydro-Meteorological. . .

9

J. Thielen, K. Bogner, F. Pappenberger, M. Kalas, M. del Medico, A. de Roo, Monthly-, medium-, and short-range flood warning: testing the limits of predictability. Meteorol. Appl. 16, 77–90 (2009). doi:10.1002/met.140 S.W.D. Turner, D. Marlow, M. Ekstrom, B.G. Rhodes, U. Kularathna, P.J. Jeffrey, Linking climate projections to performance: a yield- based decision scaling assessment of a large urban water resources system. Water Resour. Res. 50(4), 3553–3567 (2014) Q.J. Wang, D.E. Robertson, F.H.S. Chiew, A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites. Water Resources Research, 45, W05407 (2009) C.P. Weaver, R.J. Lempert, C. Brown, Improving the contribution of climate model information to decision making: the value and demands of robust decision frameworks. Wiley Interdiscip. Rev. Clim. Chang. 4(1), 39–60 (2013) F. Wetterhall, F. Pappenberger, L. Alfieri, H.L. Cloke, J. Thielen-del Pozo, S. Balabanova, J. Danhelka, A. Vogelbacher, P. Salamon, I. Carrasco, HESS 17(11), 4389–4399 (2013) WMO, Present and planned configurations of ensemble prediction systems at the National Centers for Environmental Protection (NCEP). Report CBS ET/EPS/Doc.3(6) of WMO Expert Team on Ensemble Prediction Systems (2003) M. Zappa, K.J. Beven, M. Bruen, A. Cofino, K. Kok, E. Martin, P. Nurmi, B. Orfila, E. Roulin, K. Schröter, A. Seed, J. Stzurc, B. Vehviläinen, U. Germann, A. Rossa, Propagation of uncertainty from observing systems and NWP into hydrological models: COST-731 Working Group 2. Atmos. Sci. Lett. 11(2), 83–91 (2010) M. Zappa, F. Fundel, S. Jaun, A ‘Peak-Box’ approach for supporting interpretation and verification of operational ensemble peak-flow forecasts. Hydrol. Process. 27(1), 117–131 (2013)

Best Practice in Communicating Uncertainties in Flood Management in the USA Robert K. Hartman

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Conveying Probabilistic Streamflow Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Static Product Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Interactive “User” Product Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Provision of “Raw” and “Postprocessed” Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Managing Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Applications to Water Resources Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Hindcasting and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Validation and Associated Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 3 5 6 8 9 11 15 16 16

Abstract

Ensemble forecasting has gained a great deal of popularity for addressing and estimating uncertainty associated with both meteorologic and hydrologic forecasts over the past decade. While ensemble-based hydrologic forecasts have been in routine operations for longer-term forecasts for many years, the notion of shortand medium-term probabilistic forecasts in support of water and flood management efforts is relatively new and is a developing science and service. Approaches to effectively conveying and communicating hydrologic forecast uncertainty are being actively developed and vetted with potential user communities. Important experience and insight will be gained over the next few years as the community of developers, forecasters, and end users work to leverage probabilistic forecasts in a risk-based decision environment. With proper focus and support, these efforts R.K. Hartman (*) California-Nevada River Forecast Center, NOAA, National Weather Service, Sacramento, CA, USA e-mail: [email protected] # Springer-Verlag Berlin Heidelberg (outside the USA) 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_42-1

1

2

R.K. Hartman

have the potential to significantly improve flood, ecosystems, and water management with benefits to multiple sectors of our society. Keywords

Ensemble • Communication • Probability • Risk • Uncertainty • Hydrology • Water resources • Hindcasting

1

Introduction

Hydrologic ensemble forecasting procedures have made great strides over the past decade. Progress has been attributable to a growing acceptance that uncertainty is something that can be leveraged to make more informed decisions (National Research Council of the National Academies 2006) and substantial community support as evidenced through the success of the Hydrological Ensemble Prediction Experiment (HEPEX; www.hepex.org). Among the most vexing challenges of the hydrologic ensemble prediction process is the appropriate conveyance of uncertainty information to decision makers. These decision makers represent many sectors (e.g., emergency services, power generation, recreation, agriculture, navigation, municipal water supply, industry, ecological management). Each has different and specialized needs and each has a different risk tolerance. In addition, their statistical background varies from nearly nothing (i.e., plays the lotto) to very sophisticated. This chapter describes approaches and examples of how ensemble-based hydrologic forecast information is conveyed to users by the US National Weather Service today. Positive and negative attributes of approaches along with challenges are presented.

2

Conveying Probabilistic Streamflow Information

There are at least four fundamental approaches that can be used to provide uncertainty information associated with forecast streamflow. The oldest approach is to simply accommodate the expected “error” as a function of the user’s substantial experience with forecasts over time and, in particular, events that were memorable. This anecdotal uncertainty is what the hydrologic forecast and user community is attempting to supplant with objective information generated through ensemble techniques. More quantitative vehicles include: • Generating a collection of ensemble-based products (text and graphics). • Providing access to an interface that can create custom ensemble-based products (text and graphics). • Providing ensemble members (data) that can be analyzed by end users in their own decision support architecture. Each approach has benefits and challenges and experience which has shown that all three, together, may represent a more reasonable approach.

Best Practice in Communicating Uncertainties in Flood Management in the USA

2.1

3

Static Product Generation

Ensemble forecasts can be analyzed to address a seemingly infinite number of information requirements. This flexibility is beneficial, but it also creates challenges for forecast producers. What are the “best” sets of static graphics and text products that meet the greatest need for information? The fundamental questions associated with product generation are: • What is the time period of interest (e.g., next 3 days, next 2 weeks, month of June)? • What is the data aggregation period (e.g., hourly, daily, weekly, monthly, seasonal, annual)? • What aspect of flow is of interest (e.g., summation (volume), mean, peaks, minimums, time to a threshold of interest)? The more precisely the questions are addressed, the more useful the information for a specific application. Sounds simple enough, but ultimately, choices must be made if the number of routinely generated products is limited. One of the most vexing issues encountered in generating ensemble-based graphics involves the impact of time aggregation on probability. Customers of hydrologic forecast information really want to see a “hydrograph” with associated uncertainty or “error bars.” What they really get is a series of histograms, at the timestep of analysis, placed side-by-side in sequential order. Interpretation of these sorts of graphics can very easily lead toward the wrong conclusions. Look first at Fig. 1. This graphic includes ten (10) 1-day histograms for flow and stage. It has the look of a hydrograph, but the 1-day time-step defeats that interpretation tendency to some degree. One interpretation of this graphic might be that “the river has a less than a ten percent chance of exceeding 12 feet over the next 5 days.” Now compare this with Fig. 2. This figure shows the distribution of peak flows within the coming 5-day period. This graphic suggests that the “river has a probability of between 25 % and 50 % of exceeding 12 feet over the next 5 days.” The proper interpretation of Fig. 1 is “the river has less than a ten percent chance of exceeding 12 feet on any of the next 5 individual days”; however, Fig. 2 indicates that collectively (all 5 days considered together), the probability of exceeding 12 feet is much higher. This phenomenon becomes more pronounced as the time-step of the analysis gets shorter. For example, if the time-step is reduced to 1 h, Fig. 1. really begins to look like a hydrograph and the likelihood of any extremes (peaks or minimums) is further reduced by appearance because their likelihood associated with any specific hour is lower than it is for any day or the entire period of consideration (e.g., 5 days). Best practices, therefore, limit the generation of graphics that are easily misinterpreted. It is important for the users of hydrologic ensemble forecasts to continually remember that the generated products are simply an interpretation of the current set of ensemble members. For that reason, it remains good practice to provide a trace plot (spaghetti) among the set of routinely generated graphics. The 10-day trace plot that serves as the basis for the information in Figs. 1 and 2 is shown in Fig. 3. Note

4

R.K. Hartman

Fig. 1 Streamflow histogram, 1-day duration for Navarro River in California

that while the analyzed probabilities shown in Figs. 1 and 2 may be well below the region of concern for an emergency/resource manager, a limited number of traces may be very problematic and therefore very much worth being aware of. Operators of reservoirs are normally more interested in volumetric (often multiday) forecasts of inflow rather than instantaneous or single day inflows. Understanding that the 1-day flows as shown in Fig. 1 cannot to be added together to form a probabilistic multiday volume, graphics such as is shown in Fig. 4 can serve a critical need of the reservoir management community. Again, without the provision of an accumulated volume plot (Fig. 4), a reservoir operator might be led to misinterpret a daily histogram (e.g., Fig. 1). For some time, River Forecast Centers in the USA have generated 90-day graphics that depict weekly probabilities of maximum stage (Fig. 5) and the maximum stage probability distribution over the entire 90-day period (Fig. 6). These sorts of graphics are particularly valuable when preparing for spring snowmelt flooding as

Best Practice in Communicating Uncertainties in Flood Management in the USA

5

Fig. 2 Streamflow histogram, 5-day duration for Navarro River in California

often occurs in the upper Midwest of the USA. As with all longer-range products, they make heavy reliance on the information content of the model states.

2.2

Interactive “User” Product Generation

Given the diversity of interests in streamflow projections and the multitude of options for periods, durations, and flow attributes (maximum, minimum, mean, summation, and time to a threshold of interest), providing customers with a tool to analyze a set of ensembles using their specific criteria makes a lot of sense. This sort of feature does come with risk as it assumes that the user is well informed enough to make the required selections and properly interpret the results. This represents a minority of the total number of forecast customers, but to those who use it, it is a very important and powerful service. Figure 7 shows the interface supported by the California-Nevada River Forecast Center. A substantial “help” section is provided to assist users in navigating the options and interpreting the product generated. This

6

R.K. Hartman

Fig. 3 Ensemble streamflow traces for Navarro River in California

sort of interface allows users to “narrow” the scope of their information need and generate products that directly address their requirements.

2.3

Provision of “Raw” and “Postprocessed” Ensembles

While forecasters and developers struggle with the “best” ways of describing the uncertainty of hydrologic ensemble forecasts with complex and often difficult to interpret graphics, the most effective practice may be to simply provide the data and allow the customer to perform an analysis that is meaningful to them. For sophisticated users with resources, this is clearly the most effective alternative. As an example, a reservoir operator with a model that can simulate operations can easily process each member of an ensemble set to evaluate the benefits/costs of a selected release strategy. Alternatively, that same operator will have to use their imagination to understand how a histogram of daily inflow probabilities will impact their regulation strategy. The difference is profound. Substantial progress is being made along this front. In California alone, the INFORMS project (Georgakakos et al. 2007) as well as the Yuba-Feather Forecast Coodinated Operations (FCO)

Best Practice in Communicating Uncertainties in Flood Management in the USA

7

DRY CREEK - LAKE SONOMA (WSDC1) Lattitude: 38.72° N Longitude: 123.01° W Location: Sonoma Country in California

Elevation: 440 Feet River Group: Russian Napa

Issuance Time: Feb 28 2014 at 9:46 AM PST

28.7

28.7

25.7

25.7

22.6

22.6

19.6

19.6

16.5

16.5

13.5

13.5

10.4

10.4

7.4

7.4

4.3

4.3

1.3 03/01

03/02

03/03

03/04

03/05

03/06

03/07

03/08

03/09

Volume (1000s Ac-Ft)

Volume (1000s Ac-Ft)

10-Day Accumulated Volume Plot

1.3 03/10

Month / Day (mm/dd) 10% (Max)

50% (Prob)

90% (Min)

Generated 02/28/2014 a1 09:45 AM PST (ID = WSDC1)

California Department of Water Resources NWS / California Nevada River Forecast Centre

Fig. 4 10-day accumulated inflow volume for Lake Sonoma in California

project have engineered solutions to leverage the full potential of ensemble forecasts in a decision support model. Figure 8 shows the conceptual process schematic for the Yuba-Feather FCO ensemble-based decision support model. Just as ensemble Numerical Weather Prediction (NWP) models exhibit biases and inappropriate spread, so too will the hydrologic forecasts without some sort of postprocessing methodology (Demargne et al. 2014). Sophisticated users, however, have a choice in this process. They may choose to accept the “raw” ensembles without the benefit of postprocessing and instead apply their own error correction processes based on an adequately long history of performance. Such is the case for the ensemble forecast services provided to the New York City Department of Environmental Planning (NYCDEP) by the US National Weather Service. This may be the most efficient way of objectively accounting for ensemble reliability issues, but it also may result in additional workload for operational entities required to issue both raw and postprocessed ensemble information.

8

R.K. Hartman

Fig. 5 Web interface for “Create Your Own” ensemble product

2.4

Managing Expectations

It is clear that, given all of the assumptions that must be made to generate an ensemble-based hydrologic forecast, there will be uncertainty in the estimates of uncertainty. The “discrimination” in the system may not be able to reliably differentiate between 85 % and 90 % probability of exceedance. Further, work is needed to help users understand that some risk must be assumed if one expects to leverage uncertainty in the long run. If you need to be 99 % sure before you will take action, you will likely miss a lot of opportunities.

Best Practice in Communicating Uncertainties in Flood Management in the USA

9

1 Week Changes of Exceeding River Levels on the RED R at E GRAND FORKS MN Latitude: 47.9 Longitude: –97.0 Forecast for the period 2/25/2014 – 5/20/2014 This is a conditional simulation based on the current conditions as of 2/18/2014 46.90

Major Flooding

43.50 40.10 Moderate Flooding

36.70

33.30 Weekly Maximum

29.90

Stage (FT)

Flood Level 26.50

23.10

19.70

16.30 12.90 2/25 18:00

3/4 Z

3/11

3/18

3/25

4/1

4/8

4/15

4/22

4/29

5/6

5/13

5/20

Exceedance Probability Floor Level 28.0 (Feet)

Exceedance Probability 10 – 50%

50 – 90%

>= 90%

Fig. 6 Weekly histogram of maximum stage for the Red River in Minnesota

3

Applications to Water Resources Forecasting

Ensemble applications to water resources forecasting are not new but have gained substantial growth and acceptance over the past decade. Early work appeared in the 1970s and the National Weather Service formalized a process within their forecasting system in the mid-1980s (Day 1985). Despite this, the predominate approach for seasonal streamflow forecasting in the Western USA has remained some form of regression modeling driven with data available on a monthly basis (Garen 1992). More recently, forecasters have begun to integrate daily observations, and the US National Weather Service is in the process of shifting toward full reliance upon ensemble processes evaluated every day and year-round. The attributes of relying on ensemble process for longer-range water resources forecasting include: – Integration with short-term hydrologic forecasting procedures

10

R.K. Hartman

cs

= Conditional Simulation

cs

= Historical Simulation

Chances of Exceeding River Levels on the RED R at E GRAND FORKS MN Latitude: 47.9 Longitude: –97.0 Forecast for the period 2/25/2014 – 5/26/2014 This is a conditional simulation based on the current conditions as of 2/18/2014

Major Flooding

55.90

Above 46.0 Feet. Moderate Flooding

51.70

40.0–46.0 Feet. Minor Flooding

47.50

28.0–40.0 Feet. 43.30

39.10

Maximum Stage

CS

34.90

(FT) HS

30.70

26.50

22.30

18.10 13.90 99% 98%

95%

90%

80%

70% 60% 50% 40% 30%

20%

10%

5%

2%

1%

Exceeding Probability Flood Level 28.0 (Feet)

Fig. 7 90-day Exceedance probability plot for the Red River in Minnesota

– Use and integration of near real-time observations (e.g., precipitation, air temperature, streamflow) – Integration of current weather and climate forecast information – Ability to update on a daily basis As this transition takes place, forecasters are experimenting with graphical products that describe both the uncertainty as well as how the forecasts have changed or “trended” over time. Figures 9, 10, 11, and 12 show potential candidate graphics that describe expected volume over time. The “trend plots” in Figs. 10 and 11 do not show uncertainty but do show how the 50 % exceedance probability forecast over seasonal volume has changed over time. Note the precipitous drop in expected water supply volume that took place during the fall and early winter as California received only a very small percentage of normal precipitation during this period. Assuming that the ensemble forecasts leverage the skill in the weather and climate forecasts, one can quickly see that the forecasts were not effective in detecting the coming drought with much or any lead time. This may seem discouraging, but it does highlight the need, value, and process for leveraging improved seasonal weather predictions through a hydrologic ensemble forecasting framework. In the water resources services domain, the streamflow forecast alone is not adequate to provide customers with the complete picture of the water supply situation. In many areas, reservoirs provide a buffer for interannual variation as

Best Practice in Communicating Uncertainties in Flood Management in the USA

11

Fig. 8 Schematic of data flow for Yuba-Feather Forecast Coordinated Operations ensemble-based decision support model (by permission of David Ford Consulting Engineers, Sacramento, CA)

well as a means for shifting runoff from the time of occurrence to the time of need (e.g., irrigated agriculture). Information that summarized and combines the expected runoff with the existing reservoir storage is critical for assessment purposes. In addition, water supply customers have historically expressed a need to see information that supports the streamflow forecast itself, such as monthly and seasonal precipitation and snowpack. Comparison of precipitation and snowpack when expressed as a percent of normal provide excellent context for understanding and establishing confidence in a specific volumetric seasonal streamflow forecast.

4

Hindcasting and Validation

The key attribute of ensemble-based probabilistic hydrologic forecasts that makes them useful is reliability. Reliability means that the ensemble members, as a package, are (1) unbiased and (2) have appropriate spread. If the ensembles have not been demonstrated to be adequately reliable, the user incurs a great deal of risk when applying the information contained in the ensembles to their specific decisionmaking process.

12

R.K. Hartman

FEATHER RIVER - LAKE OROVILLE (ORDC1) Lattitude: 39.53° N Longitude: 121.52° W Location: Butte Country in California

Elevation: 992 Feet River Group: Lower Sacramento

Issuance Time: Feb 28 2014 at 9:13 AM PST

Monthly Probability Plot FEATHER – OROVILLE DAM (src=D) Latitude: –999.0 Longitude: –999.0 Forecast for the period 2/28/2014 – 1/28/2015 This is a conditional simulation based on the current conditinos as of 2/28/2014 1378560.0

1229954.0

1081348.0

932742.0

784136.0

635530.0 Volume (AC-FT)

486924.0

338318.0

189712.0

41106.0 –107500.0 Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Exceedance Probability 10 – 25%

25 – 50%

50 – 75%

75 – 90%

>= 90%

Fig. 9 Monthly volume histogram for 1-year for the Feather River inflow to Lake Oroville in California

The process of “hindcasting” is well established (Demargne et al. 2014). In essence, the complete hydrologic forecasting system is run in a retrospective process to effectively create the set of forecasts that would have been generated over an adequately long period of time. That period of time is normally constrained by the availability of numerical weather prediction (NWP) models, used to force the hydrologic model set, to the last 25 or 30 years (Hamill et al. 2013). The process of generating the NWP hindcasts requires a great deal of computer resources and is therefore expensive. Once generated, the NWP hindcast serves as a rich dataset that can be used to understand the behavioral climatology of the specific NWP. Everyone accepts that NWPs are not perfect. They are biased to some extent and exhibit

Best Practice in Communicating Uncertainties in Flood Management in the USA

13

Fig. 10 Seasonal (April-July) accumulated volume trend plot for the Feather River inflow to Lake Oroville in California

uncertainty. The hindcasts allow for measurements of the bias (difference between forecasts and observations) and uncertainty (correlation between forecasts and observations). This hindcast analysis information allows for the proper interpretation “today’s” NWP model run and the effective integration of the NWP forecast information into the hydrologic ensemble forecast process. One approach for doing that is well described by Demargne et al. (2014). It is important to note that the NWP hindcasts are specific to a model and the parameterization at the time of hindcast generation. If changes are made to the NWP, the hindcasts would simply no longer apply and would need to be rerun, reanalyzed, and reintergrated into the hydrologic ensemble forecast process. Further, customers of the hydrologic ensembles may need to make adjustments to their decision models to accommodate any resulting shifts. For these reasons, it is critically important that “frozen versions” of NWPs are operationally supported when the user community is dependent upon representative hindcasting information. Further, it is important to

14

R.K. Hartman

Fig. 11 Water Year (October-September) accumulated volume trend plot for the Feather River inflow to Lake Oroville in California

recognize that both the hydrologic forecast community and the user community need time (months) to integrate new NWP hindcast information before a “frozen version” is operationally discontinued. While the hindcast process provides a way to understand the behavior of the complete forecasting system and resulting information, it is not perfect. Fully replicating the somewhat interactive hydrologic forecasting process in practice today is not feasible. It is generally accepted that hydrologic forecasters add value (reduce errors) through interaction with the hydrologic forecasting modeling system. This might take the form of small adjustments to forcing data (precipitation or air temperature) or adjustments to model states to better align model simulations with observations of streamflow during the recent observed period. Even if you were able to insert a forecaster into the hindcast process (very labor intensive), it would be extremely difficult to replicate the full forecasting environment that influences human decision-making. As such, the hydrologic ensemble hindcasts are an

Best Practice in Communicating Uncertainties in Flood Management in the USA

15

Fig. 12 Probablistic water year (October-September) accumulated volume trend plot for the Feather River inflow to Lake Oroville in California

approximation of what we should expect from the current forecast process, but they may exhibit slightly more uncertainty as they do not benefit from forecaster experience and interaction.

4.1

Validation and Associated Services

With all their conditions and issues, hydrologic ensemble forecast hindcasts offer keen insight into the value of current probabilistic hydrologic forecasts. They provide the body of information that allows for the development of trust. Ensembles allow for a nearly infinite number of questions to be addressed. In turn, the hindcast analysis allows one to assess the reliability of the information used to address those very same questions.

16

R.K. Hartman

Ensemble verification capability such as those described by Brown et al. (2010) provide the flexibility and rigorous statistical testing needed. Substantial education and training are needed to help consumers of this information to fully understand the implications of the ensemble forecast verification metrics and how they affect their specific decision-making process. Substantial work will be required to create validation information that is general enough to apply to most cases and specific enough to build/demonstrate value and trust.

5

Conclusion

The pace of improvement in hydrologic forecasts is steady, but very slow. Rather than waiting for the perfect forecast, a great deal of value and insight can be gained by understanding and leveraging the uncertainty associated with today’s forecast. Resource managers are pressed harder every day to work “smarter.” Integrating risk into every aspect of decision-making is warranted as long as the information being used is reliable and understood. While hydrologic forecasters and water resource managers have years of experience in using probabilistic seasonal streamflow volume forecasts, the notion and technology of short- and medium-range probabilistic hydrologic forecasts is quite new. Developers, forecasters, and users are challenged to create, provide, and integrate probabilistic information that will yield understanding and improved outcomes for end users.

References J.D. Brown, J. Demargne, D.-J. Seo, Y. Liu, The Ensemble Verification System (EVS): a software tool for verifying ensemble forecasts of hydrometeorologic and hydrologic variables at discrete locations. Environ. Model Software 25, 854–872 (2010) G.N. Day, Extended streamflow forecasting using NWSRFS. J. Water Resour. Plan Manag. 111, 157–170 (1985) J. Demargne, L. Wu, S.K. Regonda, J.D. Brown, H. Lee, M. He, D.-J. Seo, R. Hartman, H.D. Herr, M. Fresch, J. Schaake, Y. Zhu, The science of NOAA’s operational hydrologic ensemble forecast service. Bull. Am. Meteorol. Soc. 95, 79–98 (2014) D.C. Garen, Improved techniques in regression-based streamflow volume forecasting. J. Water. Resour. Plan Manag. Am. Soc. Civil Eng. 118(6), 654–670 (1992) K.P. Georgakakos, N.E. Graham, A.P. Georgakakos, H. Yao, Demonstrating Integrated Forecast and Reservoir Mangement (INFORM) for Northern California in an operational environment. IAHS Publ. 313, 1–6 (2007) T.M. Hamill, G.T. Bates, J.S. Whitaker, D.R. Murray, M. Fiorino, T.J. Galarneau, Y. Zhu, W. Lapenta, NOAA’s second generation global medium-range ensemble reforecast dataset. Bull. Am. Meteorol. .Soc. 94, 1553–1565 (2013) National Research Council of the National Academies, Completing the Forecast: Characterizing and Communicating Uncertainty for Better Decisions Using Weather and Climate Forecasts (The National Academies Press, Washington, DC, 2006), p. 124

Saving Lives: Ensemble-Based Early Warnings in Developing Nations Feyera A. Hirpa, Kayode Fagbemi, Ernest Afiesimam, Hassan Shuaib, and Peter Salamon

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Regional Disparities of Disaster Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Flood Disaster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Drought Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Early Warnings from Ensemble-Based Systems Could Save Lives . . . . . . . . . . . . . . . . . . . 3 The State of Early Warning Systems in Regions with Fragmented Infrastructure: Nigeria Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The 2012 Flood in Nigeria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Need for Early Warning Systems in Nigeria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Ensemble Forecast for the 2012 Flood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Global Partnerships to Address Local Problems: Joining Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Global Flood Partnership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Integrated Drought Management Programme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 4 6 8 9 9 9 12 15 16 18 19 20

F.A. Hirpa (*) • P. Salamon European Commission, Joint Research Centre (JRC), Institute for Environment and Sustainability (IES), Climate Risk Management Unit, Ispra, VA, Italy e-mail: [email protected]; [email protected] K. Fagbemi National Emergency Management Agency (NEMA), Abuja, Nigeria e-mail: [email protected] E. Afiesimam Nigerian Meteorological Agency (NiMet), Abuja, Nigeria e-mail: ernafi[email protected] H. Shuaib University of Abuja, Abuja, Nigeria e-mail: [email protected] # Springer-Verlag Berlin Heidelberg 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_43-1

1

2

F.A. Hirpa et al.

Abstract

Natural disasters disproportionately affect the developing nations due to the lack of effective early warning systems. In this chapter, we present the need, challenges, and opportunities of early warning systems in developing nations for decision making in disaster risk management and demonstrate the added value of ensemble forecasting in particular in data- and infrastructure-scarce regions. First, we review the global extent of flood and drought disaster damages in the last few decades on human lives and the economy and demonstrate that a disproportionately high rate of death (per event) occurred in developing regions, where there is no (or ineffective) operational early disaster warning systems. Next, we present the everyday needs and challenges of preparing for and responding to natural disasters in Nigeria, a typical developing country with fragmented data infrastructure and limited national early warning system capability. Particularly, we share experiences from the most recent major flood disaster and demonstrate a potential value of ensemble-based flood early warnings, using streamflow forecasts from the Global Flood Awareness System (GloFAS). However, forecasting of disasters alone is not sufficient if the information is not translated into actionable advice at a local community level. This is particularly important for ensemble forecasting which requires training for the forecasters as well as the receiving authorities. In order to achieve this, technical knowledge and communication infrastructure are needed to deliver the early warning information to the relevant communities and concerned authorities. Multi-stakeholder partnerships bringing together scientific community, policy, and decision makers and end users from international to local level could facilitate humanitarian aid organizations, and decision makers understand and use the ensemble predictions on timely basis before, during, and after disaster strikes. The chapter concludes with highlighting the multi-stakeholder partnership initiatives on floods (Global Flood Partnership (GFP)) and droughts (Integrated Drought Management Programme (IDPM)), established with the common goal of reducing flood and drought risk across the globe. Keywords

Ensemble forecasting • Early warning • Risk reduction • Disaster response • Global Flood Partnership • Integrated Drought Management Programme

1

Introduction

Floods and droughts range among the world’s deadliest and costliest natural disasters. Since the turn of the twentieth century, almost 5000 major hydrological disasters have occurred (CRED 2015); more than seven million people have been killed; over 3.5 billion affected; and estimated damages of 650 billion USD are reported. During the same period, droughts have been responsible for over 11 million deaths and 134 million USD of economic damages (CRED 2015; Hirpa et al. 2016).

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

3

While these disasters occur worldwide, they disproportionally affect the poor, with more than 97 % of the associated deaths occurring in the developing world and the disasters amounting to a significant portion of their GDP (Pilon 2002; CRED 2015). Therefore, national authorities as well as the international community are seeking to cope with and adapt to the phenomena and to reduce the impact through risk prevention policies. Effective risk prevention policy extends over a set of long- to short-term actions. Long-term measures include action on prevention and adaptation such as planning (control of urban development and construction following legal rules and risk prevention master plans), construction works resilient to disaster risk, as well as educating experts, decision makers, and the public. Shorter-term measures include the monitoring, forecasting, and warning before disaster strikes. The importance of strengthening early warning systems and to tailor them to the needs of the users (including quantification and communication of uncertainty) remains one of the priorities of the Sendai Framework for Disaster Risk Reduction 2015-2030 (UNISDR 2015). Forecasting hazards and the associated uncertainty are an important element of any early warning system. However, to fully understand the risk and consequently translate it into actionable warning information, it needs to be complemented with information on exposure, vulnerability, and impact over a wide range of spatial and temporal scales. For instance, major flooding in a rural area with no population may not appear critical while the same magnitude flood could cause significant damage in urbanized area with large population. However, a flooding in a generally non-populated area can turn into a disaster in case temporary camps or public events are organized when the disaster strikes. An effective risk reduction procedure, thus, should take all the three components – hazard, exposure, and vulnerability – into consideration. Many of the methods, tools, and applications described in this handbook require a functioning data collection and sharing infrastructure, remote sensing tools, hydrometeorological models with different levels of sophistication, and broad expert knowledge in scientific methods and data handling. Furthermore, platforms for communicating the information to end users and decision makers and, at the end of the chain, skilled end users that understand the hazard information are needed for an effective risk reduction. While numerous initiatives and services exist, still such prerequisites are not available in several, particularly developing, countries, and, consequently, there are often a large number of human lives lost due to natural disasters as indicated in this chapter. To this end, we discuss the regional variations of disaster (flood and drought) impacts using data from global disaster archives. Next, using the example of Nigeria, a country with limited early warning infrastructure, we present the challenges of coping with disasters in developing countries. Furthermore, we demonstrate the potential value of an ensemble prediction system (ENS) using flood forecasts from the Global Flood Awareness System (www.globalfloods.eu) for the unprecedented 2012 flood disaster in the country. Finally, an outlook to new initiatives on global flood and drought partnerships for risk assessment and reduction are highlighted. The partnerships have goals of closing the gaps in disaster risk management with

4

F.A. Hirpa et al.

Table 1 Global flood disaster occurrence and its impact since year 2000 (as of September 2015) (Data was obtained from CRED (2015))

Regions Region I

Region II

Asia Africa Caribbean and Central and South (CCS) America Europe USA and Canada Oceania

Total

Flood occurrence 712 473 346

Total deaths 46,452 9,263 10,484

Total affected (millions) 1148.8 38 31.2

255 82 43 1911

1,108 362 125 67,794

6.9 11.6 0.59 1237.09

Total damage (million USD) 214,237 4,591 19,097

63,589 33,025 11,062 34,5601

regard to floods in particular for the developing countries but with added value products also for developed countries.

2

Regional Disparities of Disaster Impacts

2.1

Flood Disaster

Since the year 2000 (as of September 2015), close to two thousand riverine flood disasters were recorded worldwide (CRED 2015; see Table 1). More than one third (37 %) of the disasters during this period occurred in Asia; a quarter of those were recorded in Africa; and close to one fifth (18 %) of the events were reported in the Americas (excluding Canada and the USA). This means that four out of five of the total flood disasters reported worldwide occurred in the developing regions (denoted “Region I” in this chapter for brevity), while the remaining part of the globe (Region II: Europe, Oceania, Canada, and the USA) experienced 20 % of the total count of riverine flood disaster during the same period. Note that this broad regional classification is used differently from the World Meteorological Organization’s (WMO) six regional associations (WMO 2014). In terms of impact, more than 67,000 people have lost their lives, and a total economic damage of more than 34.5 billion USD was caused worldwide due to riverine floods since the turn of this century. Asian continent was the most affected: accounting for 69 % of the total deaths, 93 % of the total people affected (e.g., injured or displaced), and 62 % of the total economic damage since the year 2000. The Caribbean and Central and South (CCS) America and Africa had the second and the third most recorded deaths at 10,484 (15.5 %) and 9,263 (13.7 %), respectively. As shown in Fig. 1, 97.6 % of the total deaths attributed to riverine floods occurred in the developing regions (Region I). This is significantly higher than the proportion of the reported flood occurrence (80.1 %) in these parts of the globe (see Fig. 1). Similarly, a disproportionately high number of people were affected by flood (e.g.,

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

5

Fig. 1 The regional breakdown of flood occurrence and impact for the last 15+ years (2000–September 2015). Region I includes Asia, Africa, South and Central America, and the Caribbean. Region II includes Europe, Oceania, Canada, and the USA (Data was obtained from CRED (2015))

injured or displaced) in Region I. This disparity could be attributed to, among others, the lack of early warning systems in the developing nations. Several countries in the developed regions (Region II) have operational flood forecasting services that help them prepare for an upcoming flood disaster. For example, in Europe, local, regional, and national as well as continental flood forecasting services complement each other (www.efas.eu; Thielen et al. 2009) to provide early warning information; Meteorological Model Ensemble River Forecasts (MMERF) produce operational streamflow forecasts across the USA (NWS 2015); several flood forecasting centers provide operational forecasts across Canada (EC 2015); and Australia has a national-scale operational flood forecasting and warning service (BoM 2015). Further, looking at the impact of flood per occurrence across different regions reveals that for every flood event that occurred since year 2000, an average of 65 people lost their lives in Asia, while 30 in CCS America and 20 in Africa were killed (see Fig. 2). To relatively a lesser extent, every flood event during the same period caused deaths of four persons in Europe (similar to the USA and Canada combined). This means that about 16 times as many deaths in Asia were caused by a flood disaster as the number of deaths recorded in Europe. The total number of people affected by a flood event tells a similar story (also shown in Fig. 2). There were substantially more people affected in Asia per flood event compared to other regions (e.g., 60 times as many as in Europe and 11 times as many as in the USA and Canada). The larger impacts on human life seem to be a direct reflection of (besides the lack of early warning systems) the high vulnerability and the lack of

6

F.A. Hirpa et al.

Fig. 2 The impacts of riverine flood per occurrence (since 2000) for different regions and continents (Data was obtained from CRED (2015))

coping capacity to the flood hazard in developing countries (De Groeve et al. 2014a). Interestingly, the reported economic damages due to flood are in a stark contrast to the loss of human lives. They were higher (compared to the proportion of the occurrence) in the developed regions (i.e., Europe, Oceania, USA, and Canada): 31.2 % of the total damage compared to the 19.9 % of the total flood disaster count globally (see Fig. 1). This can be attributed to the relatively higher economic assets and infrastructure exposed to flood hazard in those developed regions (Region II) compared to the Region I. The average cost of a flood event in the USA and Canada is the highest at 400 million USD. Moreover, the damage in each of Europe and Oceania per flood event is 250 million USD. While Asia experienced an average loss of 300 million USD per one event, Africa (10 million USD) and CCS America (55 million USD) suffered comparatively lower damages per flood event (Fig. 2). The data from CRED shows that the economic damages of the flood hazard are more concentrated in developed countries.

2.2

Drought Problem

Even though it occurs less frequently than flood worldwide, drought causes significantly higher socioeconomic damages and humanitarian crisis per event. It affects

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

7

Fig. 3 The regional breakdown of the global drought events and their impacts since 1980 (as of September 2015). Region I includes Asia, Africa, South and Central America, and the Caribbean. Region II includes Europe, Oceania, Canada, and the USA (Data was obtained from CRED (2015))

large population mainly due to its large spatial extent (sometimes extending to a continental scale) and long duration (possibly lasting multiple years) (Haile 2005; Sheffield and Wood 2011; Dutra et al. 2012). Since 1980, a total of 519 drought events worldwide have killed more than half a million people and caused a loss of 129 billion USD (see Fig. 3). A large majority (86.3 %) of these events were reported in developing nations (Region I), and almost all people affected belong to these regions. Notably, the 1983/1984 drought in Ethiopia and Sudan accounted for 450,000 deaths, which remain to be the biggest share of causalities caused by a drought event (CRED 2015). The 2010/2011 drought in the Horn of Africa resulted in humanitarian crisis affecting more than 10 million people. Similarly, between 2002 and 2009, different drought events affected hundreds of millions of people in China and India. Since 1980, a drought event has killed, on average, more than 1120 people and has affected 3.5 million people worldwide (CRED 2015). A continental break down reveals that almost all of the deaths were recorded in Africa (2560 persons killed per drought event) and Asia (45 persons). The African continent depends largely on rainfed agriculture which is highly affected by the climate variability (Sheffield et al. 2013; Dixon et al. 2001), and, therefore, drought has a direct devastating impact on the food production and supply of the continent. While the overall economic loss due to drought was the highest in the USA (3.1 compared to 0.01 billion USD per event in Africa), there was no reported loss of human life as a direct consequence of drought in the USA. This clearly indicates that drought causes

8

F.A. Hirpa et al.

humanitarian crisis (due to shortage of food supply) disproportionally in the poorer nations.

2.3

Early Warnings from Ensemble-Based Systems Could Save Lives

The importance of disaster early warning systems in developing nations cannot be overstated. As demonstrated above, the flood disasters were more frequent and they disproportionately killed more persons per event in regions with developing countries (Asia, Africa, Caribbean, and South and Central America). Similarly, the largest share of droughts occurred in these regions, and a strikingly large portion of causalities (99 %) over the last 35 years were reported in Africa. Early flood warning systems can raise awareness of, and increase preparedness for, an upcoming disaster and hereby reduce the risk on human lives. The regions with (some sort of) operational early flood warning systems experienced significantly lower loss of human lives to natural disasters. A large majority of the countries in Region II have operational hydrologic ensemble prediction systems (HEPSs; Pappenberger et al. 2015; Cloke and Pappenberger 2009) providing near real-time ensemble streamflow forecasts at national or continental level. While increasing the local technical knowledge and management capacity for communicating early warning information to the concerned authorities and the public are equally important, the advanced warnings play a key role in predicting the extent of the foreseen hazard. To this end, implementing effective flood forecasting and early warning systems in developing regions will save a considerable number of human lives. This is particularly significant in light of the number of persons losing life for every flood event (e.g., 65 persons in Asia) mainly due to the lack of effective advanced warnings. Drought early warning systems and timely seasonal forecasts may greatly help farmers and pastoralists plan to protect their crops and livestock in developing nations (Sheffield et al. 2013; Pozzi et al. 2013). However, most of the developing countries do not have advanced seasonal forecasting due to lack of reliable monitoring network, low institutional capacity, and lack of national drought policies (Sheffield et al. 2013; Naumann et al. 2014). This is particularly evident in sub-Saharan Africa where seasonal climate forecasts are generated by regional climate outlook forums (Ogallo et al. 2008) based primarily on statistical regression without detailed information needed for agricultural adaptations (Sheffield et al. 2013; Patt et al. 2007). Skillful seasonal climate forecasts with long lead times (up to 6 months) could help the national authorities plan for storing and transporting food to the vulnerable communities and inform the farmers in advance to relocate their livestock to a water-available area. Moreover, early drought forecast may help humanitarian aids (e.g., World Food Programme (WFP)) to plan for rapid disaster response. The ensemble prediction system provides a range of hazard scenarios which allow emergency managers to consider all likelihoods and prepare for actions.

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

3

9

The State of Early Warning Systems in Regions with Fragmented Infrastructure: Nigeria Case

A practical implementation of an effective flood early warning system in developing regions with fragmented infrastructure has several challenges. The lack of adequate data, advanced modeling and communication infrastructures, and the limited technical knowledge for understanding the extent of the disaster obstruct the forecasting and monitoring efforts. Using the experiences from the devastating flood of 2012 and in Nigeria, we discuss the need for and the challenges of an effective advanced flood warning in developing nations. Furthermore, we demonstrate the value of the ENS flood forecasting using forecasts from the Global Flood Awareness System.

3.1

The 2012 Flood in Nigeria

Floods are the most common and recurring disaster in Nigeria (PDNA 2013). While they impact the country each year, the damage and losses from the 2012 floods were unprecedented. Heavy rainfall between July and October 2012 combined with rising water contributed to the flooding of human settlements located downstream of the Kainji, Shiroro, and Jebba dams on the Niger River; the Lagdo Dam in Cameroun on the Benue River; the Kiri Dam on the Gongola River; and several other irrigation dams. In some cases, the dams were damaged; in others, water had to be released at full force to avert a dam overflow. The reports from the National Emergency Management Agency (NEMA) showed that between July and the end of October 2012, about 7.7 million people were affected by the floods, and over 2.1 million persons were displaced (IDPs), while 363 persons were reported dead. In addition, more than 600,000 houses were damaged or destroyed (PDNA 2013). The worst flooding in Nigeria for over four decades was experienced as a result of high rainfall intensity and the emergency opening of the spillway gates of the Kainji Dam on the Niger River and the Lagdo Dam on the Benue River. This disaster was partly man induced as an uncontrolled release of the waters upstream from dams compounded the problem.

3.2

The Need for Early Warning Systems in Nigeria

Currently an advanced, basin-wide early warning system is not available in Nigeria and in many parts of the developing world mainly due to the lack of data and modeling infrastructures. The lack of early warning makes it difficult to have sufficient time for individuals and communities to act to reduce flood risk, loss of life, and damage to property and the environments. For the flood warning to be effective, it should be issued sufficiently early prior to the potential inundation to allow adequate preparation (Kundzewicz 2013). The appropriate time frame is affected by the catchment size relative to the vulnerable zones. The warning should also be expressed in a way that persuades people to take appropriate action to reduce

10

F.A. Hirpa et al.

damage and costs of the flood. An assessment of risks provides the basis for an effective warning system by identifying potential threats from hazards and establishing the degree of local vulnerability and resilience to extreme weather conditions. In Nigeria efforts are being made in enhancing monitoring weather and climate facilities and providing early warning advisories, but it requires far reaching steps in contributing to the mitigation of the extreme weather impacts on the community. However, there are several limitations in the data, modeling, and communication infrastructures which obstruct the monitoring efforts. For example, ground data networks in Nigeria for monitoring weather and climate elements are sparse with low station coverage density; the upper air monitoring network is even more limited. It is also rather sad that the station networks in many cases were originally installed to support aviation services (due to the international requirements) without expansion to wider coverage in the country – a situation that now limits the applicability of the limited data to other applications such as flood and drought monitoring, water resources, and disaster risk reduction. Without adequate ground-based measurement data, meteorological services rely more on circulation model outputs from outside the country whose parameterization may not take into account the specificity of the country region, or they rely on satellite products despite the lack of accuracy verification associated to these products, thus having large uncertainty in the spatiotemporal distribution and intensity of rainfalls. Seasonal forecast tools have also been tailored more toward volumetric rainfall totals (or probability of rainfall totals expressed as a percentage of the long-term mean) that do little to offer guidance to users requiring more specific information. For example, in order to be informed comprehensively, the vulnerable communities would not only need information about the probability of the total rainfall of a rainy season to be above or below the long-term mean but more importantly on the timing, intensity, and volume of rainfalls. A farmer is interested in the timing of the onset of the rainy season (the separation between the real and false onset dates), the distribution of the rains within the season relative to the germination, maturity, and harvest stages of the crops, more than the cumulative rainfall of the season. In Nigeria, the seasonal rainfall prediction is usually released between January and February every year. However, the prediction does not provide specifics with the magnitudes and timing of the rainfall, and there is no uncertainty information provided. For example, the 2014 seasonal rainfall prediction in Nigeria stated, “the 2014 growing season is predicted to experience normal onset across the country except in and around Gusau and Yola in the North, Shaki and Abeokuta in the Southwest which are predicted to experience delayed onset” (NiMet 2014). This will be inadequate for a farmer who will want to know the dates he can start planting. A long lead-time (up to a month or longer) flood may allow those involved in agriculture and pastoral activities to make preparations to protect assets and reduce the impacts of a flood. For example, if it is known that a significant flood may occur near the end of a growing season, farmers may plan ahead of time to harvest crops early. Or if a significant flood is anticipated early in a season, planting may be

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

11

delayed, and different varieties of seed may be chosen that have a shorter growing season or are better adapted to saturated soils. The two major rivers that traverse the country are transboundary which complicates monitoring of potential floods. Furthermore, Nigeria has more than 20 dams (FAO 2004; and increasing) which were constructed for multiple purposes. Some are designated as multipurpose dams to provide water for irrigation in the dry season and hydroelectric and domestic uses. But in reality, in some cases, the water is impounded and is neither used for hydroelectric generation nor irrigation. Compounding this is the siltation that has reduced the carrying capacities of the dams. To avoid a situation where water will have to be released suddenly, a shortterm (24 h to 3 days) skillful precipitation forecast and river discharge would be useful. Dissemination of disaster information to the disaster management agencies, such as the National Emergency Management Agency (NEMA), and the communities at risk is also a crucial step for making plans for early disaster response. The early warning message-based ensemble forecast with attached uncertainty is preferable for disaster managers or civil protection since they can plan their actions based on the probability scenarios. For example, it is generally followed that forecast probability of less than 70 % should be communicated to the NEMA and state agencies, while forecast probabilities above 70 % should be communicated to all stakeholders across the board from the national to the local communities. However, without ensemblebased prediction systems in Nigeria, it is difficult to implement these policies. Understanding forecast for flood and drought, no doubt, requires training of the users. The agencies responsible for the communication of early warning messages must train users of the information to avoid misinterpretation in forecast messages. For example, the 2012 flood in Nigeria caused so much damage to farmers that when Season Rainfall Prediction was released in 2013, some farmers were unwilling to invest in crop production for fear of losing their investment to flooding because they did not clearly understand the rainfall prediction. This was attributed to the way and manner the prediction was communicated to the communities through general mass media (TV and radio), but not through organizational communication with more attention to the needs of the vulnerable communities. Communication through television tends to be ineffective in reaching a rural community in Nigeria due to high proportion of the inhabitants who have no access to electricity. To accommodate the diverse economical, language, social, and infrastructure levels in Nigeria, wide ranges of commutation channels need to be used. Disaster warning messages could be sent, when possible, over telephones and mobile devices, on the Internet, and through social media and printed media. There are also well-organized indigenous mass communication systems (e.g., influential community and religious leaders) that can be used to disseminate emergency information to the rural communities. To fully benefit from ensemble forecast, implications of the forecast for flood and droughts should be analyzed and communicated to experts in hydrology, agriculture, transport, health, civil protection, and tourism and dam managers. The ever improving means of communication, through advancement of information and communication technologies, provides new opportunities of

12

F.A. Hirpa et al.

making the communication of disaster to rural communities faster and more reliable (Ewolabi and Ekechi 2014).

3.3

Ensemble Forecast for the 2012 Flood

Here, we examine the ensemble flood prediction from the Global Flood Awareness System (GloFAS; www.gloablfloods.eu;) for the 2012 flood in Nigeria. The GloFAS (Alfieri et al. 2013) provides global ensemble-based flood early warning information for major rivers worldwide. It produces probabilistic flood forecasts based on a modeling setup combining the land surface model (LSM) Hydrological Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL) (Balsamo et al. 2009) with a river routing scheme used by LISFLOOD (van der Knijff et al. 2010). HTESSEL is used operationally in the Integrated Forecast System (IFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF) for describing the evolution of soil, vegetation, and snow over the continents at diverse spatial resolutions. The GloFAS combines state-of-the-art weather forecasts with a hydrologic model set up at a 0.1  0.1 grid resolution to produce once a day 51-member ensemble streamflow forecasts and probability flood occurrence (in terms of threshold exceedance probability). The flood thresholds were extracted from long-term reforecasts (see Hirpa et al. 2016). We investigate the GloFAS forecasts during the devastating 2012 flood at three locations (see Fig. 4) on the main streams of the Niger River (A) and Benue River (C) and on the downstream of the intersection of the rivers (B). The Post-Disaster Needs Assessment report from the Federal Government of Nigeria (PDNA 2013) indicated that states along the Benue River and lower reaches of the Niger River (below point B in Fig. 4) were considerably affected by the flood disaster with the most downstream states suffering the heaviest damages. Conversely, the damages reported in the states along the upper reaches of the Niger River (including the Niger State that includes point A) were relatively small. To see how the HEPS captures the streamflow in various regions with (and without) reported flood damages, we selected the three points to analyze the GloFAS forecasts. Figure 5a–c shows the streamflow time series of a 5-day lead forecast produced by the GloFAS over a period of more than 2 months (from 5 August to 6 October 2012). The time period was selected based on the recorded start and end date of the flood event in Dartmouth Flood Observatory (DFO, Brakenridge 2015). We have extended both the start (20 days ahead) and the end (10 days further) dates of the flood to fully capture that the rising and falling limbs of the GloFAS forecast. In the figures, we also show 2-year, 5-year and 20-year flood levels. At all three locations, the discharge magnitude increases with the peak occurring between 9 and 20 September 2012 (see Fig. 5a–c). However, the extent of the increase varied from one location to another. For the location in the state of Niger (location A), the ENS mean did not reach 2-year flood at any time throughout the duration of the flood. This is consistent with the Nigerian government’s postdisaster report (see PDNA 2013) which documented no flood damage in this part

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

13

Fig. 4 Selected locations on the Benue and Niger rivers in Nigeria. The government report indicated severe flood damage in the states along the Benue River (location C), Kogi (location B), and downstream states (PDNA 2013)

of the river. The GloFAS indicated large flow magnitudes for the other two locations considered. The ENS mean at the point on the Benue River (in the state of Benue, C) exceeded the 2-year flood threshold for 43 days (beginning from 21 August), and it came close to or exceeded the 5-year threshold for 36 days with the peak forecast on 16 September. The forecast at the downstream point (location B) showed a consistent increase in the flow volume until 20 September when it approached the 20-year threshold. This high-to-severe flow forecast is in agreement with the regions of the country (especially the downstream states) severely damaged by the flood event. The most attractive feature of ENS compared to deterministic forecast is that, besides proving a prediction of just flood magnitude, it provides the uncertainty attached to it in a form of probability of occurrence of a certain flood magnitude. The ensemble prediction gives the ability to provide ranges of scenarios that may occur, and can enable the forecasters to identify the most likely outcome. We calculated a probability of the streamflow forecast exceeding three (2-, 5-, and 20-year flood) threshold levels. Figure 5d shows the probability of the 5-day lead forecast exceeding the 5-year return level for locations B and C (location A is excluded since no high flood was indicated). For the downstream location (B), a high 5-year exceedance probability (p = 1.0) was indicated over a 3-week period between 5 (rising) and 26 September (recession period of the flow). The forecast for the Benue River (C) indicated a high probability of 5-year exceedance after mid-September.

14

F.A. Hirpa et al.

Fig. 5 GloFAS forecasts for the 2012 flood in Nigeria. Five-day lead forecasts at location (a), (b), and (c) are shown (see Fig. 4). And the 5-year flood threshold exceedance probabilities at location (b) and (c) are indicated

These probabilities would be valuable information for the National Emergency Management Agency since it decides on whether to disseminate early warning information based on these values (when the probability >70 %). For instance, if these forecasts were used by the NEMA and if there were GloFAS reporting points at the locations A, B, and C, the agency would have communicated flood alerts for 5 September to all stakeholders in the country from national authorities to the local communities in the Kogi State and all downstream of point B. Similarly, communities in the Benue State and surrounding would have been alerted for 12 September five days ahead. The 5-day lead forecasts may provide enough time to temporarily transfer vulnerable communities to low risk places and help save some lives. It is known that the quality of the GloFAS forecasts, in the same manner as any forecast, needs to be verified through comparing ground-based observations. Nevertheless, these forecasts for the 2012 Nigerian flood demonstrate the potential value of an ENS and how it could assist disaster risk reduction efforts in developing countries without their own early warning systems. The challenge is, then, to translate the forecast information (including the freely available GloFAS) into actions that save human lives and the economy. This may require an integration of forecasted datasets (e.g., flood hazard) into the national early warning system and a

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

15

creation of an effective channel of communication from the scientific communities and forecast centers to the local authorities, the vulnerable communities, and international disaster response organizations. Fragmented infrastructure and limited technical knowledge pose great challenges to preparedness, adaptation, and response to extreme weather and climate conditions in developing countries. This can be mitigated through partnerships with different stakeholders and raising the knowledge of local stuffs on the understanding and use of already available data from forecasting and observation (e.g., near real-time satellite detection) systems.

4

Global Partnerships to Address Local Problems: Joining Forces

As illustrated in the previous sections, floods and drought are global problems that require global, regional, and national coordination in order to mitigate its impacts. The increasing availability of advanced remote sensing technologies for earth observations and improving global climate and land surface models (GCMs and LSMs) provide an attractive platform for skillful flood and drought forecasting at global and regional scales; however, it is evident that not all services have the technological capacity yet to put such systems into practice. Therefore, regional and global solutions are key to overcome such limitations and can become particularly important for transboundary river basins where no or limited cooperation between the upstream and downstream countries exists (Nishat and Faisal 2000; Turton et al. 2003; Gerlak et al. 2011; Veilleux 2013). In such cases, global and regional flood and drought early warning systems can provide national and local authorities with added value information they would otherwise not have access to and thus strengthen disaster preparedness and improve recovery efforts. The local agencies which naturally have easier access to on-site observations (both real time and historical) of a hazard and possess a better understanding of disaster risk (e.g., in terms of population and economic assets exposed) can provide a useful feedback for the forecasting and early warning centers. The feedback helps improve their performance and, in doing so, gain confidence from the local stakeholders in the usability of the forecast information. Effective knowledge sharing about an upcoming disaster between scientists, operational forecasters, and local government agencies and communication to end users are necessary in order to optimally manage the associated risks. It is evident that this is much more challenging in countries with fragmented data infrastructures, low computational capacities, and lack of coherent early warning systems and disaster risk management policies than in highly developed industrialized countries. To overcome these challenges, global partnerships can help fill the gap that exists in the scientific data, modeling infrastructure, and effective knowledge sharing. To this end, there are global initiatives related to flood and drought disasters created with collaboration of multidisciplinary groups of scientists, operational forecasters, international disaster response organizations, and national stakeholders: Global Flood Partnership and Integrated Drought Management Programme.

16

4.1

F.A. Hirpa et al.

Global Flood Partnership

The Global Flood Partnership (GFP, http://portal.gdacs.org/Global-Flood-Partner ship and De Groeve et al. 2014b) has been launched in March 2014 with a vision of bringing together the scientific community, decision makers, and end users for a common goal of raising global flood awareness and reducing disaster risks. The partnership is aimed at closing the gaps with regard to data availability, monitoring, detection, and forecasting both for flood hazard and risk assessments as well as communication between all parties concerned. The overall objective of the Global Flood Partnership is “the development of flood observational and modeling infrastructure, leveraging on existing initiatives for better predicting and managing flood disaster impacts and flood risk globally” (GFP 2013). The partnership promotes sharing of flood-related information among the partners, and it is stated that “open data policy, where partners have access to data, tools, and services, [is] considered to be a cornerstone of the Partnership” (De Groeve et al. 2014b). The GFP has five components which work hand in hand to increase flood risk preparedness and organize effective disaster responses as summarized in De Groeve et al. (2014b): – Flood service and toolbox component provides operational flood detection, forecasting, mapping, and risk assessment services and shares tools (software, algorithm, and models) used for supporting such services. Besides, it creates a platform for historical analysis, validation, and comparison studies. – Flood observatory creates a link between flood services and decision makers to facilitate the easy dissemination of flood information to stakeholders. – Flood record collects, archives, and shares flood data (mainly from remote sensing but also ground observations) and related impacts which are helpful for understanding flood risk. – User guidance and capacity building support the developing regions with limited or no flood early warning systems by providing timely information, tools, and expertise from the partnership. – User forum creates a platform for knowledge exchange among scientists, local authorities, and disaster response organizations though trainings and user conferences. There are several organizations actively involved in the GFP. These include academic and scientific organizations (e.g., universities, research centers) developing global -scale flood tools and services, national and continental hydrometeorological modeling and operational centers, private companies involved in flood risk analysis (insurance and reinsurance), remote sensing centers (e.g., NASA), disaster response organizations (e.g., World Food Programme, Red Cross, and World Bank), and the World Meteorological Organization. In contrast to a purely research-driven initiatives such as HEPEX, the Global Flood Partnership aims at providing operational services providing daily, global information on floods including mediumrange ensemble prediction-based flood forecasts from different models, multiple

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

17

satellite-based flood detection, short-term flood forecasting, and risk assessment tools, all of which will provide the disaster response organizations such as the World Food Programme with a wealth of information to assess the probabilities of critical situations coming up. One of the systems providing global ensemble-based flood early warning information is the Global Flood Awareness System (GloFAS, www.globalfloods.eu). The GloFAS has been running preoperational since June 2011. Another central element of the GFP is the global flood risk assessment and mapping based on flood hazard, vulnerability, and exposure (e.g., Winsemius et al. 2013; Ward et al. 2013). The global flood risk under current and future climate scenarios is produced and provided freely with Aqueduct Global Flood Analyzer (http://www.wri.org/resources/maps/ aqueduct-global-flood-analyzer). Regions with limited or no early flood warning systems are expected to highly benefit from the GFP. A first demonstration of the benefit of GFP was provided at the onset of the flood event in Malawi in January 2015. From 12 to 17 January, heavy rainfalls were observed in Mozambique and Malawi, affecting largely the Zambezi River Basin. These heavy rains were forecast well by the numerical weather prediction models, e.g., the probabilities of exceeding 300 mm of rain accumulated over a 10-day forecast period indicated heavy precipitation at the border of Mozambique and Malawi (Fig. 6) which coincided well with later observations. Due to the heavy rains and previous flooding, the President of Malawi declared a state of emergency on 13 January 2015 for 15 districts. A number of humanitarian aid organizations started acting to provide assistance to the affected population. One day after the declaration of the state of emergency, the World Food Programme posted a request for more information to the partnership. On the same day, information from GloFAS and the public portal of the Emergency Response Coordination Centre (http://erccportal.jrc.ec.europa.eu) was shared with the community. On 15 January an extreme rainfall assessment for Malawi was presented by ITHACA (www.ithacaweb.org), and Vienna University of Technology communicated information on available satellite information which was then shared by WFP with the country offices. Also the Global Facility for Disaster Reduction and Recovery (https://www.gfdrr.org/) informed GFP that they got involved in aid management. On 16 January further information was exchanged by the Dartmouth Flood Observatory (www.dartmouth.edu/~floods), NASA, and UNOSAT on available satellite imagery and the activation of the UN charter. Thus within a few days a wealth of information ranging from ensemble forecasts over flood detection to information on aid management was shared within the community. This illustrates clearly how information flow can be enhanced, relevant content shared, and knowledge increased through the partnership. In the future, GFP is planning to develop a more proactive approach with daily analysis of forecasts and flood detection information as well as the building of a flood record database. It is important to highlight that the partnership works in close collaboration with the national authorities and does not interfere with authoritative information such as national warnings nor override the national policies and communication channels.

18

F.A. Hirpa et al.

Fig. 6 Probabilities of exceeding 300 mm (150 mm) over a 10-day forecast shown in monochromatic red (blue) shading for the time period of 5–15 January 2015 based on ensemble prediction rainfall of ECMWF 00:00 UTC forecast

4.2

Integrated Drought Management Programme

Effective drought risk reduction requires coordination of several stakeholders at different levels ranging from seasonal forecasting and risk assessment to designing proactive drought policies. To support the coordination efforts, the Integrated Drought Management Programme (IDPM) has been established as a joint initiative of the World Meteorological Organization (WMO) and the Global Water Partnership (GWP, http://www.gwp.org/). The central objective of the IDMP (http://www. droughtmanagement.info/) is “to support stakeholders at all levels by providing policy and management guidance and by sharing scientific information, knowledge and best practices for Integrated Drought Management” (WMO/GFP 2011). The coordination of already existing national and regional drought-related agencies and centers revolves around four central issues: better scientific understanding of drought and knowledge sharing; drought risk assessment, monitoring, prediction, and early warnings; policy and planning for drought preparedness and mitigation; and drought risk reduction and response. Drought early warning, one of the core elements of the IDMP, plays a central role for planning and preparedness in drought risk reduction. A Global Drought Early Warning System (GDEWS, Pozzi et al. 2013) has been developed as an outcome of a partnership between several international organizations. The early warning system consists of monthly to seasonal (up to 15 months lead) ensemble drought forecasts worldwide provided by GDEWS partner organizations such as the European Centre

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

19

for Medium-Range Weather Forecasts (ECMWF) and the US National Centers for Environmental Prediction (NCEP)/Climate Prediction Center (CPC). The skill of the ensemble-based forecast is continuously enhanced by model merging and downscaling techniques (Pozzi et al. 2013). The European Drought Observatory (EDO, Sepulcre-Canto et al. 2012) has demonstrated a good forecast skill over the European continent. A drought monitoring and forecasting for sub-Saharan Africa has been proposed and evaluated (Sheffield et al. 2013). The IDPM operates in three regional programs (IDPM 2015) in Central and Eastern Europe (IDPM CEE) located in Bratislava, Slovakia; in the Horn of Africa (IDPM HoA) centered in Entebbe, Uganda; and in West Africa (IDPM WAF) located in Ouagadougou, Burkina Faso. Besides, there are two regional initiatives in South Asia (Colombo, Sri Lanka) and Central America (Tegucigalpa, Honduras). There are also a number of drought management efforts supported by the IDMP (IDPM 2015).

5

Conclusions

Natural hazards disproportionately affect people in the developing countries mainly due to the lack of an early warning system. Forecasting and early detection of hazards is an important element in preparedness and disaster risk reduction. Early detection of natural hazards allows the disaster response organizations to more effectively provide assistance to the affected and manage the recovery efforts. However, due to the inherent complexity and uncertainty of natural hazards, this remains a complex task. Ensemble forecasts indicate a set of possible hydrometeorological scenarios (e.g., flood predictions and possible hurricane tracks) and provide the confidence in the likelihood of these scenarios. These make the ensemble models attractive for assisting national governments and international organizations in planning for a wide range of disaster scenarios and have the potential to allow them to allocate resources more carefully. However, effective early detection and warning systems require a functioning data collection and sharing infrastructure, hydrometeorological models with different levels of sophistication, and broad expert knowledge in scientific methods and data handling as well as platforms for communicating the information to end users and decision makers. Developing nations tend to have limited or no infrastructure for early warning systems (as discussed above with the Nigeria case) and, as a result, are less prepared and consequently disproportionally affected by disasters To fill these gaps, global or/and regional partnerships have been established to share scientific knowledge, models, and tools with regard to upcoming flood and drought disasters. This can be achieved by taking advantage of the increasing availability of remote sensing data (for disaster detection and monitoring), advanced hydrometeorological models for short-term (flood) and seasonal (drought) forecasting, and interoperable data sharing platforms on the Internet to reach the concerned authorities. The GFP and IDMP are two of the typical showcases of such global partnerships.

20

F.A. Hirpa et al.

References L. Alfieri, P. Burek, E. Dutra, B. Krzeminski, D. Muraro, J. Thielen, F. Pappenberger, GloFAS – global ensemble streamflow forecasting and flood early warning. Hydrol. Earth Syst. Sci. 17, 1161–1175 (2013). doi:10.5194/hess-17-1161-2013 G. Balsamo, A. Beljaars, K. Scipal, P. Viterbo, B. van den Hurk, M. Hirschi, A.K. Betts, A revised hydrology for the ECMWF model: verification from field site to terrestrial water storage and impact in the integrated forecast system. J. Hydrometeorol. 10, 623–643 (2009) BoM, Australian Government Bureau of Meteorology, National Flood forecasting and warning services (2015). Available online at http://www.bom.gov.au/water/floods/index.shtml. Last accessed 29 Sept 2015 G.R. Brakenridge, Global Active Archive of Large Flood Events, Dartmouth Flood Observatory (University of Colorado, 2015). Available online at http://floodobservatory.colorado.edu/ Archives/index.html. Last accessed 13 July 2015 H.L. Cloke, F. Pappenberger, Ensemble flood forecasting: a review. J. Hydrol. 375, 613–626 (2009) CRED, Centre for Research on the Epidemiology of Disasters Emergency Database (2015). Available online at http://www.emdat.be/. Last accessed 14 Sept 2015 T. De Groeve, K. Poljansek, L. Vernaccini, Index for Risk Management - INFORM: Concept and Methodology. Luxembourg: Publications Office of the European Union. Doi:10.2788/78658 (2014a) T. De Groeve, J. Thielen, R. Brakenridge, R. Adler, L. Alfieri, D. Kull, F. Lindsay, O. Imperiali, F. Pappenberger, R. Rudari, P. Salamon, N. Villars, K. Wyjad, Joining forces in a global flood partnership. Bull. Am. Meteorol. Soc. (2014b). doi:10.1175/BAMS-D-14-00147.1 J. Dixon, A. Gulliver, D. Gibbon, Farming Systems and Poverty: Improving Farmers’ Livelihoods in a Changing World (FAO & World Bank, Rome/Washington, DC, 2001) E. Dutra, F. Di Giuseppe, F. Wetterhall, F. Pappenberger, Seasonal forecasts of drought indices in African basins. Hydrol. Earth Syst. Sci. Discuss. 9, 11093–11129 (2012). doi:10.5194/hessd-911093-2012 EC, Environment Canada, Flood Forecasting Centers across Canada (2015). Available online at www.ec.gc.ca/eau-water/default.asp?lang=En&n=7BF9B012-1. Last accessed 29 Sept 2015 T.O.S. Ewolabi, C.O. Ekechi, Communication as critical factor in disaster management and sustainable development in Nigeria. Int. J. Dev. Econ. Sustain. 2(3), 58–72 (2014) FAO, Review of the public irrigation sector in Nigeria, Food and Agricultural Organization of the United Nations. Report No: 0009/TF/NIR/CPA/27277-2002/TCOT (2004). Available online at ftp://ftp.fao.org/AGL/AGLW/ROPISIN/ROPISINreport.pdf. Last accessed 31 Mar 2015 A.K. Gerlak, J. Lautze, M. Giordano, Water resources data and information exchange in transboundary water treaties. Int. Environ. Agreements 11, 179–199 (2011). doi:10.1007/ s10784-010-9144-4 GFP, Global Flood Partnership, Draft paper (2013). Available online at http://portal.gdacs.org/ Portals/0/GFP/Concept%20Paper%20Global%20Flood%20Partnership%20v4.3.pdf. Last accessed 1 Sept 2015 M. Haile, Weather patterns, food security and humanitarian response in sub-Saharan Africa. Philos. Trans. R. Soc. B 360(1463), 2169–2182 (2005). doi:10.1098/rstb.2005.1746 F.A. Hirpa, P. Salamon, L. Alfieri, J. Thielen, E. Zsoter, F. Pappenberger, The effect of reference climatology on global flood forecasting. J. Hydrometeorol. 17(4), 1131–1145, doi:10.1175/ JHM-D-15-0044.1 (2016) IDPM, Integrated Drought Management Programme (2015). Available online at http://www. droughtmanagement.info/idmp-activities/. Last accessed 18 May 2015 Z.W. Kundzewicz, Floods: lessons about early warning systems, in Late Lessons from Early Warnings: Science, Precaution, Innovation 347 Emerging Lessons from Ecosystems (European Environment Agency, 2013) Luxembourg: Publications Office of the European Union. doi:10.2800/73322

Saving Lives: Ensemble-Based Early Warnings in Developing Nations

21

G. Naumann, E. Dutra, P. Barbosa, F. Pappenberger, F. Wetterhall, J.V. Vogt, Comparison of drought indicators derived from multiple data sets over Africa. Hydrol. Earth Syst. Sci. 18, 1625–1640 (2014). doi:10.5194/hess-18-1625-2014 NiMet, Seasonal Rainfall Prediction for 2014 provided by the NiMet (2014). Available online at http://nimet.gov.ng/sites/default/files/publications/SRP%20BROCHURE%20FINAL.pdf. Last accessed 6 Oct 2015 A. Nishat, I. Faisal, An assessment of the institutional mechanisms for water negotiations in the Ganges–Brahmaputra–Meghna system. Int. Negot. 5(2), 289–310 (2000) NWS, United States National Weather Service, Meteorological Model Ensemble River Forecast (2015). Available online at http://www.erh.noaa.gov/mmefs/. Last accessed 29 Sept 2015 L. Ogallo, P. Bessemoulin, J.P. Ceron, S. Mason, S.J. Connor, Adapting to climate variability and change: the climate outlook forum process. WMO Bull. 57, 93–102 (2008) F. Pappenberger, L. Stephens, S.J. van Andel, J.S. Verkade, M.H. Ramos, L. Alfieri, J. Brown, M. Zappa, G. Ricciardi, A. Wood, T. Pagano, R. Marty, W. Collischonn, M. Le Lay, D. Brochero, M. Cranston, D. Meissner, HEPEX– Operational HEPS systems around the globe (2015). Available online at http://hepex.irstea.fr/operational-heps-systems-around-theglobe. Last accessed 3 Oct 2015 A.G. Patt, L. Ogallo, M. Hellmuth, Learning from 10 years of climate outlook forums in Africa. Science 318(5847), 49–50 (2007). doi:10.1126/science.1147909 PDNA, Nigeria post-disaster needs assessment 2012 floods (2013). Available online at https://www. gfdrr.org/sites/gfdrr/files/NIGERIA_PDNA_PRINT_05_29_2013_WEB.pdf. Last accessed 6 Oct 2015 P.J. Pilon, Guidelines for reducing flood losses. United Nations Office for Disaster Risk Reduction (UNISDR) (2002). Available online at http://www.unisdr.org/files/558_7639.pdf. Last accessed 10 Mar 2015 W. Pozzi, J. Sheffield, R. Stefanski, D. Cripe, R. Pulwarty, J.V. Vogt, R.R. Heim Jr., M.J. Brewer, M. Svoboda, R. Westerhoff, A.I.J.M. van Dijk, B. Lloyd-Hughes, F. Pappenberger, M. Werner, E. Dutra, F. Wetterhall, W. Wagner, S. Schubert, K. Mo, M. Nicholson, L. Bettio, L. Nunez, R. van Beek, M. Bierkens, L.G.G. de Goncalves, J.G.Z. de Mattos, R. Lawford, Toward global drought early warning capability: expanding international cooperation for the development of a framework for monitoring and forecasting. Bull. Am. Meteorol. Soc. 94, 776–785 (2013). doi:10.1175/BAMS-D-11-00176.1 G. Sepulcre-Canto, S. Horion, A. Singleton, H. Carrao, J. Vogt, Development of a combined drought indicator to detect drought in Europe. Nat. Hazards Earth Syst. Sci. 12, 3519–3531 (2012) J. Sheffield, E.F. Wood, Drought: Past Problems and Future Scenarios (Earthscan, Routledge, London, 2011), p. 192 J. Sheffield, E.F. Wood, N. Chaney, K. Guan, S. Sadri, X. Yuan, L. Olang, A. Amani, A. Ali, S. Demuth, A drought monitoring and forecasting system for sub-Sahara African water resources and food security. Bull. Am. Meteorol. Soc. 95, 861–882 (2013). doi:10.1175/ BAMS-D-12-00124.1 J. Thielen, J. Bartholmes, M.-H. Ramos, A. de Roo, The European flood alert system–part 1: concept and development. Hydrol. Earth Syst. Sci. 13, 125–140 (2009). doi:10.5194/hess-13125-2009 A. Turton, P. Ashton, E. Cloete, An introduction to the hydropolitical drivers in the Okavango River basin, in Transboundary Rivers, Sovereignty and Development: Hydropolitical Drivers in the Okavango River Basin, ed. by A. Turton et al. (African Water Issue Research Unit/Green Cross, Pretoria, 2003), pp. 7–30 UNISDR, Third United Nations World Conference on Disaster Risk Reduction, Geneva, 17–18 November 2014 (2014). Available at http://www.wcdrr.org/uploads/Zero-draftpost2015-framework-for-DRR-20-October-.pdf. Last accessed 10 Aug 2015 UNISDR, Sendai Framework for Disaster Risk Reduction 2015-2030. United Nations Office for Disaster Risk Reduction (UNISDR), 32p (2015)

22

F.A. Hirpa et al.

J.M. van der Knijff, J. Younis, A.P.J. de Roo, LISFLOOD: a GIS – based distributed model for river basin scale water balance and flood simulation. Int. J. Geogr. Inf. Sci. 24(2), 189–212 (2010) J.C. Veilleux, The human security dimensions of dam development: the Grand Ethiopian Renaissance Dam. Glob. Dialogue 15(2), 41 (2013) P.J. Ward, B. Jongman, F.S. Weiland, A. Bouwman, R. van Beek, M.F.P. Bierkens, W. Lictvoet, H.C. Winsemius, Assessing flood risk at the global scale: model setup, results, and sensitivity. Environ. Res. Lett. 8(4), 044019 (2013) H.C. Winsemius, L.P.H. Van Beek, B. Jongman, P.J. Ward, A. Bouwman, A framework for global river flood risk assessments. Hydrol. Earth Syst. Sci. 17(5), 1871–1892 (2013) WMO, World Meteorological Organization, Composition of the WMO (2014). Available at http:// www.wmo.int/wmocomposition/documents/wmocomposition.pdf. Last accessed 19 Oct 2015 WMO/GFP, Integrated Drought Management Programme, A joint WMO-GFP Programme, Concept Note, Version 1.2 (2011)

Communicating and Using Ensemble Flood Forecasts in Flood Incident Management: Lessons from Social Science David Demeritt, Elisabeth M. Stephens, Laurence Créton-Cazanave, Céline Lutoff, Isabelle Ruin, and Sébastien Nobert

Contents 1 2 3 4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Visualizing HEPS Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Perception and Understanding of Probabilistic Forecast Information . . . . . . . . . . . . . . . . . . . . . . . 7 Case Studies of HEPS in Operational Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 France . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

D. Demeritt (*) Department of Geography, King’s College London, Strand, London, UK e-mail: [email protected] E.M. Stephens School of Archaeology, Geography and Environmental Science, University of Reading, Whiteknights, Reading, UK e-mail: [email protected] L. Créton-Cazanave Université Paris Est Marne-la-Vallée, Labex Futurs Urbains (LATTS, LEESU, Lab’Urba), Marnela-Vallée, France e-mail: [email protected] C. Lutoff Université Grenoble 1, PACTE UMR 5194 (CNRS, IEPG, UJF, UPMF), Grenoble, France e-mail: [email protected] I. Ruin Laboratoire d’étude des Transferts en Hydrologie et Environnement (LTHE), CNRS, Grenoble, France e-mail: [email protected] S. Nobert School of Earth and Environment, University of Leeds, Leeds, UK e-mail: [email protected] # Springer-Verlag Berlin Heidelberg 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_44-1

1

2

D. Demeritt et al.

4.2 United Kingdom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 European Flood Awareness System (EFAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 The Global Flood Awareness System (GloFAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16 19 22 25 26

Abstract

This chapter explores the practical challenges of communicating and using ensemble forecasts in operational flood incident management. It reviews recent social science research on the variety and effectiveness of hydrological ensemble prediction systems (HEPS) visualization methods and on the cognitive and other challenges experienced by forecast recipients in understanding probabilistic forecasts correctly. To explore how those generic findings from the research literature work out in actual operational practice, the chapter then discusses a series of case studies detailing the development, communication, and use of HEPS products in various institutional contexts in France, Britain, and internationally at the EU and global levels. The chapter concludes by drawing out some broader lessons from those experiences about how to communicate and use HEPS more effectively. Keywords

Risk communication • Risk perception • Cognitive biases • Heuristics • Visualization • Spaghetti plots • HEPS, public saliency of • Decision-making • Probability of precipitation forecasts • Uncertainty, public understanding of • Civil protection • EFAS • GloFAS • SCHAPI • UK Met Office • Environment Agency

1

Introduction

Recent years have seen rapid advances in flood forecasting. With developments in supercomputing, data assimilation, and modeling, real-time ensemble flood forecasting is increasingly now an operational reality for forecasting agencies. However, the technical capacity of fully coupled hydrological ensemble prediction systems (HEPS) to extend forecast lead times and provide quantitative information about the uncertainties associated with those forecasts has often galloped ahead of the institutional capacity to communicate and use those forecasts effectively. For all its promises, HEPS will be of little practical benefit in reducing the toll from flood disasters if the information they generate cannot be successfully communicated, interpreted, and used by those immediately at risk and the emergency services charged with protecting them. Accordingly, this chapter explores the practical challenges of communicating and using ensemble forecasts in operational flood incident management. It summarizes recent social science research on the variety and effectiveness of HEPS visualization methods and on the cognitive and other difficulties experienced by forecast recipients in understanding probabilistic forecasts correctly. To explore how those generic

Communicating and Using Ensemble Flood Forecasts in Flood Incident. . .

3

findings from the research literature shape operational practice, the chapter then draws on a series of case studies from the published literature about the development, communication, and use of HEPS products in various institutional contexts. The chapter concludes by drawing out some broader lessons from those experiences about how to communicate and use HEPS more effectively.

2

Visualizing HEPS Forecasts

Risk communication research highlights the ways in which the design and framing of uncertainty information can shape its perception (Visschers et al. 2009; Spiegelhalter et al. 2011). HEPS is a new technology, and there is, as yet, no expert consensus on the best methods for visualizing its forecasts, or even, as Demeritt et al. (2010) and Ramos et al.(2010) document, on what is the most important information to extract and transmit from a probabilistic flood forecast. However, it is increasingly recognized that there can be no “one-size-fits-all” solution: different users have different decision-making needs and will require different information visualized in different ways to meet those needs (Pappenberger et al. 2013) For the most part, operational HEPS have tended to rely on fairly conventional visualization techniques (Lumbroso et al. 2009; Cloke and Pappenberger 2009; Bruen et al. 2010). So-called spaghetti-style hydrographs representing the temporal evolution of different ensemble members are quite common, as they can illuminate the degree of dispersion among ensemble members and thus provide an at-a-glance assessment of forecast uncertainty (Fig. 1). Particularly for less expert users, these may be displayed in a more simplified form. Sometimes they are converted to percentiles and displayed as a plume chart (Fig. 2) or box plot, or alternatively presented in reduced form as the mean of the ensemble members and their maximum and minimum values (Fig. 3). Another method for summarizing the ensemble – and one that does not require postprocessing – is in tabular form summarizing the number of ensemble members for a given location exceeding various thresholds over time (Fig. 4). This enables the timing and persistence over time of any signal to be assessed (Pappenberger et al. 2011). However, neither hydrographs nor these tabular displays provide any spatial information about the location of the risk and its spatial patterning, which can be important for some users. To address that gap, HEPS forecasts in Sweden are presented on the public website in a choropleth map in which the shading of 1001 sub-basins in a national overview map represents the probability of exceeding the very lowest warning threshold for that sub-basin (Fig. 5). This very high-level summary gives a good overview but does not provide much local detail. The European Flood Alert System (EFAS) has developed a hybrid visualization that combines threshold exceedance maps, in which map pixels are color coded to represent the number of ensemble members at each point exceeding various thresholds, with additional clickable tabular summaries for any pixel (Fig. 6) as well as, for some selected gauging stations where data has been made available, postprocessed hydrographs of the ensemble (Fig. 2). Rather than just conventional hydrographs, Zappa et al. (2013)

4

D. Demeritt et al.

20 Konvektive Starkniederschläge 23.05.2005

18 Niederschlag [mm]

16 14 12

Beobachtung

10 8 6

Hauptlauf

4 2 0 0

4

8

12

16

24 28 20 Vorhersagezeit [h]

32

36

40

44

48

Fig. 1 In Austria, emergency services personnel working in the Abteilung Feurwehr und Zivilschutz Landeswarnzentrale (the Fire Service and Civil Defence Early Warning Centre) have additional access to much richer EP outputs, including “spaghetti” plots of the 51-member ALADIN-LAEF of convective rainfall, which are the light colored lines in this plot which also shows the deterministic forecast (Hauptlauf) in black and the observed in red

recommend the benefits of what they call a peak-box approach to draw attention to the temporal spread and magnitude of peak flood flow forecasts in different ensemble members. As with many such best practice recommendations for communicating HEPS (Martini and De Roo 2007), it is not backed by any empirical testing or systematic evaluation of its communicative effectiveness, which is now recognized as a research priority (Spiegelhalter et al. 2011; Demeritt and Nobert 2014). One of the few studies to empirically evaluate HEPS visualization was conducted by Pappenberger et al. (2013). Based on focus group discussions with operational forecasters and other experts, they found strong support for the idea that HEPS needed to provide expert users with information about the following: • • • • • • •

Discharge Lead time in the form of a fully written date and time Warning/alert levels and/or return period Observations and past model performance Representation of uncertainty, either in percentiles or more simplified quantiles Worst/best case scenarios Metadata about station location and model, forecast provider, and whom to contact for further information • A risk measure (expected cost, population affected, etc.) However, it was also recognized that displaying such a rich array of information may be more appropriate for hydrological experts than for many other HEPS forecast

Communicating and Using Ensemble Flood Forecasts in Flood Incident. . .

5

Fig. 2 Fan chart display of HEPS forecast from the European Flood Awareness System. The x-axis displays the time in days. The y-axis displays the flow discharge. MHQ stands for mean high discharge (taken from observed). MQ stands for mean flow (from observed). The plot shows the simulated forecast (pink line) and the observations (open circles). It also displays the three main meteorological forecasting systems driving EFAS (DWD, ECMWF, and COSMO-LEPS). It gives a full account of all uncertainties (model and predictive). Probability of threshold exceedances is shown in the right-hand panel. A detailed explanation can be found in the work of Bogner and Pappenberger (2011). Courtesy of EFAS, Joint Research Centre, European Commission, Ispra, Italy

recipients lacking hydrological expertise, such as members of the emergency services, elected officials, or members of the general public. Since user needs are likely to differ by customer type, level of expertise, and decision-making context, Pappenberger et al. (2013) cautioned that firm conclusions must await further research focused specifically on evaluating the effectiveness of different HEPS visualizations at meeting those needs. Nevertheless, research on other domains of ensemble forecast communication does suggest some potential lessons for the communication of HEPS. A report from the US National Research Council (NRC 2006) provides an extensive review of the literature on communicating the uncertainty of meteorological forecasts. Among other things, it highlights the need for communication to be tailored to user needs. For instance, in the USA, the National Hurricane Center has been using “cone of probability” visualizations to depict the uncertainty about forecasted storm tracks (Fig. 7). However, Broad et al. (2007) found that the meaning of these visualizations was not always understood correctly by the general public because of ambiguity about what the cone represents: is it uncertainty around the track of the storm or the

6

D. Demeritt et al.

Station: Korneuburg Prognose Korneuburg Prognose

Stationsname

Gewässer

207241P

Mst-Nr.

Donau

Wasserstandsprognose Grafik: Woche Download: Monat Korneuburg Prognose WasserstandPrognose [cm] Wahrscheinlichste Prognose

350 300 250 200 150

0

:0

:0

2

.1

05

2 ,1

.1

06

0

1 2,

.1

06

0

:0

2:

0 ,0

2

2

.1

05

00

0

0

:0

0 ,0

0

:0

0 ,0

2

.1

07

nachste Prognose 07:00

400

0

:0

2 ,1

2

.1

07

Messwerte

Vertrauensbereich

Prognosezeitpunkt 07:00

450

Aktueller Zeitpunkt

WasserstandPrognose [cm]

MW96

2

.1

08

0 ,0

:0

2 ,1

2

2

.1

09

0

:0

:0

2 ,1

2

.1

08

0

0

:0

0 ,0

.1

09

0 ,0

2

.1

10

07.12.2007,16:55

Fig. 3 In Austria, the public has access to simplified HEPS forecasts of streamflow, with the blue line showing observation, the green line a “best-guess” forecast, and the two gray lines the 10 and 90 % confidence intervals

area that will be impacted by the storm, which is very much broader than the path of the storm eye itself. Research on the cone of uncertainty points to the importance of being explicit about what, exactly, is being represented in HEPS visualizations and what other features are being abstracted away. In this case the graphic clearly shows the uncertainty about the storm track, but at the cost of omitting other information that is arguably more important for public safety, such as the uncertainties about maximum wind speeds or about the scale and extent of secondary flooding, which the general public often fails to appreciate is in fact the leading cause of hurricane fatalities in the USA (Morss and Hayden 2010). Distilling lessons for the communication of climate ensembles, Stephens et al. (2012) conclude that when designing a means of communication attention needs to be paid to balancing three priorities (Fig. 8). The first is robustness: making sure that the visualization reflects the scientific confidence in the prediction. Second, richness is the amount of detail, data dimensionality, and contextualizing information that are represented. Finally, saliency is the degree to which the information is relevant to the decision needs of the user. These priorities are often in tension with one another. As Stephens et al. (2012) explain “Some users may demand increases in informational richness (e.g., a full probability distribution rather than a range) that impact the ability of others to understand or use the information. Likewise concerns

Communicating and Using Ensemble Flood Forecasts in Flood Incident. . .

7

Fig. 4 Persistence in the number of COSMO-LEPS driven HEPS ensemble members for River Sihl, Zurich, Switzerland. The columns represent the days being forecasted and the rows the 5 day forecasts generated on a given date. The numbers inside each cell give the maximum forecasted flow and the number of ensemble member at the threshold level represented by the color of the cell, while the color of the cell border shows the highest warning level exceeded by any single ensemble member. Further details on this HEPS product are provided by Bruen et al. (2010)

with robustness (e.g., limitations and ambiguities of the EP) might require reduced informational richness, given that highly contested or incomplete predictions should not be communicated with unwarranted precision. Such alterations in richness, in turn, also affect perceptions of saliency, potentially decreasing it for users who want access to particular predictions, or increasing it for those who prefer simple, unambiguous results.” Beyond highlighting the general point that the design of HEPS products needs to be sensitive to the needs of the particular audiences for which they are intended, the framework developed by Stephens et al. (2012) provides a way for articulating the trade-offs and compromises involved in selecting among alternative HEPS visualizations.

3

Perception and Understanding of Probabilistic Forecast Information

While some, very simplified HEPS forecast products are now being disseminated directly to the public in a few countries, including Austria (Fig. 3), and Sweden (Fig. 5), with France, Canada, and several others planning to do so as well, there has been no research as yet about whether and how they are understood by public audiences. There is, however, a rich body of research on public understandings of other forms of forecast and uncertainty information, and it provides a sound basis for

8

D. Demeritt et al.

P>=98 Nästan säker 75 1ÞÞ), where different probabilities are indicated with shades of red. Also, the three layers of alert points defined above are shown with triangles of size proportional to the probability of EPIC to exceed the

Flash Flood Forecasting Based on Rainfall Thresholds

7

corresponding alert class. An example of the resulting display of EPIC on the EFAS web interface (www.efas.eu) is shown in Fig. 1 for forecasts of the 28 February 2011 12:00 UTC for the Marche region in central Italy. At each reporting point, a time-plot displays with blue shadings the forecast return period of EPIC for a probability range of 5–95 % (e.g., see Fig. 2). According to the operational alerting rules currently used in EFAS, these forecasts would have resulted in a flash flood alert for six rivers (indicated by red triangles in Fig. 1), in the Marche and Abruzzo regions. On 02 March 2011, media news confirmed widespread flooding in most rivers of the two regions, due to the most intense rainfall in 40 years (http://www.ansa.it/web/notizie/rubriche/english/2011/03/02/ visualizza_new.html_1561001936.html, last accessed on 11 July 2016). The location and timing of this event were skillfully captured by EPIC forecasts, together with the river basins where the most extreme features would exhibit. One can see in Fig. 2 that return period time-plots put the focus on extreme conditions, while rainfall depths below the mean annual maxima tend to converge to the 1-year return period. Vice versa, in such plots the uncertainty spread increases for extreme values, proving the usefulness of the probabilistic information but also showing the difficulty in providing accurate alerts in operational warning systems. As example, the event peak in Fig. 2 has a coefficient of variation (CV) of the predicted EPIC ensemble of CVEPIC = 0.17, while the corresponding one calculated on the ensemble of return periods, CVT = 0.87, is about five times larger.

2.3

Case Study: Flash Floods in Sardinia (Italy) in November 2013

During the night between 18 and 19 November 2013, exceptionally high rainfalls fell in Sardinia, Italy. According to the website of ARPA Sardegna, the regional environment agency, more than 350 mm of precipitation was recorded within 24 h in a considerably large region in central-east Sardinia. News reports from the BBC (http://www.bbc.co.uk/news/world-europe-24996292, last accessed on 11 July 2016) quote up to 440 mm in 90 min, which place it among the highest measurements on record worldwide. A total of 18 casualties were reported, the area near Olbia being the most severely affected. Weather forecasts used in EFAS predicted high rainfalls for Sardinia, Corsica, northern Spain, and southern France since 14 November, though forecasts were not persistent and underestimated the actual precipitation totals. For this event, EPIC showed a medium probability for flash floods across the island about 1 day before the event (see Fig. 3). Only the forecast of 18 November 201312:00 UTC indicated a high probability of flash floods which would have resulted in an EFAS warning to authorities, though when the event was ongoing. This example highlights how the performance of EPIC are directly related to those of the underlying weather models in predicting extreme precipitation. In addition, at the grid resolution of 5–10 km the forecast uncertainty is high. Often only few members of the ensemble are able to reproduce extreme conditions potentially leading to flash floods, making the alert detection particularly challenging. A key issue in operating early warning systems based on ensemble or probabilistic forecasts is the definition

Fig. 1 Map of forecast probability of exceeding the mean annual rainstorm (i.e., EPIC = 1). Model run of 28 February 2011 12:00 UTC. Color saturation of red lines is proportional to the percent probability level (see legend). EPIC reporting points and flash flood alerts are shown with pink and red triangles, respectively

8 L. Alfieri et al.

Flash Flood Forecasting Based on Rainfall Thresholds

9

Fig. 2 Forecast return period of EPIC on 28 February 2011 12:00 UTC for a reporting point at the outlet of the Potenza River in central Italy. The maximum probabilities of exceeding the three alert thresholds are shown on the left. The ensemble mean and the interquartile range are also shown with solid and dashed black lines

of minimum probability thresholds used to trigger the alerts. Lower thresholds enable the detection of events with high forecast uncertainty, as in the case described above, and improve the hit rate of the system though at the cost of a higher average false alarm rate. Ideally, probability thresholds should be calibrated depending on the ratio between the cost of issuing an alert and the economic losses in case a flash flood occurs.

2.4

Performance in Early Detection of Extreme Storms and Flash Floods

By its definition, EPIC is not designed to detect all types of floods, but rather those in small-size catchments (with a lower boundary depending on the resolution of the NWP) induced by short and intense rainfall events. The collection of quantitative discharge data and of flood thresholds in small rivers throughout Europe, for validation purpose, is a huge and painstaking task. Flash floods usually occur in ungauged catchments, where the only source of information is post-event descriptive reports. Besides, even when gauging stations are available, they are sometimes damaged and made inoperative by the rage of the flood flow. Performance in operational monitoring of the early warning system described above was assessed through a qualitative approach, by selecting the strongest recorded signals of

10

L. Alfieri et al.

Fig. 3 EPIC flash flood forecast of 18 November 2013 00:00 UTC, at the outlet of the Cedrino River, Sardinia

upcoming severe events from EPIC and verifying the actual occurrence of flooding events in the areas where they were forecast. A research work by Alfieri and Thielen (2015) analyzed 22 months of forecasts of EPIC driven by COSMO-LEPS forecasts, ending in September 2011. They obtained a threshold for flash flood alerts corresponding to 60 % probability of exceeding the 5-year return period of EPIC. Such criterion enabled the detection of 363 alerts out of the overall set of reporting points (see Fig. 4), belonging to 50 distinct events. Their occurrence was investigated by searching for reported news on the internet. The main source of information used is the flooding section of the European Media Monitor (EMM, http://emm.jrc.it/, last accessed on 19 August 2015). EMM NewsBrief was developed at the Joint Research Centre of the European Commission. It is a summary of news from the world in several languages, which is generated automatically by software algorithms. EMM news have been complemented by the Emergency Events Database (EM-DAT, http://www.emdat. be/, last accessed on 11 July 2016) of the Centre for Research on the Epidemiology of Disasters (CRED) and by targeted internet searches on national and regional news

Flash Flood Forecasting Based on Rainfall Thresholds

11

Fig. 4 EPIC reporting points (black dots) and flash flood alerts (red circles) between December 2009 and September 2011. COSMO-LEPS spatial domain is also indicated with a gray shaded area (From Alfieri and Thielen (2012), Copyright # 2012 Royal Meteorological Society, first published by John Wiley & Sons, Ltd.)

websites. Out of 50 events, in 42 cases reported, news of rainstorms and economic losses were found, due to a combination of floods, flash floods, surface water flooding, debris flow, landslides, hail, lightning, sea waves, or wind storms. The remaining eight cases were not confirmed by media news as hazardous events. Reasons were attributed to errors in the event severity (two cases), to a shift in their location (three cases), and to boundary issues for regions at the edge of the simulation domain (three cases).

2.5

An Outlook to Future Flash Flood Early Warning in Europe

EPIC is used for flash flood early warning as part of the operational EFAS suite. A semiautomated procedure was set up to speed up the preparation of alert emails and reduce the component of human error due to stress and time pressure. A fully automated procedure to send flash flood alerts is envisaged for the near future, in order to maximize the warning lead time after the latest forecast results are ready. In addition, since 2014 EFAS flash flood, alert emails include information on the landslide susceptibility of the affected catchments (from G€unther et al. 2013), to

12

L. Alfieri et al.

help identify possible hotspots of upcoming debris flow events and rainfall-driven landslides. Research work by Raynaud et al. (2015) showed how the performance in the event detection can be improved through a modified version of EPIC based on runoff instead of precipitation. It takes into account the initial soil moisture conditions and geomorphological features to weigh the contribution of rainfall on the severity of the forecast event. Similarly, Alfieri et al. (2014) extended the formulation of EPIC to detect floods in a wider range of basin size, using a nonparametric approach relying exclusively on the output of a state-of-the-art global circulation model coupled with a land-surface scheme. They defined the extreme runoff index (ERI), which is designed to detect extreme accumulations of surface runoff over critical flood durations for each section of the river network.

3

Deterministic and Ensemble Flash Flood Early Warning in Southern Switzerland

The flash flood early warning tool deployed in southern Switzerland (canton of Ticino) is running in real time since July 2013. For some regions of the target area, real-time hydrological ensemble predictions have been established since 2007 (Zappa et al. 2008, 2013) including the use of ensemble weather radar Quantitative Precipitation Estimates (QPEs) (Germann et al. 2009; Zappa et al. 2011; Liechti et al. 2013b). Such real-time systems based on the hydrological model PREVAH (Viviroli et al. 2009) need to be calibrated against observed streamflow and are difficult to be configured for small ungauged areas. The examples presented in Sects. 2 and 4 demonstrate the potential of early warning based on accumulated precipitation. Thus, a tailored tool based on these principles was developed for southern Switzerland, too. First-order catchments are used as baseline spatial unit to accumulate precipitation. To calculate the areas of the first-order catchments a digital terrain model with a spatial resolution of 200 m has been used and processed following the algorithms presented in Binley and Beven (1992). In the target area, 759 first-order catchments have been isolated (Fig. 5, left).

3.1

Estimation of Return Periods

In the Swiss application, return periods have been estimated following the generalized extreme value (GEV) distribution introduced by Jenkinson (1955). It incorporates three types of commonly used extreme value distributions in a single function, the Gumbel distribution (type I), the Fréchet distribution (type II), and the Weibull distribution (type III) (Martins and Stedinger 2000). The standardization of the three types simplifies the extreme value analysis substantially. No subjective decision is needed as the data itself defines the matching type of extreme value distribution through the shape parameter (Leadbetter et al. 1983).

Flash Flood Forecasting Based on Rainfall Thresholds

160

160

150

150

13

Piotta

140

140

130

130

X [km]

X [km]

Robièi

120

120

110

110

100

100

90

90

80

80

S. Bernardino Acquarossa / Comprovasco

Cimetta Magadino / Cadenazzo Locarno / Monti

Lugano

Stabio

70

70 670

680

690

700

710

Y [km]

720

730

740

670

680

690

700

710

720

730

740

Y [km]

Fig. 5 Left: First-order catchments, southern Switzerland (Ticino river basin). 759 first-order catchments exist with an area between 0.02 and 29.7 km2. Right: Location of the nine MeteoSwiss rain gauge stations with a temporal resolution of 10 min

Three parameters define the GEV: location μ, scale σ, and shape ξ. It reduces to one of the three distribution types for different ranges of the shape parameter: the Gumbel distribution with ξ = 0, the Fréchet distribution with ξ > 0, and the Weibull distribution with ξ < 0. The cumulative distribution function of the GEV is written as:  h x  μi1=ξ  GðxÞ ¼ exp  1 þ ξ for ξ 6¼ 0 σ n h x  μio for ξ 6¼ 0 GðxÞ ¼ exp exp  σ

(7) (8)

where 1 + ξ(x  μ)/σ > 0 and 1 < μ < 1, σ > 0, and 1 < ξ < 1 (Coles 2001). The parameters of the GEV were estimated by the maximum likelihood method (Aldrich 1997). As the GEV does not satisfy the regularity conditions required for the usual asymptotic properties associated with the maximum likelihood estimator to be valid, Smith (1985) gave the following guidelines: • ξ < 0.5, the maximum likelihood estimation is valid; it has the usual asymptotic properties. • 1 > ξ > 0.5, the maximum likelihood estimation has a result, but the standard asymptotic properties are not fulfilled. • ξ > 1, maximum likelihood estimation is unlikely.

14

L. Alfieri et al.

For extreme values ξ  0.5 is rare. Therefore, the theoretical restriction of the maximum likelihood method is normally no obstacle in practice (Coles 2001). For the analysis of series of yearly maxima from natural processes shape parameters above 0.5 indicate problems with the data or different processes involved. For analysis, the block maxima approach was chosen. Homogeneity and stationarity were tested by trend analysis using the Mann-Kendall test (Kendall 1970) and the Run test (Wald and Wolfowitz 1940).

3.2

Combined Use of Station Information and Gridded Data for IDF Estimations

Similar to EPIC, also the early warning tool used in southern Switzerland adopts intensity-duration-frequency (IDF) curves. Different data sources are evaluated according to the IDF formulation described in Koutsoyiannis et al. (1998). This IDF equation relates the three indicators related to heavy precipitation: intensity, duration, and return period. IDF are generally created to assess the frequency of a certain rainfall intensity for a certain event duration. Once the relation is established, then a real-time product can be evaluated with respect to its probability of recurrence. In a first step, intensity-duration-frequency (IDF) curves for each catchment have been calculated based on: • Rain gauge data for the period 1980–2013 from the monitoring network SwissMetNet of MeteoSwiss. Nine stations within the target area observing precipitation with a temporal resolution of 10 min have been used (Fig. 5, right). • The gridded RhiresD dataset (Schiemann et al. 2010; MeteoSchweiz 2013). RhiresD has a spatial resolution of 1 km and a temporal resolution of 1 day. It has been calculated based on approximately 420 rain gauge measurements covering the entire territory of Switzerland. It is available from 1961 to 2012. • Thirty-year hindcast of the COSMO-LEPS numerical weather prediction model. The hindcast is available for the time period between 1971 and 2001. The used data has a temporal resolution of 1 day. The grid size is 10 km (Fundel et al. 2010; Jörg-Hess et al. 2015). The gridded data have been downscaled (nearest-neighbor, Fundel and Zappa 2011) to generate local integral time series for each of the evaluated first-order catchments. Finally, IDF curves have been calculated for each precipitation dataset (single gauges, RhiresD, and COSMO-LEPS) and first-order catchment. A GEV fit (see Sect. 4.1) has been estimated for every duration for which the relation between precipitation intensity and the return period is of interest (from 10 min up to 10 days in the specific case of the early warning system for southern Switzerland). Since only the single gauges provide information on sub-daily precipitation intensities, a methodology has been developed to combine the gridded datasets with the information stemming from the ground stations (Knechtl 2013). The information of the 10-min rain gauge measurements has been adopted to extend

Flash Flood Forecasting Based on Rainfall Thresholds

15 [mm/h]

160

160

150

150

140

140

130

130

46

42

38

X [km]

X [km]

33

120

120 29

110

110

100

100

90

90

25

20

80

RhiresD Intensity [mm/h]: d = 2h, T = 2yr

80

70

16

COSMO−LEPS Intensity [mm/h]: d = 2h, T = 2yr

12

70 670

680

690

700

710

Y [km]

720

730

740

670

680

690

700

710

720

730

740

Y [km]

Fig. 6 Catchment-specific rainfall intensities [mm/h] for a duration of 2 h and a return period of 2 years. Left: RhiresD. Right: COSMO-LEPS

the catchment IDF functions. Knechtl (2013) associated each catchment to one of the local rain gauges. The association is realized basing on the similarity between the IDF of the stations and of the sub-areas in case of durations of 1–10 days and return periods of 2–20 years. The assumption is that similar IDF in this interval is a sign that the IDF is similar for all durations and thus the local information provided by the station can be used to integrate the information from the two gridded datasets. Figure 6 is an example of the resulting spatial intensity distribution based on the gridded precipitation datasets RhiresD (left) and COSMO-LEPS (right) for the duration 2 h and the return period of 2 years. For the RhiresD dataset, the weakest intensities occur in the eastern and southern part of the canton of Ticino. Medium intensities are shown in the northern part and in the central Ticino. The strongest intensities are in the north of the Lago Maggiore. Furthermore, there is a strong intensity gradient between the Italian catchments and the Swiss catchments in the west of the Lago Maggiore. Concerning the COSMO-LEPS hindcast (right-hand side of Fig. 6), relatively weak intensities occur in the southern Ticino, medium intensities occur in the northern Ticino, and highest intensities occur in the east and western part of the target area. The spatial distribution of precipitation intensities between the RhiresD and the COSMO-LEPS dataset is clearly different. These differences are due on how these datasets have been elaborated. The COSMO-LEPS hindcast dataset contains the COSMO-LEPS-related model biases, and the RhiresD contains biases owed to the limitations of the spatial representativity of rain gauges. Furthermore, the strong intensity gradient between the Italian and Swiss catchments in the west of the Lago

16

L. Alfieri et al.

Maggiore only appear for the RhiresD dataset whereas there is no such gradient for the COSMO-LEPS dataset. The resulting IDF equations for the Italian catchments are erroneous due to missing data in the RhiresD dataset outside Switzerland. To a weaker extent, the same phenomenon occurs in the southeast where Italian sub-catchments show also a weaker intensity compared to the neighboring Swiss sub-catchments for the RhiresD dataset. This limitation could be eliminated only if a homogenous transnational dataset would be available. In the presented application, the availability of separate IDF analysis for different data sources allows switching between different IDFs in the case of real-time operations. Alerts based on observations are triggered by comparison to the RhiresD IDF, while alerts for the next days are triggered by comparison to the COSMO-LEPS IDF. This reduces the problem of inhomogeneity and bias between the raw COSMOLEPS output and the observation-based products (Fundel et al. 2010).

3.3

Operational Implementation Forced by Deterministic and Ensemble NWP

The obtained IDFs associated to COSMO-LEPS and RhiresD long time series are the core of the deployed early warning system. Real-time forecasts and gridded observations are obtained from MeteoSwiss. Three datasets are used: • COSMO-LEPS forecast. COSMO-LEPS is a Limited-Area Ensemble Prediction System developed and run by the Consortium for Small-Scale Modeling (Marsigli et al. 2005). This probabilistic numerical weather forecast model has 16 members. It is available for lead times up to 132 h at a spatial resolution of 7 km (Addor et al. 2011; Zappa et al. 2008). Forecasts are initialized at 12:00 UTC and are delivered approximately 10 h later. Thus, analyses are completed only for 120 h starting from 00:00 UTC of the next day, 12 h after initialization. • COSMO-2 is a high-resolution realization of COSMO-LEPS and includes explicit calculation of small-scale convection. The grid resolution of the model is 2.2 km and the forecast lead time is 33 h with a temporal resolution of 1 h. A new forecast is calculated every 3 h (Zappa et al. 2008; Ament et al. 2011). • CombiPrecip is the observation-based dataset used within the Swiss flash flood early warning tool. This dataset combines rain gauge with radar measurements. The aim of this dataset is to combine the quantitatively accurate gauge measurement data with the radar measurement that covers a large area (Sideris et al. 2014). It is operationally available at a temporal resolution of 1 h. Different accumulations can be computed. In operational mode, COSMO-2 is nudged to real-time data of a weather radar precipitation obtained from CombiPrecip. Figure 7 presents the setup of the real-time tool consisting of an “early detection” component based on COSMO-LEPS, which is evaluated against the COSMO IDF for accumulated rainfall of 1–5 days. A second “nowcast” component consists of the

Flash Flood Forecasting Based on Rainfall Thresholds

“Nowcast”

17

“Early detection” COSMO-LEPS Updated daily

CombiPrecip COSMO2 Updated hourly

LT0 LT1 LT2 LT3 LT4

Today 00:00

IDF RhiresD

IDF CLEPS

∑1d, 2d, 3d, 4d, 5d Alert based on probability of exceeding the return period of 2 years (T2) for each accumulation interval Alert levels based on Return period (in years) for each accumulation interval

1.25

2.0

5.0

0/16

8/16 Members exceeding T2

16/16

Fig. 7 Sketch of the real-time Swiss flash flood early waning tool consisting of an early detection component (right part) and a nowcast component (left side). See main text for further information

combination of the latest 12 hourly fields of CombiPrecip and the forecasted precipitation fields of COSMO-2 for the next 12 h. The accumulated rainfall is evaluated for different intervals (from 1 to 12 h) and depending on the source of the data either the IDF curves of RhiresD or the one based on COSMO-LEPS is used. The nowcast component is updated each hour, while the early detection component runs once every day. Knechtl (2013) evaluated this setup against observed events. These events are either discharge peaks in gauged sub-areas or reports of damages caused by flash flood events. The hypothesis that it is possible to detect hydrological events with the flash flood early warning tool could be partly confirmed. The highest skill is obtained if the return period of CombiPrecip is assessed at hourly time scale. With this, it was possible to confirm most of the damage events that occurred in 2010 and 2011. The prototype tool is affected by several false alarms. This is because initial conditions of the soils are not considered.

3.4

Case Study in October 2014

The early warning system presented in Fig. 7 is operational since July 2013. In the period between the start of operations and December 2014 damages related to floods

18

L. Alfieri et al.

INIT DAY

LT1 DAY

LT2 DAYS

LT3 DAYS

LT4 DAYS

20141010

20141011

20141012

20141013

20141014

∑24h

Members exceeding T2 16/16

∑48h

8/16

∑72h

0/16

Fig. 8 Early detection of flash floods in southern Switzerland on the basis of operational COSMOLEPS daily rainfall fields. The examples display alert based on the probability of exceeding the return period of 2 years (T2) for each accumulation interval (24, 48 and 72 h). The situation relates to a forecast issued on 10 October 2014 and indicating possible floods for the next 4 days (see text for additional information)

(Hilker et al. 2009) have been reported for more than 20 calendar days (Andres et al. 2015). Numerous damage events occurred on 13 October 2014. In this section, we present the situation as seen by the tool for this particular day. Figure 8 shows the outcomes of the early detection component as available on 10 October 2015 (about 72 h ahead of the event). The analysis of the probabilistic COSMO-LEPS rainfall fields indicates that there is locally some probability to exceed the 2-year return period (T2) for 24, 48, and 72 accumulated rainfall. The 24-h maps indicate that in the northwest areas the intense rainfall should already occur on 11 October 2014. In the southeastern areas, the main rainfall has to be expected for 13 October 2014. The 48-h maps present a band extending from southwest to northeast with probabilities of about 30 % (five to six COSMO-LEPS members) to exceed T2 for the period 12–13 October 2014. A similar interpretation can be achieved when inspecting the map of 72-h accumulated rainfall for the period 11–13 October 2014. The event itself was characterized by a period of intense precipitation between 01:00 UTC and 13:00 UTC on 13 October 2014. Figure 9 presents three different visualizations of rainfall information during this 12-h interval. The top panel (source MeteoSwiss via https://www.gin.admin.ch/, last accessed on 19 August 2015; Heil et al. 2014) presents the accumulated rainfall from the weather radar QPE for a duration of 12 h. The lower left panel presents the evaluation of the CombiPrecip accumulated rainfall during the same time interval under consideration of the

Flash Flood Forecasting Based on Rainfall Thresholds

19

Fig. 9 Heavy precipitation event in southern Switzerland on 13 October 2014. Visualization of analyses of accumulated precipitation between 01:00 UTC and 13:00 UTC. Top panel: accumulated rainfall from the weather radar QPE of MeteoSwiss. Bottom left panel: evaluation of the CombiPrecip accumulated rainfall. Bottom right panel: evaluation of the COSMO-2 forecast available at 20:00 UTC of 12 October 2014. See also Fig. 7 and the text for further information

RhiresD IDF (Fig. 6, left). The lower right panel of Fig. 10 illustrates the evaluation of the COSMO-2 forecast available at 20:00 UTC of 12 October 2014 (5 h prior to the beginning of the relevant accumulation interval). The return period of the COSMO-2 accumulated rainfall was assigned after comparison to the COSMOLEPS IDF (Fig. 13, right). Since CombiPrecip is a “post-processed” version of the radar QPE, the general shape of the evaluated CombiPrecip rainfall and of the radar accumulated QPE is very similar. The early warning tool highlights several spots with high precipitation intensities. These spots are within an ellipse stretching from southwest to northeast. This is very similar to the indications obtained from the early prediction obtained on

20

L. Alfieri et al.

c

b

a

l

ll l

l l l l

l

l l l

l

l

l

l

l

ll l

l l

l l l l

ll l

l l l

l

l

DE: 13.10.2014 d: 24h CPC: 13.10.2014

DE: 13.10.2014 d: 24h CPC: 12.10.2014

l

l

ll l

l l

l l l l

ll l

l l l

l l

ll l

l

DE: 13.10.2014 d: max1h in CPC day CPC: 13.10.2014

e

d

l

l

l

ll l

l l l l

l

l l l

l

l

ll l

l l

l l l l

ll l

l l l

l l

ll l

l

l

l

DE: 13.10.2014 d: 24h, lt: 1h C2: 13.10.2014 00

DE: 13.10.2014 d: 12h, lt: 1h C2: 13.10.2014 00

g

f

prob

l

l

ll l

l l l l

l

l l l

ll l

l

l

ll l

l l

l l l l

l l l

ll l

l

1

0.8

l

l

T = 1.01−1.25 yrs T = 1.25−2 yrs T = 2−5 yrs T = > 5 yrs water/debris flow

l

0.6

l

0.4 0.2

Sum: 24h, T = 2yr Calc day: 12.10.2014 Forecast day: 13.10.2014

Sum: 24h, T = 2yr Calc day: 11.10.2014 Forecast day: 13.10.2014

0

Fig. 10 Visualization of locations affected by floods and debris flow (blue dots) and correspondent alerts issued by the flash flood early warning tool for southern Switzerland on 13 October 2014. Top row: alerts related to CombiPrecip accumulated rainfall. Central row: alerts related to COSMO-2 rainfall predictions. Bottom row: probabilistic alerts based on COSMO-LEPS. See Fig. 7 and the text for full information

10 October 2014 by evaluating COSMO-LEPS (Fig. 9). The COSMO-2 prediction presents a very high rainfall intensity exceeding the return period of 5 years in many areas of the target region. The location of the spots with highest intensity is about 30 km more south than the observed fields (radar QPE and CombiPrecip). Nevertheless, the information of COSMO-2 was available with some hours of advance and could have been used to anticipate the event and trigger mitigation measures. All flood and debris flow events occurring in Switzerland are documented in a database collecting information from newspapers and online sources (Hilker et al. 2009). Andres et al. (2015) evaluated the damages occurred in the target area on 13 October 2014 and provided the exact coordinates of the locations affected by

Flash Flood Forecasting Based on Rainfall Thresholds

21

damages. This information has been used to roughly evaluate the potential of the early warning tool (Fig. 10). In Fig. 10, the damage locations are plot within the field of seven flash flood indicators (panels (a) to (g)): • (a) Return period of 24-h accumulated CombiPrecip precipitation for the event day (13 October 2014) • (b) Return period of 24-h accumulated CombiPrecip precipitation for the day antecedent to the event • (c) Maximal return period of hourly rainfall intensity obtained from CombiPrecip during the event day • (d) Return period of 12-h accumulated COSMO-2 precipitation forecast for the run started at 00:00 of the event day • (e) Return period of 24-h accumulated COSMO-2 precipitation forecast for the run started at 00:00 of the event day • (f) Probability of exceeding a return period of 2 years for 24-h accumulated rainfall on the event day as obtained from the 16-member forecast of COSMOLEPS delivered 1 day ahead • (g) Probability of exceeding a return period of 2 years for 24-h accumulated rainfall on the event day as obtained from the 16-member forecast of COSMOLEPS delivered 2 days ahead The inspection of Fig. 10 indicates that both the rainfall observed 1 day ahead (Fig. 10b) and the maximal 1-hour intensity of CombiPrecip during the event (Fig. 10c) would have been poor indicators for locating the damage spots. The 24-h cumulated CombiPrecip rainfall (Fig. 10a) seems to be better correlated to the locations affected by damages. This might indicate that the events are not triggered by local high-intensity events. The cause for the event is most probably due to long-lasting rainfall, as also indicated by the early prediction component of the tool (Fig. 9). Figure 11d, e also indicates that the event is characterized by highintensity 24-h accumulated rainfall. The COSMO-2 evaluation (Fig. 10e) well identifies the regions that were finally affected by damages. The probabilistic early predictions with the tool based on COSMO-LEPS present higher probability of high 24-h rainfall on the event day in the case of the forecast delivered 2 days in advance (Fig. 10g), while the evaluation available 1 day ahead (Fig. 10f) shows lower probability of exceeding a 2-year return period for 24-h cumulated rainfall in the areas affected by damages according to the newspaper reports collected by Andres et al. (2015).

3.5

An Outlook to Future Flash Flood Early Warning in Switzerland

An initial assessment of the value of the early warning tool demonstrated the potential of precipitation-based indexes for flash flood prediction. The current

22

L. Alfieri et al.

situation concerning the tool implemented in southern Switzerland can be summarized in four points: • Real-time collection of forecasts is still ongoing since July 2013. The next step will be the verification of this first period of real-time operations. • The realization of a second prototype in northern Switzerland is planned. • Further efforts will focus on using probabilistic extrapolation of weather radar QPE (Mandapaka et al. 2012; Foresti and Seed 2014) for hydrological applications (Liechti et al. 2013a). • Finally, real-time hydrological information as obtained from the application of high-resolution distributed models should be used to estimate initial conditions and reduce the number of false alarms observed in this prototype application (Knechtl 2013). This will make use of concepts linked to the mapping of dominant runoff processes (Antonetti et al. 2015).

4

Flash Flood Detection in Catalonia

The flash flood forecasting system adopted in Catalonia (Corral et al. 2009; Alfieri et al. 2011; Versini et al. 2014) is based on rainfall estimates and forecasts generated from weather radar observations. This system, named FF-EWS, is designed as a tool for monitoring the hazard of intense rainfall situations in the context of a flash flood early warning system. Radar rainfall inputs depict the evolution of the rainfall field with a resolution (of the order of 1 km and 5–10 min) adapted to monitoring the precipitation phenomena that produce flash floods. Also, extrapolation of radar rainfall observations has been successfully used for forecasting the evolution of the rainfall field for a few hours. With these rainfall inputs, flash flood hazard assessment is based on the assumption that the rainfall accumulated upstream of a point of the drainage network (i.e., the basin-aggregated rainfall) can be used to characterize the flash flood hazard, especially for high return periods, when the pdf of discharges and the pdf of precipitation tend to have the same slope (Guillot and Duband 1967).

4.1

Probabilistic Rainfall Inputs

Quantitative Precipitation Estimates (QPEs) and Quantitative Precipitation Forecasts (QPFs) are produced from weather radar observations with the Integrated Tool for Hydrometeorological Forecasting (Corral et al. 2009). The use of probabilistic rainfall inputs allows us to characterize the uncertainty in QPE and QPF and assess its impact in the estimated hazard. Most of the existing techniques to generate radarbased probabilistic rainfall products are based on the ensemble approach (e.g., Bowler et al. 2006; Llort et al. 2008; Germann et al. 2009; Villarini et al. 2009; Berenguer et al. 2011; Panziera et al. 2011; Quintero et al. 2012): they produce a

Flash Flood Forecasting Based on Rainfall Thresholds

23

number of realistic rainfall scenarios (members) compatible with radar observations and that at the same time respect the spatial and temporal structure of the rainfall field. The latter is a crucial aspect to properly assess the impact of rainfall uncertainty on hazard assessment. Ensuring the quality of the QPE maps is fundamental to guarantee the good performance of the system. This requires processing radar observations with a chain of algorithms to reduce the effect of the sources of uncertainty affecting radar QPE (e.g., Corral et al. 2009; Villarini and Krajewski 2009). In the Integrated Tool for Hydrometeorological Forecasting (Corral et al. 2009), the production of QPE maps includes (1) mitigating the effects of the interception of the radar beam with the terrain (Delrieu et al. 1995), (2) eliminating non-meteorological echoes (Berenguer et al. 2005; Park and Berenguer 2015), (3) identifying precipitation types in volumetric radar data, (4) extrapolating elevated radar observations to the surface with a vertical profile of reflectivity that depends on the type of precipitation (as described by Franco et al. 2006, 2008), and (5) converting reflectivity into rain rate using a relationship also adapted to the type of precipitation. From instantaneous rainfall maps, rainfall accumulations are computed with a resolution of 1  1 km2 considering the motion of the precipitation systems and the evolution of rainfall intensities between consecutive radar scans. Radar QPE ensembles (EQPE) are obtained by perturbing the deterministic QPE with the method of Llort et al. (2008), which considers the space-time structure of errors affecting radar QPE. Each member of the EQPE is a possible realization of the unknown precipitation field given radar measurements. Deterministic rainfall nowcasting is based on Lagrangian extrapolation (i.e., advection of most recently observed rainfall map with the estimated motion field, neglecting the evolution of rainfall intensities), which has proven to generate useful rainfall forecasts for lead times up to a few hours (see, e.g., Germann et al. 2006; Berenguer et al. 2005, 2012). It is composed of two modules for: • Rainfall tracking: The algorithm implemented to estimate the motion field of precipitation is based on matching three rainfall maps within 24 min with a modified version of the COTREC algorithm (Li et al. 1995). • Extrapolation of rainfall observations: The last observed rainfall field is advected in time according to the motion field estimated with the mentioned tracking technique. The motion field is kept stationary in time along the series of generated forecasts. The uncertainty in rainfall nowcasting is characterized with the SBMcast technique (Berenguer et al. 2011). It generates an ensemble of realistic future rainfall scenarios that evolve from the most recent QPE field, assuming the String of Beads model (Pegram and Clothier 2001) to characterize the space-time variability of the rainfall field.

24

4.2

L. Alfieri et al.

Flash Flood Hazard Assessment

Hazard assessment uses estimated and forecasted 30-min rainfall accumulations. This accumulation period is thought to be relevant at point scale (e.g., for urban drainage or in sensitive points of the road network). For each point of the drainage network, the rainfall inputs available at a given time are used to compute the basin-aggregated rainfall accumulated over a duration corresponding to the concentration time of the catchment (the computations are made for durations between 0.5 and 24 h and for catchments between 4 and 2000 km2). Hazard assessment (expressed in terms of probability of occurrence or as return period) is based on comparing the computed rainfall accumulations with the values of the available intensity-duration-frequency (IDF) curves. In the case of basinaggregated rainfall, point IDF values are reduced with a scaling factor that depends on the area of the drained catchment. Hazard assessment is recalculated every time a new QPE map is available with a resolution of 1  1 km2 and for lead times between t + 0 and t + 3 h. Because of the simplicity of this flash flood hazard assessment approach the results sometimes do not reproduce what could be obtained with a system based on a complete rainfall-runoff simulation. The simplification of relating the probability of occurrence of rainfall with the probability of occurrence of discharges neglects some hydrological variables that have an important role in the catchment response (such as the initial moisture state of the catchment or the presence of accumulated snow). On the other hand, the main advantage of the implemented approach is that it does not use parameters that require calibration. This is an important advantage for an operational implementation over large domains, where the aim of the system is to detect flash flood events in small and medium catchments that are often ungauged.

4.3

Implementation in Catalonia (NE Spain) and Case Studies

The FF-EWS is used operationally for flash flood hazard assessment at the control center of the Catalan Water Agency (Barcelona, Spain) for monitoring the evolution of rainfall situations that might lead to flash floods in Catalonia (NE of Spain – see Fig. 11). In this region, the littoral and pre-littoral mountain ranges (approximately parallel to the coast) act as natural barriers to the warm and humid air from the Mediterranean Sea favoring the genesis of convective processes that lead to intense rains. Yearly rainfall accumulations range from 400 to 1200 mm, but some individual events contribute significantly to the yearly totals (the 10-year return period daily accumulation exceeds 100 mm, and almost every year accumulations over 200 mm in 24 h are recorded somewhere in the Spanish Mediterranean coast). The response times of the mountainous catchments are rather short due to the steep slopes of the streams and the urbanization of the flood plains especially near the coast (where some ephemeral streams have become streets, which increases the risk of flash floods).

Flash Flood Forecasting Based on Rainfall Thresholds

25

4750

4700

a Nogue ra

Ter

4600 Foix

Gaià

Francolí

at eg br

Ebr e

Be

sòs

Se

gr e

a er rd To

Nogue ra

uga

Pallare s

orçana

km

Ribag

0

Llo

y UTM [km]

4650

La M

Segre 12

Barcelona

4550 rain gauge stream gauge Ebre

C-band radar

4500 300

350

400 x UTM [km]

450

500

Fig. 11 Network of hydrometeorological sensors available in Catalonia: The circles indicate the location of the 4 C-band radars of XRAD and the C-band radar of the Spanish Agency of Meteorology (AEMET); the gray triangles and white diamonds show the location of the rain gauges and stream-level sensors, respectively

These factors combined with heavy rainfall events are the ingredients that lead to flash floods in this region. Operationally, the FF-EWS uses the observations of the rain gauges and streamlevel sensors of the Automatic Hydrological Information System and of the four C-band radars of the XRAD (the radar network of the Meteorological Service of Catalonia – see Fig. 11). These are processed with the Integrated Tool for Hydrometeorological Forecasting to generate the radar-based QPE and QPF ensembles. The IDF curves applied to estimate of the return period of the measured rainfall are those used for river planning and in flooding studies at the Catalan Water Agency (ACA 2003).

26

L. Alfieri et al.

250

a

Radar vs. raingauges (13 Sep 2006)

R [mm]

200

150

100

50

0 0

50

100 150 G [mm]

200

250

b Radar QPE accumulation (13 Sep 2006) c Hazard assessment (12-14 Sep 2006)

Girona Lleida Barcelona Tarragona

1

5 10 15

25 35 50 75 100 accumulation [mm]

200

2

5

10

25

50 100 200 500 T [years]

Fig. 12 Results obtained with the radar-based hazard assessment system during the event of 12–14 September 2006. (a) Scatterplot of radar QPE (R) versus rain gauge measurements (G) and (b) radar QPE accumulation for 13 September 2006 (the circles indicate rain gauge accumulations); (c) hazard assessment for the entire event (the red dashed ellipses indicate the areas where floods were reported)

4.3.1 Case Study: 12–14 September 2006 This case was a typical autumn event during which several mesoscale convective systems crossed Catalonia from southeast to northwest. The maximum rain gauge

Flash Flood Forecasting Based on Rainfall Thresholds

27

accumulations were reported in Constantí (near Tarragona) with 267 mm and in the area of the Gulf of Roses (north of Girona) with 256 mm, 216 mm in 24 h (Fig. 12a). This event caused significant material losses (intense flooding in urban areas in the regions of Tarragona and Barcelona and failure of the road and railway networks due to flooding and landslides) and one casualty. The intense rainfall produced flash floods in ephemeral torrents near the coast and in some sub-basins of the main rivers as, for example, in the lower part of the rivers Llobregat (5000 km2) and La Muga (850 km2). The estimated rainfall accumulation map for 13 September 2006 is shown in Fig. 12b. The comparison between radar QPE and the available rain gauge records shows a reasonable agreement, except for the largest accumulations (probably more affected by attenuation of the radar signal due to intense rain). Figure 12c shows the summary of the maximum hazard level estimated at each point of the drainage network throughout the event and how the system was able to successfully detect the importance of the event and identify the areas most affected by intense rainfall and flash floods (marked with red dashed circles in Fig. 12c; see also Fig. 11 of Barnolas et al. 2008). Figure 13 shows the variety of hazard assessment products generated by the system every time a new radar QPE map is available: besides the hazard assessment based on radar observations (Fig. 13a), the system forecasts the expected evolution of the hazard level based on radar QPF (the hazard level forecasted on 13 September 2006 at 15:00 UTC for a lead time of 2 h is shown in Fig. 13b). Additionally, the system estimates the uncertainty in the forecasted hazard level in terms of probability to exceed a given return period (Fig. 13c, d show, respectively, the probability to exceed a return period of 2 and 5 years for the case shown in Fig. 13b). As presented in Sect. 4.1, this product describes how the uncertainty in rainfall QPE and QPF affects the hazard level estimated with the FF-EWS system.

4.3.2 Case Study: 02 November 2008 On 02 November 2008 in the early morning, a convective system coming from the Mediterranean moved toward the interior of Tarragona. The intense rain and strong winds caused significant damages in the coastal area. Some flash floods occurred in the streams near the coastal city of Salou (the closest rain gauge recorded 50 mm between 01:00 and 03:00 UTC). Also, the Segre river flooded some areas in the region of Lleida. The maximum rain gauge accumulation exceeded 130 mm in the village of Prades, with over 90 mm recorded on 02 November 2008 between 02:00 and 05:00 UTC. In the area of Girona, the rainfall was more sustained along the day, and maximum accumulations reached up to 125 mm, resulting in numerous calls to the fire brigades for flooding in cities like Girona or Olot. The radar QPE accumulation shows a good correspondence with collocated rain gauge measurements (Fig. 14a, b). Figure 14c shows that the system identified significant hazard in the main spots where floods occurred. In the zone of Tarragona, the area affected by large rainfall accumulations is a 15-km wide band along which an intense convective cell developed and propagated. The system was able to identify significant hazard level in the torrents around Salou, where the most

28

L. Alfieri et al.

a 13 Sep 2006 17:00 (leadtime: 0.0h)

b 13 Sep 2006 17:00 (leadtime: 2.0h)

500

100 50

T [years]

200

25 10 5 2

c 13/09/2006 17:00 (leadtime: 2.0h)

0

10 20 30 40 50 60 70 80 90 100 probability T>2years [%]

d 13/09/2006 17:00:00 (leadtime: 2.0h)

0

10 20 30 40 50 60 70 80 90 100 probability T>5years [%]

Fig. 13 Hazard assessment for 13 September 2006 at 17:00 UTC: (a) obtained radar rainfall observations; (b) obtained for a lead time of 2 h. (c) and (d) Probability of exceeding a return period of 2 and 5 years, respectively, as estimated from probabilistic rainfall forecasts with a lead time of 2h

damaging flash floods occurred (see Figs. 14c and 15a). Figure 15b–e show the evolution of the hazard map forecasted for 06:00 UTC. The first signal of possible hazard in this area was detected 1.5 h ahead (i.e., with the forecast generated at 04:30). Thirty minutes later, the hazard map forecasted with a lead time of 1 h is almost identical to what was finally diagnosed from radar observations (Fig. 15a). For this case, the European Flood Awareness System’s flash flood component (EPIC; see Sect. 2), forecasted the possibility of a significant event in the coastal streams of Tarragona almost 3 days in advance (based on the NWP forecasts produced on 31 September 2008 at 12:00 UTC). However, the differences in the exact location where the different members of the NWP Ensemble Prediction System

Flash Flood Forecasting Based on Rainfall Thresholds

150

a

29

Radar vs. raingauges (02 Nov 2008)

R [mm]

100

50

0 0

50

100

150

G [mm]

b Radar QPE accumulation (02 Nov 2008) c Hazard assessment (02 Nov 2008)

Olot Girona Lleida Barcelona Prades Tarragona Salou

1

5 10 15

25 35 50 75 100 accumulation [mm]

200

2

5

10

25

50 100 200 500 T [years]

Fig. 14 Same as Fig. 13, but for the event of 02 November 2008

used in EPIC forecasted the intense rainfall event resulted in a wide region with some (low) probability of flash flood occurrence (Alfieri et al. 2011). This case illustrates the complementarity of a system like EPIC (based on NWP forecasts) and the radar-based flash flood hazard system (with a time horizon of a few hours): once a system like EPIC has determined the possibility of flash floods over a certain area a few days ahead, radar-based QPE and QPF allow us to increase the resolution of hazard assessment in the context of flash flood monitoring.

30

L. Alfieri et al.

a 02 Nov 2008 06:00 (leadtime: 0h)

A

A (Zoom) b leadtime: 2.0h

A (Zoom)

c leadtime: 1.5h

A (Zoom)

d leadtime: 1.0h

A (Zoom)

e leadtime: 0.5h

A (Zoom)

Fig. 15 (a) Hazard estimated on 02 November 2008 at 06:00 UTC. (b–e) Evolution of the hazard assessment product for lead times of 2.0, 1.5, 1.0, and 0.5 h

4.3.3 Case Study: 17–19 June 2013 This event caused significant losses all throughout the central Pyrenees (both in the southern and the northern sides). In Catalonia, the northwestern counties were the most affected, with serious flooding in the head of the Garona and Noguera Pallaresa catchments, in the Pyrenees (with mountains above 3000 m amsl). Many of the mountain torrents and the rivers Garona and Noguera Pallaresa overflowed and flooded several villages producing important damages: some roads collapsed, the bridges of Salardú, Arties, and Llavorsí collapsed, several houses were destroyed by the waters in Arties and Bossòst, and 400 people were evacuated (see http://www. catalannewsagency.com/society-science/item/severe-floods-in-the-north-western-cata lan-pyrenees). In the county of Val d’Aran, the total damages were estimated in 100 M €. Daily rainfall accumulations reached 115 mm in Vielha (the capital of this county), with maximum rainfall intensities of 12 mm in 30 min. This event is presented to illustrate two of the limitations of the presented hazard assessment system: 1. The location of the affected area falls beyond the limits of what can be considered the maximum range for radar Quantitative Precipitation Estimation (as can be seen in Figs. 11 and 16, the closest radar is located over 100 km away) and is a very mountainous area affected by visibility problems from the radar perspective

Flash Flood Forecasting Based on Rainfall Thresholds

a Radar QPE accum. (17-19 Jun 2013)

31

c Radar-raingauges blending Vielha

Vielha Llavorsí

Llavorsí

200

B

50 35 25

accumulation [mm]

100 75

10 5 1

120

b Radar vs. raingauges (17-19 Jun 2013)

d Hazard assess. (17-19 Jun 2013)

100 80 200

60

100 50

40

T [years]

R [mm]

500

25 10

20 0 0

B (Zoom) 20

40

60 G [mm]

80

100

5 2

120

Fig. 16 Results obtained with the radar-based hazard assessment system during the event of 17–19 June 2013. (a) Radar QPE accumulation between 17 June 2013 at 20:00 UTC and 19 June 2013 at 24:00 UTC; the red dashed ellipse indicates the location where radar QPE significantly underestimated precipitation and where floods occurred (the black dots correspond to the villages mentioned in the text) (from west to east, Salardú, Vielha, Arties, Bossòst and Llavorsí). (b) Radarrain gauge blending accumulation. (c) Scatterplot of radar QPE versus rain gauge measurements (the blue diamonds correspond to the locations where radar QPE significantly underestimated rainfall accumulations). (d) Hazard assessment for the entire event based on radar-rain gauge blending in the domain indicated in panel (b)

due to beam blockage and beam overshooting (e.g., Pellarin et al. 2002). Consequently, the quality of radar QPE in these areas is in general rather poor and underestimates rain gauge accumulations (see Fig. 16a, b). 2. Although this was quite a significant rainfall event, the severe response of the basin was strongly influenced by the accelerated melting of a good part of the snow accumulated in the mountains at the end of spring (after the event, the snow depth recorded in stations located above 2200 m showed a reduction of up to 700 mm).

32

L. Alfieri et al.

These factors resulted in serious underestimation of the hazard level in this area using radar-based QPE (not shown here). To improve the quality of radar QPE in such mountainous areas, several authors have proposed to use small X-band radars to fill the gaps of regional radar networks (e.g., Beck and Bousquet 2013; Campbell and Steenburgh 2014). When these small radars are not available (as in the case of the Catalan Pyrenees), the estimated QPE field can be improved by blending radar QPE maps with rain gauge observations (e.g., Velasco-Forero et al. 2009; Schiemann et al. 2011; Sideris et al. 2014 and references therein). These are constrained by rain gauge observations and reproduce the structure of the rainfall field as depicted by radar. The blended radar-rain gauge QPE (obtained with the technique of Velasco-Forero et al. 2009) has been implemented to recalculate the hazard level along the event. Figure 16c, d shows, respectively, the resulting total rainfall accumulation and the summary of the maximum hazard level estimated along the event. Now, the areas most affected by floods show significant flood hazard level (above a return period of 5 years in some parts of the drainage network). However, the damages indicate that the real magnitude of the event was significantly higher (the return period was most likely above 25 years), with an important role of snow melting (not considered in the current version of the hazard assessment system).

4.4

An Outlook to Future Flash Flood Early Warning in Catalonia

The hazard assessment system presented here uses radar-based probabilistic QPE and QPF inputs. The system is used in real time in the control center of the Catalan Water Agency for monitoring the evolution of rainfall events that might lead to flash floods. Some of its outputs are also disseminated to the general public through the website “Water in Real Time” (AETR – http://aca-web.gencat.cat/aetr/aetr2/, last accessed on 17 July 2016). The main advantage of the system is its high resolution (1 km and 6 min), which enables very precise determination of the areas at risk at the expense of shorter lead times (in Catalonia, up to 3 h). Consequently, extending this time horizon with NWP models with lead times beyond 1 day is necessary to enable earlier preparedness and trigger effective emergency and response plans. The best practice is to use the radarbased FF-EWS in the context of monitoring potentially hazardous rainfall situations as detected with systems based on NWP (as suggested by Alfieri et al. 2011; Versini et al. 2014). Current work in progress focuses on extending the coverage of the system for flash flood hazard assessment in the context of the EC Civil Protection project European Demonstration of a Rainfall- and Lightning-Induced Hazard Identification Nowcasting Tool (EDHIT, www.edhit.eu, last accessed on 17 July 2016). This project is implementing a similar system at European scale based on the European radar mosaics generated by the EUMETNET project OPERA (Huuskonen et al. 2014). The goal of EDHIT is to demonstrate the potential of the European radar mosaic for rainfall-induced hazard assessment in the context of Civil

Flash Flood Forecasting Based on Rainfall Thresholds

33

Protection. A prototype of the system has been operating since mid-2012, and the first analyses show promising results. Figure 17 shows an example of the regional hazard assessment based on radar observations during the event of May 2012 in central Europe, compared with the results obtained with an operational numerical weather prediction system. In parallel, current research focuses on how to account for soil moisture conditions in hazard assessment: the chosen approach is based on adapting the rainfall thresholds used for hazard assessment to the soil moisture conditions depicted with the rainfall-runoff model LISFLOOD (van der Knijff et al. 2010), applied operationally at European scale in the context of EFAS.

a

0.1

0.6 1 2 3 4 5 6 8 10 13 16 20 25 35 50 accumulation [mm/h]

b

1

2 hazard level

3

1.0

correlation [ ]

0.8 0.6

0.4 0.2 0.0 0

12

24 leadtime [h]

36

48

Fig. 17 (a) Hazard level overlaid on hourly rainfall accumulations for 01 June 2013 at 17:00 obtained from radar observations (left) and from rainfall forecasts with a lead time of 3 h (right). (b) Correlation between hourly accumulations estimated from the OPERA radar mosaic and rainfall forecasts obtained with the HIRLAM run corresponding to 01 June 2013 at 00:00 UTC (dashed blue line) and with the analyzed nowcasting system run on 01 June 2013 at 00:00, 08:00, 16:00 UTC and on 02 June 2013 at 00:00, 08:00 UTC (red lines)

34

5

L. Alfieri et al.

Conclusions

Three early warning systems for flash flood early warning have been presented in this chapter. The systems show a number of similarities though making different uses of probabilistic information. EPIC and the Swiss tool use probabilistic forecasts of COSMO-LEPS and consistent hindcast datasets for obtaining spatially distributed information on possible upcoming floods. The Catalan FF-EWS system makes full use of ensemble weather radar QPE and QPF, while the Swiss tool uses an advanced weather radar product combined to high-resolution numerical weather predictions for the nowcasting of flash floods. A clear challenge in flash flood forecasting based on NWP is the accurate prediction of the location and timing of extreme storms, given their features of small-scale, short, and intense events. Probabilistic approaches and ensemble forecasting can help us address this issue, so that if some chance of extreme events is predicted, regional systems are triggered to monitor the evolution of the event as it approaches and develops, making use of more detailed information of regional monitoring networks and of the expertise of local flood forecasters. Interestingly, ongoing developments of all the three systems are aimed to integrating initial soil moisture conditions and to better characterize the timing and magnitude of the flood events. All three systems present and target applications in areas extending across different countries. Problems of data homogeneity are often not easy to address, particularly when merging data from different national and regional networks and in the blending of point observations with gridded output from numerical models. EPIC is not affected by such issue as it is based on modeled precipitation from a single data product. On the other hand, both the Swiss tool and the large-scale application of the Catalan approach might suffer from using transnational data. Hence, the implementation of these latter applications in different areas requires a careful and tailored design based on the local data availability.

References ACA, Recomanacions tècniques per als estudis d’inundabilitat d’àmbit local (Agència Catalana de l’Aigua, Barcelona, 2003), p. 106 N. Addor, S. Jaun, F. Fundel, M. Zappa, An operational hydrological ensemble prediction system for the city of Zurich (Switzerland): skill, case studies and scenarios. Hydrol. Earth Syst. Sci. 15, 2327–2347 (2011). doi:10.5194/hess-15-2327-2011 J.R.A. Aldrich, Fisher and the making of maximum likelihood 1912–1922. Stat. Sci. 12(3), 162–176 (1997) L. Alfieri, J. Thielen, A European precipitation index for extreme rain-storm and flash flood early warning. Meteorol. Appl. 22(1), 3–13 (2015). doi:10.1002/met.1328 L. Alfieri, D. Velasco, J. Thielen, Flash flood detection through a multi-stage probabilistic warning system for heavy precipitation events. Adv. Geosci. 29, 69–75 (2011). doi:10.5194/adgeo-2969-2011

Flash Flood Forecasting Based on Rainfall Thresholds

35

L. Alfieri, J. Thielen, F. Pappenberger, Ensemble hydro-meteorological simulation for flash flood early detection in southern Switzerland. J. Hydrol. 424–425, 143–153 (2012). doi:10.1016/j. jhydrol.2011.12.038 L. Alfieri, F. Pappenberger, F. Wetterhall, The extreme runoff index for flood early warning in Europe. Nat. Hazards Earth Syst. Sci. 14(6), 1505–1515 (2014). doi:10.5194/nhess-14-15052014 F. Ament, T. Weusthoff, M. Arpagaus, Evaluation of MAP D-PHASE heavy precipitation alerts in Switzerland during summer 2007. Atmos. Res. 100(2–3), 178–189 (2011) N. Andres, A. Badoux, C. Hegg, Unwetterschäden in der Schweiz im Jahre 2014. Rutschungen, Murgänge, Hochwasser und Sturzereignisse. Wasser Energie Luft 107(1), 47–54 (2015) M. Antonetti, R. Buss, S. Scherrer, M. Margreth, M. Zappa, Mapping dominant runoff processes: an evaluation of different approaches using similarity measures and synthetic runoff simulations, Hydrol. Earth Syst. Sci. Discuss. 12, 13257–13299 (2015). doi:10.5194/hessd-12-13257-2015 M. Barnolas, A. Atencia, M.C. Llasat, T. Rigo, Characterization of a Mediterranean flash flood event using rain gauges, radar, GIS and lightning data. Adv. Geosci. 17, 35–41 (2008). doi:10.5194/adgeo-17-35-2008 J.C. Bartholmes, J. Thielen, M.H. Ramos, S. Gentilini, The European flood alert system EFAS – part 2, statistical skill assessment of probabilistic and deterministic operational forecasts. Hydrol. Earth Syst. Sci. 13(2), 141–153 (2009) J. Beck, O. Bousquet, Using gap-filling radars in mountainous regions to complement a national radar network: improvements in multiple-doppler wind syntheses. J. Appl. Meteorol. Climatol. 52, 1836–1850 (2013). doi:10.1175/JAMC-D-12-0187.1 M. Berenguer, C. Corral, R. Sanchez-Diezma, D. Sempere-Torres, Hydrological validation of a radar-based nowcasting technique. J. Hydrometeorol. 6, 532–549 (2005). doi:10.1175/ JHM433.1 M. Berenguer, D. Sempere-Torres, G.G.S. Pegram, SBMcast – an ensemble nowcasting technique to assess the uncertainty in rainfall forecasts by Lagrangian extrapolation. J. Hydrol. 404, 226–240 (2011). doi:10.1016/j.jhydrol.2011.04.033 M. Berenguer, M. Surcel, I. Zawadzki, M. Xue, F. Kong, The diurnal cycle of precipitation from continental radar mosaics and numerical weather prediction models, part II: intercomparison among numerical models and with nowcasting. Mon. Weather Rev. 140, 2689–2705 (2012). doi:10.1175/mwr-d-11-00181.1 A. Binley, K. Beven, Three dimensional modelling of hillslope hydrology. Hydrol. Process. 6(3), 253–368 (1992) N.E. Bowler, C.E. Pierce, A.W. Seed, STEPS: a probabilistic precipitation forecasting scheme which merges an extrapolation nowcast with downscaled NWP. Q. J. Roy. Meteorol. Soc. 132, 2127–2155 (2006). doi:10.1256/qj.04.100 L.S. Campbell, W.J. Steenburgh, Finescale orographic precipitation variability and gap-filling radar potential in little Cottonwood Canyon, Utah. Weather Forecast. 29, 912–935 (2014). doi:10.1175/WAF-D-13-00129.1 V.T. Chow, D.R. Maidment, L.W. Mays, Applied Hydrology. (McGraw-Hill Science/Engineering/ Math., McGraw-Hill: New York. ISBN 0-07-010810-2, 1988) S. Coles, An Introduction to Statistical Modeling of Extreme Values, vol. 208 (Springer, London, 2001) C.G. Collier, Flash flood forecasting: what are the limits of predictability? Q. J. Roy. Meteorol. Soc. 133(622), 3–23 (2007) C. Corral, D. Velasco, D. Forcadell, D. Sempere-Torres, Advances in radar-based flood warning systems. The EHIMI system and the experience in the Besòs flash-flood pilot basin, in Flood Risk Management: Research and Practice, ed. by P. Samuels, S. Huntington, W. Allsop, J. Harrop (Taylor & Francis, London, 2009), pp. 1295–1303 D.P. Dee, S.M. Uppala, A.J. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U. Andrae, M.A. Balmaseda, G. Balsamo, P. Bauer, P. Bechtold, A.C.M. Beljaars, L. van de Berg, J. Bidlot, N. Bormann, C. Delsol, R. Dragani, M. Fuentes, A.J. Geer, L. Haimberger,

36

L. Alfieri et al.

S.B. Healy, H. Hersbach, E.V. Hólm, L. Isaksen, P. Kållberg, M. Köhler, M. Matricardi, A.P. McNally, B.M. Monge‐Sanz, J.-J. Morcrette, B.-K. Park, C. Peubey, P. de Rosnay, C. Tavolato, J.-N. Thépaut, F. Vitart, The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Q. J. Roy. Meteor. Soc. 137(656), 553–597 (2011). doi:10.1002/qj.828 G. Delrieu, J.D. Creutin, H. Andrieu, Simulation of radar mountain returns using a digitized terrain model. J. Atmos. Oceanic Tech. 12, 1038–1049 (1995). doi:10.1175/1520-0426(1995) 0122.0.CO;2 M. Fiorentino, F. Rossi, P. Villani, Effect of the basin geomorphoclimatic characteristics on the mean annual flood reduction curve. Proc. IASTED Int. Conf. Model. Simul. 5, 1777–1784 (1987) L. Foresti, A. Seed, The effect of flow and orography on the spatial distribution of the very shortterm predictability of rainfall from composite radar images. Hydrol. Earth Syst. Sci. 18, 4671–4686 (2014). doi:10.5194/hess-18-4671-2014 M. Franco, R. Sánchez-Diezma, D. Sempere-Torres, Improvements in weather radar rain rate estimates using a method for identifying the vertical profile of reflectivity from volume radar scans. Meteorol. Z. 15, 521–536 (2006). doi:10.1127/0941-2948/2006/0154 M. Franco, R. Sánchez-Diezma, D. Sempere-Torres, I. Zawadzki, Improving radar precipitation estimates by applying a VPR correction method based on separating precipitation types, 5th European Conference on Radar in Meteorology and Hydrology (Helsinki, 2008), P14.16 J. French, R. Ing, S. Von Allmen, R. Wood, Mortality from flash floods: a review of national weather service reports, 1969–81. Public Health Rep. 98(6), 584 (1983) F. Fundel, M. Zappa, Hydrological ensemble forecasting in mesoscale catchments: sensitivity to initial conditions and value of reforecasts. Water Resour. Res. 47, W09520 (2011). doi:10.1029/ 2010WR009996 F. Fundel, A. Walser, M.A. Liniger, C. Appenzeller, Calibrated precipitation forecasts for a limitedarea ensemble forecast system using reforecasts. Mon. Weather Rev. 138(1), 176–189 (2010) E. Gaume, V. Bain, P. Bernardara, O. Newinger, M. Barbuc, A. Bateman, L. Blaškovičová, G. Blöschl, M. Borga, A. Dumitrescu, I. Daliakopoulos, J. Garcia, A. Irimescu, S. Kohnova, A. Koutroulis, L. Marchi, S. Matreata, V. Medina, E. Preciso, D. Sempere-Torres, G. Stancalie, J. Szolgay, I. Tsanis, D. Velasco, A. Viglione, A compilation of data on European flash floods. J. Hydrol. 367(1–2), 70–78 (2009) K.P. Georgakakos, Analytical results for operational flash flood guidance. J. Hydrol. 317(1), 81–103 (2006) U. Germann, I. Zawadzki, B. Turner, Predictability of precipitation from continental radar images, part IV: limits to prediction. J. Atmos. Sci. 63, 2092–2108 (2006). doi:10.1175/JAS3735.1 U. Germann, M. Berenguer, D. Sempere-Torres, M. Zappa, REAL-Ensemble radar precipitation estimation for hydrology in a mountainous region. Q. J. Roy. Meteorol. Soc. 135, 445–456 (2009). doi:10.1002/qj.375 P. Guillot, D. Duband, La méthode du Gradex pour le calcul de la probabilité des crues à partir les pluies. AISH Publ. 84, 560–569 (1967) A. G€unther, M. Van Den Eeckhaut, J.P. Malet, P. Reichenbach, J. Hervás, The European landslide susceptibility map ELSUS 1000 Version 1, EGU General Assembly Conference Abstracts, 15, 10071 (2013), http://adsabs.harvard.edu/abs/2013EGUGA..1510071G. Last accessed 19 Aug 2015 B. Heil, I. Petzold, H. Romang, J. Hess, The common information platform for natural hazards in Switzerland. Nat. Hazards 70(3), 1673–1687 (2014) N. Hilker, A. Badoux, C. Hegg, The Swiss flood and landslide damage database 1972–2007. Nat. Hazards Earth Syst. Sci. 9, 913–925 (2009) J.R.M. Hosking, L-moments: analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B Methodol. 52(1), 105–124 (1990) A. Huuskonen, E. Saltikoff, I. Holleman, The operational weather radar network in Europe. Bull. Am. Meteorol. Soc. 95, 897–907 (2014). doi:10.1175/BAMS-D-12-00216.1

Flash Flood Forecasting Based on Rainfall Thresholds

37

P. Javelle, C. Fouchier, P. Arnoud, J. Lavabre, Flash flood warning at un-gauged locations using radar rainfall and antecedent soil moisture estimations. J. Hydrol. 394, 267–274 (2010) A.F. Jenkinson, The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Q. J. Roy. Meteorol. Soc. 81(348), 158–171 (1955) S.N. Jonkman, Global perspectives on loss of human life caused by floods. Nat. Hazards 34(2), 151–175 (2005) S. Jörg-Hess, S. B. Kempf, F. Fundel, M. Zappa, The benefit of climatological and calibrated reforecast data for simulating hydrological droughts in Switzerland, Meteorological Applications, 22(3), 444–458 (2015) doi:10.1002/met.1474 M.G. Kendall, Rank Correlation Methods (Griffin, London, 1970). ISBN 0-85264-199-0 V. Knechtl, Flash-flood early warning tool. Use of intensity-duration-frequency curves for flashflood warning in southern Switzerland and forecast skill evaluation, Master Thesis, ETH Z€ urich, 2013 D. Koutsoyiannis, D. Kozonis, A. Manetas, A mathematical framework for studying rainfall intensity-duration-frequency relationships. J. Hydrol. 206, 118–135 (1998) M.R. Leadbetter, G. Lindgren, H. Rootzén, Extremes and Related Properties of Random Sequences and Processes (Springer, New York a.o., XII, 336 pp, 1983) L. Li, W. Schmid, J. Joss, Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteorol. 34, 1286–1300 (1995). doi:10.1175/1520-0450(1995) 0342.0.CO;2 K. Liechti, L. Panziera, U. Germann, M. Zappa, The potential of radar-based ensemble forecasts for flash-flood early warning in the southern Swiss Alps. Hydrol. Earth Syst. Sci. 17, 3853–3869 (2013a). doi:10.5194/hess-17-3853-2013 K. Liechti, M. Zappa, F. Fundel, U. Germann, Probabilistic evaluation of ensemble discharge nowcasts in two nested Alpine basins prone to flash floods. Hydrol. Process. 27, 5–17 (2013b). doi:10.1002/hyp.9458 X. Llort, C.A. Velasco-Forero, J. Roca-Sancho, D. Sempere-Torres, Characterization of uncertainty in radar-based precipitation estimates and ensemble generation, 5th European Conference on Radar in Meteorology and Hydrology (Helsinki, 2008) P.V. Mandapaka, U. Germann, L. Panziera, A. Hering, Can Lagrangian extrapolation of radar fields be used for precipitation nowcasting over complex Alpine orography? Weather Forecast. 27, 28–49 (2012). doi:10.1175/WAF-D-11-00050.1 C. Marsigli, F. Boccanera, A. Montani, T. Paccagnella, The COSMO-LEPS mesoscale ensemble system: validation of the methodology and verification. Nonlinear Processes Geophys. 12(4), 527–536 (2005) E.S. Martins, J.R. Stedinger, Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resour. Res. 36(3), 737–744 (2000) Meteoschweiz, Documentation of MeteoSwiss Grid-Data Products – Daily Precipitation (final analysis): RhiresD (2013), http://www.meteoschweiz.admin.ch/content/dam/meteoswiss/de/ser vice-und-publikationen/produkt/raeumliche-daten-niederschlag/doc/ProdDoc_RhiresD.pdf. Last accessed 13 July 2015 D. Norbiato, M. Borga, S. Esposti, E. Gaume, S. Anquetin, Flash flood warning based on rainfall thresholds and soil moisture conditions: an assessment for gauged and ungauged basins. J. Hydrol. 362, 274–290 (2008) L. Panziera, U. Germann, M. Gabella, P.V. Mandapaka, NORA–Nowcasting of Orographic Rainfall by means of Analogues. Q. J. Roy. Meteorol. Soc. 137, 2106–2123 (2011). doi:10.1002/qj.878 S. Park, M. Berenguer, Adaptive reconstruction of radar reflectivity in clutter-contaminated areas by accounting for the space–time variability. J. Hydrol. 520, 407–419 (2015). doi:10.1016/j. jhydrol.2014.11.013 G.G.S. Pegram, A.N. Clothier, High-resolution space-time modelling of rainfall: the “String of Beads” model. J. Hydrol. 241, 26–41 (2001). doi:10.1016/S0022-1694(00)00373-5 T. Pellarin, G. Delrieu, G.M. Saulnier, H. Andrieu, B. Vignal, J.D. Creutin, Hydrologic visibility of weather radar systems operating in mountainous regions: case study for the Ardeche catchment

38

L. Alfieri et al.

(France). J. Hydrometeorol. 3, 539–555 (2002). doi:10.1175/1525-7541(2002)0032.0.co;2 F. Quintero, D. Sempere-Torres, M. Berenguer, E. Baltas, A scenario-incorporating analysis of the propagation of uncertainty to flash flood simulations. J. Hydrol. 460–461, 90–102 (2012). doi:10.1016/j.jhydrol.2012.06.045 D. Raynaud, J. Thielen, P. Salamon, P. Burek, S. Anquetin, L. Alfieri, A dynamic runoff co-efficient to improve flash flood early warning in Europe: evaluation on the 2013 central European floods in Germany. Meteorol. Appl. 22(3), 410–418 (2015). doi:10.1002/met.1469 S. Reed, J. Schaake, Z. Zhang, A distributed hydrologic model and threshold frequency-based method for flood forecasting at ungauged locations. J. Hydrol. 337, 402–420 (2007) R. Schiemann, M. Liniger, C. Frei, Reduced space optimal interpolation of daily rain gauge precipitation in Switzerland. J. Geophys. Res. 115, D14109 (2010). doi:10.1029/2009JD013047 R. Schiemann, R. Erdin, M. Willi, C. Frei, M. Berenguer, D. Sempere-Torres, Geostatistical radarraingauge combination with nonparametric correlograms: methodological considerations and application in Switzerland. Hydrol. Earth Syst. Sci. 15, 1515–1536 (2011). doi:10.5194/hess15-1515-2011 I.V. Sideris, M. Gabella, R. Erdin, U. Germann, Real-time radar-rain-gauge merging using spatiotemporal co-kriging with external drift in the alpine terrain of Switzerland. Q. J. Roy. Meteorol. Soc. 140, 1097–1111 (2014). doi:10.1002/qj.2188 R.L. Smith, Maximum likelihood estimation in a class of non-regular cases. Biometrika 72(1), 67–90 (1985) J. Thielen, J. Bartholmes, M.H. Ramos, A. de Roo, The European flood alert system – part 1: concept and development. Hydrol. Earth Syst. Sci. 13(2), 125–140 (2009) J.M. van der Knijff, J. Younis, A. de Roo, A GIS-based distributed model for river basin scale water balance and flood simulation. Int. J. Geogr. Inf. Sci. 24, 189–212 (2010) C.A. Velasco-Forero, D. Sempere-Torres, E.F. Cassiraga, J.J. Gómez-Hernández, A non-parametric automatic blending methodology to estimate rainfall fields from rain gauge and radar data. Adv. Water Resour. 32, 986–1002 (2009) P. Versini, M. Berenguer, C. Corral, D. Sempere-Torres, An operational flood warning system for poorly gauged basins: demonstration in the Guadalhorce basin (Spain). Nat. Hazards 71, 1355–1378 (2014). doi:10.1007/s11069-013-0949-7 A. Viglione, G. Blöschl, On the role of storm duration in the mapping of rainfall to flood return periods. Hydrol. Earth Syst. Sci. 13(2), 205–216 (2009). doi:10.5194/hess-13-205-2009 G. Villarini, W. Krajewski, Review of the different sources of uncertainty in single polarization radarbased estimates of rainfall. Surv. Geophys. 31, 107–129 (2009). doi:10.1007/s10712-009-9079-x G. Villarini, W. Krajewski, G. Ciach, D. Zimmerman, Product-error-driven generator of probable rainfall conditioned on WSR-88D precipitation estimates. Water Resour. Res. 45, W01404 (2009). doi:10.1029/2008wr006946 D. Viviroli, M. Zappa, J. Gurtz, R. Weingartner, An introduction to the hydrological modelling system PREVAH and its pre- and post-processing-tools. Environ. Model. Software 24(10), 1209–1222 (2009) A. Wald, J. Wolfowitz, On a test whether two samples are from the same population. Ann. Math. Stat. 11, 147–162 (1940) M. Zappa, M. Rotach, M. Arpagaus, M. Dorninger, C. Hegg, A. Montani, R. Ranzi, F. Ament, U. Germann, G. Grossi, S. Jaun, A. Rossa, S. Vogt, A. Walser, J. Wehrhan, C. Wunram, MAP D-PHASE: real-time demonstration of hydrological ensemble prediction systems. Atmos. Sci. Lett. 9(2), 80–87 (2008) M. Zappa, S. Jaun, U. Germann, A. Walser, F. Fundel, Superposition of three sources of uncertainties in operational flood forecasting chains. Atmos. Res. 100(2–3), 246–262 (2011). doi:10.1016/j.atmosres.2010.12.005. Thematic Issue on COST731 M. Zappa, F. Fundel, S. Jaun, A “Peak-Flow Box” approach for supporting interpretation and evaluation of operational ensemble flood forecasts. Hydrol. Process. 27, 117–131 (2013). doi:10.1002/hyp.9521

Medium Range Flood Forecasting Example EFAS Jutta Thielen-del Pozo, Peter Salamon, Peter Burek, Florian Pappenberger, C. Alionte Eklund, E. Sprokkereef, M. Hazlinger, M. Padilla Garcia, and R. Garcia-Sanchez

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 EFAS – A Novel Concept for Improving Preparedness for Flooding in Europe . . . . . . . . . . . 2.1 Hydrological Model and System Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data and How It Is Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Concepts and Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Calculation of Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Partner Network and Connection to International Initiatives . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 4 5 7 9 11 12 14 14 14 15

J. Thielen-del Pozo (*) • P. Salamon Joint Research Centre (JRC), European Commission, Ispra, Italy e-mail: [email protected] P. Burek Water Program (WAT), International Institute for Applied System Analysis (IIASA), Laxenburg, Austria F. Pappenberger European Centre for Medium-Range Weather Forecast, ECMWF, Reading, UK C. Alionte Eklund Swedish Meteorological and Hydrological Institute, Norrkoeping, Sweden E. Sprokkereef Rijkswaterstaat, Water Management Centre for the Netherlands, Lelystad, Netherlands M. Hazlinger Slovak Hydro-Meteorological Institute, Bratislava, Slovakia M.P. Garcia REDIAM, Sevilla, Spain R. Garcia-Sanchez ELIMCO SISTEMAS S.L., Sevilla, Spain # Springer-Verlag Berlin Heidelberg 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_51-1

1

2

J. Thielen-del Pozo et al.

Abstract

Europe repeatedly observes flood events that affect several countries at the same time and which require the coordination of assistance at the European level. The European Flood Awareness System (EFAS) has been developed specifically to respond to the need for forecasting transnational floods with sufficient lead time to allow coordination of aid at the European level in case the national capacities for emergency management are exceeded. In order to achieve robust and reliable flood forecasting at the continental scale with lead times up to 10 days, EFAS promotes probabilistic forecasting techniques based on multiple numerical weather prediction inputs including ensemble prediction systems. Its aim is to complement existing national flood forecasting services with added value information and to provide European decision makers with coherent overviews on ongoing and upcoming floods in Europe for better planning and coordination of aid. To date, EFAS is a unique system providing daily, probabilistic flood forecast information for the entire of Europe on a single platform. Being a complementary system to national ones, EFAS predicts the probabilities for exceeding critical flood thresholds rather than quantitative information on stream flows. By maintaining a dedicated, multinational partner network of EFAS users, novel research could be transferred directly to the operational flood forecasting centers in Europe. EFAS development started in 2003, and the system has become fully operational under the umbrella of Emergency Management Service of the European Copernicus Space Program in 2011. Keywords

Continental • Ensemble prediction • Early flood warning

1

Introduction

With around 40 cross-border rivers, Europe is particularly exposed to floods which require bilateral or international cooperation for effective management of flood prevention and preparedness measures. Most European rivers cross two to three countries, except the Danube River which is shared with as many as 18 different countries and controlled by many more authorities (http://www.icpdr.org/main/pub lications/15-years-managing-danube-basin). During the last 20 years, several critical transnational flood events took place in Europe, for example, in the river basins of the Odra (1997), Rhine and Meuse (1993, 1995), and Elbe and Danube (2002, 2006 and 2013). In particular, the 2002 floods in the Elbe and Danube were disastrous and affected eight countries simultaneously (Brazdil et al. 2005; Yiou et al. 2006; Toothill 2002). While many of those countries exceeded their national capacity to fight against the flooding and to reduce the impacts of the event, the planning and coordination of aid between the countries and in particular at the European level proved to be difficult because decision makers were faced with noncoherent flood warning information from different sources and

Medium Range Flood Forecasting Example EFAS

3

of variable quality and information level. In fact, despite the high number of transnational river basins and the exposure to floods, fully integrated, transnational flood forecasting systems for European systems are rare, and information exchange on discharge and water level data is still a challenge between cross-border authorities. However, without actually sharing the same forecasting model for upstream and downstream river sections or using the same information platform, appropriate planning is a challenge. Bakker (2007) suggests that transboundary flood events can be particularly hazardous and result in higher number of victims and financial damages than national rivers which can be partially attributed to the lack of integrated information platforms and coherent management plans. Therefore, the European Commission initiated the development of the European Flood Awareness System (EFAS). The aim of EFAS is to operate a unique forecasting system for the entire of Europe capable of forecasting the probability of severe flood events with sufficient lead time for various decision makers (Thielen et al. 2009; Bartholmes et al. 2009) including experts in flood forecasting authorities and international civil protection services. The system is to provide the national authorities with added value information to their own national and higher resolution systems. Added value for national services could include fully catchment-based simulation including river sections from neighboring countries, flood forecasting information based on different weather forecasts, flood forecasting based on another hydrological model, as well as overview information from neighboring river basins for general information. In case of absence of a national flood forecasting system, as is the case in a few countries, EFAS can serve as unique flood forecasting system. For planning and coordination of international aid in support to the affected countries, the system provides the European civil protection community with unique overview information on ongoing and upcoming flood events. EFAS information is provided on a single platform using the same color codes and thresholds for all river basins in Europe and has become an important planning tool for European decision makers (Cloke et al. 2013a). A recent study also evaluated the potential monetary value of the system and found a return of up to €400 for every €1 invested (Pappenberger et al. 2015a). An important driver for floods is rainfall which can only be forecasted skillfully up to a few days ahead with single (deterministic) forecasts (Buizza et al. 1999). Meteorologists have responded to this limitation arising from numerical representation by using ensemble prediction systems (EPS). While ensemble prediction systems (EPS) were already commonly used and applied in the meteorological community in the 1990s for extending the predictability of forecasts (Buizza et al. 1999), hydrological EPS (HEPS) were still novel in hydrological applications and forecasters only started familiarizing themselves with their use and how to deal with the associated uncertainties (Demeritt et al. 2007; Thielen et al. 2011). Over the last decade, HEPS are increasingly seen as a possibility for extending the predictability for flood events and consequently for improved decision-making (Cloke et al. 2013b). They are already an integral part in many operational centers in particular in Europe (Cloke and Pappenberger 2009; Cloke et al. 2009; Wetterhall et al. 2013) and represent today the state of the art in forecasting science and related

4

J. Thielen-del Pozo et al.

disciplines (Schaake et al. 2006; Thielen et al. 2008a, 2011; Collins and Knight 2007; Cloke et al. 2013c; Stephens and Cloke 2014). In this chapter, first the setup of the EFAS system including a brief description of the hydrological model, the technical setup, and meteorological inputs will be provided. Furthermore, a subsection is dedicated to the visualization of EFAS results and their communication to the different end users. Finally, calculation of scores and evaluation of EFAS results on event bases are described. A brief summary and outlook to future development avenues will be given at the end of the chapter.

2

EFAS – A Novel Concept for Improving Preparedness for Flooding in Europe

The development of EFAS from a research project to a fully operational system was a continuous process over a time span of approximately 10 years. The system sparked off from a research project called “European Flood Forecasting System (EFFS, 1999–2002)” (Gouweleeuw et al. 2004; 2005; Pappenberger et al. 2005) where for the first time in Europe, the feasibility of using ensemble prediction system inputs for early flood warning was tested for several European river basins. From 2003 to 2011, the Joint Research Centre of the European Commission brought EFAS to an operational state in collaboration with the European hydrological and meteorological institutions in the Member States and the international research community represented through HEPEX. During the first 2 years of development, the technical skeleton was set up, data collected, and the model calibrated. From 2005 to 2010, EFAS was being tested in real-time mode and continuously improved and adapted to the needs of the National hydrological services and the European Civil Protection which could consult EFAS information through the EFAS information system (EFAS-IS, launched in 2007), a username and password protected web interface to which partners can connect any time to view EFAS results, provide feedback, and share comments and suggestions with others (www.efas.eu). While at the beginning of the EFAS project, the emphasis was put on products for large-scale riverine floods in transnational river basins, this started changing once the system had proved its added value. Increasingly, EFAS partners also looked to EFAS for additional information for national and smaller rivers and flash flood events. Subsequently, a variety of indicators for rapid flood generating rainfalls were produced (Alfieri et al. 2012a, b, this issue). By 2011 EFAS was sufficiently developed to be transferred to full operations as part of the Emergency Management Service of the Copernicus Initial Operations (Regulation (EU) No 377/2014 of the European Parliament and of the Council of 3 April 2014 establishing the Copernicus Program and repealing Regulation (EU) No 911/2010) in direct support to European Civil Protection (Decision No 1313/2013/EU of the European Parliament and of the Council of 17 December 2013 on a Union Civil Protection Mechanism). All operational components have been

Medium Range Flood Forecasting Example EFAS

5

Fig. 1 Illustration of the data and information flow for the operational EFAS. There are four operational centers executing distinct tasks of hydrological and meteorological data collection, computations, postprocessing and visualization, and finally analysis and dissemination of results. Partner networks play major roles as data providers and receivers of EFAS forecast information

outsourced to competent institutions in a decentralized manner as illustrated in Fig. 1. The establishment of a partner network with national, regional, and local flood forecasting centers was key during the development of EFAS and remains an integral part of the system management during operations. Direct contact with partners as well as annual meetings with the representatives from the entire network ensured that the development of EFAS products was in line with the evolving needs of the end users. The core features of EFAS are described comprehensively in the two EFAS publications by Thielen et al. (2009) and Bartholmes et al. (2009), which provide a good overview of the system. Further publications on specific topics and further developments are numerous and referred to in this chapter, which represents a concise summary of the current state of the art of the system.

2.1

Hydrological Model and System Setup

The core hydrological model of EFAS is the LISFLOOD model, a spatially distributed hybrid between a conceptual and a physical rainfall-runoff model combined with a routing module in the river channel, which has been specifically designed for the simulation of hydrometeorological processes in large European river basins (de Roo et al. 2003; van der Knijff et al. 2008). It is programmed in a dynamic Geographical Information System (GIS) language allowing optimal use of data layers of land use, soil type, depth and texture, and river network (Thielen

6

J. Thielen-del Pozo et al.

Fig. 2 Distribution of calibration stations used for the calibration conducted in 2011 (blue) and new stations added during the calibration conducted in 2013 (red)

et al. 2008b). LISFLOOD has several parameters that require calibration based on observed discharges. Currently 693 stations across Europe are available for the calibration as illustrated in Fig. 2. In case observed discharges are not available, LISFLOOD is run with standard parameters. For EFAS, LISFLOOD has been set up on a rectangular 5  5 km2 grid for a geographical domain including all 27 EU countries. Currently, an extension of the

Medium Range Flood Forecasting Example EFAS

7

model domain towards Eastern countries is ongoing. Time steps are variable: in order to establish the initial conditions, the model is driven with observed meteorological data with a 24 h time step. The forecasts are driven with various numerical weather prediction (NWP) inputs and executed with either 6 hourly or daily time steps. Experimentally, also hourly time steps for flash flood type applications were implemented (Alfieri et al. 2011, this issue), in this case combined with a higher grid resolution. In the near future, a multihydrological model setup is also envisaged.

2.2

Data and How It Is Used

2.2.1 Static Data The LISFLOOD model requires information on the topography such as elevation and slope gradient. General land use information is essential and is complemented with additional maps on fraction of forest and urban areas. Seasonality is accounted for by incorporating monthly information on leaf area index and total water withdrawal demand. Soil is described by soil depth and soil texture for an upper and lower soil layer. The description of the river network is crucial and not always straightforward when upscaling from high resolution to lower resolution in particular for tributaries and rivers at close distance. The channel is defined through maps on width, length, gradient, and slope. Although the model is set up on a 5  5 km2 grid, information on the river itself is available at higher resolution. 2.2.2 In Situ For EFAS, both hydrological and meteorological in situ data are required. The data are collected as real-time data for the forecasting applications and as historic time series data for model calibration and validation purposes. After collection of the raw data, the data are submitted to vigorous quality and quantity checks before being aggregated further. Data are typically aggregated to 6 hourly or 24 hourly data and both raw data, flagged data, and aggregated data stored in dedicated databases. During the calibration and validation phase, the meteorological in situ data are used for calculating long-term simulations which are used to calculate statistics and to study specific flood events. In operational mode, the observed data are used to establish the initial conditions for the next forecast runs. The time from the last available observed data and the next forecast run is filled in with forecasted NWP data. The observed hydrological data are used for several applications. First of all, with a subset of reliable data the model is calibrated and validated. In real-time mode, all available data are used to compare against the model output for visual evaluation of the model output (and identification of observational errors). For a subset of the stations, the model output is bias and error-corrected to obtain fully probabilistic discharge forecasts (Bogner and Kalas 2008; Bogner and Pappenberger 2011). These postprocessed data are made available for the EFAS partners. Finally, for all stations

8

J. Thielen-del Pozo et al.

Table 1 List of weather forecast products used in EFAS in 2016 Forecast name DWD (German Weather Service) ECMWFDeterministic ECMWF – VAREPS COSMO-LEPS UKMO TIGGE archive including ensembles from at least eight global NWP.

Lead time Members (days) 1 7 1

10

Spatial resolution (km) ~6.5 km (for days 1 to 5) ~15 km (for days 6 to 7) ~8 km (for days 1–10)

51

15

~18 km

16

5

~7 km (does not cover most of Scandinavia)

Variable: 5–30

Variable: 30–110 km

Mode Operational, twice a day Operational, twice a day Operational, twice a day Operational, twice a day Case studies Case studies

where critical discharge or water levels are available, the actual values are compared against the critical thresholds, and where they are exceeded, the station is highlighted with information on time, location, and which critical threshold has been exceeded, as well as a link to the national providers.

2.2.3 Numerical Weather Prediction (NWP) EFAS promotes the use of ensemble prediction systems (EPS) as well as multimodel inputs to achieve robust and reliable probabilistic outputs (Thielen et al. 2009). NWP products are used from the European Centre for Medium Range Weather Forecasts (ECMWF, deterministic, and EPS), the German Weather Service (DWD, deterministic, dynamically downscaled for days 1–3), and the COSMO-LEPS consortium (COSMO-LEPS, EPS). Products from the UK Meteorological Office (UKMO, EPS) are currently also being tested. Furthermore, all global EPS stored in the Thorpex Interactive Global Grand Ensemble (TIGGE) archive can be fed into EFAS. TIGGE data, so far, have only been applied in research mode and not in operational mode, for example, Pappenberger et al. (2008). Spatial resolutions and lead times in days are illustrated in Table 1. In EFAS, each ensemble member is treated like a single deterministic forecast. Only after each member has been pushed through the hydrological model, the results are postprocessed to different probabilistic products. Real-time weather forecasts are used for the operational flood forecasting application twice a day using the 00:00 and the 12:00 forecast. In between time steps such as 03:00 and 09:00 are not included. EFAS forecasts are initiated as soon as a weather forecast is available. Archived forecasts and reforecasts are applied in hindcast studies and to recalculate long-term skill scores (Pappenberger et al. 2011a, b).

Medium Range Flood Forecasting Example EFAS

2.3

9

Concepts and Methodologies

By comparing the forecasts to these thresholds, the ensemble streamflow calculations are converted into effective flood forecasts with up to 10 days lead time.

2.3.1 Flood Threshold Exceedance Versus Discharge Forecasts The role of EFAS is to notify its partners when the system indicates a probability that critical flood levels might be exceeded. This task can be challenging for a continental flood forecasting system which is lacking the spatial and temporal resolution of most national systems, the high-resolution input data to correctly determine the initial conditions, and, most of all, the local knowledge when discharges are becoming critical. Therefore, a different approach needed to be developed. The development team of EFAS opted for a model consistent framework: Using the same model setup and parameterization, long-term discharge simulations are produced from which different return periods are calculated. For EFAS, the 1.5-, 2-, 5-, and 20-year return periods were chosen. These thresholds are calculated for every grid point in the model domain in the same way. When the forecasted discharges are then compared against these thresholds, the streamflow forecasts are effectively converted into dichotomous time series of “flood threshold exceeded”/“flood threshold not exceeded.” This approach has the advantage that systematic over- or underestimations of discharges in the model are compensated for. The only bias remaining then arises from the discrepancies between the observed and forecasted weather inputs. This approach, which has been developed quite early in EFAS, has proved to be very successful as a guidance for decision makers. Return periods of 1.5 years are classified as “low,” 2 years as “medium,” 5 years as “high,” and 20 years as “severe.” Threshold exceedances also play a role in the visualization and are color coded as green (1.5), yellow (2), red (5), and purple (20). 2.3.2 Incorporating Persistence into Alerting Procedures The main objective of EFAS is to report large riverine floods, which are typically caused by widespread severe or long-lasting precipitation, snow melting, or rainfalls combined with snow melting. The driving processes for such floods are mostly large synoptic-scale weather phenomena that build up over several days and should therefore be captured repeatedly by the NWP. Therefore, the principle of temporal “persistence” was introduced for notifying EFAS partners: A pixel is flagged as “risk of flooding” only if the discharges in that river pixel exceed the EFAS high or EFAS severe flood threshold in at least three consecutive 12-hourly forecasts. This is equivalent to lagged forecasts. It has been shown that by introducing a criterion of persistence in flood forecasting, the forecast reliability increases (Bartholmes et al. 2009). 2.3.3

Visualization Multiple Forecasts and EPS as Aggregated Exceedance Information EFAS results are distributed to more than 38 EFAS partners from 23 different countries, most of them with different languages. Since end users include both

10

J. Thielen-del Pozo et al.

Discharge (m3/s)

Low=774.64 Med=877.07 High=1129.1 WB_DWD WB_ECMWF WB_obs

Sev=1456.1

DWD

ECMWF

VAREPS

VAREPS Media

20 yrs 5 yrs 2 yrs 1 yr

26/05

27/05

28/05

time0

31/05

01/06

02/06

03/06

04/06

05/06

06/06

07/06

08/06

Date (Days)

Fig. 3 Box-plot diagram showing the discharge time series in reference to the four EFAS thresholds low (green, 1.5-year return period), medium (yellow, 2-year return periods), high (red, 5-year return periods), and severe (purple, 20-year return period) for the ECMWF EPS. The first 3 days show the simulation of the hydrological conditions based on observations (indicated as dots), while the forecasts are shown for a lead time of 10 days for EPS. In addition, two deterministic forecasts of ECMWF (red line) and of DWD (black line) are shown

experts from hydrological forecasting and civil protection services, the recipients have different level of training and understanding. Therefore, EFAS information must be clear and unambiguous in order to be correctly understood and lead to appropriate action (Demeritt et al. 2007). EFAS results are visualized as time series or maps on the customizable web interface. Time series are not shown as spaghetti plots as this was dismissed by the EFAS partners as a useful representation (Ramos et al. 2007). Instead, the information is shown as classical box diagrams with min/max whiskers and the 25, 50, and 75 quantiles in a box (Fig. 3) where the y-axis is expressed as return period and color coded according to the threshold. Information is aggregated when possible, for example, the results based on EPS are visualized together with the deterministic forecasts. However, also with the concise representation of box-plot diagrams, visualizing results from two or more HEPS in the same diagram is confusing. Instead, a more condensed view has been developed: First of all, information is aggregated in daily information, then expressed either in terms of threshold exceedance for the deterministic forecasts or in terms of percentages exceeding thresholds for the HEPS. This example is illustrated in Fig. 4. The x-axis represents the date for which the forecast has been made. Each row represents a forecast. The first two rows illustrate the two deterministic forecasts which are simply expressed in terms of threshold exceedance. For further information, the tendency of rising or falling discharges is indicated by arrows, the peak as a star. The next row represents the percentage of HEPS exceeding the 5-year return period threshold (color coded red) and the 20-year return period (color coded purple). The two rows below show the equivalent for the COSMO-LEPS. In this

Medium Range Flood Forecasting Example EFAS

11

Overview of DWD, ECMWF, EUE > HAL, EUE > SAL

Forecast Type

30

31

1

2

3

4

10

39

45

20

4

6

4

31

44

EUE > SAL

COS > SAL

7

8

*

EUD

COS > HAL

6

*

DWD

EUE > HAL

5

8

19

Fig. 4 EFAS representation of flood threshold exceedances from multiple forecasts in one diagram, corresponding to the box-plot diagram in Fig. 3. The flood thresholds are color coded in the same way as in Fig. 3. For probabilistic forecasts, the probability of exceeding the 5- or 20-year periods are indicated in percent and color coded by intensity. Arrows indicate if the discharges are increasing or decreasing during the 24 h aggregation and a star symbolizes the peak of the time series

representation, the lead times are clearly visible, for example, the 5-day lead time for COSMO-LEPS as opposed to the 10 days of ECMWF. In this very condensed representation, all information is summarized at one glance and information on further forecasts could easily be added. It also has the advantage that the consistency between the forecasts are easily assessed, for example, are all the forecasts indicating the same day for the peak and is the event forecast with similar severity in all forecasts? Furthermore, this representation also allows checking back in time if the forecasts are persistent (Fig. 5). The representations of forecasts are shifted so that forecasts where peaks are always forecast for the same day align. Figure 5 clearly shows that the forecasts cluster a higher probability for exceeding the EFAS high alert around the 3–5th of the coming month. It also clearly shows that the probabilities are increasing from forecast to forecast from an initial 4–8 % to 45 % in the last forecast. Spatial information is shown in the form of maps and when appropriate follow the same color coding concept – pixels exceeding the 5-year return period thresholds are color coded in red, while those exceeding the 2-year return period are color coded in yellow.

2.4

Postprocessing

Although the concept of threshold exceedance has proven to work well and has been successful, experts in the flood forecasting centers are also interested in quantitative information for better comparison of national results with EFAS. Therefore, at selected points where real-time data are available, the streamflow output of EFAS is postprocessed so that the forecasts start with the same value as the observed

12

J. Thielen-del Pozo et al.

Fig. 5 Persistence diagram for the current probabilistic forecasts based on ECWMF Ensemble Prediction System (last line) and the history of previous forecasts (lines above last). The current forecast is the same as the EUE > HAL of Fig. 4

discharges and the information can be readily uploaded into national systems and compared. For this, a novel method based on wavelet transformation has been developed which is described in Bogner and Pappenberger (2011) and Bogner and Kalas (2008). For specific stations where real-time data are available, the method applies AutoRegressive models with eXogenous input (ARX) to relate the observed streamflow at one time to the previous one with a time lag and the simulated model. This approach will work particularly well for the first time lags. However, model errors usually show spatio-temporal variations that can, for example, depend on the season or local weather conditions. Such errors can be handled most efficiently using wavelet transformations as described in detail by Bogner and Kalas (2008). The postprocessing method developed for EFAS has further the advantage that information on both model uncertainty (for past days) and total predictive uncertainty (for forecasts) can be provided to the end user as shown in Fig. 6. Currently, developments are ongoing to publish this postprocessed forecasts also as a web service using Open Geospatial Consortium (OGC) standards for a better direct integration into the national forecasting systems.

2.5

Calculation of Scores

Quality control and validation of EFAS results is essential for EFAS (de Roo et al. 2011). While deterministic forecasts can be relatively easy classified as “hit,” “false alarm,” or “missed” when comparing the forecasts to the observed event, for probabilistic forecasts it needs to be verified if the distribution of the hydrographs as well as the probability of the extreme events have been captured correctly (Bartholmes et al. 2009; Cloke and Pappenberger 2009). Since different scores test

Medium Range Flood Forecasting Example EFAS

13

Fig. 6 Postprocessed EFAS results for selected stations indicating the model uncertainty in gray shading for the period where the model is driven with observed meteorological data and the predictive uncertainty in blue shading

different performance attributes such as reliability, resolution, sharpness, spreadskill, and bias, single scores do not capture the performance of the HEPS (Hsu and Murphy 1986; Jolliffe and Stephenson 2003). For EFAS, the HEPS are validated against the EFAS water balance for each grid point. Average scores are updated on the 13th day of each month. In order to reduce possible effects of seasonality, the scores are calculated over a time window including the corresponding first day of the month and the past 365 days. For EFAS, several scores have been calculated including the rank histogram, root mean square error (RMSE) and the RMSE scaled by mean discharges, also referred to as coefficient of variation of RMSE (CV), Brier (skill) score, continuous ranked probability (skill) score (CRPS), relative operating characteristics (ROC), and the area under the ROC (AROC). Current skill scores are published regularly in the bimonthly EFAS bulletins accessible on www.efas.eu. A longer term assessment of

14

J. Thielen-del Pozo et al.

the skill of EFAS has been performed and published by Pappenberger et al. (2011a,b) and the use of benchmarks in EFAS for skill score analysis evaluated by Pappenberger et al. (2015b).

2.6

Training

Training plays an important role in EFAS. Its concepts, setups, and functionalities of the EFAS interface are demonstrated regularly to the partners during the annual EFAS meetings as well as during dedicated training sessions held at the National hydrological services. In addition, during the EFAS annual meetings, different topics have been addressed such as the concept of probabilistic forecasting, skill scores, making decisions based on uncertain forecasts, and different ways of visualizing results (Pappenberger et al. 2013; Wetterhall et al. 2013).

2.7

Partner Network and Connection to International Initiatives

A dedicated and active partner network has been key to ensuring that EFAS products are designed as added value products. Interaction and feedback from the National hydrological services as well as the civil protection community has ensured that the products are complementary, understood, and used by the various decision makers. The EFAS partner network has grown to 38 dedicated partners and several associated partners. In addition to the operational end users, the development of a continental flood forecasting system has also put EFAS at the forefront of research in probabilistic flood forecasting and served as a testbed for many research studies not directly linked to the operational system and an example for others to follow (for example, Alfieri et al. 2011; He et al. 2009; Pappenberger et al. 2015a, b; Bogner et al. 2012a, b). HEPEX has played an important role for EFAS since the HEPEX community addressed many topics of relevance for the development of the system including preprocessing of meteorological data, bias correction of both meteorological and hydrological data, estimation of uncertainty, visualization of results, and definition of end user needs. Furthermore, close links will be kept with the Global Flood Partnership, a recently launched international initiative to combat the effects of floods on international scale (http://portal.gdacs.org/Global-Flood-Partnership).

3

Summary and Conclusion

With the European Flood Awareness System, the first operational, continental flood early warning system for Europe has been developed. It is operated under the umbrella of the European space program Copernicus as part of the Emergency Management Service in support to improved preparedness for floods in Europe and serves both the National hydrological services and the European Civil Protection community.

Medium Range Flood Forecasting Example EFAS

15

EFAS is a comprehensive system incorporating several ensemble prediction systems as well as deterministic weather forecasts to assess the probability of critical flood thresholds being exceeded in the coming 10–15 days. By combining different concepts such as the flood threshold exceedance based on a model consistent framework, the use of forecast persistence to increase reliability, easy visualization of forecasts on a twice daily updated web interface for a wide end user basis, sophisticated postprocessing of forecast for an improved quantitative comparison with national results, and a regular calculation and publication of skill scores EFAS has become a unique, continental scale flood forecasting system. It is embedded in a management structure including a strong partner network which provides valuable feedback on the performance and future developments of the system. Continuous training sessions held at the National hydrological services and a strong linkage to research projects and international initiatives such as HEPEX ensure that EFAS products are designed as added value products and maintain the system as state of the art.

References L. Alfieri, D. Velasco, J. Thielen Del Pozo, Flash flood detection through a multi-stage probabilistic warning system for heavy precipitation events. Adv. Geosci. 29, 69–75 (2011) L. Alfieri, P. Salamon, F. Pappenberger, F. Wetterhall, J. Thielen, Operational early warning systems for water-related hazards in Europe. Environ. Sci. Pol. 21, 35–49 (2012a) L. Alfieri, J. Thielen, F. Pappenberger, Ensemble hydro-meteorological simulation for flash flood early detection in southern Switzerland. J. Hydrol. 424–425, 43–153 (2012b) L. Alfieri, M. Berenguer, V. Knechtl, K. Liechti, D. Sempere-Torres, M. Zappa, Flash flood forecasting based on rainfall thresholds, in Handbook of Hydrometeorological Ensemble Forecasting, ed. by Q. Duan et al. (Springer, Berlin/Heidelberg, this issue). doi:10.1007/978-3-64240457-3_49-1 M.H.N. Bakker, Transboundary river floods: vulnerability of continents, international river basins and countries. Ph.D Dissertation, Oregon State University, 276, (2007), http://hdl.handle.net/ 1957/3821 J.C. Bartholmes, J. Thielen, M.H. Ramos, S. Gentilini, The European Flood Alert System EFAS – part 2: statistical skill assessment of probabilistic and deterministic operational forecasts. Hydrol. Earth Syst. Sci. 13, 141–153 (2009) K. Bogner, F.Pappenberger, Multiscale error analysis, correction, and predictive uncertainty estimation in a flood forecasting system. Water. Resour. Res (Impact Factor: 3.15) 47 (2011). doi:10.1029/2010WR009137 K. Bogner, M. Kalas, Error-correction methods and evaluation of an ensemble based hydrological forecasting system for the Upper Danube catchment. Atmos. Sci. Lett. 9(2), 95–102 (2008) K. Bogner, H.L. Cloke, F. Pappenberger, A. De Roo, J. Thielen, Improving the evaluation of hydrological multi-model forecast performance in the Upper Danube Catchment. Int. J. River Basin Manage. 10(1), 1–12 (2012a). ISSN 1814–2060. doi:10.1080/15715124.2011.625359 K. Bogner, F. Pappenberger, H.L. Cloke, Technical note: the normal quantile transformation and its application in a flood forecasting system. Hydrol. Earth Syst. Sci. 16, 1085–1094 (2012b). ISSN 1027–5606. doi:10.5194/hess-16-1085-2012 R. Brazdil, C. Pfister, H. Wanner, H. von Storch, J. Luterbacher, Historical climatology in Europe – the state of the art. Clim. Change 70, 363–430 (2005) R. Buizza, A. Hollingsworth, F. Lalaurette, A. Ghelli, Probabilistic predictions of precipitation using the ECMWF ensemble prediction system. Weather Forecast. 14, 168–189 (1999)

16

J. Thielen-del Pozo et al.

H.L. Cloke, F. Pappenberger, Ensemble flood forecasting: a review. J. Hydrol. 375(3–4), 613–626 (2009) H. Cloke, J. Thielen, F. Pappenberger, S. Nobert, G. Balint, C. Edlund, A. Koistinen, C. de SaintAubin, E. Sprokkereef, C. Viel, P. Salamon, R. Buizza, Progress in the implementation of Hydrological Ensemble Prediction Systems (HEPS) in Europe for operational flood forecasting. ECMWF Newsletter 121, 20–24 (2009) H. Cloke, F. Pappenberger, J. Thielen, V. Thiemig, Operational European flood forecasting, in Environmental Modelling: Finding Simplicity in Complexity, ed. by J. Wainwright, M. Mulligan, 2nd edn. (Wiley, Chichester, 2013a). doi:10.1002/9781118351475.ch25 H.L. Cloke, F. Pappenberger, S.J. van Andel, J. Schaake, J. Thielen, M.-H. Ramos, Hydrological ensemble prediction systems. Hydrol. Process. 27, 1–4 (2013b). doi:10.1002/hyp.9679 H.L. Cloke, F. Wetterhall, Y. He, J.E. Freer, F. Pappenberger, Modelling climate impact on floods with ensemble climate projections. Q. J. R. Meteorol. Soc. 139(671 part B), 282–297 (2013c). ISSN 1477-870X. doi:10.1002/qj.1998 M. Collins, S. Knight, Ensembles and probabilities: a new era in the prediction of climate change. Phil. Trans. R. Soc. A. 365, (1857), 1471–2962 (2007) A. de Roo, J. Thielen, P. Salamon, K. Bogner, S. Nobert, H.L. Cloke, D. Demeritt, J. Younis, M. Kalas, K. Bódis, D. Muraro, F. Pappenberger, Quality control, validation and user feedback of the European Flood Alert System (EFAS). Int. J. Digital Earth. 4(Supplement 1), 77–90 (2011), Special Issue A. de Roo, B. Gouweleeuw, J. Thielen, J. Bartholmes, P. Bongioannini-Cerlini, E. Todini, P. Bates, M. Horritt, N. Hunter, K.J. Beven, F. Pappenberger, E. Heise, G. Rivin, M. Hills, A. Hollingsworth, B. Holst, J. Kwadijk, P. Reggiani, M. van Dijk, K. Sattler, E. Sprokkereef, Development of a European Flood Forecasting System. Int. J. River Basin Manage. 1, 49–59 (2003) D. Demeritt, H. Cloke, F. Pappenberger, J. Thielen, J. Bartholmes, M.H. Ramos, Ensemble predictions and perceptions of risk, uncertainty, and error in flood forecasting. Environ. Hazards 7, 115–127 (2007) B. Gouweleeuw, P. Reggiani, A. De Roo, A European Flood Forecasting System EFFS. Full Report, European Report EUR21208, EC DG JRC & WL Delft Hydraulics, p. 304 (2004) B.T. Gouweleeuw, J. Thielen, G. Franchello, A.P.J. de Roo, R. Buizza, Flood forecasting using medium-range probabilistic weather prediction. Hydrol. Earth Syst. Sci. 9(4), 365–380 (2005) Y. He, F. Wetterhall, H.L. Cloke, F. Pappenberger, M. Wilson, J. Freer, G. McGregor, Tracking the uncertainty in flood alerts driven by grand ensemble weather predictions. Met. Apps 16, 91–101 (2009). doi:10.1002/met.132 W.-R. Hsu, A.H. Murphy, The attributes diagram: a geometric framework for assessing the quality of probability forecasts. Int. J. Forecast. 2, 285–293 (1986) I.T. Jolliffe, D.B. Stephenson, Forecast Verification: A practitioner’s Guide in Atmospheric Science (Wiley, Chichester, 2003) F. Pappenberger, K.J. Beven, N.M. Hunter, P.D. Bates, B.T. Gouweleeuw, J. Thielen, A.P.J. de Roo, Cascading model uncertainty from medium range weather forecasts (10 days) through a rainfallrunoff model to flood inundation predictions within the European flood forecasting system EFFS). Hydrol. Earth Syst. Sci. 9(4), 381–393 (2005) F. Pappenberger, J. Bartholmes, J. Thielen, H.L. Cloke, R. Buizza, A. de Roo, New dimensions in early flood warning across the globe using grand-ensemble weather predictions. Geophys. Res. Lett. 35, L10404 (2008). doi:10.1029/2008GL033837 F. Pappenberger, J. Thielen Del Pozo, M. Del Medico, The impact of weather forecast improvements on large scale hydrology: analysing a decade of forecasts of the European Flood Alert System. Hydrol. Process. 25(7), 1091–1113 (2011a) F. Pappenberger, K. Bogner, F. Wetterhall, H. Yi, H. Cloke, J. Thielen Del Pozo, Forecast convergence score: a forecaster’s approach to analysing hydro-meteorological forecast systems. Adv. Geosci. 29, 27–32 (2011b)

Medium Range Flood Forecasting Example EFAS

17

F. Pappenberger, E. Stephens, J. Thielen, P. Salamon, D. Demeritt, S.J. van Andel, F. Wetterhall, L. Alfieri, Visualizing probabilistic flood forecast information: expert preferences and perceptions of best practice in uncertainty communication. Hydrol. Process. 27(1), 132–146 (2013). http:// onlinelibrary.wiley.com/doi/10.1002/hyp.9253/full F. Pappenberger, H.L. Cloke, D.J. Parker, F. Wetterhall, D.S. Richardson, J. Thielen, The monetary benefit of early flood warnings in Europe. Environ. Sci. Pol. 51, 278–291 (2015a). ISSN 1873–6416. doi:10.1016/j.envsci.2015.04.016 F. Pappenberger, M.H. Ramos, H.L. Cloke, F. Wetterhall, L. Alfieri, K. Bogner, A. Mueller, P. Salamon, How do I know if my forecasts are better? Using benchmarks in Hydrological ensemble prediction. J. Hydrol. 522, 697–713 (2015b). ISSN 0022–1694. doi:10.1016/j. jhydrol.2015.01.024 M.-H. Ramos, J. Bartholmes, J. Thielen-del Pozo, Development of decision support products based on ensemble forecasts in the European flood alert system. Atmos. Sci. Lett. 8(4), 113–119 (2007) J. Schaake, K. Franz, A. Bradley, R. Buizza, The Hydrological Ensemble Prediction EXperiment (HEPEX). Hydrol. Earth Syst. Sci. Discuss. 3, 3321–3332 (2006) E. Stephens, H. Cloke, Improving flood forecasts for better flood preparedness in the UK (and beyond). Geochem. J. 180, 310–316 (2014). doi:10.1111/geoj.12103 J. Thielen, J. Schaake, R. Hartman, R. Buizza, Aims, challenges and progress of the Hydrological Ensemble Prediction Experiment (HEPEX) following the third HEPEX workshop held in Stresa 27 to 29 June 2007. Atmos. Sci. Lett. 9, 29–35 (2008a) J. Thielen, P. Salamon, A. De Roo, “Geographical information systems – An integral part of the European Flood Alert System (EFAS)”. GeoFocus (Editorial) (8), 12–16 (2008b). ISSN: 1578–5157 J. Thielen, J. Bartholmes, M.-H. Ramos, A. de Roo, The European Flood Alert System – part 1: concept and development. Hydro. Earth Syst. ScI 13, 125–140 (2009) J. Thielen, F. Pappenberger, P. Salamon, K. Bogner, P. Burek, A. de Roo, State of the art of flood forecasting – from deterministic to probabilistic approaches, in Flood Hazards: Impacts and Responses for the Built Environment, ed. by J. Lamond, C. Booth, F. Hammond, D. Proverbs, T. Francis (Francis and Taylor, London, 2011), 371 pp J. Toothill, Central European Flooding August 2002, Technical Report EQECAT, ABS Consulting, 21 p, (2002) J.M. van der Knijff, J. Younis, A.P.J. de Roo, LISFLOOD: a GIS-based distributed model for river basin scale water balance and flood simulation. Int. J. Geogr. Inf. Sci. 24, 189–212 (2010) F. Wetterhall, F. Pappenberger, H. Cloke, J. Thielen del Pozo et al., Forecasters priorities for improving probabilistic flood forecasts. Hydrol. Earth Syst. Sci. 17, 4389–4399 (2013) P. Yiou, P. Ribereau, P. Naveau, M. Nogaj, R. Brazdil, Statistical analysis of floods in Bohemia (Czech Republic) since 1825. Hydrol. Sci. J. 51(5), 930–945 (2006)

Seasonal Drought Forecasting on the Example of the USA Eric F. Wood, Xing Yuan, Joshua K. Roundy, Ming Pan, and Lifeng Luo

Contents 1 Introduction: Princeton’s Seasonal Drought Forecast System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Drought Forecast Application over USA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 8 9

Abstract

Drought is a slowly developing process and usually begins to impact a region without much warning once the water deficit reaches a certain threshold. Predicting the drought a few months in advance will benefit a variety of sectors for drought planning and preparedness. In response to the National Integrated Drought Information System (NIDIS), the Princeton land surface hydrology group has been working on drought monitoring and forecasting for over 10 years and has developed a seasonal drought forecasting system based on global climate forecast models and a large-scale land surface hydrology model.

E.F. Wood (*) • M. Pan Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ, USA e-mail: [email protected]; [email protected] X. Yuan RCE-TEA, Inst. of Atmosph. Phys., Chinese Academy of Sciences, Beijing, China e-mail: [email protected] J.K. Roundy Department of Civil, Environmental and Architectural Engineering, University of Kansas, Lawrence, KS, USA e-mail: [email protected]; [email protected] L. Luo Department of Geography, Michigan State University, East Lansing, MI, USA e-mail: [email protected] # Springer-Verlag Berlin Heidelberg 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_52-1

1

2

E.F. Wood et al.

This chapter will showcase the performances of the system in predicting soil moisture drought area, frequency, and severity over the Conterminous United States (CONUS) at seasonal scales; discuss about the challenges in forecasting streamflow for hydrologic drought; and provide an outlook for future developments and applications. Keywords

Drought • Seasonal forecast • Hydrology • Soil moisture • Streamflow • Severity • Climate model • Land surface model • CFSv2 • VIC • NMME • Ensemble prediction • Postprocessing • Downscaling • Bayes

1

Introduction: Princeton’s Seasonal Drought Forecast System

The central element of Princeton’s Seasonal Drought Forecast System (Luo and Wood 2007, 2008) is the Variable Infiltration Capacity (VIC) hydrological model (Liang et al. 1996) that transforms seasonal climate forecast information into hydrological information such as soil moisture and streamflow. VIC is one of the state-ofthe-art macroscale hydrological models, and it has been calibrated, validated, and evaluated in numerous studies at grid, basin, and continental scales (e.g., Wood et al. 1997). The forecast system implements a Bayesian merging procedure (Luo et al. 2007) to combine seasonal forecasts from dynamical climate models with observed climatology at the monthly level to obtain posterior distributions for monthly precipitation and temperature at each grid for each month of the forecast period. With the mean and variance of the posterior distribution for the spatially downscaled monthly variables, a hybrid method that includes both the historicalanalogue criterion and random selection is used to generate daily forecast time series (Luo and Wood 2008). Finally, a simple scaling method is used to adjust the resulting series according to the monthly ensemble mean of the posterior distribution (Yuan et al. 2013a). Runoff generated within a grid cell is routed to the stream gauge location using the linear routing model developed by Lohmann et al. (2004). The inputs of the forecast system are monthly precipitation and temperature from seasonal climate forecast models and the initial land surface conditions generated from Princeton’s drought monitoring system as part of the North American Land Data Assimilation (NLDAS) system. The outputs include high-resolution (1/8th degree spatially) soil moisture and streamflow at selected stations, which can be converted into percentiles to reflect the agricultural and hydrologic drought conditions, respectively. In the following sections, the system is evaluated for predicting soil moisture drought areas, frequency, and severity. The ensemble streamflow forecast is also assessed using root mean square error skill score (SSRMSE) and Relative Operating Characteristic (ROC) diagram. The latest version of the forecast system (Yuan et al. 2013a), which is based on National Centers for Environmental Prediction (NCEP)’s Climate Forecast System version 2 (CFSv2; Yuan et al. 2011),

Seasonal Drought Forecasting on the Example of the USA

4000

3000

2000

1000

0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Southeast Number of Grid Cells Where SM 1400 m3.s-1)

the correlation coefficient results are important to verify that major issues do not affect the results, for example, if the rainy period is not displaced or shifted. In this sense results obtained for the Três Marias inflow are suitable. The correlation coefficient using all data is always above 0.5, and the correlation coefficient for high flows (only values above Q10) is above 0.5 for lead times for up to 4 days and above zero until 13 days of lead time. Figure 10 shows the rank histograms for different lead times, from 48 to 360 h. Figure 10a shows the rank histograms for the whole data set, while Fig. 10b shows the rank histogram just for the cases when observed reservoir inflow exceeded 1400 m3s 1. The U-shaped form of the histograms suggests that in most of the cases, one of the following two outcomes happened: all the 14 members of the ensemble predicted a lower inflow than what was actually observed, or all the members predicted a higher inflow than was actually observed. This indicates a lack of spread of the ensemble at all lead times. A rank histogram with a U shape during the early lead times can be considered normal in the case of ensemble forecasting systems that ignore the uncertainty in initial conditions of the hydrological model and of the observed data and only consider the meteorological uncertainty of weather forecasts. As a result, the ensemble shows a very low spread during the early time steps of the forecast, almost acting as a deterministic forecast. It is possible that the U-shaped form of the rank histogram, even for long lead times shown in Fig. 10a, is probably related to the kind of forecasting errors that occur during the long dry season of the São Francisco River. During the dry season, forecast rainfall is normally close to zero in all members of an ensemble, and, therefore, all members result in the same inflow forecast. In this case observed inflow will usually be above (or below) all the members, because all the members are equal. Figure 10b shows results where the influence of the dry season has been removed by preparing the rank histogram using only the cases when observed inflow was higher than 1400 m3s 1. It can be seen that the form of the histogram did not change for short lead times. However, for lead times of about 96 h (4 days) to 360 h

14

C.E.M. Tucci et al.

Fig. 10 Rank histogram assessment for Três Marias inflow: (a) using all data and (b) only for high flows, using a threshold of Q10 (1400 m3/s)

(15 days), the form of the rank histogram showed a tendency of more uniform distribution. An assessment of 1400 m3/s threshold exceedance using ROC curves to the Três Marias inflow is presented in Fig. 11 for lead times of 48, 120, 240, and 336 h. The point representing the ensemble mean is also shown in Fig. 11. Results for the first lead time (48 h) are very similar for all the ensemble percentiles and for the ensemble mean because at this lead time, the spread in the forecast is usually very small. At this lead time, the probability of detection (POD) of the threshold is around 0.9 for a probability of false detection (POFD) near to zero. Results from the lead times 120 h (5 days) and 240 h (10 days) suggest that it is possible to take decisions related to the 1400 m3/s threshold exceedance with a higher true alarm rate by using the ensemble forecast higher percentiles than using the ensemble mean. With the upper bounds of the ensemble forecast, it is possible to have a true alarm rate of approximately 0.9 with a false alarm rate between 0.02 and 0.12. Using the ensemble mean, the true alarm rate would be around 0.8 for false alarm rates near 0.09. In the case of the 360 h lead time ROC curve, there is a trade-off in the use of ensembles or ensemble means. With the higher ensemble percentiles, it is possible to have a higher probability of true alarm (between 0.85 and 0.9), but also a higher false alarm rate (0.18–0.2). And the use of the ensemble mean provides a smaller false alarm rate (0.13), although also a smaller true alarm rate (0.83) than ensemble higher bounds.

Hydropower Forecasting in Brazil

15

ROC Curve - 1400m³/s - 120h 1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

POD

POD

ROC Curve - 1400m³/s - 48h 1

0.5

0.5

0.4

0.4 0.3

Random Guess

0.3

Random Guess

0.2

Ensemble

0.2

Ensemble

0.1

Ensemble Mean

0.1

Ensemble Mean

0

0 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0

1

1

ROC Curve - 1400m³/s - 360h

ROC Curve - 1400m³/s - 240h 1

1

0.9

0.9

0.8

0.8

0.7

0.7 0.6

POD

0.6

POD

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 POFD

POFD

0.5

0.5 0.4

0.4 0.3

Random Guess

0.3

Random Guess

0.2

Ensemble

0.2

Ensemble

0.1

Ensemble Mean

0.1

Ensemble Mean

0

0 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 POFD

1

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

POFD

Fig. 11 ROC curves for four lead times for the Três Marias inflow using a threshold of 1400 m3/s

Figure 12 shows the results of Brier score (BS) for detection of 1400 m3/s floods in the Três Marias inflow. Lower BS values indicate better forecasting performance in terms of threshold detection. The main information given by Fig. 12 is that BS is generally lower for the ensemble than for the ensemble mean, which means that threshold exceedance is usually better detected by the full ensemble forecast, if compared to the use of the ensemble mean as a consensus deterministic forecast. The errors of squared probabilities vary from 0.02 (lead time 48 h) to 0.10 (lead time 360 h) using the full ensemble, while using the ensemble mean errors vary from 0.02 (lead time 48 h) to 0.12 (lead time 360 h). And in the early lead times until 48 h, results are very similar between both forecasts due to the greater dependence of the system to observed conditions and lower uncertainty in the QPF. Figure 13 shows the forecast convergence score (FCS) results for detection of 1400 m3/s floods in the Três Marias inflow. This metric describes the consistency of two sequential forecasts in terms of threshold detection (Pappenberger et al. 2011). Values near to zero indicate more consistent decisions. The FCS, as discussed by Pappenberger et al. (2011), is therefore not a performance metric, as it does not use

16

C.E.M. Tucci et al.

Fig. 12 Brier score analysis to the Três Marias inflow using a threshold of 1400 m3/s

Fig. 13 Forecast convergence score analysis to the Três Marias inflow using a threshold of 1400 m3/s

comparisons with observations. But it is a descriptive metric that addresses a useful characteristic of forecasts, which is the capability of issuing consistent consecutive forecasts.

Hydropower Forecasting in Brazil

17

This characteristic is especially important in reservoir short-term operations, when the reservoir is also addressed for flood control, which is the case of Três Marias reservoir. More consistent forecasts could lead to less inconstant decisionmaking concerning water releases or storage for preventing damages and maximizing power production. Results obtained for the FCS indicate that for early lead times, between 1 to 2 days, results are very similar for the full ensemble and the mean. From this lead time on the FCS of the ensemble is always smaller than the FCS of the ensemble mean. This means that the full ensembles are more consistent in their indications than the deterministic forecasts. It is important to mention that, in terms of BS and FCS, ensemble forecasts can be issued for intermediate probabilities of occurrences, while deterministic forecasts can only be issued for binary decisions (occurrence or not occurrence). Despite results already suggesting that full ensembles have better performance and are more consistent, there is also one additional value in such probability-based results. Given the availability of probabilities in the future, a hydropower reservoir operation strategy could make use of this information for setting the decisions for the early operation lead times considering a scenario described by multiple probable futures. Putting it another way, it is possible to make a weighted decision that would allow a satisfactory operation considering all possible futures. A deterministic forecast, on the other hand, generally leads to a single decision, and if, given all future uncertainties, something deviates from the predicted future, abrupt maneuvers of spillway gates may be necessary to offset the impact of the previously incorrect decisions. Figure 14 shows the reliability diagram for detection of 1400 m3/s floods in the Três Marias inflow. The reliability diagram shows the conditional bias of the forecasts, comparing the forecasted probability (P(F) in the x-axis) with the conditional observed frequency (P(O|F) in the y-axis). A sample count (sharpness) histogram for forecasted probabilities partitioned in five classes is also shown below in each diagram. In the sample count diagrams, the y-axis is plotted in a logarithmic scale, also done by Hamil et al. (2008), for better visualization of the number of samples in each class. For the lead time of 48 h, the reliability diagram indicates a good calibration, although also a small number of samples were issued in the intermediate classes of probabilities. This may influence the results, but can be considered normal for early lead times, where the spread of the members is small. For the lead time of 120 h, the forecasts also show a good calibration, with the issued line near to the perfect 45  line. For example, when forecasts were issued for the threshold with a probability of 0.7, the event was observed with a frequency near to 0.6. In terms of sample count, small probabilities had a greater number of occurrences than others. But all the classes had more than 30 samples. For the two longer lead times, results show a behavior generally related to a positive conditional bias in the forecasts. This means that the forecasted probabilities were higher than the conditional observed frequencies. For example, in the 240 h lead time, when forecasts were issued with a probability of 0.7, the event was

18

C.E.M. Tucci et al.

Reliability Diagram - 1400m³/s - 48h

Reliability Diagram - 1400m³/s - 120h 1

0.9

Perfect

0.8

Ensemble

0.9

Perfect

0.8

Ensemble

0.7

0.7

0.6

0.6 P(O|F)

P(O|F)

1

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 P(F)

0

1

Sample Count - 1400m³/s - 48h

1

Sample Count - 1400m³/s - 120h

1000 100 10 1

1000 100 10 1

1

2

3 Bins of P(F)

4

5

1

2

3 Bins of P(F)

4

5

Reliability Diagram - 1400m³/s - 240h

Reliability Diagram - 1400m³/s - 360h

1

1

0.9

Perfect

0.9

Perfect

0.8

Ensemble

0.8

Ensemble

0.7

0.7

0.6

0.6

P(O|F)

P(O|F)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 P(F)

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

P(F)

P(F)

Sample Count - 1400m³/s - 240h

Sample Count - 1400m³/s - 360h

1000 100 10 1

1000 100 10 1

1

2

3 Bins of P(F)

4

5

1

2

3 Bins of P(F)

4

5

Fig. 14 Reliability diagram analysis to the Três Marias inflow using a threshold of 1400 m3/s

observed with a frequency near to 0.4. And in the 360 h lead time, when forecasts were issued with a probability of 0.7, the event was observed with a frequency near to 0.35. This indicates that for these longer lead times, the system overestimates the probability of event occurrence, requiring caution when the spread of the members is

Hydropower Forecasting in Brazil

19

used as an uncertainty measurement. In these cases, the sharpness histograms always indicated more than hundred samples for all classes. Overall, metric assessment results indicate that the evaluated hydrological forecasting system has results similar to observed reality. However, increasing uncertainty is expected with lead time, and the results with lead times longer than 10 days may have a quality that is not desirable for the system. However, shorter lead times’ results are useful from a reservoir operation point of view, since errors are in an acceptable range and the true alarm–false alarm trade-off given by the ROC curves has acceptable rates. In terms of comparison between deterministic forecasts given by the ensemble mean as a consensus, the ensemble forecasts indicate benefits in terms of Brier score and in terms of consistency given by the forecast convergence score (FCS). And in terms of reliability, the system presents a good calibration until 5 days of lead time. Beyond this lead time, more caution should be taken in the use of probabilities for threshold exceedance. More spread on early lead times of the forecasting system (measured by the rank histograms) could be obtained, if the system also made use of a hydrological ensemble, not only a meteorological one. However, it can be said that the most important problems identified on early lead times are the ones related to lack of observed information and to low quality of inflow observations obtained by reservoir water budget. Work to minimize the importance of these errors can be related to including more data sources in the forecasting system, such as radar or satellite rainfall information, and also trying other methods of data assimilation that considers uncertainties in the input data. Finally, it is worth noting that the hydrological model calibration period and forecasting test period partially overlap. This potentially leads to relatively higherquality results that could be obtained in operational applications.

4

Conclusion

This case study presents an assessment of a short-term ensemble forecasting system to predict inflows to a major Brazilian hydropower reservoir. Hindcasting experiments were conducted for the period from 2008 to 2012 using data from a global numerical weather prediction model of the Brazilian meteorological center CPTEC, in combination with a large-scale hydrological model largely used in Brazil for hydrological forecasting. Results for the São Francisco River are encouraging in terms of ensemble applicability. It is believed that ensemble inflow forecasts to major reservoirs in Brazil will be used in the near future as input to the chain of models used for the optimization of the national electric power producing system, due to results shown here and in other references (Fan et al. 2014, 2015a, b; Schwanenberg et al. 2015). The forecasting system framework described here is currently being tested in other river basins in Brazil, including the Tocantins River, located in the Northern region of Brazil, and the Uruguay River, located in the South.

20

C.E.M. Tucci et al.

Results also show that forecasts can be improved by adopting a scheme to consider the uncertainty in initial conditions of the hydrological model and in the observed data. Another improvement could be obtained by increasing the number of gauging stations with hourly data and telemetric transmission. One meteorological radar that was recently installed in the Southeast of the basin will probably lead to improvements in short lead time forecasts, and the use of satellite information can be interesting for improving the observed rainfall generalization. A benefit from ensemble forecasts that was discussed in this text is the ability to issue for intermediate probabilities of occurrences, while deterministic forecasts can only issue for binary decisions (occurrence or nonoccurrence). And, given the availability of these probabilities in the future, a hydropower reservoir operation strategy could make use of this information for setting the decisions for the early operation lead times considering a scenario described by multiple probable futures. Putting it another way, using ensemble forecasts it should be possible to make a weighted decision that would allow a satisfactory operation considering all possible future uncertainties. This kind of assessment for short-term reservoir operation, where the ensemble forecast probabilistic scenarios are used, can be conducted by multistage stochastic optimization approaches. Such a method was tested by Schwanenberg et al. (2015) for the Três Marias reservoir seeking the objectives of flood control and energy generation in the São Francisco River, with encouraging results.

References P. Bougeault, Z. Toth, C. Bishop, B. Brown, D. Burridge, D.H. Chen, B. Ebert, M. Fuentes, T.M. Hamill, K. Mylne, J. Nicolau, T. Paccagnella, Y. Park, D. Parsons, B. Raoult, D. Schuster, P.S. Dias, R. Swinbank, Y. Takeuchi, W. Tennant, The THORPEX interactive grand global ensemble. Bull. Am. Meteorol. Soc. 91(8), 1059–1072 (2010). 14p A.A. Bradley, S.S. Schwartz, Summary verification measures and their interpretation for ensemble forecasts. Mon. Weather Rev. 139(9), 3075–3089 (2011) J.M. Bravo, A.R. Paz, W. Collischonn, C.B. Uvo, O.C. Pedrollo, S. Chou, Incorporating forecasts of rainfall in two hydrologic models used for medium-range streamflow forecasting. J. Hydrol. Eng. 14, 435–445 (2009) J.D. Brown, J. Demargne, D.-J. Seo, Y. Liu, The ensemble verification system (EVS): a software tool for verifying ensemble forecasts of hydrometeorological and hydrologic variables at discrete locations. Environ. Model. Softw. 25(7), 854–872 (2010) L. Calvetti, Hydrometeorologic ensemble forecasts in the Upper Iguassu river basin using WRF and TopModel. Ph.D. thesis (in portuguese). Instituto de Astronomia, Geofísica e Ciências Atmosféricas. Universidade Federal de São Paulo, 2011. 141p M. Cataldi, C.O. Machado, L.G.F. Guilhon, S.C. Chou, J.L. Gomes, J.F. Bustamante, Análise de Previsões de Precipitação obtidas com a utilização do modelo ETA como insumo para modelos de previsão semanal de vazão natural. Rev. Bras. Recursos Hídricos. 12, 5–12 (2007) S.C. Chou, C.A. Tanajura, Y. Xue, C.A. Nobre, Validation of the coupled ETA/SSiB model over South America. J. Geophys. Res. 107(D20), 8088 (2002) W. Collischonn, C.E.M. Tucci, R. Haas, I. Andreolli, Forecasting river Uruguay flow using rainfall forecasts from a regional weather-prediction model. J. Hydrol. 305, 87–98 (2005) W. Collischonn, D.G. Allasia, B.C. Silva, C.E.M. Tucci, The MGB-IPH model for large scale rainfall-runoff modeling. Hydrol. Sci. J. 52, 878–895 (2007a)

Hydropower Forecasting in Brazil

21

W. Collischonn, C.E.M. Tucci, R.T. Clarke, S.C. Chou, L.G. Guilhon, M. Cataldi, D.G. Allasia, Medium-range reservoir inflow predictions based on quantitative precipitation forecasts. J. Hydrol. 344, 112–122 (2007b) W. Collischonn, A. Meller, P.L. Silva Dias, D.S. Moreira, Hindcast experiments of ensemble streamflow forecasting for the Paraopeba river. Geophys. Res. Abstr. 14, 3596 (2012) W. Collischonn, A. Meller, F. Fan, D.S. Moreira, P.L. Silva Dias, D. Buarque, J.M. Bravo, Shortterm ensemble flood forecasting experiments in Brazil. Geophys. Res. Abstr. 15, 11910 (2013) F.S. Costa, I.P. Raupp, J.M. Damazio, P.D. Oliveira, L.G.F. Guilhon, The methodologies for the flood control planning using hydropower reservoirs in Brazil, in 6th International Conference on Flood Management, ICMF6, São Paulo, 2014 (ABRH, São Paulo, 2014) EPE – EMPRESA DE PESQUISA ENERGÉTICA, Anuário estatístico de energia elétrica 2013. Rio de Janeiro: EPE, 2013. 253 p B.A. Faber, J.R. Stedinger, Reservoir optimization using sampling SDP with ensemble streamflow prediction (ESP) forecasts. J. Hydrol. 249, 113–133 (2001) F. M. Fan, V. A. Siqueira, P. R. M. Pontes, W. Collischonn, Avaliação do Uso de Dados de Precipitação do Merge para Simulação Hidrológica da Bacia do Rio Paraná. In: XVII Congresso Brasileiro de Meteorologia, 2012, Gramado – RS. AVALIAÇÃO DO USO DE DADOS DE PRECIPITAÇÃO DO MERGE PARA SIMULAÇÃO HIDROLÓGICA DA BACIA DO RIO PARANÁ, 2012a F. M. Fan, P. R. M. Pontes, W. Collischonn, L. F. S. Beltrame, Sistema de Previsão de Vazões para as Bacias dos Rios Taquari-Antas e Pelotas, in XI Simpósio de Recursos Hídricos do Nordeste, 2012, João Pessoa PB. Sistema de Previsão de Vazões para as Bacias dos Rios Taquari-Antas e Pelotas, 2012b F.M. Fan, W. Collischonn, A. Meller, L.C.M. Botelho, Ensemble streamflow forecasting experiments in a tropical basin: the São Francisco river case study. J. Hydrol. 519, 2906–2919 (2014) F. M. Fan, W. Collischonn, K. Quiroz, M. V. Sorribas, D. C. Buarque, V. A. Siqueira, Flood forecasting on the Tocantins River using ensemble rainfall forecasts and real-time satellite rainfall estimates. J Flood Risk Manag. v. SI (2015a). doi:10.1111/jfr3.12177 F. M. Fan, D. Schwanenberg, W. Collischonn, A. Weerts, Verification of inflow into hydropower reservoirs using ensemble forecasts of the TIGGE database for large scale basins in Brazil. J. Hydrol.: Reg. Stud. (2015b) (in press). doi:10.1016/j.ejrh.2015.05.012 L.G.F. Guilhon, V.F. Rocha, J.C. Moreira, Comparison of forecasting methods for natural inflows in hydropower developments. Braz. J. Water Res. 12(3), 13–20 (2007) (in Portuguese) T. Hamill, R. Hagedorn, J. Whitaker, Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. part II: precipitation. Mon. Weather Rev. 136(7), 2620–2632 (2008) A.F. Hamlet, D. Huppert, D.P. Lettenmaier, Economic value of longlead streamflow forecasts for Columbia river hydropower. J. Water Resour. Plann. Manage. 128(2), 91–101 (2002) H. Hersbach, Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15, 559–570 (2000) I.T. Jolliffe, D.B. Stephenson (eds.), Forecast Verification: A Practitioner’s Guide in Atmospheric Science, 2nd edn. (Wiley, Chichester, 2012) M.E.P. Maceira, J.M. Damázio, Periodic auto-regressive streamflow models applied to operation planning for the Brazilian hydroelectric system, in Regional Hydrological Impacts of Climatic Change – Impact Assessment and Decision Making. IAHS Publ, vol. 295 (IAHS Press, Wallingford, 2005) M.E.P. Maceira, L.A. Terry, F.S. Costa, J.M. DAMÁZIO, A.C.G. e MELO, Chain of optimization models for setting the energy dispatch and spot price in the Brazilian system, in Anais do XIV Power Systems Computation Conference, session 43, paper 1, Sevilla, 2002 E.P. Maurer, Predictability of Runoff in the Mississipi River Basin. Water Resour. Ser. Tech. Rep, vol. 172 (University of Washington, Seattle, 2002) A. Meller, Short-range ensemble flood forecasting. Ph.D thesis. (In Portuguese) Instituto de Pesquisas Hidráulicas. Universidade Federal do Rio Grande do Sul, 2013. 224p

22

C.E.M. Tucci et al.

A. Meller, J.M. Bravo, W. Collischonn, Assimilação de Dados de Vazão na Previsão de Cheias em Tempo-Real com o Modelo Hidrológico MGB-IPH. Rev. Bras. Recur. Hidr. 17, 209–224 (2012) A.M. Mendonça, J.P. Bonatti, Experiments with EOF-based perturbation methods and their impact on the CPTEC/INPE ensemble prediction system. Mon. Weather Rev. 137, 1438–1459 (2009) R.C.D. Paiva, D.C. Buarque, W. Collischonn, M.-P. Bonnet, F. Frappart, S. Calmant, C.A. Bulhões Mendes, Large-scale hydrologic and hydrodynamic modeling of the Amazon River basin. Water Resour. Res. 49, 1226–1243 (2013) F. Pappenberger, K. Bogner, F. Wetterhall, Y. He, H.L. Cloke, J. Thielen, Forecast convergence score: a forecaster’s approach to analysing hydro-meteorological forecast systems. Adv. Geosci. 29, 27–32 (2011). doi:10.5194/adgeo-29-27-2011 A. R. Paz, W. Collischonn, C. E. M. Tucci, R. T. Clarke, D. G. Allasia, Data assimilation in a largescale distributed hydrological model for medium-range flow forecasts, in Symposium of the IUGG, 2007, Perugia. Quantification and Reduction of predictive uncertainty for sustainable water resources management (Proceedings of Symposium HS2004 at IUGG2007), (IAHS Press, Wallingford, 2007), p. 471–478 M. Pereira, L. Pinto, Multi-stage stochastic optimization applied to energy planning. Math. Program. 52(1–3), 359–375 (1991) L. Raso, N. Van De Giesen, P. Stive, D. Schwanenberg, P.J. Van Overloop, Tree structure generation from ensemble forecasts for real time control. Hydrol. Process. 27, 75–82 (2013) D. Schwanenberg, F.M. Fan, S. Naumann, J.I. Kuwajima, R.A. Montero, A. Assis Dos Reis, Shortterm reservoir optimization for flood mitigation under meteorological and hydrological forecast uncertainty. Water Resour. Manag. 920–4741, 1–17 (2015). doi:10.1007/s11269-014-0899-1 H. Stanski, L. Wilson, W. Burrows, Survey of Common Verification Methods in Meteorology (World Meteorological Organization, Geneva, 1989) C.E.M. Tucci, P.L.S. Dias, R.T. Clarke, G.O. Sampaio, W. Collischonn, Long-term flow forecasts based on climate and hydrologic modeling: Uruguay river basin. Water Resour. Res. 39(7), 1–2 (2003) C.E.M. Tucci, W. Collischonn, R.T. Clarke, A.R. Paz, D. Allasia, Short- and long-term flow forecasting in the Rio Grande watershed (Brazil). Atmos. Sci. Lett. 9, 53–56 (2008) D. Wilks, Statistical Methods in the Atmospheric Sciences (Academic, Amsterdam/Boston, 2006) W.W.-G. Yeh, L. Becker, R. Zettlemaoyer, Worth of inflow forecast for reservoir operation. J. Water Resour. Plann. Manage. 108(WR3), 257–269 (1982)

Probabilistic Shipping Forecast Dennis Meißner and Bastian Klein

Contents 1 Introduction: Inland Waterway Transport in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Hydrological Impacts on Shipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Hydrological Forecasts for Inland Waterway Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Utilization and User Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Components of Hydrological Forecasting Systems for Shipping . . . . . . . . . . . . . . . . . . . . . 8 3.3 Demonstrating the Added Value of Probabilistic Forecasts for Navigation . . . . . . . . . . 9 4 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Abstract

Inland waterway transport is an important even though often neglected economic sector relying on hydrological forecasts in order to increase its operating efficiency. Besides river ice and floods, low stream flow is the main hydrological impact for shipping along the European inland waterways as it sustainably affects the load capacity of vessels and thus transportation costs several times a year. For this reason, inland waterway transport benefits from water-level forecasts in order to take preventive action and adjust the draft, especially during stream flow droughts. Although most navigation-related water-level forecasts are still deterministic, the waterway transport sector is a well-suited customer of probabilistic forecast products for several reasons: The number of decisions to be taken is quite high, especially in comparison to the operation of protection measures against rare flood events. Furthermore, in waterway transport, the user’s costs and losses D. Meißner (*) • B. Klein Department Water Balance, Forecasting and Predictions, Federal Institute of Hydrology (BfG), Koblenz, Germany e-mail: [email protected]; [email protected] # Springer-Verlag Berlin Heidelberg 2015 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_58-1

1

2

D. Meißner and B. Klein

associated with possible forecast-based decisions are well known and monetary valuation of losses, like nonoperation times or additional effort due to lighterage, is more feasible as it is for example with regard to human lives or environmental pollution. Last but not least shipping is an inhomogeneous stakeholder as different vessel types and routes cause different cost structures and sensitivities due to navigation conditions. Selecting one “best-guess” forecast, being optimal for all users, is impossible. In this chapter, hydrological forecasts as one component to support inland waterway transport are presented and the added value of probabilistic forecasts is demonstrated applying a simulation based cost model for the River Rhine, being one of the world’s most frequented inland waterways.

Keywords

Cost structure model • Flood • Inland waterway • Inland waterway transport • Low stream flow • Navigation • River ice • River Information Service (RIS) • River Rhine • Seasonal forecast • Shipping • Traffic • Water-level forecast

1

Introduction: Inland Waterway Transport in Europe

Inland navigation has a long history of providing safe and environment-friendly transport. As one of the modes of transport – besides road, railway and air – numerous types of cargo are carried by inland waterway vessels. The European inland waterways offer a more than 40,000 km network of canals, rivers, and lakes connecting cities and industrial regions across the continent. Some 18 out of the 28 European member states have inland waterways, most of them being part of interconnected waterway networks (European Union 2013). The European waterway network is particularly dense in the north-western part of the continent where the large waterways (especially Rhine, Danube, Elbe) in combination with their tributaries and canals enables inland shipping to reach many destinations and, for example, to travel from the North to the Black Sea. All important industrial areas in Western Europe as well as the major sea ports (like Rotterdam, Antwerp, Hamburg) are accessible to inland shipping vessels (Fig. 1). In light of continuing transport growth within the European Union – freight transport performance is expected to grow up to 32 % by 2020 and up to 50 % by 2030 (Petersen et al. 2009) – there is a need to use the free capacity inland navigation offers more consequently in order to release the overloaded road and railway networks. As currently freight transport by inland waterway accounts for approximately 5 % of all freight transport in the European Union, it is a political objective to promote and strengthen the position of inland waterways within the overall European transportation network in order to increase its modal share (European Union 2013). One main disadvantage of inland waterway transport is the comparatively long time of travel. A trip from Rotterdam to Basel by ship takes approximately 5 days, whereas the travel times of the competing transport modes are just a

Probabilistic Shipping Forecast

3

Fig. 1 Important inland waterways in Central Europe (Modified source: German Federal Ministry of Transport and Digital Infrastructure, Fachstelle f€ ur Geoinformation S€ ud, Regensburg, Germany)

small fraction of these. Therefore, inland waterway transport is particularly suitable for the transport of large quantities of cargo, especially bulk cargo and containers. A standard inland shipping vessel with a length of 135 m can carry up to 3,800 t, which is equal to nearly 150 lorries. As with any transportation system, reliability and availability is the main driver to use inland waterway transport as traffic carrier. Unscheduled or excessive scheduled delays are one of the key factors for determining reliability of the transport system (International navigation association 2002). Although the inland waterways offer a nearly congestion-free network (sometimes standby time occurs due passing sluices), the ease, safety, and efficiency of inland waterway transport are sensitive to hydrological impacts. Therefore inland waterway transport is one of the economic sectors, like hydro power or agriculture, relying on hydro-meteorological and hydrological forecasts in its short- to long-term business planning and decision making (Moser et al., 2012).

2

Hydrological Impacts on Shipping

The main hydrological hazards concerning inland waterway transport in Europe are low stream flows, floods, and river ice (Nilson et al. 2012). While river ice and floods limit the operational availability of waterways due to closing affected stretches, low stream flows do not cause abandoning of navigation, but it leads to restrictions of

4

D. Meißner and B. Klein

load carrying capacity and as a consequence to higher operation costs. The impacts are of different relevance depending on climate conditions as well as characteristics of the waterway. River ice primarily occurs in waterways with low or nearly no flow velocities (canals, impounded rivers) in areas with low air temperature over longer periods (like Scandinavia and Eastern Europe). Besides blocking the waterway and interrupting its trafficability, ice run can damage vessels and harm technical structures, like weirs, locks, or harbor facilities (Ashton 1986; Carstensen 2008). Besides this purely economic impact, river ice might cause so-called ice jam floods by restraining the discharge which causes the water-level of the jam to rise significantly (Beltaos 1995). Therefore, river ice forecasts, indicating affected stretches as well as estimating ice thickness, are a valuable information in order to coordinate the operation of icebreakers (e.g., pooling the vessels at hot-spots), trying to clear the waterway as long as possible, as well as to take into account limitations of waterway availability (e.g., shifting transport to another mode). The fact that rive ice has a significant impact on inland navigation is indicated by Fig. 2 (top part), showing the number of days with suspension of navigation due to river ice at the waterways Odra along the German-Polish border and Main-Danube canal connecting River Rhine an River Danube and therefore the North with the Black Sea. In most parts of Europe, river ice is relevant for shipping over a limited period of the year, whereas the water-level – high as well as low – is the hydrological parameter influencing navigation most time of the year. Therefore, this chapter focuses on water-level forecasts for inland waterways. High water-levels affect rivers, regulated as well as free flowing, whereas floods are normally not relevant for canals due to missing natural inflows. Restrictions related to floods depend on the absolute water height as above a given level, river traffic is halted (e.g., Donaukommission 2005). In addition to the protection of the infrastructure the security of navigation is the main motivation, because high flow velocities occurring during floods reduce the maneuverability of the vessels traveling downstream. Additionally the guaranteed clearance below bridges might become too low and limits the possible layer of containers. Therefore, water-levels are not solely relevant for the flood protection community but also for navigational user (Belz et al. 2013). Although the duration and frequency of occurrence of floods is significantly lower than low flows (see Fig. 2, bottom part), floods could cause relevant costs with regard to inland waterway transport. For example, the big floods in 1993 and 1995 caused costs due to failed proceeds of more than 25 million Euros in the international Rhine basin (International Commission for the Hydrology of the Rhine basin, 1999). Figure 2 (bottom part) shows the high variability of restrictions due to high and low stream flows between different years at the station Vienna/Danube using the highest navigable discharges (HSQ) and the (low) flow value that is exceeded in 95 % of the days within a reference period (FlowDurationCurve_Q95 %) as relevant indicators. Restrictions caused by low stream flows occur in free flowing waterways, as discharges and water-levels are correlated and the interannual flow regime leads to corresponding water-level conditions. In canals and impounded rivers, the waterlevel is determined artificially and therefore just indirectly affected by hydrometeorological drivers. Here, water-levels might be affected during low flows

Probabilistic Shipping Forecast

5

Fig. 2 top part: Suspension of navigation on the Main-Danube Canal and the Odra along the German-Polish border in the period 1971–2007 (left); bottom part: Annual number of days above the high-flow threshold HSQ (blue columns) and below the low-flow indicator FDC_Q95 (red columns) for the station Vienna/Danube in the period 1951–2007

when the operation rules of the canal/weirs do not longer allow for abstraction or retention of water. Unlike floods, there is no threshold beyond which navigation is prohibited due to low stream flows. It is the responsibility of each vessel’s skipper to decide whether it is possible to travel within a given section of the waterway despite the reduced water depth. So, it is an individual evaluation of risk (in terms of safety

6

D. Meißner and B. Klein

and cost-effectiveness of the transport) given the intensity of the low flow situation, the ship as well as the cargo type and the destination of the transport. Low waterlevels reduce the load factor of inland waterway vessels and thereby increases costs per transported unit (cost per ton). At the same time, the danger of ship-grounding or ship-to-ship collisions increases due to reduced depth and width of the fairway. As low flow situations occur more often than floods and as they are relatively long lasting, they are regarded as the major threat to the reliability of inland waterway transport (see Fig. 2, bottom). The estimated damage in shipping caused by the extreme drought 2003 was, for example, about 91 million for the Rhine basin (Jonkeren et al. 2007).

3

Hydrological Forecasts for Inland Waterway Transport

3.1

Utilization and User Requirements

Water-level as well as river ice forecasts are a fundamental part of many River Information Services (RIS). Those services provide harmonized information in order to support traffic and transport management in inland waterway transport as part of the intermodal transport chain (International navigation association 2002). At the same time, RIS aims at increasing safety of transport and reducing accidents. The main use of hydrological forecasts aiming at inland waterway transport is to optimize the amount of cargo a ship can take before it starts its trip. That is why the water-level, which directly determines the available water depth, is the most important forecast parameter. Figure 3 gives an example on how operational water-level forecast published via RIS are used by the waterway transport sector. Figure 3 shows the number of accesses per day (black dots) on the forecast for the station Kaub, which is one of the main bottlenecks of the international waterway Rhine (Fig. 3, right), published via the German RIS-component ELWIS (www.elwis. de). It is obvious that decreasing water-levels increase the need for water-level forecast information. Also in case of water-levels tending to the highest navigable level (indicated as “HSW” in Fig. 3), more users are interested than in case of medium water-levels offering sufficient water-depths to fully load the vessels. The link between user demand and economic sensitivity is visible in Fig. 3, too. The transport costs increase with decreasing water-levels as well (blue dots), initially moderate, subsequently exponentially. At high water-levels, large-sized vessels have advantages (economy of scale), which inverses to disadvantages compared to smaller vessels at low fairway depths. Although the sensitivity of the transport costs differs as a function of the absolute water-level and the vessel type, hydrological shipping forecasts are in demand along the complete range of water-levels as waterway vessels are operated throughout the whole year (despite of suspension time, e.g., due to river ice or floods). Therefore,

Probabilistic Shipping Forecast

7

Fig. 3 Left part: Number of daily users of the operational low flow forecast for the River Rhine at station Kaub (black dots) and simulated transport costs (blue dots) both plotted against the absolute water-level, right part: schematic overview of the River Rhine waterway and the fairway depths guaranteed

availability and soundness are basic demands for navigation-related forecasts. This requires a high and stable forecast quality over the complete water-level range. As users in the waterway transport sector have to take a large number of decisions over the year based on hydrological forecasts, they benefit from the “law of large numbers,” which means that single wrong decisions are compensated by a majority of good choices. This aspect is also relevant with respect to probabilistic forecasts. According to probability theory, decisions based on (the expected value of) numerous probabilistic forecasts tend to result in an optima on the long-term, although several single decisions might produce losses. Besides the absolute height, the timing of forecasted water-levels is also important for transport usage, because the skipper calculate quite exactly when they will reach load limiting locations along their trip, which is the point in time the forecasted water-level is particularly relevant for. That is why the vast majority of waterway user is interested in the shape of the water-level hydrograph instead of just looking at the forecast overtopping or dropping below single threshold levels. As especially during low stream flows the water-levels are sensitive to the height and shape of the river bed, it is important for shipping forecasts to take current bed levels into account, especially within morphodynamically active river stretches, like Elbe or the Lower Rhine. Forecast lead times should be equal to the travel time of the vessels or at least cover the period the ships need to pass the main bottlenecks of a waterway leaving the lading port. Normally this takes several days. If forecast lead times are too short the forecasts are not completely usable to optimize the load before starting the trip

8

D. Meißner and B. Klein

and cargo capacity is wasted or the ships are overloaded and they have to dump cargo on their way leading to additional costs (due to unloading, stocking, further transport via truck or rail, etc.). The required forecast frequency is relatively low when compared to flood forecasts for example. The water-level dynamic is lower than in case of floods, and the navigational users are not able to adjust the load of their vessels every time a new forecast is published anyway. The skipper has to take his decision on the load carrying capacity at a specific point in time before leaving the port. Most navigation-related forecasts are issued once a day.

3.2

Components of Hydrological Forecasting Systems for Shipping

Forecasting systems for shipping are quite similar to those for flood forecasting or power production. The systems are a cascade of different models which is driven by hydro-meteorological forecasts. Precipitation and air temperature are the meteorological variables used most often. Precipitation directly influences discharges and as a consequence water-levels along the rivers. Air temperature is relevant as it influences the snow processes as well as evaporation, but of course all processes related to river ice almost completely depend on the air temperature. These hydrometeorological inputs are transferred into discharges by a hydrological model covering the whole basin of the waterway of interest. As navigational users are primarily interested in water-levels, a hydrodynamic model, normally one-dimensional, representing the waterway itself with all its relevant structures (e.g., groins, weirs), is part of the forecasting system. Quite often forecast locations relevant for waterway transport are affected by backwater effects or the regulation of structures, which could be represented within the hydrodynamic model adequately. In most cases, the hydrodynamic model is the computational most demanding hydrological component, especially when using ensemble forecasts. The hydrodynamic models require a comparatively high spatial (one grid point every 100–500 m) and temporal resolution (less than 1 h) in order to guarantee numerical stability and reliable results. The hydrodynamic model has to contain current data of the river morphology, as it influences the calculation of the water-level. So, the frequency of model updates is relatively important and comparatively high when compared to hydrological models. The models within a forecasting system for shipping are not calibrated solely focusing on a specific discharge or water-level range (e.g., floods), but the aim is to find a balanced model-setup giving satisfying forecast results over the whole hydrological range. The quality measures used to evaluate calibration have to allow for these characteristics. For example, care should be taken that single major discrepancies of flood peaks are not overrated in the quality index used. On the other hand, the amount and in most cases also the quality of training data in

Probabilistic Shipping Forecast

9

order to calibrate the models is significantly larger as if the focus is on extreme events only.

3.3

Demonstrating the Added Value of Probabilistic Forecasts for Navigation

Shipping forecasts have value if the users are able to base upon decisions that reduce the transport costs and therefore maximize their profit over the years. There are several methods available to estimate the economic value of meteorological or hydrological forecasts (Katz and Murphy 1997; Roulin 2007). Simulation-based cost modeling is an approach to assess the economic value in an objective way with close relation to practice. It is beyond question that hydrological forecasts are a valuable information for inland waterway transport. The following examples of the application of a cost structure model demonstrate that forecast uncertainty information is able to yield an additional economic benefit for the inland waterway transport sector using the example of the Rive Rhine. The River Rhine is one of the world’s most frequented waterways and the backbone of the European inland waterway network. Approximately 600 vessels are passing the Dutch-German border per day. A cost structure model considers various influences and cost components of inland waterway transport. The relevant expense factors are composed on the one hand of fixed costs, like capital costs for investment and insurance and labor costs depending on ship size and operation mode. On the other hand, the variable operation costs have to be considered, which depend on the navigation conditions, mainly water depth and hydrodynamic properties of the vessels. The model considers the impact of water-levels and stream velocities on load carrying capacity, power demand, speed and the resulting fuel consumption. Of course, the water-level determines the maximum draft and therefore maximum payload, but also fuel consumption and possible speed are influenced significantly. In order to analyze the effect of water-level forecasts on the performance of inland waterway transport, a deterministic forecast and three water-level quantiles (0.25-quantile, 0.5-quantile, 0.75-quantile) of a probabilistic forecast are coupled to a cost structure model. For each forecast variant, cost analysis simulations have been performed for seven representative vessel types traveling on three origin-to-destinations (starting point: the port of Rotterdam) in upstream direction for a period of approximately 4.5 years (1530 forecasts). Within the model each day of the time span a ship of each type is traveling to all of the three destinations as far as the waterway isn’t closed due to a flood (Holtmann and Bialonski 2009; Bruinsma et al. 2012). Based on these results Fig. 4 compares the best probabilistic variant with the deterministic forecast for five vessel types and the three routes of different lengths and waterway characteristic (see Fig. 3, right panel). Negative values indicate lower costs than using the deterministic forecast. The analysis shows that probabilistic forecasts have the potential to improve decisions for waterway transport with relation to costs. Even a cost reduction of just a few percentage points per ton leads to relevant effects due to the large amounts of tons transported per year. It is

10

D. Meißner and B. Klein

Fig. 4 Cost differences between deterministic and probabilistic forecast for five vessels and three routes based on cost structure modeling for the River Rhine

also visible that the longer the trip and therefore the longer the required lead times, the higher the sensitivity of the hydrological forecast variant becomes. The probabilistic forecasts become more and more advantageous – also in economic terms – with increasing uncertainty. Furthermore, larger vessels show higher potential to profit from probabilistic forecasts. Figures 5 and 6 illustrate two single forecasts out of the 1530 forecasts analyzed with the cost structure model for the trip from the port of Rotterdam to the industrial region of Rhine – Neckar near Mannheim. The bottleneck near Kaub is passed after 3 days, which is therefore the relevant lead time to determine the load capacity. Figure 5 demonstrates a positive example of the probabilistic forecast, which is defined by the area between the 5 %- and 95 %-quantiles. The uncertainty range is relatively narrow, and the measured values (black squares) are quite close to the median (blue line). The deterministic forecast (red line) underestimates the waterlevel from day one onwards, with the consequence that less cargo than possible is carried (up to nearly 150 t for large-sized vessels). The effects on transport costs are depicted in Fig. 5 as well. As for every probabilistic forecast and also for navigation-related forecasts, it is disadvantageous if the range of forecasted water-levels does not cover the measured values. Figure 6 shows such a forecast bust, as the forecast clearly overestimates the water-level rise at the station Kaub for the coming days. Subsequently skippers tend to overload their vessels and they have to lighter surplus goods or wait until the water-level raises high enough to pass the bottleneck along their trip (although safety margins already included). The commercial consequences depend on the size of the vessel, but on average the additional costs due to lighterage represent more than half of the total transport costs per ton, for smaller ships significantly more.

Probabilistic Shipping Forecast

11

Fig. 5 Example of a successful probabilistic forecast with a tight uncertainty range

4

Outlook

The growth of transport requires added shifting of cargo on the inland waterways with its vast capacities in order to relieve roads and railways. To trap the full potential of inland waterway transport as an environmentally friendly and energy

12

D. Meißner and B. Klein

Fig. 6 Example of a forecast bust – the measured water-levels are completely outside the uncertainty range (top), transport costs including charges due to excess load based on different water-level quantiles for three vessel types (bottom)

efficient transport mode, the importance of hydrological shipping forecasts will increase in the future. The effort to incorporate water-level forecasts into River Information Services will go on and have to be extended to probabilistic forecast products as they offer additional benefit to navigational user.

Probabilistic Shipping Forecast

13

A second challenge is the development and design of medium-range up to seasonal forecast products for waterway transport. Additional lead-time is needed in order to strengthen the inclusion of waterway transport within multi-modal transport chains. Seasonal forecasts with lead times of several months have great potential to become a valuable tool for the optimization of waterborne transport and the medium- to long-term waterway management. Extended lead times offer the possibility to optimize the fleet structure of shippers as well as the stock management of enterprises. By taking into account periods with above- or below-average waterlevels for the coming months, the timing of transport could be rescheduled to an earlier or later date or multiple smaller ships could be ordered in times of lower water-levels to execute the transport efficiently. Of course, seasonal forecast have to be probabilistic to be useful and communicated/explained to end-users in an adequate way. Last but not least the (probabilistic) forecast of additional parameter relevant for navigation along the inland waterways is a matter of development. In addition to optimize the amount of cargo a ship can take before it starts, fuel consumption can be minimized by following the optimal track based on the flow velocity and using the optimal ship speed in relation to the water depth and the desired time of arrival. So, hydrological forecasts for shipping could be used to set up advanced trip advisors, which gives the optimal track and ship speed in addition to the maximum load capacity or container height for which it is possible to pass the critical points on the particular route

5

Conclusion

Switching from deterministic to probabilistic information is beneficial also for navigation-related water-level forecasts. This is particularly true if longer leadtimes up to monthly and seasonal scales are required. But as analyses based on the coupling of probabilistic water-level forecasts with a cost-structure model for the River Rhine demonstrate, probabilistic forecast information could be vital for short- to medium-range forecasts, too. Of course, forecast busts are not excluded, but in the end probabilistic forecasts enable the waterway user to take decisions on an objective and consistent basis according to their own individual sensitivity to the occurrence or nonoccurrence of a hydrological event. Publishing one “best-guess forecast” is out of reach as inland waterway transport covers a wide range of stakeholders with extremely different sensitivities related to navigation condition. The essential transition from deterministic to probabilistic forecasts is a common process of several years for forecast providers and forecast users. And only if we succeed to transform the undisputed theoretical advantage of probabilistic forecasts into practical use, establishing probabilistic shipping forecasts will go beyond being a purely academic exercise.

14

D. Meißner and B. Klein

References G.D. Ashton, River and Lake Ice Engineering (Water Resources Publications LLC, Littleton, 1986) S. Beltaos, River ice jams (Water Resources Publications LLC, Littleton, 1995) J. Belz, N. Busch, M. Hammer, M. Hatz, P. Krahe, D. Meißner, A. Becker, U. Böhm, A. Gratzki, F.J. Löpmeier, G. Malitz, T. Schmidt, Das Juni-Hochwasser des Jahres 2013 an den Bundeswasserstraßen – Ursachen und Verlauf, Einordnung und fachliche Herausforderungen. Korrepondenz Wasserwirtschaft (2013). doi:10.3243/kwe2013.001 F. Bruinsma, P. Koster, B. Holtmann, E. van Heumen, M. Beuthe, N. Urbain, B. Jourquin, B. Ubbels, M. Quispel, Consequences of climate change for inland waterway transport. Deliverable 3.3 Consequences of climate change for inland waterway transport (2012). http://www. ecconet.eu/deliverables/ECCONET_D3.3_final.pdf (last access: 11/07/2016) D. Carstensen, Eis im Wasserbau – Theorie, Erscheinungen, Bemessungsgro¨ßen. Dresdner Wasserbauliche Mitteilungen, vol. 37 (TU Dresden, Dresden, 2008). ISBN 978-3-86780-099-0 Donaukommission, Lokale Schifffahrtsregeln auf der Donau (Sonderbestimmungen) (Budapest, 2005) http://www.danubecommission.org/uploads/doc/publication/Lokale_Schifff_Reg/ Lokale_Schifffahrtsregeln.pdf (last access: 11/07/2016) European Union, EU transport in figures – statistical pocketbook 2013 (2013). doi:10.2832/19314 B. Holtmann, W, Bialonski, Einfluss von Extremwassersta¨nden auf die Kostenstruktur und Wettbewerbsfa¨higkeit der Binnenschifffahrt, ed. by BMVBS Tagungsband KLIWAS – Auswirkungen des Klimawandels auf Wasserstraßen und Schifffahrt in Deutschland. 1. Statuskonferenz am 18. und 19. März 2009 (Bonn, 2009) International navigation association, Vessel traffic and transport management in the inland waterways and modern information systems. Report of Working Group 24 – INCOM (2002). ISBN 2-87223-124-2 O. Jonkeren, P. Rietveld, J. van Ommeren, Climate change and inland waterway transport: welfare effects of low water levels on the river Rhine. J. Transp. Econ. Policy 41(3), 387–411 (2007) R.W. Katz, A.H. Murphy, Economic value of weather and climate forecasts (Cambridge University Press, Cambridge, 1997). ISBN 9780521435710 H. Moser, J. Cullmann, S. Kofalk, S. Mai, E. Nilson, S. Rösner, P. Becker, A. Gratzki, K.J Schreiber, in An integrated climate service for the transboundary river basin and coastal management of Germany, ed. by WMO. Climate ExChange, (2012), ISBN 978-0-9568561-4-2 E. Nilson, I. Lingemann, B. Klein, P. Krahe, Impact of hydrological change on navigation conditions. Deliverable 1.4 ECCONET – Effects of climate change on the inland waterway transport network (2012) M.S. Petersen, R. Enei, C.O. Hansen, E. Larrea, O. Obisco, C. Sessa, P.M. Timms, A. Ulied, Report on Transport Scenarios with a 20 and 40 year Horizon, Final report TRANSvisions project, (Copenhagen/Denmark, 2009) Internationale Kommission f€ ur die Hydrologie des Rheingebietes, Eine Hochwasserperiode im Rheingebiet – Extremereignisse zwischen Dez. 1993 and Febr. 1995 (1999). ISBN 90-7098028-2 E. Roulin, Skill and relative economic value of medium-range hydrological ensemble predictions. Hydrol. Earth Syst. Sci. 11, 725–737 (2007)

Probabilistic Inundation Forecasting A. Mueller, C. Baugh, P. Bates, and F. Pappenberger

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Numerical Weather Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Hydrological Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 Hydraulic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.5 Forecast Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Event Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Model Framework Settings and Simplifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1 Probability Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.2 Forecast Skill Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

A. Mueller (*) Geography and Environmental Science Department, University of Reading and European Centre for Medium-Range Forecast (ECMWF), Reading, UK e-mail: [email protected] C. Baugh (*) European Centre for Medium-Range Forecast (ECMWF), Reading, UK e-mail: [email protected] P. Bates (*) Department of Geography, School of Geographical Sciences, University of Bristol, Bristol, UK e-mail: [email protected]; [email protected] F. Pappenberger (*) European Centre for Medium-Range Weather Forecasts, Reading, UK e-mail: fl[email protected] # Springer-Verlag GmbH Germany 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_59-1

1

2

A. Mueller et al.

Abstract

Many existing operational hydrological ensemble forecasting systems only produce forecasts of river discharge. It is possible to convert discharge forecasts into inundation extents, in particular because there are well-established tools for the estimation of inundation hazard. The basic components of the modeling framework from which to produce inundation forecasts are: (1) meteorological forcing; (2) a hydrological model; (3) a hydraulic model; and; (4) a methodology to derive probabilistic inundation maps. We perform all those steps using the example of the 2013 River Elbe event. We validate the maps of flooding probability against the observations. We stress the importance of the spatial discretization of the digital elevation maps (DEM) and the influence of the resolution of the flood defense topographic features. This study shows that up to 80% of the flooded area along the Elbe in 2013 could have been forecasted to inundate 7 days in advance, using the probabilistic modeling framework proposed. Keywords

Probabilistic forecast • Ensemble forecast • Inundation forecast • Forecast skill

1

Introduction

Prior to a forecasted emergency flooding situation, it is important that responders have access to predictions of flood inundation extent (Wetterhall et al. 2013; Dale et al. 2014). Many existing operational hydrological ensemble forecasting systems only produce forecasts of river discharge (Cloke et al. 2009; Pappenberger et al. 2014; Voisin et al. 2011; Pappenberger et al. 2011; Cloke and Pappenberger 2009); therefore, it is necessary to translate these into forecasts of flood inundation extent. Pappenberger et al. (2005) and Schumann et al. (2013) illustrate that it is possible to convert discharge forecasts into inundation extents, in particular because there are well-established tools for the estimation of inundation hazard (Pappenberger et al. 2012). Therefore, the basic components of the modeling framework from which to produce inundation forecasts are: (1) meteorological forcing, (2) a hydrological model, (3) a hydraulic model, and (4) a methodology to derive probabilistic inundation maps (see Fig. 1). However, floods are by definition extreme events and establishing prediction skill in such cases is notoriously difficult due to the extreme difference against normal conditions. Hence, the evaluation of skill can be undertaken by an assessment of case studies to establish the suitability and usability of such forecasts (Alfieri et al. 2013; Pappenberger et al. 2011). This study evaluates the skill of a probabilistic flood inundation forecast for the June 2013 floods in Central Europe using meteorological forcing from the ECMWF ensemble system, the operational set-up of the European Flood Awareness System (EFAS) to forecast discharges, and the hydraulic model LISFLOOD-FP for inundation predictions.

Probabilistic Inundation Forecasting

2

Methodology

2.1

Modeling Framework

3

The modeling framework in this study, as described above, consists of four components (Fig. 1): (1) meteorological forcing from a numerical weather prediction (NWP) system to provide forecasts of variables such as precipitation, (2) a hydrological model to route precipitation through the catchment and river network providing discharge forecasts at a given river location, (3) a hydraulic model to route the forecasted discharges through the river and floodplain. In this study, the meteorological forcing comes from an ensemble forecasting system; hence, multiple predictions of flood inundation extent will be produced. The fourth modeling component deals with the evaluation and presentation of these probabilistic forecasts.

2.2

Numerical Weather Prediction

Probabilistic meteorological forcing data are taken from the ECMWF Integrated Forecasting System (IFS) ensemble. This is a global forecasting system with one control member and 50 perturbed members according to uncertainties in the initial model conditions and deficiencies in the model. The spatial resolution of this forecasting system is 32 km for the first 10 days lead time, thereafter it is 64 km.

Fig. 1 Schematic procedure for probabilistic flood prediction

4

2.3

A. Mueller et al.

Hydrological Modeling

The meteorological data are used to force a hydrological model in order to produce forecasts of river discharge. In this study, the LISFLOOD hydrological model which is part of the operational EFAS suite was used. LISFLOOD is a distributed, hydrological rainfall-runoff model which simulates canopy and surface processes as well as flow and wave routing in the river channel (Van der Knijff et al. 2010). EFAS provides probabilistic flood forecasts and early warnings, based on discharge predictions, to a network of European national hydrological forecasting institutions and civil protection agencies (Thielen et al. 2009). The hydrological model is run at a 5 km resolution; therefore, the meteorological data are downscaled using a nearest neighbor approach. EFAS has been evaluated in multiple scientific studies and has been proven to provide skillful discharge forecasts for rivers in Europe (Pappenberger et al. 2011; Bogner et al. 2014; Demeritt et al. 2013; Ramos et al. 2013; Alfieri et al. 2012; Bogner et al. 2011).

2.4

Hydraulic Modeling

The LISFLOOD-FP hydraulic model is used to route the EFAS discharge predictions across the floodplain, thus providing forecasts of inundation extent. LISFLOOD-FP simulates the spreading of fluvial or coastal flooding by solving the St. Venant shallow water equations (Bates and de Roo 2000; Bates et al. 2010; Neal et al. 2012) of mass (1) and momentum (2) conservation. The equations in the horizontal direction x are as follows: @A @Q þ ¼0 @t @x     @Qx @ Q2x @h ¼ gA S0  Sf þ þ gA @x A @x @t

(1) (2)

@z is the bed slope, and Sf ¼ AQ2 Rn4=3 is the friction slope. A is the crosswhere: S0 ¼  @x sectional area of the flow, t is time, Qx is the volumetric flow rate in x direction of the flow, ɡ is the gravity acceleration, h is the depth of the flow, n Manning coefficient of friction, z bed elevation, and R is the hydraulic radius. In this study, a version of the code called Lisflood-ACC (or acceleration) was used (Bates et al.2010;  de Almeida and Bates 2013). It solves the Eqs. 1 and 2 2 2

omitting term

@ @x

Q2x A

, which is called the convective acceleration. When solved

numerically the above equations provide a good simulation of wave propagation when flow is subcritical. The discretization in LISFLOOD-FP is achieved on a regular square (raster) mesh by means of a Finite Difference Method with explicit

Probabilistic Inundation Forecasting

5

time-stepping. The solver uses adaptive time-stepping with the Courant-FriedrichsLewy (CFL) number set to 0.7 for stability reasons (de Almeida and Bates 2013).

2.5

Forecast Evaluation

Forecasts of flood extent provided by this model cascade can be evaluated against an observed extent with multiple measures. The possible outcome of a binary (yes/no – wet/dry) event, such as a flood, can be represented using a contingency table (Mason 2003). There are two ways for the forecast to be correct (correct hit (A) and correct rejection (D)) and two ways to fail (false alarm (B) and missed occurrence (C)) – see Table 1. In this case, each cell of the domain can fall into categories A, B, C, or D. We perform such a classification for each pixel of the domain. The forecast performance for the 2013 flood event is measured by the skill scores detailed in Table 2. Another way of evaluating the forecasts is the Brier Score which can present the skill of the probabilistic forecast information as a single value. It is defined as the mean squared error of N samples: BS ¼

N 1X ðp  O i Þ2 N i¼1 i

(3)

where pi  < 0, 1 > is the probability of the forecast of the observation oi = 0 or 1. Our sample size N is the number of pixels in the model. Each pixel or cell has 51 possible 0/1 states; 0 when pixel is dry and 1 when pixel is wet. This way we get the probability pi for each pixel separately (51 being the number of ensemble members): pi ¼

51 1 X ðpixelwet ¼ trueÞ 51 j¼1

Table 1 Contingency table for a flood event Observed

Table 2 Forecast verification measures used

Wet Dry

Name Hit rate (H) False alarm rate (F) Peirce Skill score (PSS) Critical Success Index (CSI)

Simulated Wet A (hit) B (false alarm)

(4)

Dry C (miss) D (correct dry)

Definition A AþC B BþD ADBC ðBþDÞðAþCÞ A AþBþC

¼HF

6

3

A. Mueller et al.

Event Description

Extreme flooding in Central Europe began after several days of heavy rain in late May and early June 2013. Prior to the torrential rains, the SMOS (Soil Moisture and Ocean Salinity satellite mission) showed that soils in Germany reached record levels of saturation due to the effect of the unusually wet spring (Pappenberger et al. 2013). The areas affected included south and east German states along the Elbe, Danube, and their tributaries, leading to high water and flooding along their banks. The area around the city of Wittenberge has been chosen for this demonstration. The flood wave reached the city on 8 June 2013. The inundation extent data (Radarsat-2, 50 m resolution) were provided by the German Center for Satellite Based Crisis Information (DLR-ZKI) (see Fig. 2a). The dark blue areas represent the normal body of water, while the light blue shows the extent of flooding. The next figure (Fig. 2b) presents the bitmap of the body of water obtained from the polygon supplied by DLR-ZKI. The areas circled in green were removed from the mask before the comparison with the simulation results. They represent a lake and inundation not connected directly with the inundation caused by the river Elbe. The area marked with dashed blue line is a bridge and a road leading to it; these were also removed from the comparison polygon, as they do not appear in 100 m resolution Digital Elevation Model (DEM) which is used to run LISFLOOD-FP. The water flows under the bridge, including it in the simulation would mean that it is a dry area obstructing the flow in the river channel.

4

Model Framework Settings and Simplifications

Meteorological predictions from the 51 member ECMWF IFS ensemble for the forecast on 30 May 2013 were used to force the LISFLOOD hydrological model to produce a 10 day 6-hourly ensemble discharge series (Fig. 3) at the location marked with the red cross in Fig. 2a. The date of this forecast was chosen in order to assess the ability of the model framework to forecast the flood extent with a 1 week lead time. The LISFLOOD-FP model domain was set to the same extent as in Fig. 2a (lower left corner ETRS-LAEA coordinates 4417000,3308000), it extended 40 km to the east and 24 km to the north. The input data requirements are a DEM of floodplain topography, definitions of channel geometry, a declaration of Manning’s n friction parameter values across the domain, and flow input and output boundary conditions. The Shuttle Radar Topography Mission – SRTM DEM was used to define the floodplain topography (Fig. 4a). According to the global assessment of the SRTM mission (Rodriguez et al. 2006), the absolute height error in Eurasia is of the order of 6 m. The mission radar did not penetrate the vegetation canopy, so the DEM contains the vegetation and buildings artifacts. Version 4.1 of the CGIAR-CSI (Consultative Group for International Agricultural Research – Consortium for Spatial Information) SRTM DEM (downloaded from http://srtm.csi.cgiar.org/) was used. It was resampled from the WGS84 to the ETRS-LAEA coordinate system at 100 m resolution. Channel

Probabilistic Inundation Forecasting

7

a

b

× 10

6

3.33

3.325

3.32

3.315

3.31 4.42

4.425

4.43

4.435

4.44

4.445

4.45

4.455 × 10 6

Fig. 2 Flood extent on 8 June 2013. Picture courtesy of DLR-ZKI (a), A bitmap of flooded region. White – observed flooding, black – no flooding. Circled areas excluded when comparing with simulations (b)

geometry requires the definition of bed elevation and channel widths along the river course. Since this information was not available to this study, the SRTM DEM was also used to define this input. Manning’s n friction parameter values were defined across the domain (Fig. 4b) for different landcover classes taken from the Corine 2006 dataset (http://www.eea.europa.eu/data-and-maps/data/corine-land-cover-2006-raster-2). The

8

A. Mueller et al. Forecast on 30.05.2013 5000 4500 4000

QEFAS [m3/s]

3500 3000 2500 2000 1500 1000 500 0

1

2

3

4

5

6

7

8

9

10

time [days]

Fig. 3 Members of the EFAS ensemble forecast used as a forcing term. Red line is the mean of the ensemble, green stars are observations

lowest values were declared in the channel, thus allowing for preferential flow in these areas. Flow inputs, as described above, were taken from the results of the LISFLOOD hydrological model within EFAS driven by the ECMWF-IFS ensemble. To account for the lack of channel geometry representation in the SRTM DEM, owing to the inability of the radar signal to penetrate the water surface, the flow inputs were corrected and the mean annual flow was removed before the simulation. This meant that LISFLOOD-FP in this study would be used to route only the flood event flow discharge anomaly, as using the entire discharge would result in the overestimation of flood extent. Flow outputs were declared in the form of free boundary conditions at the northern and western edges of the domain. Finally the soil was treated as impermeable, and it was assumed that no precipitation or evaporation occurred within the floodplain. These input data are available on a global or European scale and were used in an “as-is” way to show if, without modification, they are suitable for producing a probabilistic flooding forecast in this area. Moreover, we also sought to identify what modifications and/or additional information would be needed in order to reproduce the flooding observed in June 2013.

5

Results

5.1

Probability Maps

LISFLOOD-FP was run 51 times for all the members of ensemble forecast. These were independent simulations with different forcing discharges as depicted in Fig. 3. Each run provides the water levels and flood extent over time. Combining those

Probabilistic Inundation Forecasting

9

Fig. 4 Digital Elevation MAP (SRTM 100 m resolution) (a), Manning coefficient (b)

51 realizations, we can estimate a map of probable flooding for this particular forecast, as calculated from Eq. 4, where 0 means the area is never flooded, and 1 means the area is flooded in all 51 cases. In Fig. 5, one can see the result of the simulations run on the original DEM with 100 m resolution compared to the observed flood extent marked with blue polygon line (from DLR-ZKI). Pixels are colored with the probability pi (Eq. 4). The simulation overestimates the area of flooding which is partly due to the low resolution of SRTM DEM, as flood defenses, dykes, and roads are not continuously represented in this data set. Instead, there are gaps in these linear features through which the entire valley can be flooded. For example, see the satellite terrain image of the area contained in yellow rectangle in Fig. 5. The inability of the SRTM DEM to

10

A. Mueller et al.

Fig. 5 Map of flooding probability on 08.06.2013. EFAS forecast from 30.05.2013

resolve such topographic features and the consequent hindrance on flood extent prediction has also been noted in previous studies (Sanders 2007). It was decided to introduce an approximate representation of the flood defenses. The original DEM was modified by increasing the elevation of pixels lying along the course of flood defenses. The locations of flood defenses and dykes were approximated from open access data sources such as Google Earth. The city of Wittenberge is protected by flood defenses, while in peripheral areas elevated road networks acted as dykes, the locations of these are shown by the yellow line in Fig. 6. Flood defense and dyke pixels at these locations were raised by 2 m relative to their neighbors. The decision to use this particular height of approximate flood defenses was based on the average elevation of the roads in Google Earth (yellow lines in Fig. 6). The resulting probability map of inundation can be seen in Fig. 6. Here again, the blue line represents the flood extent observed on 8 June 2013. The flood defenses in this simulation contained water within the banks throughout the majority of the domain. Of the overbank flow which did occur, approximately 80 % was onto the southern bank. Neither of the simulations captured the inundation around the city of Wittenberge itself (northern bank, center of Fig. 2a). Flooding here may have happened due to inundation from the small Stepenitz river which flows along the eastern flank of the city. Flow inputs for this river were not defined in LISFLOOD-FP model.

5.2

Forecast Skill Scores

Both flood predictions discussed above were obtained applying the EFAS ensemble discharge forecast as a forcing term. In the first case, an unmodified SRTM DEM of

Probabilistic Inundation Forecasting

11

Fig. 6 Map of flooding probability on 8 June 2013. EFAS forecast from 3 May 2013. Modified DEM including flood defenses (dashed yellow line)

100 m resolution was used to describe the terrain. In the second case, we decided to include an approximation of the flood defenses, by modifying the original DEM. The spatial resolution and roughness in both cases remained the same. We ran 51 different simulations for both DEMs which provided us with a probability distribution of flooded cells. Next we quantified the skill scores of the two ensemble simulations. Table 3 compares the scores from the simulations with the different DEMs. Q1, Q2, and Q3 are the first (25 %), second (50 %), and third (75 %) quartiles, respectively. The hit rate (H) increased after introducing the flood defenses, this is because the water is kept within the secondary channel and covers more of the in-channel islands within the area of the observed flood. The false alarm rate (F) is small in both cases, despite the fact that the flooding extent is significant in the case without flood defenses. This is likely due to the inclusion of upland, neverflooded areas into the calculation of correctly predicted dry areas (D). Still, the false alarm rate (F) decreased significantly after introducing the flood defenses into the simulation. This resulted in a reduction in forecasted inundation extent and hence a reduction in the number of false-positive cells. Both Peirce and Critical Success Index score values were also increased after the introduction of flood defenses, which reflect the increase in the proportion of correct hits over false positives and misses. The Brier Score for the forecast with the original DEM was equal to BSoriginal DEM = 0.28 and after introducing flood defenses, BSwithdykes = 0.08. It means that the mean squared error of the simulation against observations is reduced when

12

A. Mueller et al.

Table 3 Skill scores distribution, comparison with the observation on 8 June 2013 (DLR-ZKI, Fig. 2b). Q1, Q2, and Q3 are the first, second, and third quartile, respectively H [%] F [%] PSS [%] CSI [%]

Original DEM min,Q1,Q2,Q3,max 62.2, 67.8, 70.6, 74.3, 79.9 17.4, 22.4, 24.3, 27.1, 31.1 43.9, 45.7, 46.4, 47.1, 47.8 18.0, 18.8, 19.2, 19.6, 21.5

DEM with flood defenses min,Q1,Q2,Q3,max 70.0, 73.8, 75.0, 76.1, 78.8 1.5, 3.0, 6.1, 12.1, 21.5 57.2, 63.8, 68.8, 70.3, 71.4 23.5 32.6, 45.1, 55.4, 61.1

modifying the DEM. It is also shown in the Brier Skill Score (BSS) relating the forecast with original DEM and the modified one: BSS ¼ 1 

BSwithdykes BSoriginalDEM

The resulting value of BSS = 0.71 shows that the forecast using the DEM with flood defenses has more skill for the 2013 event than when using the original SRTM DEM.

6

Conclusion

The probabilistic forecasting of flood inundation is important for emergency management. It can give an advanced warning and quantify the probability of flood occurrence. In this chapter, we demonstrate using the example of the June 2013 floods in Central Europe a modeling framework that can derive probabilistic inundation maps. In this framework, meteorological forcing from the ECMWF-IFS ensemble was used to drive the LISFLOOD hydrological model within EFAS to produce discharge forecasts, in turn these forecasts were routed across the floodplain using the LISFLOOD-FP hydraulic model. These ensemble forecasts of inundation extent were then combined to produce a probabilistic forecast which highlights the areas most likely to flood. The forecasted inundation extent is extremely sensitive to the accuracy of the input DEM. Using the original SRTM DEM resulted in the overestimation of inundation, modifying this to include topographic components such as flood defenses reduced this problem and improved the forecast skill. Therefore, the quality of the input DEM will be an important consideration when establishing an operational flood inundation forecast framework. Overall this study shows that up to 80 % of the flooded area along the Elbe in 2013 could have been forecasted to inundate 7 days in advance, using the probabilistic modeling framework proposed.

References L. Alfieri, P. Salamon, F. Pappenberger, F. Wetterhall, J. Thielen, Operational early warning systems for water-related hazards in Europe. Environ. Sci. Policy 21, 35–49 (2012). doi:10.1016/j. envsci.2012.01.00

Probabilistic Inundation Forecasting

13

L. Alfieri, P. Burek, E. Dutra, B. Krzeminski, D. Muraro, J. Thielen, F. Pappenberger, GloFAS global ensemble streamflow forecasting and flood early warning. Hydrol. Earth Syst. Sci. 17, 1161–1175 (2013). doi:10.5194/hess-17-1161-2013 P. Bates, A. de Roo, A simple raster-based model for flood inundation simulation. J. Hydrol. 236, 54–77 (2000) P. Bates, M.S. Horritt, T.J. Fewtrell, A simple inertial formulation of the shallow water equations for efficient two-dimensional flood inundation modelling. J. Hydrol. 387, 33–45 (2010) K. Bogner, H. Cloke, F. Pappenberger, A. de Roo, J. Thielen, Improving the evaluation of hydrological multi-model forecast performance in the Upper Danube catchment. Int. J. River Basin Manag. (2011). doi:10.1080/15715124.2011.625359 K. Bogner, D. Meißner, F. Pappenberger, Korrektur von Modell- und Vorhersagefehlern und Abschätzung der prädiktiven Unsicherheit in einem probabilistischen Hochwasservorhersagesystem. Hydrol. Wasserbewirtsch. 58(2), 73–75 (2014). doi:10.5675/HyWa_2014,2_2 H.L. Cloke, F. Pappenberger, Ensemble flood forecasting: a review. J. Hydrol. 375(3–4), 613–626 (2009) H.L. Cloke, J. Thielen, F. Pappenberger, S. Nobert, P. Salamon, R. Buizza, G. Blint, C. Edlund, A. Koistinen, C. de Saint-Aubin, C. Viel, E. Sprokkereef, Progress in the implementation of Hydrological Ensemble Prediction Systems (HEPS) in Europe for operational flood forecasting. ECMWF Newsletter No 121 (2009), pp. 20–24 M. Dale, J. Wicks, J. Mylne, F. Pappenberger, S. Laeger, Probabilistic flood forecasting and decision-making: an innovative risk-based approach. Nat. Hazards 70(1), 159–172 (2014). doi:10.1007/s11069-012-0483-z A.M. de Almeida, P. Bates, Applicability of the local inertial approximation of the shallow water equations to flood modelling. Water Resour. Res. 49, 4833–4844 (2013) D. Demeritt, S. Nobert, H.L. Cloke, F. Pappenberger, The European Flood Alert System and the communication, perception, and use of ensemble predictions for operational flood risk management. Hydrol. Process. 27, 147157 (2013). doi:10.1002/hyp.9419 I.B. Mason, Binary Events, in Forecast Verification a Practitioners Guide in Atmospheric Science, ed. by I.T. Jolliffe, D.B. Stephenson (Wiley, Hoboken, 2003), 240 pp J. Neal, G. Schumann, P. Bates, A subgrid channel model for simulating river hydraulics and floodplain inundation over large and data sparse areas. Water Resour. Res. 48, W11506 (2012) F. Pappenberger, K.J. Beven, N.M. Hunter, P.D. Bates, B.T. Gouweleeuw, J. Thielen, Cascading model uncertainty from medium-range weather forecasts (10 days) through a rainfall-runoff model to flood inundation predictions within European Flood Forecasting System (EFFS). Hydrol. Earth Syst. Sci. 9(4), 381–393 (2005) F. Pappenberger, J. Thielen, M. del Medico, The impact of weather forecast improvements on large scale hydrology: analysing a decade of forecasts of the European Flood Alert System. Hydrol. Process. 25(7), 1091–1113 (2011). doi:10.1002/hyp.7772 F. Pappenberger, E. Dutra, F. Wetterhall, H.L. Cloke, Deriving global flood hazard maps of fluvial floods through a physical model cascade. Hydrol. Earth Syst. Sci. 16, 4143–4156 (2012). doi:10.5194/hess-16-4143-2012 F. Pappenberger et al., Floods in Central Europe in June 2013. ECMWF Newsletter, No. 136, Summer (2013) F. Pappenberger, L. Stephens, S.J. van Andel, J.S. Verkade, M.H. Ramos, L. Alfieri, J.D. Brown, M. Zappa, G. Ricciardi, A. Wood, T. Pagano, R. Marty, W. Collischonn, M. Le Lay, D. Brochero, M. Cranston, D. Meissner, Operational HEPS systems around the globe (2014), http://hepex.irstea.fr/operational-heps-systems-around-the-globe/. Accessed 10 Apr 2014 M.H. Ramos, S.J. van Andel, F. Pappenberger, Do probabilistic forecasts lead to better decisions? Hydrol. Earth Syst. Sci. 17, 2219–2232 (2013). doi:10.5194/hess-17-2219-2013 E. Rodriguez, C.S. Morris, J.E. Belz, A global assessment of the SRTM performance. Photogramm. Eng. Remote. Sens. 72(3), 249–260 (2006) B. Sanders, Evaluation of on-line DEMs for flood inundation modeling. Adv. Water Resour. 30, 1831–1843 (2007)

14

A. Mueller et al.

G.J.-P. Schumann, J.C. Neal, N. Voisin, K.M. Andreadis, F. Pappenberger, N. Phanthuwongpakdee, A.C. Hall, P.D. Bates, A first large scale flood inundation forecasting model. Water Resour. Res. 49, 6248–6257 (2013). doi:10.1002/wrcr.20521 J. Thielen, J. Bartholmes, M.-H. Ramos, A. de Roo, The European flood alert system – part 1: concept and development. Hydrol. Earth Syst. Sci. 13, 125140 (2009) J.M. Van der Knijff, J. Younis, A.P.J. de Roo, LISFLOOD: a GIS-based distributed model for river basin scale water balance and flood simulation. Int. J. Geogr. Inf. Sci. 24, 189–212 (2010) N. Voisin, F. Pappenberger, D.P. Lettenmaier, R. Buizza, J.C. Schaake, Application of a mediumrange global hydrologic probabilistic forecast scheme to the Ohio River Basin. Weather Forecast. 26, 425446 (2011) F. Wetterhall, F. Pappenberger, L. Alfieri, H.L. Cloke, J. Thielen-del Pozo, S. Balabanova, J. Dahelka, A. Vogelbacher, P. Salamon, I. Carrasco, A.J. Cabrera-Tordera, M. Corzo-Toscano, M. Garcia-Padilla, R.J. Garcia-Sanchez, C. Ardilouze, S. Jurela, B. Terek, A. Csik, J. Casey, G. Stanknaviius, V. Ceres, E. Sprokkereef, J. Stam, E. Anghel, D. Vladikovic, C. Alionte Eklund, N. Hjerdt, H. Djerv, F. Holmberg, J. Nilsson, K. Nystrm, M. Sunik, M. Hazlinger, M. Holubecka, HESS Opinions “Forecaster priorities for improving probabilistic flood forecasts”. Hydrol. Earth Syst. Sci. 17, 4389–4399 (2013). doi:10.5194/hess-17-4389-2013

Challenges of Decision Making in the Context of Uncertain Forecasts in France Caroline Wittwer, C. de Saint-Aubin, and C. Ardilouze

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Vigilance Procedure and Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 Flood Vigilance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 Operational Activity at Local and National Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 Added Value of Ensemble Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Abstract

Flood is a major risk in France and an operational hydrometeorological organization (a team of 450 employees for national coordination and local services) is in place since 2003 for surveying the main river courses, about 22,000 km which can be damaged by flash flood, fluvial flood, and coastal flood hazard. Besides the building of a national network of hydrometric station, with real-time access of data on the Internet, the services are responsible for flood vigilance, over the next 24 h, and for more detailed forecasts at shorter lead-time. The information is disseminated since 2006 on the www.vigicrues.gouv.fr website. Hydrological and hydraulic deterministic models are used to increase the forecast lead-time for some 500 hydrometric stations located close to C. Wittwer (*) BRGM, Orléans, France e-mail: [email protected] C. de Saint-Aubin Service Central d’Hydrométéorologie et d’Appui à la Prévision des Inondations/SCHAPI, Toulouse, France C. Ardilouze Météo France, Toulouse, France # Springer-Verlag Berlin Heidelberg 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_72-1

1

2

C. Wittwer et al.

vulnerable flood areas. Within the set of information integrated into the operational procedure, ensemble meteorological forecasts are used by the hydroforecasters to evaluate rainfall distribution in the following days. Besides those routine activities, the European Flood Awareness System (EFAS) platform, the ensemble based forecasting system, has been tested since 2009 for some 60 flood events to study how the system can be best used within the French flood vigilance and warning procedure. During the first years of the evaluation, the results stayed quite difficult to be accounted for in the operational activities, as the number of false alarms and missed events was too high. Since 2013, the tendency is completely different, with the production of Flash Flood reports for watersheds smaller than 3,000 km2. The localization of the impacted area and the timing is better foreseen; this tendency should be even improved when real-time discharges of main gauging stations will be transferred to EFAS. Keywords

Vigilance • France • Operational flood forecasting services

1

Introduction

Two thirds of the 36,000 French cities are subject to one or more natural disasters (floods, earth quakes, clay deformation), with more than 6 million people living in flood prone areas. Reduction of the natural risk is therefore a national priority in order to adapt to these phenomena and to minimize their impact. Risk prevention policy is put in place with the aims of reducing the consequences of potential damages. The risk prevention policy is based on a large set of activities including long-term adaptation measures (control of urban development and construction following legal standards and risk prevention master plans, reduction of vulnerability, adaptation to risk and climate change to prepare for future crises); resilient infrastructure development (protection, security of protection infrastructures); provision of training and real-time information to the citizen; and other short-term crisis management measures. This covers not only the surveillance, forecasting, and warning activities during crises but also all the preparative measures linked to the awareness of hazards and vulnerability, the gathering of historical information and for analyzing past crises, as well as preparedness to crises situations. Here, we discuss how the French hydro-meteorological services translate flood forecast information, including probabilistic forecasting, into so-called vigilance maps and written information to the public services and the population.

2

Vigilance Procedure and Communication

A new disaster awareness system called vigilance procedure was established in France following the devastating storm of 1999 along the Atlantic coast. Although the storm level (intensity, location) had been well estimated by meteorological

Challenges of Decision Making in the Context of Uncertain Forecasts in. . .

3

forecasters, the public underestimated the effect of the storm. Vigilance procedure provides clear instructions to the public on what to do in the case of a possible natural hazard over the following 24 h. It combines information on current situation and forecasts of the vulnerability of the area, in order to determine the right level of action to be taken by the services in charge of the security of the citizen and by the citizen themselves. Vigilance implies that the extents of natural hazards are continuously assessed and that threshold levels are set based on historical events, model simulations, local knowledge, and any type of information that allows linking hazard to impact on citizens and assets. Vigilance ensures that not only the technical and civil security services are aware about the level of any future events but also that the public is informed and become responsible for its own behavior. For this, communication means need to be in place for real-time assessment and coordination, and responsibility and information flow between the various technical services, civil security, and local services must be agreed. The vigilance procedure is used for hydro-meteorological disasters including heavy rainfall and floods, as well as for meteorological and health related hazards, such as strong wind, snow-haze, avalanche, storm, heat wave, cold wave, wave-storm surge. In the French risk prevention policy, the terminology makes a difference between vigilance, a warning procedure produced by technical services (either meteorological or hydrological) to the civil services and to the public for preparing to react properly in case of natural hazard, and alert, which triggers civil protection actions and is launched by the Prefect (the administrative coordinator at each of the 100 departments) when the danger is confirmed, when the forecasts indicate that safety measures and assistance become necessary.

3

Flood Vigilance

Flood vigilance information is produced by a coordinated effort of some 20 local flood forecasting services across France, responsible for a series of watersheds, and one national hydro-meteorological and flood forecasting center (SCHAPI). Together with the services in charge of the hydrometric stations, this organization is operated by about 450 persons, performing either daily in normal situation or 24/24 h and 7/7 days during crises. All of them are technicians or engineers of the Government at national and regional level. The goal of the vigilance procedure is to deliver twice daily, or more frequently, information on the level of hydro-meteorological hazard before and during a disaster to the responsible party in charge of alerting the responsible services of the government, such as mayors at local level, the prefects at department level, the civil security services, and the general public. Vigilance information is provided for the upcoming 24 h. Four levels of vigilance are used for meteorological hazards (by Météo France) and for hydrological hazard (by the flood forecasting services). The levels are represented by four color codes: green, yellow, orange, and red. Green stands for no flood risk, yellow for localized floods with minor damages or rapid increase of water level, orange for more dangerous events with people at risk and damages on

4

C. Wittwer et al.

infrastructures and houses, and red stands for major floods extensively threatening people and property and involving the support of national civil security services. For each of those vigilance levels, behavioral guidelines have been provided to decrease the risk on people and assets. The visualization of vigilance does not include information about the probability or uncertainties, e.g., there is no color shading or changing intensity of colors reflecting the probability range of the event. Instead, the vigilance maps are accompanied by bulletins where the uncertainties are expressed verbally and ranges of waterlevel forecasts are described either in the text itself or in attached files. The flood hazard level classification is performed for linear segments of river network covering 22,000 km of streams under surveillance. The 250 linear segments, called river sections, ranging up to 40 km long, were identified depending on the need for detailed forecasts, the localization of vulnerable urbanized areas, and the availability of measuring hydrometric stations. All the gauging stations are linked to one of these river sections, and they relate water levels in the river course to damages in the area. Each of the past events can be classified on the vigilance scale, as shown on Fig. 1. The products delivered by the flood forecasting services range from hydrographs at hydrometric stations to maps and bulletins for the river network. Since 2006, the website www.vigicrues.gouv.fr has provided direct access to real-time measures of some 1,500 stations, as well as to national and local vigilance information. This information is either sent to the decision-making public authorities or made available for free access on the Internet (Fig. 2). Radio and TV are also major tools of the dissemination process. Two of the regional services also provide applications to registered users when threshold levels are reached at selected measuring stations. This functionality will be implemented at national level in the near future, with the upgrading of the flood vigilance web services. Real-time communication gives therefore fast and easy access to field data, as well as to flood forecasting (vigilance levels and local forecasts). With continuous improvements (for example, access to historical data and event description), it is a very effective tool for increasing the participation of people to hazard and crisis management (Fig. 2). The vigilance procedures for hydrological risks are fully aligned with the ones for meteorological risks – from the exchange of data to the dissemination of a joint meteorological and hydrological vigilance map, produced since 2007 at the request of the Ministry of Interior in order to deal with a unique concerted and expertized access to decision-making information.

4

Operational Activity at Local and National Level

A set of meteorological and hydrological information is used by SCHAPI for assessing the global hydro-meteorological situation over the current day and the next days across France. This information at the national scale is valuable for ensuring that the local vigilance levels and forecasts produced by the 20 regional services are consistent and in line with the behavior of the main watersheds, as most

Y E L L O W

Level 1 : no particular vigilance required

Level 2 : Risk of high or rapid rising water not involving significant damage but requiring particular vigilance in the case of seasonal and/or outdoor activities.

Normal situation

Disruption of activities related to the river courses, localized overflow, localized secondary road closures, isolated houses involved, inundated cellars, dirsuption of agricultural activity

Generalized and damaging overflow, Level 3 : Risk of flood with threatened human lifes, evacuations, considerable overflow liable to greatly affected circulation, partly significantly affect the daily life and disruption of social, agricultural and security of people and property economic activities

Rare and catastrophic flood, numerous human lifes threatened, Level 4 : Major risk of flood directly generalized overflow, generalized and extensively threatening people and simulateous evacuations, and property disruption at large scale of urb a,, agricultural and industrial activities

Explanation and expected consequences

REFERENCE STATIONS OF THE RIVER SECTION

1,59 m

2,66 m

February 2003

April 2004

3,06 m

3,70 m

June 2000

January 2004

4,00 m

4,74 m

8,60 m

Water level

February 1952

May 1977

June 1875

Historical flood

STATION: CAZERES / GARONNE

Fig. 1 Vigilance section of the Garonne River in the Toulouse area: compilation of past flood events for two gauging stations in relationship with vigilance levels

SPC GARONNE-TARN-LOT

2,70 m 2,67 m

April 2004

3,38 m

3,52 m

4,31 m

4,38 m

4,57 m

8,32 m

Water level

November 2011

February 2003

January 2004

May 1977

June 2000

February 1952

June 1875

Historical flood

STATION : TOULOUSE / GARONNE

A reference station is the station selected for defining the vigilance color of the river section

Warning: the choice of the color will also account for special cases, such as extremely rapid increase of water level, unusual event of the season or seasonal dangerous activity

G R E E N

O R A N G E

R E D

Vigilance

River section GARONNE TOULOUSAINE (1)

Challenges of Decision Making in the Context of Uncertain Forecasts in. . . 5

Fig. 2 Example of vigilance information available on www.vigicrues.gouv.fr during the June 2013 event in the Pyrenees. (a) Publication of the joint meteorological and hydrological map on June 17th and 18th. (b) Hydrological maps at national and regional level on June and related hydrograph for one of the gauging stations

6 C. Wittwer et al.

Challenges of Decision Making in the Context of Uncertain Forecasts in. . .

7

of the main river courses are surveyed by a series of regional services. For the next days, the information is used for preparing and organizing work of the forecasting team (shifts, 24/24 h presence in the office). In any case, a first telephone call occurs between SCHAPI and the national civil security, the Inter-ministerial Center for Operational Crisis Management (COGIC) as soon an orange or red vigilance map is foreseen, therefore before any information is pushed via Internet. During crises, the whole process is continuously repeated, in order to follow the gravity and evolution of the situation. The operational dataset used by the forecasters to assess the hydro-meteorological hazard includes the following information: – Operational radar-gauge rainfall grids (Tabary 2007) and moisture maps calculated with the SIM model (Noilhan and Mahfouf 1996), both provided by Météo France – Observed hydrological data records (waterlevels and discharges) – Deterministic meteorological forecasts at large scale from ARPEGE model from Météo France – Rainfall quantitative forecasts for the next 3 days, with expertized quantitative forecasts for some 100 meteorologically homogenous subzones over the French watersheds at various ranges, one latest method being produced on 3 h timesteps, as well as comparisons of the available numerical models AROME (Seity et al. 2011) and ECMWF-ENS – Meteorological hazard at midterm range, with probability rainfall maps of European Centre for Medium-Range Weather Forecasts (ECMWF) – Medium-range, ensemble hydro-meteorological models covering the entire national area, such as EFAS (Thielen et al. 2009) and the SIM-PE provided by Météo France (51 members of the ECMWF ENS forcing the SAFRAN-ISBAMODCOU hydro-meteorological model, Habets et al. 2008) Each of the local forecasting services is responsible for assessing the extent of future flood hazard along the set of river sections under its responsibility. This means that various types of decision-aiding tools have been developed and are regularly used to provide a vigilance color on each of its river sections, usually at 10 am and 4 pm, for the following 24 h. The vigilance level, color, is often defined by using simple graphic representations of water level, or discharge, increase for ranges of precipitation forecasts over the next day. Additional criteria include initial conditions (e.g., river water level, soil moisture, snow cover, and frozen soil). For some 500 stations, quantitative forecast can be calculated using numerical models to provide values at different lead-times ranging from few hours to 10 days for probabilistic systems. In order to account for the heterogeneity of hydrological conditions at local level, from mountainous to fluvial and coastal watersheds, the forecasters use a quite large number of modeling tools, from empirical, conceptual, global, and distributed rainfall-runoff models to hydrological routing models. Hydraulic propagation is also represented by 1 D models. All models are running with deterministic rainfall distributions, either accumulation or forecasts at various

8

C. Wittwer et al.

time steps (12 h, down to 3 h) and spatial grids (88 km and 2,52,5 km, 11 km) issued from numerical weather prediction models and/or from the radar network. When several models are providing a set of forecasts at the same station, the forecaster uses either expert decision or simple routines to determine the uncertainty around the chosen forecast value. These forecasts, and their uncertainty, are disseminated in the bulletin within the comment line of the river sections or as an attached file. The visual representation of the forecast and related uncertainty on the hydrograph is planned in the forthcoming new version of the Vigicrues website. An illustration of the types of information involved into the assessment of the forecaster over a 10-day period before any event is given on Fig. 3. It shows when the various products provide valuable information and how they complement. Decision, either for selecting a vigilance color on the 250 river sections or for coordinating the operational activities inside the flood forecasting services, at national and regional level, and for warning the civil security on future events, is based on the whole set of data. No action is taken based on only one source of information. Ensemble products are particularly valuable for early warning, as their forecast range is in line with the reaction time of most of the large watersheds, for

Fig. 3 Use of various meteorological and hydrological information by the forecasters over a 10-day sliding period

Challenges of Decision Making in the Context of Uncertain Forecasts in. . .

9

which a first signal within 3–5 days before event is an additional criteria to inform the civil security services about a possible flood occurrence.

5

Added Value of Ensemble Platforms

Besides the operational procedure, SCHAPI is also involved into the testing of new products and the development of the national modeling strategy. Ensemble methods are used to assess longer term flood hazard including EFAS and are part of this program since 2009, when the first agreement was signed with the Joint Research Center for testing the platform. Results from SIM-PE (Thirel et al. 2008; Cloke et al. 2009), a similar system developed by Météo France at national scale, are also integrated to this testing procedure. During the last 5 years, EFAS and SIM-PE signals have been checked daily to identify possible areas with future flood events with as much anticipation as possible to compare the signals with the routinely used forecasting tools and the observed vigilance level during the event. Over the period 2009–2014, each event announced either with EFAS alerts and watches or with orange and red flood vigilance has been evaluated on various criteria, such as the size of the watershed, spatial distribution and type of precipitation (rain/snow), alert level compared to real situation, and lead-time. More than 60 events with orange vigilance and almost 10 with red vigilance have been analyzed and classified into “hits,” “missed,” and “false alarm” groups. Results have shown that the medium-range forecasts provided for watersheds larger than 4000 km2 are less useful for forecasters in their decision making for the vigilance maps. With the introduction of the so-called flash flood layers into EFAS in 2012 (see Alfieri in this handbook), the warning on watersheds below 3000 km2 is now also provided by the system. In 2013, a ratio of “hits” of about 50 % for the relatively large watersheds was observed and a very high ratio of 80 % for the Flash Flood reports on smaller watersheds. These improving results are providing more valuable information to the operational forecasters: the system can provide warnings up to 4 days in advance, which is greatly valuable for organizing the forecasting activities and also for first exchanges with the civil security services. Warnings sent only 1or 2 days before an event by the EFAS system are less informative, as the situation is already localized by other models and the timing is also better defined. An example from EFAS system for one of the November 2014 events on the Mediterranean shore is shown in Fig. 4, while the corresponding vigilance maps sent over the same period are presented below on Fig. 5. Besides the very good performance of EFAS for the Aude, Têt, and Tech rivers, it is worthwhile to notice that EFAS Flood Watch had been received on November 22 for Hérault (2525 km2) for November 27th, therefore longer anticipation. On the other hand, no signal had been sent for the Berre river, although it reached red vigilance level on Vigicrues on November 30th. A detailed evaluation of the EFAS flash flood results for 2014 is still underway but first results show that valuable information concerning the area covered by an incoming event and its timing are provided to the operational teams by the ensemble system.

10

C. Wittwer et al.

Fig. 4 Flood probability forecast provided by EFAS system for Aude river (5,500 km2) for November 26, 2014 issued 3 days ahead. The red triangles show the flood alert and the orange triangles show thunderstorm for Tech (731 km2) and Têt (1,412 km2)

The active participation of the operational team in the testing of two ensemble EFAS and SIM-PE platforms during their development phase and thereafter has been a very valuable training for the operational forecasters. It allows them not only to learn about the methodology but also to assess in real-time, as well as a posteriori, the value of the warnings within the overall set of hydro-meteorological information from various sources which are provided to the forecaster in normal situation and during crises. The forecasters are using additional information to evaluate if the level of warnings issued by the ensemble are in agreement with the local hydrological models and with the real water levels reached during events. This is for example the accuracy of the location and surface of the impacted area, the timing, duration, leadtime and consistency of the signal. In return, the possibility of providing feedback to the developers has allowed improving the systems, reducing false alarm rates, and visualizing information in the best way for the expert users. In line with the testing of EFAS and the SIM-PE platform, several national projects have been developed, and implemented, by the flood forecasting services and its research partners, in order to concentrate on more local floods and short-term forecasting. The purpose there is to provide new methodologies and tools to deal

Challenges of Decision Making in the Context of Uncertain Forecasts in. . .

11

Fig. 5 Vigilance maps and hydrographs of two stations on Aude and Hérault rivers for the November 30, 2014 event

with the spatial and temporal variability of precipitation forecasts, which are influencing the behavior of smaller, upper parts of watersheds and causing devastating floods over the last years. These new systems aim at providing forecasts for areas which are not yet covered by the river network under “hydrological vigilance surveillance,” and responding therefore to the expectation of the citizens affected by these rapid floods, as well as to the public authorities. Following two tragic events in 2010 along the Atlantic coast and the Var department Mediterranean area, the ministry in charge of sustainable development acted a national management plan for rapid submersions (MEDDE 2011), to decrease the effects of coastal and small watersheds floods. These two projects are one of the measures of this large program covering four main areas: (a) control of urban planning and adaptation of constructions; (b) improvement of hazard understanding, warning systems, forecasting, vigilance, and alert; (c) security of waterworks and protecting infrastructures; and (d) improvement of resilience.

12

6

C. Wittwer et al.

Summary

In France, a major reform of the national and regional flood services in 2002 has been enforced to modernize the former procedures, based on observation of river levels, into a national real-time operational system, delivering forecasting and warning information on the main river courses (Tanguy et al. 2005). The network covers about 22,000 km of streams (out of the 1,20,000 km of streams larger than 1 m width) and improves the anticipation before flood events for 90 % of population at risk, on the basis of a variety of data and numerical hydro-meteorological models, including ensemble prediction systems. A complementary procedure is under study for flash floods on small watersheds in order to inform an even larger percentage of the population. Up to now, the probabilistic information, whether meteorological or hydrological, is mainly used by the operational expert teams, who evaluate the platforms results and provide a “deterministic” assessment to the decision makers (Prefects, civil security). A first success has already been reached by Météo France, the French National Meteorological Service, to convey uncertain warnings to forecasters, technical public authorities, and civil security, by producing a daily map of possible extreme (orange and red) events for the next 2 and 3 days at regional level. The probabilistic information or the degree of uncertainty is not yet translated to the public but a new version of the Vigicrues website will include uncertainty ranges on the forecasts. However, the aim of the vigilance procedure is to make the public aware that a potentially dangerous situation is expected and that adapted behaviors must be followed. The situation has also evolved inside the flood services: 5 years ago, only the national center, SCHAPI, was convinced of the value of ensemble warnings and ready to invest time to be educated to this new approach and to assess regularly the platform results (EFAS and SIM-PE). Now, several local services, mainly those concerned by short-term rapid floods (6–12 h reaction time of the watersheds), are also interested into this type of information, as EFAS has produced particularly welllocated and timely forecasts over the last 2013 and 2014 flood events. Furthermore, the tests performed with the current models under development, CHROME and (AIGA and CHROME), are also providing excellent forecasts, especially in the case of orographic effects and rapid floods. Through working groups and evaluation of the different components of the EFAS and SIM-PE systems, improvements and needs for adaptation to the national procedures and local hydrological behavior have been identified by the French forecasters. These results can not only be implemented into the European platform but they are also valuable to develop additional dedicated numerical models and approaches. The future of ensemble forecasting is therefore large for operational forecasters, not only in testing and developing new products but also in transferring the information to the end-users, at technical and decisional level, as well as to the public at longer term.

Challenges of Decision Making in the Context of Uncertain Forecasts in. . .

7

13

Conclusion

With more than 6 million people living in flood prone areas, two thirds of the French cities are subject to one of more natural disasters (floods, earth quakes, clay deformation).

References P. Arnaud, J. Lavabre, Coupled rainfall model and discharge model for flood frequency estimation. Water Resources Research, 38–6, (2002). doi:10.1029/2001WR000474 H. Cloke, J. Thielen, F. Pappenberger, S. Nobert, G. Balint, C. Edlund, A. Koistinen, C. De SaintAubin, E. Sprokkereef, C. Viel, P. Salamon, R. Buizza, Progress in the implementation of hydrological ensemble prediction systems (HEPS) in Europe for operational flood forecasting. ECMWF Newslett. 121, 20–24 (2009) F. Habets, A. Boone, J.L. Champeaux, P. Etchevers, L. Franchisteguy, E. Leblois, E. Ledoux, P. Le Moigne, E. Martin, S. Morel, J. Noilhan, P. Quintana Segui, F. Rousset-Regimbeau, P. Viennot, The SAFRAN-ISBA-MODCOU hydrometeorological model applied over France. J. Geophys. Res. 113, D06113 (2008). doi:10.1029/2007JD008548 MEDDE, Plan Submersion Rapide. Document of the Ministry for Ecology, Sustainable Development and Energy (2011) J. Noilhan, J.F. Mahfouf, The ISBA land surface parameterisation scheme. Global Planet. Change 13(1–4), 145–159 (1996) Y. Seity, P. Brousseau, S. Malardel, G. Hello, P. Bénard, F. Bouttier, C. Lac, V. Masson, The AROME–France convective-scale operational model. Mon. Weather Rev. 139, 976–991 (2011) P. Tabary, The New French operational radar rainfall product. Part I: methodology. Weather Forecast. 22(3), 393–408 (2007) J.-M. Tanguy, J.-M. Carrière, Y. Le Trionnaire, R. Schoen, Réorganisation de l’annonce des crues en France. Houille Blanche – Revue Internationale De L’Eau 2, 44–48 (2005). doi:10.1051/ lhb:200502005 J. Thielen, J. Bartholmes, M.-H. Ramos, A. de Roo, The European flood alert system–Part 1: concept and development. Hydrol. Earth Syst. Sci. 13, 125–140 (2009). doi:10.5194/hess-13125-2009 G. Thirel, F. Rousset-Regimbeau, E. Martin, F. Habets, On the impact of short-range meteorological forecasts for ensemble streamflow prediction. J. Hydrometeorol. 9, 1301–1317 (2008) B. Vincendon, V. Ducrocq, O. Nuissier, B. Vié, Perturbation of convection-permitting NWP forecasts for flash-flood ensemble forecasting. Nat. Hazards Earth Syst. Sci. 11, 1529–1544 (2011).

Overview of Communication Strategies for Uncertainty in Hydrological Forecasting in Australia With a Focus on “Assessing Forecast Quality of the National Seasonal Streamflow Forecast Service” Narendra Kumar Tuteja, Senlin Zhou, Julien Lerat, Q. J. Wang, Daehyok Shin, and David E. Robertson Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Methodology Used to Assess Forecast Quality of the National Seasonal Streamflow Forecast Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Modeling Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Seasonal Streamflow Forecast Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Aggregated Forecast Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Forecast Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Forecast Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Aggregated Forecast Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 3 3 5 9 10 10 11 15 17 18

Abstract

The National Seasonal Streamflow Forecasting Service operated by the Bureau of Meteorology since 2010 delivers monthly updates of 3 month ensemble forecasts at 147 locations across 75 river basins using the statistical Bayesian joint probability (BJP). Seasonal forecasts are communicated to the public using statistical concepts such as “chances,” “ensembles,” “lower/higher than median,” etc. However, these concepts require advanced competencies in statistics, and they N.K. Tuteja (*) • J. Lerat Bureau of Meteorology, Canberra, ACT, Australia e-mail: [email protected] S. Zhou • D. Shin Bureau of Meteorology, Docklands, VIC, Australia Q.J. Wang • D.E. Robertson CSIRO Land and Water, Clayton, VIC, Australia # Crown Copyright, as represented by Bureau of Meteorology 2016 Q. Duan et al. (eds.), Handbook of Hydrometeorological Ensemble Forecasting, DOI 10.1007/978-3-642-40457-3_73-1

1

2

N.K. Tuteja et al.

cannot be conveyed to a general audience easily. This chapter focuses on the challenge of communicating forecast skill to a wide range of users more effectively. A simple forecast performance measure called the “Aggregated Forecast Performance Index (AFPI)” was introduced which captures key attributes such as forecast reliability and accuracy and combines them into a single easy-to-understand and well-informed aggregated measure. Based on this index, it was demonstrated that bureau’s seasonal streamflow forecasts are reliable. They also offer improved accuracy by narrowing down the forecast uncertainty (up to 25 %) with respect to reference climatology and hence offer a value proposition for water managers to improve their decision-making. Keywords

Seasonal streamflow forecasting • Ensemble forecasting • Uncertainty estimation • Forecast verification • Aggregated forecast performance • Forecast accuracy • Forecast precision • Forecast reliability • Continuous Rank Probability Score (CRPS) • Root mean squared error (RMSE) • Root mean squared error in probability space (RMSEP) • Hit rates • Bayesian joint probability model (BJP)

1

Introduction

Hydrological conditions in Australia are among the most variable on earth (McMahon et al. 1987). Its streamflow regime can go through prolonged periods of droughts such as the “Millennium drought” that occurred between 1995 and 2007 across most parts of eastern Australia (Chiew et al. 2008). This variability has a profound impact on the management of water resources in Australia and more specifically on the management of risks related to water supply for urban, irrigation, and environmental needs. On 26 January 2007, following a prolonged period of severe drought and rapidly diminishing water supplies, the Australian Prime Minister announced the National Plan for Water Security, a 10-point plan significantly enhancing Commonwealth involvement in the nation’s water affairs (Vertessy 2013). One of the pillars of the reforms was a significant commitment to improving the quality and coverage of Australia’s water information, which includes a National Seasonal Streamflow Forecasting Service (SSF) provided by the Bureau of Meteorology since December 2010 (www.bom.gov.au/water/ssf). The SSF service now provides forecasts at 147 locations across 75 river basins located across Australia as of July 2016. Work is currently underway to include more locations not covered to-date. Seasonal forecasting is probabilistic by nature due to large uncertainties in the dynamics of the atmosphere and uncertainties in the response of catchments to climate forcing. As a result, seasonal forecasts are communicated to the public using statistical concepts such as “chances,” “ensembles,” “lower/higher than median,” etc. However, these concepts require advanced competencies in

Overview of Communication Strategies for Uncertainty in Hydrological. . .

3

statistics and they cannot be conveyed to a general audience easily (Morss et al. 2008). More precisely, seasonal forecasting involves two main challenges. The first is to issue probabilistic forecasts using communication methods that aim to minimize the risks of misinterpretation. An overview of the products provided by the Bureau of Meteorology to address this challenge is discussed here, and more details can be found in a recent stakeholder survey indicating a high level of satisfaction related to the SSF products (Wilson et al. 2014). Here we focus on the second challenge in the seasonal forecasting domain, i.e., communicating forecast skill. Forecasting models have varying levels of skill depending on the forecast location and period of the year. Measures of skill can have a strong influence on how forecasts impact decisions related to water management, and they must be communicated to the users of the forecasts. Various forecast verification methods are available for assessing the multiple facets of forecast performance including notions such as accuracy and reliability (Murphy 1993). However, these methods remain fairly complex and target a scientific audience. This paper describes a rigorous approach used by the Bureau of Meteorology to verify streamflow forecasts using a variety of complementary performance metrics which are then integrated into a single index. This index can be easily communicated to and discussed with a wide audience as described below.

2

Methodology Used to Assess Forecast Quality of the National Seasonal Streamflow Forecast Service

2.1

Modeling Approach

Probabilistic streamflow forecasts of 3 month ahead outlooks for this service are derived from a statistical modeling approach called the Bayesian Joint Probability model (BJP; Wang et al. 2009; Wang and Robertson 2011; Robertson and Wang 2012) using a modeling system called the Water Forecasting System for Australian Rivers (WAFARi; Shin et al. 2011). Currently, the service updates streamflow forecasts every month for 147 key water supply catchments in eastern Australia, with the objective of helping water suppliers and users make better decisions. In the period 2014-15, the service will be expanded across all jurisdictions in Australia (Fig. 1). The BJP model uses statistical relationships between climate indices, recent catchment conditions, and historical rainfall and streamflow at a site to forecast streamflow for the next 3 months. The BJP approach assumes that a transformed set of streamflows and their predictors follow a multivariate normal distribution. A typical model comprises a predictor to represent antecedent catchment conditions, commonly recently observed streamflow, one climate index predictor, and a streamflow predictand (see Wang and Robertson (2011) for details). Such a model would require up to 13 parameters. Model parameters and their uncertainty are inferred using a Bayesian formulation, which is implemented through a Markov Chain Monte Carlo (MCMC) sampling method. Forecasts are made by conditioning

4

N.K. Tuteja et al.

a

Acheron River at Taggerty (405209) Forecast period: Feb–Apr 2016

b

Terciles applied to forecast distribution

Percentage of forecast in each tercile Hundcast RMSEP = 53 (High skill)

Low flow 76.2% High flow 1.0%

0

20 40 60 80 100 Terciles from historical data

Near median flow 22.8%

0 Generated: 19:08 07/07/2016 (ver. 2.2.0/1.1.8)

20

40 60 80 Streamflow (GL)

100

© Commonwealth of Australia 2016, Australian Bureau of Meteorology

Fig. 1 (a) Spatial extent of the seasonal streamflow forecast locations across Australia (146 locations as of July 2016). (b) Forecast distribution derived from the Bayesian Joint Probability model, historical reference distribution (1950–2010) and tercile forecasts derived from the historical reference distribution for the Acheron River at Taggerty, Goulburn River Basin

Overview of Communication Strategies for Uncertainty in Hydrological. . .

5

the multivariate distribution on predictor values represented by a 5000-member ensemble. The BJP model for forecasting 3 month streamflow volumes is calibrated and verified separately for each 3-month period (e.g., JFM, FMA. . . DJF). The crossvalidation process, i.e., model calibration and forecast verification in retrospective mode at a given site and season, is illustrated in Fig. 2. The current cross-validation procedure is developed for the period 1980–2008 of available streamflow records in which 5 years are left out of the model calibration and are used for forecast verification.

2.2

Seasonal Streamflow Forecast Verification

The need to establish requirements for a comprehensive national system to verify hydrologic forecasts and guidance products which satisfy end user requirements is unequivocal. Key elements of the Bureau’s approach to national seasonal streamflow forecast verification are: (a) Scientific rigor in the development of stringent forecast verification metrics encompassing a multitude of methods using existing and/or new and evolving techniques that can be used to represent probabilistic forecast performance across all sites and seasons. Central to this requirement is the objective to publicly display a range of forecast performance metrics at a granular level in a transparent manner for the end users to have an informed view of the forecasting system performance prior to its use in decision making.

Fig. 2 Illustration of the BJP model calibration and validation procedure implemented in the WAFARi modeling system for each forecast month and location (146 locations as of July 2016). The calibration scheme is repeated for each year of the streamflow records

Fig. 3 (continued)

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0

20

40

60

80

100

c

PIT

Hit rate (%)

0.6

Uniform variate

0.4

0.8

Low flow

Near median flow

High flow

JFM FMA MAM AMJ MJJ JJA JAS ASO SON OND NDJ DJF

Tercile hit rate Acheron River at Taggerty (405209) 1950-2010

0.2

PIT value 95% K-S band

PIT - uniform probability Acheron River at Taggerty (405209) Aug-Oct 1980-2008

1.0

b

d

Skill score (%) Forecast quantile ranges and observation (GL)

a

0

50

100

150

200

250

300

0

10

20

30

40

50

60

70 RMSEP RMSE CRPS

1980

1985

Forecast median Observation

1990 1995 Forecast year

Forecast [0.10,0.90] QR Forecast [0.25,0.75] QR

2000

2005

Historical [0.10,0.90] QR

Forecast quantiles and observations versus year Acheron River at Taggerty (405209) Aug-Oct 1980-2008

Forecast period

JFM FMA MAM AMJ MJJ JJA JAS ASO SON OND NDJ DJF

Skill scores Acheron River at Taggerty (405209) 1980-2008

6 N.K. Tuteja et al.

Overview of Communication Strategies for Uncertainty in Hydrological. . .

7

(b) The requirement is to simplify the communication of ensemble streamflow forecast performance for practitioners across the water industry, to support informed decision making against the background of imperfect forecasts within an ensemble-based risk assessment framework. While this latter issue is complex, it is nevertheless critical for the adoption of the seasonal streamflow forecasts to directly support operational planning and decision making in reservoir operations, water allocation planning and delivery for consumptive use, and environmental flow management. Streamflow forecast performance could be classified in two broad categories: (a) reliability and (b) accuracy. Both of these categories are important for the end user. The Bureau of Meteorology in collaboration with its research partners from CSIRO and university sector is developing methods to describe streamflow forecast performance designed to address end user needs across the water industry in Australia. It is largely work in progress, and these methods will continue to evolve into the future as end user needs are better understood over the coming years.

2.2.1 Forecast Reliability Forecast reliability is related to the correspondence between the distribution of forecasts and the distribution of observations. It is critically important so that water managers can confidently eliminate least plausible options in their water allocation and delivery planning which invariably involves extensive scenario analysis. The streamflow forecast distribution is conditioned on the assumptions made during the inference stage. Unsupported assumptions may lead to inadequate and unreliable forecast distributions which in turn impacts on end user confidence in the forecasts. Forecast reliability is assessed using the Probability Integral Transform (PIT) plot (Dawid 1984; De Gooijer and Zerom 2000; Gneiting et al. 2007; Laio and Tamea 2007; Thyer et al. 2009; Wang et al. 2009). The diagram plots the cumulative distribution of the PIT computed from observations transformed using the corresponding forecast CDF against a standard uniform variate. The proximity of the plotted curve to the diagonal line in the plot indicates the level of reliability of the forecasts (Fig. 3a).

ä Fig. 3 (a) Forecast reliability represented through the Probability Integral Transform plot (PIT) or the predictive Q-Q plot. (b) Skill scores CRPS, RMSE, and RMSEP derived from forecast verification wrt the historical reference period (1980–2008). (c) Tercile hit rate for the low flow, near median flow, and high flow over the cross-validation period (d) Comparison of the interquantile range of the streamflow forecasts and observed streamflow over the cross-validation period 1950–2010

8

N.K. Tuteja et al.

2.2.2 Forecast Accuracy Forecast accuracy is usually defined in a relative way as “the relative accuracy of a set of forecasts, with respect to some set of standard control or reference forecasts” (Wilks 1995 pp. 236–237). In many cases, the distribution of data from historical records has been used as the reference forecast. The generic form of a skill score is represented through Eq. 1, Skill Score ¼

Scorefcast  Scoreref  100 ð%Þ Scoreperf  Scoreref

(1)

where, Scorefcast quantifies the error measure in streamflow forecasts, Scoreref is the same error measure applied to the reference forecasts, and Scoreperf is the measure applied to perfect forecasts (any error measure might be used, e.g., CRPS; RMSE; RMSEP). The lower bound of the skill score depends on the selected measure, but the upper bound of the skill score is 100 (%) for any error measure. It is noteworthy that the actual value of a skill score can be different even for the same data, depending on the reference forecasts selected for the calculation. Note that observed streamflow for the period 1970–2010 (or starting year if observations commenced post 1970) is used as the reference period in this study and referred to as “climatology.” The accuracy of the streamflow forecasts is derived using the following metrics – three skill scores based on the error measures, CRPS, RMSE, and RMSEP; tercile hit rate; and forecast precision (Fig. 3b, c; Table 1; see www.bom.gov.au/water/ssf; Laio and Tame 2007; Wang and Robertson 2011; Tuteja et al. 2011; Alfieri et al. 2014): Table 1 Forecast rating thresholds for different metrics used to quantify accuracy and precision rating of the seasonal streamflow forecasts Forecast rating Reliability ratinga (p-value) Accuracy rating (CRPS) Accuracy rating (RMSE) Accuracy rating (RMSEP) Accuracy rating – Low flow tercile Accuracy rating – High flow tercile Precision rating – IQR (10 %, 90 %)

0 1.0 %

1 < 2.5 %

2 < 5.0 %

3 < 7.5 %

4 < 10.0 %

5  10.0 %

0

< 10

< 20

< 30

< 40

 40

0

< 10

< 20

< 30

< 40

 40

0

< 10

< 20

< 30

< 40

 40

 0.33

< 0.40

< 0.50

< 0.70

< 0.90

 0.90

 0.33

< 0.40

< 0.50

< 0.70

< 0.90

 0.90

 1.0

> 0.95

> 0.85

> 0.75

> 0.6

 0.60

Reliability rating is derived from “p-value” obtained from the Kolmogrov-Smirnov test at significance thresholds varying from 1 % to 10 %

a

Overview of Communication Strategies for Uncertainty in Hydrological. . .

9

• CRPS – Continuous Rank Probability Score described by the area between the forecast distribution and a step function of the observation. • RMSE – Root Mean Squared Error representing forecast errors directly in the measurement space. • RMSEP – Root Mean Squared Error of forecast is the expected value of the distance between the median of streamflow forecasts and corresponding observations in probability space. • Tercile hit rate – ratio of the binary outcome (1 or 0) depending on observation falling within the forecast tercile with largest probability and total number of observations (Fig. 1b and 3c). • Forecast precision – determined as the ratio of the Inter Quantile Range (IQR) (10 %, 90 %) of the forecast distribution and IQR (10 %, 90 %) of the observations across the cross-validation period (IQR-80 %; Fig. 3d). Given that RMSE is based upon errors in the measurement space, it is sensitive to a small number of large errors relative to many small ones, and therefore it can potentially lead to conservative forecasts that are influenced by a few high flow events. CRPS measures departure of the forecast distribution from observations, and like RMSE it is also sensitive to large errors. Note that for a deterministic forecast, the CRPS is reduced to MAE (Mean Absolute Error). CRPS is sensitive to ensemble size (Ferro et al. 2008), and therefore a large 5000-member ensemble size is used. In the case of RMSEP, forecast errors are represented in the probability domain, and therefore, the influence of large errors in the original flow domain has a smaller impact on RMSEP compared to RMSE and CRPS. More frequent events have more influence on the errors in estimating RMSEP, which reduces the impact of outliers in the measurement space. However, for periods with a small variability in streamflow volumes, such as in the low flow season, a slight error in the measurement space can be amplified to an extremely large error in the probability space. In such a case, RMSEP can respond sensitively to even marginal errors in streamflow volumes. Forecast precision for a given forecast location and month across all years is determined through the interquantile range. It is an important metric for determining optimal storage operations, e.g., elimination of the least plausible options in scenario analysis and designing environmental watering events for significant wetlands.

2.3

Aggregated Forecast Performance

In statistics, the Kolmogorov–Smirnov test (K–S test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test) or to compare two samples (two-sample K–S test). The p-value from the K-S test is

10

N.K. Tuteja et al.

used to categorize the reliability rating (Table 1) which is derived from “p-value” obtained from the Kolmogrov-Smirnov test at significance thresholds varying from 1 to 10 %. A “p-value” greater than 10 % implies a high level of confidence in reliability of the forecasts, while a “p-value” of less than 1 % implies a weak chance that forecast distributions are reliable. Since each metric emphasizes different aspect of forecast accuracy, it is desirable to inspect all the performance metrics when comparing modeling results. The forecast accuracy rating for each of the metric – CRPS, RMSE, RMSEP, tercile hit rate, and forecast precision – is categorized across the range 0–5 (Table 1). Water managers in Australia are more interested in low flow and high flow conditions, and near median flow conditions are of least interest. Therefore, tercile hit rate accuracy of low flows and high flows are considered, and the use of near median flow hit rates is ignored in deriving overall composite indicator of the forecast performance. Given high demand for water for consumptive and environmental use, a very conservative approach (often minimum observed flow) is used by water managers at the beginning of the annual water planning cycle, and the allocations are adaptively increased over time during the filling season. For this reason, forecast precision rating based upon IQR (10 %, 90 %) is used by categorizing in the range 0–5 depending upon proportional narrowing of the uncertainty over climatology (Table 1). The interrelationship between precision and reliability is important. A very precise forecast that turns out incorrect most of the time is undesirable. On the other hand, a very reliable forecast with large uncertainty too is unhelpful to water managers because a very small number of possibilities can only be eliminated in their scenario planning. Finally, the aggregated streamflow forecast performance rating is determined using a hierarchical approach wherein forecast reliability or the baseline forecasts and accuracy representing incremental improvements over the baseline forecasts are all used to derive the Aggregated Forecast Performance Index (AFPI) (Fig. 4).

3

Results and Discussion

3.1

Forecast Reliability

The PIT uniform probability plots are used here to assess the overall reliability of the forecasts for individual locations. The PITs will be uniformly distributed if the forecasts are reliable. The p-value derived from the Kolmogorov-Smirnov (K-S) uniform test of the PITs can be used to indicate the reliability of the forecast distributions (Fig. 5). It is shown that the higher the p-value the better the reliability of the forecast distribution. The plots are suitable for small samples sizes, and they are also useful for diagnosing problems, such as bias and/or an unrealistic ensemble spread. However, the p-value alone cannot represent all aspects of the forecast errors. Following an investigation of the p-value across all sites and forecast periods along

Overview of Communication Strategies for Uncertainty in Hydrological. . .

11

Fig. 4 Hierarchical approach to determine a single Aggregated Forecast Performance Index (AFPI) rating which is comprised of forecast reliability and accuracy of the streamflow forecasts. The rating is bounded between (0, 10) and is classified into 5 categories – low (0–2), moderate (2–4), medium (4–6), high (6–8), and very high (8–10)

with a visual inspection of the PIT plots, it is concluded that seasonal streamflow forecasts derived from the BJP model are reliable (see also Sect. 3.3).

3.2

Forecast Accuracy

3.2.1 Skill Scores Based on CRPS, RMSE, and RMSEP Streamflow forecast accuracy is assessed using the CRPS, RMSE, and RMSEP skill scores. The strengths and limitations of each metric were discussed in Sect. 2.2.2. Figure 6 shows the overall skills of the seasonal streamflow forecasts derived across the cross-validation period for the 147 service locations. On average, the forecasts are more skilful for the second half of the year which coincides with the filling of the storages, active growing season, and high irrigation demand. On average, the RMSEP skill score across all sites is the highest and the RMSE skill score is the lowest among the three metrics. Since each skill score emphasizes different aspects of the forecast accuracy, it is desirable to inspect all the skill scores when assessing modeling results. It is evident from Fig. 2 that the seasonal streamflow forecasts are accurate and that they add improvements over climatology. Note that baseline forecasts from climatology itself

0.4

0.6

p = 0.075

Uniform variate

0.8

1.0

p = 0.026

PIT value 95% K-S band

0.2

0.4

0.6

p = 0.101

Uniform variate

0.6 Uniform variate

0.4

PIT - uniform probability Turon River at Sofala (421026) Dec-Feb 1950-2010

0.2

PIT value 95% K-S band

0.8

0.8

PIT - uniform probability Tarcutta Creek at Old Borambola (410047) Mar-May 1950-2010

Fig. 5 PIT uniform probability plots with different p-value for 6 different forecast locations

Uniform variate

0.0 0.0

0.8

0.0 0.0

0.6

0.2

0.6

0.8

1.0

0.0 0.0

0.2

0.4

1.0

0.2

0.4

0.6

0.8

1.0

0.4

0.2

PIT value 95% K-S band

PIT - uniform probability Mitta Mitta River at Hinnomunjie (401203) Mar-May 1955-2010

0.2

PIT value 95% K-S band

0.4

0.6

0.8

1.0

0.0 0.0

0.2

0.4

0.6

0.8

1.0

p = 0.012

PIT PIT

PIT

PIT

1.0

1.0

PIT PIT

PIT - uniform probability Adelong Creek at Batlow Rd (410061) Mar-May 1950-2010

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.0

0.2

0.4

0.6

0.8

1.0

p = 0.05

0.4

0.6

p = 0.999

Uniform variate

0.8

0.2

PIT value 95% K-S band

0.6 Uniform variate

0.4

0.8

PIT - uniform probability Bloomfield River at China Camp (108003A) Oct-Dec 1970-2010

0.2

PIT value 95% K-S band

PIT - uniform probability Murray River at Biggara (401012) Apr-Jun 1969-2010

1.0

1.0

12 N.K. Tuteja et al.

Overview of Communication Strategies for Uncertainty in Hydrological. . .

13

Fig. 6 CRPS, RMSE, and RMSEP skill scores of seasonal streamflow forecasts for the 147 service locations

provide a reasonable basis given that observations at all the forecast locations extend beyond 30 years.

3.2.2 Tercile Hit Rates for Low and High Flows Forecast accuracy in terms of the tercile hit rates for low and high flows for all 147 forecast locations across the cross-validation period is high (Fig. 7). On average, the forecasts for both high and low flows are better for the period extending from July to October. This is indeed an important outcome for water managers in that tercile forecasts are commonly used for water resource planning in Australia. 3.2.3 Forecast Precision Forecast precision is assessed on the basis of the mean ratio of the interquantile range derived across the cross-validation period. Figure 8 shows the mean ratios of forecast interquantile range (10 %, 90 %), denoted as IQR-80 %, and divided by climatology for the 147 service locations. On average, the forecasts have higher precision in the second half of the year. Across all sites and forecast periods, streamflow forecast uncertainty is narrower than climatology, and the reduction in uncertainty ranges, on average around 15 %. Given that streamflow forecasts are reliable as well as more precise than climatology, they do offer insights for water managers to discount least plausible options in their scenario planning.

14

N.K. Tuteja et al.

Fig. 7 Tercile hit rates for low and high flows of the seasonal streamflow forecasts generated for the 147 service locations

Fig. 8 Mean ratios of forecast interquantile range (10 %, 90 %) divided by climatology for the 147 service locations

Overview of Communication Strategies for Uncertainty in Hydrological. . .

15

Fig. 9 Reliability rating based on the p-value of the K-S uniform test of forecast distributions

3.3

Aggregated Forecast Performance

The p-value derived from the Kolmogorov-Smirnov (K-S) uniform test of the PITs is categorized across the range 0–5 as a measure to indicate the reliability of the forecast distributions (Fig. 7; Table 1). Figure 9 shows the reliability rating derived from the K-S p-value for the JFM, AMJ, JAS, and OND forecast periods and indicates that seasonal streamflow forecasts are very reliable for most service locations and forecast periods. The forecast accuracy rating for each of the error measures – CRPS, RMSE, RMSEP skill scores and tercile hit rate – is categorized across the range 0–5 (Table 1). The forecast precision rating based upon the mean ratio of IQR-80 % is used by categorizing in the range 0–5 (Table 1). Figure 10 shows the average of the precision rating and the five accuracy ratings for the JFM, AMJ, JAS, and OND

16

N.K. Tuteja et al.

Fig. 10 Average of the six accuracy ratings – CRPS, RMSE, RMSEP, tercile hit rate of low and high flows, and interquantile range (10 %, 90 %)

forecast periods. The figure appropriately represents the incremental improvements over the baseline forecasts. The final streamflow forecast performance index is aggregated from the forecast reliability rating and the average of the precision rating and the five accuracy ratings. Figure 11 shows the aggregated forecast performance index (AFPI) for the JFM, AMJ, JAS, and OND forecast periods. It demonstrates that the AFPI can adequately summarize the performance metrics of the seasonal streamflow forecasts. Therefore, it can be regarded as a simplified approach to communication of the probabilistic seasonal streamflow forecast performance across key water supply catchments in Australia.

Overview of Communication Strategies for Uncertainty in Hydrological. . .

17

Fig. 11 Aggregated forecast performance index (AFPI) of seasonal streamflow forecasts. JFM and JAS are the high flow season in northern and southern Australia, respectively. AMJ and OND are transition periods between high flow and low flow regimes.

4

Conclusion

The Seasonal Streamflow Forecasting Service delivers monthly updates of 3 month ensemble-based forecasts at 147 locations across 75 river basins using the statistical Bayesian Joint Probability (BJP) model which is operational within the end-to-end seasonal forecasting system WAFARi. The current service coverage extends largely across eastern Australia (at September 2014), and work is underway to extend the service to other jurisdictions. The BJP model is calibrated and verified separately for each site and forecast period using a rigorous “leave 1-year out” cross-validation procedure for

18

N.K. Tuteja et al.

1980–2008. The service delivers extensive forecast verification products at a granular level for each forecast location including metrics that describe reliability (PIT plots), accuracy (CRPS, RMSE, RMSEP), tercile hit rates for low and high flow range, and precision represented by the interquantile range 10–90 %). Water managers in Australia now increasingly rely on these forecast products in their decision making, e.g., reservoir management, water allocation planning and delivery, and environmental flow management to support water markets and trading. Increased adoption of the water availability forecasts in Australia is directly dependent on improved user understanding of the forecast quality. A wide range of forecast verification products is available through the seasonal streamflow forecasting service of the Bureau of Meteorology. However, the products are innately complex and are not easily understandable as they require advanced competencies in statistics. Therefore, there is a need to summarize various forecast verification products with a simple and easily understandable composite forecast performance measure(s). In doing this, the overall simplified forecast performance measure ought to make use of the multitude of performance measures derived from rigorous cross-validation methods discussed here, each of which have their own strengths and limitations. A simple forecast performance measure, called the “Aggregated Forecast Performance Index (AFPI)” was presented which captures key attributes such as forecast reliability and accuracy. It was demonstrated that Bureau’s seasonal streamflow forecasts are very reliable. They also offer improved accuracy by narrowing down the forecast uncertainty (up to 25 %) with respect to reference climatology and hence offer a value proposition for water managers to improve their decision making. Finally, on the basis of a single easy-to-understand and well-informed aggregated measure, the skill of the Bureau's seasonal streamflow forecasts has been communicated to its users effectively.

References L. Alfieri, F. Pappenberger, F. Wetterhall, T. Haiden, D. Richardson, P. Salamon, Evaluation of ensemble streamflow predictions in Europe. J. Hydrol. 517, 913–922 (2014). doi:10.1016/j. jhydrol.2014.06.035 F. Chiew, J. Vaze, K.J. Hennessy, Climate Data for Hydrologic Scenario Modelling Across the Murray-Darling Basin: A Report to the Australian Government from the CSIRO MurrayDarling Basin Sustainable Yields Project (CSIRO, Canberra, 2008) A.P. Dawid, Present position and potential developments: some personal views: statistical theory: the prequential approach. J. R. Stat. Soc. Ser. A 147, 278–292 (1984) J.G. De Gooijer, D. Zerom, Kernel-based multistep-ahead predictions of the US short-term interest rate. J. Forecast. 19, 335–353 (2000) C.A.T. Ferro, D.S. Richardson, A.P. Weigel, On the effect of ensemble size on the discrete and continuous ranked probability scores. Meteorol. Appl. 15, 19–24 (2008). doi:10.1002/met.45 T. Gneiting, F. Balabdaoui, A.E. Raftery, Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 69, 243–268 (2007) F. Laio, S. Tamea, Verification tools for probabilistic forecasts of continuous hydrological variables, Hydrology and Earth System Sciences. Hydrol. Earth Syst. Sci. 11, 1267–1277 (2007). www. hydrol-earth-syst-sci.net/11/1267/2007/

Overview of Communication Strategies for Uncertainty in Hydrological. . .

19

T.A. McMahon, B.L. Finlayson, A. Haines, R. Srikanthan, Runoff variability: a global perspective. IASH-AISH 168, 3–11 (1987) R.E. Morss, J.L. Demuth, J.K. Lazo, Communicating uncertainty in weather forecasts: a survey of the US public. Weather Forecast. 23(5), 974–991 (2008) A.H. Murphy, What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather Forecast. 8(2), 281–293 (1993) D.E. Robertson, Q.J. Wang, A Bayesian approach to predictor selection for seasonal streamflow forecasting. J. Hydrometeorol. 13, 155–171 (2012). doi:10.1175/JHM-D-10-05009.1 D. Shin, A. Schepen, T. Peatey, S. Zhou, A. MacDonald, T. Chia, J. Perkins, N. Plummer, WAFARi: a new modelling system for seasonal streamflow forecasting service of the Bureau of Meteorology, Australia. MODSIM2011. Perth (2011). M. Thyer, B. Renard, D. Kavetski, G. Kuczera, S.W. Franks, S. Srikanthan, Critical evaluation of parameter consistency and predictive uncertainty in hydrological modeling: a case study using Bayesian total error analysis. Water Resour. Res. 45, 22 (2009) N.K. Tuteja, D. Shin, R. Laugesen, U. Khan, Q. Shao, E. Wang, M. Li, H. Zheng, G. Kuczera, D. Kavetski, G. Evin, M. Thyer, A. MacDonald, T. Chia, B. Le, Experimental evaluation of the dynamic seasonal streamflow forecasting approach, Technical Report, Bureau of Meteorology, Melbourne (2011). http://www.bom.gov.au/water/about/publications/document/dynamic_sea sonal_streamflow_forecasting.pdf R.A. Vertessy, Water information services for Australians. Aust. J. Water Resour 16(2), 91–106 (2013). doi:10.7158/W13-MO01.2013.16.2 Q.J. Wang, D.E. Robertson, Multisite probabilistic forecasting of seasonal flows for streams with zero value occurrences. Water Resour. Res. 47, W02546 (2011) Q.J. Wang, D.E. Robertson, F.H.S. Chiew, A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites. Water Resour. Res. 45, W05407 (2009) D.S. Wilks, Statistical Methods in the Atmospheric Sciences – An Introduction (Academic, San Diego, 1995) T. Wilson, P. Feikema, J. Ridout, 2013 User feedback on the seasonal streamflow forecasts service, Bureau of Meteorology (2014)