Forecasting Electricity Demand A Time Series Approach 9783844331325

An unfortunate consequence of the widespread fascination for econometrics in the context of the Indian power sector anal

943 154 2MB

English Pages 123 Year 2011

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Forecasting Electricity Demand A Time Series Approach
 9783844331325

Citation preview

CONTENTS

Preface

Chapter 1 Prologue

1

Chapter 2 ARIMA Modelling

11

Chapter 3 Some Methodological Issues

65

Chapter 4 A Time Series Analysis

79

Chapter 5 Sectoral Analysis

References

97

109

i

ii

List of Figures and Tables

Figure 1: A stationary process

13

Figure 2: A non-stationary process

13

Figure 3: ACF of a white noise (stationary) series

18

Figure 4: ACF of a non-stationary series

18

Figure 5: PACF of a stationary series

20

Figure 6: PACF of a non-stationary series

20

Figure 7: ACF of an AR(1) (stationary) series

25

Figure 8: PACF of an AR(1) (stationary) series

25

Figure 9: A Moving Average (MA) series of order 1

32

Figure 10: ACF of a MA(1) series

33

Figure 11: PACF of a MA(1) series

36

Flow Chart 1: Box – Jenkins methodology

58

Table 1: Identification of ARIMA processes

59

Table 2: Identification of ARIMA processes

60

iii

iv

PREFACE

Forecasting of electricity consumption has become a significant element of utmost necessity of the planning exercise in the power sector, especially in the backdrop of the continuing energy crisis. The present work, based on a revised version of one of my earlier papers at Centre for Development Studies, Kerala, India, seeks to critically evaluate the electricity demand forecasting methodology available in general and proposes a methodology in the classical bi-variate time series framework.

An unfortunate consequence of the widespread fascination for econometrics in the context of the Indian power sector analysis has been the fatality of a consistent, logical approach to power demand analysis, in the particular confines of its objective environment. Correlating aggregate electricity consumption and gross domestic product in the industrialized, advanced, countries where electricity service contributes significantly to everyday life, social and work life, has become a standard tool of simple analysis for some obviously general conclusions and is beyond any methodological error as such there. However, the story is different and turns vicious with attempts at mapping this methodology on to an alien range in an underdeveloped power system where the contribution of the service of electricity is insignificant. This is so even in the industrial sector in India. Adoption of this methodology here then amounts to correlating the national/state domestic product or industrial product exclusively with an insignificant input, in violation of the ethics of a consistent and logical analytical exercise, and results in gross specification error.

Despite this glaring fact, a number of econometric studies have come up analyzing the power sector in India. However, the simplicity of the approach whether in terms of aggregation or by means of mechanical adoption of econometrics has in fact detracted from their value of analytical rigour and useful exercise.

Till the turn of the 1980s, Kerala had been pictured as a power surplus state, ostensibly exporting power to neighbouring states (Period I). Since the draught year of 1982-83, unprecedented power shortage has become a part of life in the state (Period II). Any v

demand analysis in the context of the Kerala power system must distinguish between the Period I of no-power-shortage (till 1981-82), and the Period II of power cuts (since 1982-83), and hence adjust the supply–constrained demand data series of Period II upward in order to ensure its continuity and conformity. This we accomplish here by assuming that the unconstrained demand of Period II is basically a continuum of the demand function of Period I, but for changes in the demand intensity rate. That is, we assume that the Period I demand relationship continues unhampered into Period II also, but with time-varying slope coefficient, which is obtained in effect by allowing the slope of the Period I demand function to grow over time at a rate at which the ‘actual’ rate of change of demand in Period II grows in relation to change in number of consumers.

Now that we have the complete consumption data base, consisting of the original consumption data till 1981-82 and the estimated, unconstrained data since 1982-83, we can feed it into an appropriate model for forecast purposes. We follow the logic of multi-variate autoregressive-moving average (MARMA) model, also called transfer function model, that correlates a dependent variable to its own lagged values, current and lagged values of one or more explanatory variables, and an error term whose behaviour is partially ‘explained’ by a time series model.

It goes without saying that any work, howsoever privately conceived it is, is a concrete outcome of the collective consciousness. This also has thus a social foundation. I am grateful that the academic atmosphere at the Centre for Development Studies has significantly contributed to that social basis of this study.

A good deal of my grateful appreciation and love is due to my near and dear ones; and this I dedicate to the burning memory of my parents and my sisters Akkan and Ammini with an unsatiated sense of unfulfilled obligations in the cruel face of untimely exits. And finally my family now – as Rju as usual smiles away my excuses for my absences from her little kingdom and Kala resigns herself to my non-corrigible only-the-weekend return to family care, I just seek to balance all, bring all to mind … In balance with this life, this academic pursuit!

vi

Tasmim stwayi kim veeryamitya peedhaam sarvam Dhaheyam yadhidham prdhivyaamiti.

(What power is in you? I could burn all whatever there is on earth.) – Kena Upanishadh 3.5

vii

An unseen presence, So munificently beneficent; But a dangerous touch; A torturing scorcher; Can you forecast it? But forecasting has a limit…. Yes, the sky is the limit! And they argue: There is a method even in madness…. And my Socrates adds: Vice versa! We put equations in time; But who puts time in equations? Are you a whole of parts; Or a part of a whole? Forecasting a whole upon the parts, Or, the parts upon a whole? Or, both? A dialectics!

– Vijayamohanan Pillai N.

viii

Chapter 1: Prologue

“That silent force Buzzing through the overhead wires Encircling the house Penetrating the walls Stopping Waiting Needs connecting And silently starts working Do you see it? Do you think about it? Do you know that it affects your day? But blindly you press on – unaware Until it’s gone – taken from you And you cry out You cry out How can I live without you!” – Jono Miller (‘Electricity’)

“Soviets + Electricity = Socialism”, that was one of Lenin’s development slogans. And that reflects the significance of electricity as a vital input to the wellbeing of any society and that is why the demand for it from an ever-expanding set of diverse needs is growing on an increasing rate. This in turn places increasing demands on scarce resources of capital investment, material means, and man-power. Prognosis and forecasting of electricity consumption have thus become a significant element of utmost necessity of the planning exercise in the power sector. More specifically, the continuing ‘energy crisis’ has made crucial the need for accurate projection of electricity demand; hence the importance of the forecasting methods.

In what follows we critically evaluate the electricity demand forecasting methodology and propose a methodology in the classical time series framework. The book is divided

into five chapters including this introduction. The next section in this chapter presents a brief theoretical discussion on forecasting and demand analysis and the next chapter, a brief introduction to classical time series analysis. The third chapter discusses some methodological issues involved and introduces a new forecasting technique in the context of Kerala power system beset with supply shortages. The last two chapters give the empirical results of our attempt to forecast electricity demand in Kerala in a simple, objective and theoretically sufficient manner, one for the aggregate system and the other for different sectors.

Electricity demand: Forecasting Methods The forecasting methods used for electricity demand in general may be divided into formalized and non-formalized methods.1

The first set includes extrapolation and correlation (regression) methods. For more distant periods of time in which considerable changes of the structure of the power sector must be considered, attempts have been made to use non-formalized methods such as some variants of Delphi method.

In the case of the formalized forecasting methods, two approaches may be distinguished in their scope:

i)

an approach in which we try to penetrate the internal structure, and the internal and external linkages of the observed object and to explain its response to input impulses; and

ii)

a statistical approach in which the object is treated as a ‘black box’ whose internal workings are unknown.

The first approach is partly represented by the method of structural analysis. It proceeds from a relatively detailed description of the most important parts of the power 1 This section largely draws on Pillai (2010).

2

system, from energy sources through processes of refining and of transforming individual forms of energy, to the final consumption of energy. This method, however, is used to a limited extent only.

More common are the statistical approaches that take the object as a ‘black box’ and try to explain its mechanism on the basis of the interconnections of the individual elements of the observed path of the system. Here the analysis of time series and onedimensional or multi-dimensional correlation (regressions) are used.

Time series analysis is used to predict the future path of the system by extrapolation on the basis of past observations. However, though the individual mathematical functions may characterize the past development of the system well, extrapolation of future trends is very risky and the mathematical function to be used must be chosen very carefully. For more details see my Econometrics of Electricity Demand: Questioning the Tradition (Piilai 2010).

The first-order autoregressive trend model is generally useful for only short-term forecast, since long-term forecast obtained from this model is nothing but the stationary mean of the series. A variant of this model is the logarithmic (log-linear) autoregressive trend model. In these models one has the option of setting the intercept term (a) = 0; then the slope parameter, 'b', in the simple autoregressive model represents the rate of change of the series yt, and in the logarithmic model, the compounded rate of growth of the series. Both linear and compounded extrapolations based on these two autoregressive models are commonly used as a simple means of forecasting. Note that these models involve regression with a lagged dependent variable. If the additive error process is serially correlated, the coefficient estimates will be inconsistent.

It should be remembered that the simple extrapolation methods are at best useful as only a quick way for casual forecasts, as they usually provide little forecasting accuracy. It is advisable to consider some forecast statistics such as standard error of forecast, root-mean-square simulation error, etc. The time series analysis in 3

autoregressive model may be extended to more sophisticated ARIMA (autoregressiveintegrated-moving average) model for forecast.

In any given case, such regression functions need not be equally convenient. For instance, in the case of electricity consumption, numerous analysis have revealed that three very different time intervals can be defined for the long-run development of individual countries. The first corresponds to low values of energy consumption per capita and is marked by a considerable variation in annual increments. After having reached a certain value of the per capita consumption, the development becomes steadier and its trend begins to conform to an exponential pattern. Annual increments stabilize and assume a normal distribution. Having achieved a certain development level, the development gradually slows down. That is why the extrapolation requires the utilization of functions with decreasing annual increments.

The time trend analysis explains only the most important basic components of the development. The explanation of other regular residuals about the line of the trend requires the application of correlation (regression) methods, mainly those of a multidimensional nature. To a certain extent, they also help consumption forecasts for more distant periods of time.

Using correlation (regression) methods, we try to consider the influence of internal and external factors affecting energy consumption, such as:

a) demographic factors: population development, manpower development, etc.;

b) level of economic activity: total national product (income), material consumption, size of public sector, price level, etc.;

c) climatic conditions: number of frosty days, degree days, etc.

4

The relation between electricity consumption and social-economic factors may be described in terms of either simple linear models or log-linear models. In some cases incremental method, with all the variables taken in increments, also is used.

It should be pointed out that the use of such models is connected with the prognosis of the independent variables. This in turn may involve macro-econometric modeling.

Econometric Studies A classical survey of the studies on the demand for electricity (in the U. S.) was given by Taylor (1975) in the Bell Journal of Economics, and it was later on updated and extended to natural gas, heating fuels, diesel and aviation fuels, coal, and gasoline (Taylor, 1977). In summarizing the empirical results on the demand for electricity in his survey paper in 1975, Taylor concluded:

(a) The price elasticity of demand for electricity, for all classes of consumers, is much larger in the long run than in the short run.

(b) This holds for the income elasticity of demand.

(c) The long run price elasticity of demand is indicated to be elastic.

(d) The evidence on the magnitude of the long run income elasticity is much more mixed. Estimates range from 0 to 2, and clearly depend on the type of model employed.

No econometric study of electricity demand had dealt with the decreasing block pricing in a completely satisfactory way, and the estimates of price (as also income) elasticities probably contained biases of indeterminate sign and magnitude as a consequence. Taylor's suggestions (1975) to deal with this problem were two-fold:

5

(a) Multi part tariffs require the inclusion of marginal price and intramarginal expenditure as arguments in the demand function, and

(b) The prices employed should be derived from actual rate schedules.

All the studies (in the US) had used either ex post average prices or prices derived from Typical Electric Bills, an annual publication of the US Federal Power Commission. In response to Taylor's suggestions, most of the studies since then have utilized 'the wisdom of employing electricity prices from actual rate schedules'. Several studies have also sought to improve modeling of the dynamics of electricity demand through inclusion of stocks of electricity consuming appliances in the demand function, and also the possibilities of inter-fuel substitution. Some other studies have utilized date on individual households, small geographical areas, or the area served by a utility, in a bid to utilize a data set of higher quality than that provided by data at the state, or national level, as well as to avoid (or at least reduce) aggregation bias in estimates of price and income elasticities.

To account for the dynamic characteristic of demand, a lagged dependent variable is usually used as a regressor in the log-linear model with a partial adjustment mechanism (Koyck distributed lag with geometrically declining weights). This specification facilitates to distinguish between short run and long run elasticities. Thus while the coefficient of the price variable in this model represents the short run price elasticity of demand, the long run price elasticity is obtained by dividing the short run coefficient by one less the coefficient of the lagged dependent variable used as a regressor, i.e., by the rate of adjustment. However, the presence of the lagged dependent variable, as already noted, makes the OLS estimator inconsistent due to the possible correlation between the lagged endogenous variable and the random variable, as well as the serial correlation among the successive values of the latter.

6

Demand Forecasting in India At the all-India level, forecasts of electricity requirement and demand are made by the Planning Commission and by the Annual Electric Power Surveys (APS) convened by the Ministry of Energy, the Central Electricity Authority (CEA) being the Secretariat to the APS. These are not independent forecasts to the extent that the two bodies do have extensive discussion that usually leads to a reconciliation of results. Still, subtle differences exit between the methodologies employed by them.

The Planning Commission projects electricity demand as part of its macro-economic analysis for all the sectors of the economy. Industrial requirements are derived for a set of ‘major’ industries – mostly large scale or very electricity intensive – by applying consumption norms to production targets. In the Draft Plan, consumption in the rest of the industrial sector is assumed to be proportional to that in the major industries (60 per cent of that for major industries). Railway and irrigation requirements are also estimated by using sectoral targets and norms of electricity use. Consumption in other sectors – domestic, commercial, public lighting, water works, and miscellaneous – is based on trend growth rates or on regression analysis that relates sector growth rates to electricity demands. Using the input-output model, the Planning Commission is able to check the consistency of the resultant macro-forecast with its expectations for the economy at large.

On the other hand, the APS begins with a detailed survey of industrial establishments that demand 1 MW or more of electricity to solicit their views on how rapidly their requirements for power will increase. The methods used to derive forecasts for other consumer categories are much the same as those employed by the Planning Commission – many of the coefficients relating output to electricity are identical – but with two differences. Regression analysis as opposed to trend analysis has not been used, and the exercise is conducted state-wise and then region-wise before being aggregated to the national level. The views the APS forms about the prospects for power in each state form the basis for the involvement of the SEBs and state governments in the demand forecasting exercises. Since the forecasts underpin the 7

state-wise investment programs and these in turn influence the case for Central Plan assistance, there is a fundamental concern with the APS position.

Both the Planning Commission and the APS rely heavily for forecasting on the targets for other sectors – major industries, agriculture (irrigation), railway traction – and then use electricity consumption norms to derive electricity forecasts. Thus, they are both often characterized as end-use forecasting methods. The major difference is that the Planning Commission’s analysis begins with macro-economic considerations and the APS’ with completely disaggregated data. Forecasts based on the end-use methods, in the sectors where end-use coefficients are applied, have the advantage that they are not affected by uncertainties about the extent of shortages, past or present. But the regression and trend analyses of the APS and Planning Commission approaches are affected, primarily in the estimation of the regression coefficients and secondly in the choice of base period values. The Planning Commission approach has the virtue of strong consistency with the overall Plan, but the disadvantage of not providing clues to levels of consumption in individual states or in particular load centres. The APS, while providing disaggregated data, suffers from its length of preparation and the considerable cost involved in organizing a detailed survey of so many units throughout India. Furthermore, it has been found that the APS may often be upwardly biased. The APS forecasts exceed the demand met by between 20 and 80 per cent, and the divergence generally increases in the later years, as might be expected. Thus the energy consumption forecasts for Kerala by the successive APS since the 12th APS, for 1994 are in the order of 12466, 9328, 9409, and 8567 million units (MU) respectively (by the 12th, 13th, 14th, and 15th, the latest, APS). It should, however, be pointed out that it does no good to compare these forecasts with the actual demand met (about 7027.7 MU of energy internally sold) in Kerala, fraught with severe power cuts and load shedding. One way to account for such upward divergence is to regard it as reflecting the unsuppressed demand more faithfully than the realized demand.

8

Demand Projections: Kerala Demand projections for Kerala based on the above (12th, 13th, 14th and 15th) APS results are in consideration now in the state. A steady decrease in the peak demand/energy consumption requirements is discernible in each of these forecasts that is attributed to some restrictions and revisions in the trends relative to the base year, (reflecting the increasing quanta of suppressed demand due to lack of generation capability). The State has accepted the 14th APS as ‘more dependable’ (Government of Kerala, 1997: 12); whereas the Balanandan Committee (to study the development of electricity in Kerala) finds the 15th APS ‘as the better estimates for future planning’ (Government of Kerala, 1997: 37). Considering the divergences in these forecasts of the APS for Kerala (as shown above, for example, for 1994), the State Planning Board constituted a working group to study the demand forecasts for Kerala. The committee used a log-linear model and growth rates of 4.72, 10, and 15 per cent for the HT and EHT industries to arrive at three different demand projections. The domestic demand projections in all these exercises were based on the growth of population as per the Census report. For the other sectors, the projections were made based on the trends (using semi-log scale). It should be noted that the energy demand forecast for 1994 by the Committee is only 8945 MU.

A number of computer software packages of energy planning models are available at present for energy demand forecasts, such as LEAP (Long range Energy Alternative Planning), BEEAM-TEESE (Brookhaven Energy Economy Assessment model-TERI Economy Simulation and Evaluation), MEDEE-S, ELGEM, etc. International Energy Initiative (IEI), Bangalore, has put forward a Development Focused End-Use Oriented, Service directed methodology (DEFENDUS) for estimating demand and supply of energy in an energy system (see, Amulya Kumar N. Reddy 1990), and an exercise based on this has been done for the KSEB. This methodology, with its twin focus of developed living standard and improved end-use efficiency, seeks to estimate demand for a particular energy source/carrier in a given year based on two variables – the number of energy users and their actual energy requirement in any base year as well as

9

the expected changes in the subsequent years. The total energy demand is then equal to the aggregate demand of all the categories of users for every end-use.

The trend analysis is simple and viable, and logical enough also in that the time trend incorporates the effects of the technological advance on the supply side

(that

facilitates availability upon demand) and the progress in requirement level on the demand side over time. The latter may in general be approximated to the improvement in the living standard of the customers, and to this extent the trend analysis is sufficient in itself. However, from the perspective of a more general and comprehensive socialeconomic framework, a closer examination of the influence of other, more immediate determinants is in order. The variables of primary significance here are income of the consumer, and price of energy he is to pay, along with the relevant social, demographic, environmental or climatic, and other economic factors. This in turn involves an econometric analysis.

10

Chapter 2: ARIMA Modelling

Modelling for forecasting has a limit…. Yes, the sky is the limit!

Non-stationarity or Unit Root Problem Many macroeconomic time series, such as national income and electricity demand, are nonstationary. A time series, which is theoretically a particular realization (i.e., a sample) of a stochastic process, is said to be stationary, if its characteristics are finite and invariant with respect to time. This simply means that the mean, variance and autocovariances of the series are all constants. In this case, the time series can be described in terms of a regression model with fixed coefficients, estimated from past data. If, on the other hand, the series is non-stationary with non-finite and timevarying parameters, then its regression modelling with fixed coefficients is not possible.

For an instance, consider the process given by a difference equation with a stochastic term ut and intercept α = 0: yt = φyt−1 + ut,

…..

(1)

where φ is the root of (1). In the above equation, the error term ut may be considered an ‘information shock’ or ‘innovation’, because it is the only new information to yt.

11

The error term ut is assumed to be a ‘white noise process’. This term is taken from the signal theory in physics. It is analogous to white light which contains all frequencies and is a random signal (or process) with a flat power spectral density. That is, the signal’s power spectral density has equal power in any band. This in turn implies the following:

(a) E(ut) = E(ut–1) = …. = 0; (b) Var(ut) = Var(ut – 1) = …. = σ2; (c) Cov(ut; ut–k ) = Cov(ut–s; ut–s–k) = …. = 0, for all k and s; k ≠ 0: that is, ‘no memory’; and (d) ut is normally distributed. (This assumption is not essential.)

….. (2)

White noise process is a stationary process, but not vice versa! For a stationary process, mean, variance, and covariance are all constant, independent of t. This definition is termed covariance or weakly stationarity. Thus a process yt is weakly stationary if 1. E(yt) = µ < ∞

(constant mean)

2. Cov(yt, yt–k) = γk = E[(yt – µ)( yt–k – µ)] < ∞; t = 0, ±1, ±2, ….; k = 0, ±1, ±2, ….; (autocovariance depends only on k, not on t)

For k = 0, the second condition implies 3. Var(yt) = γ0 = σ2

(constant variance).

12

Given the autocovariances and variance, we may define autocorrelation function (ACF) or correlogram, that is, sequence of correlation coefficients between pairs of values of the process {yt}, separated by an interval of length k, as

 

 ,

   



 , 



 

.

….. (3)

Fig. 1: A Stationary Process 5 4 3 2 1 0 -1 -2 -3 -4 -5 1955

1960

1965

1970

1975

1980

1985

1990

Fig. 2: A Non-Stationary Process 895 890 885 880 875 870 865 860 855 1955

1960

1965

1970

1975

1980

1985

1990

13

Note that the process given above (1) is the first order autoregressive, AR(1), model. If the root, | φ | < 1, the process is dynamically stable (i.e., stationary) and if | φ | > 1, it is dynamically explosive (i.e., non-stationary). If the process has a unit root, i.e., if | φ | = 1, the process never dampens down nor explodes. In this case, yt can be represented, through successive substitution and assuming that the initial value (of yt at t = 0) y0 = 0, in terms of the cumulation of all the past shocks: yt = ut + ut−1 + .. = Σui for i = 1, 2, …, t; thus the behaviour of yt is determined solely by the cumulation (from past to the present) of random shocks. That is, the shock persists and the process is non-stationary.2 Thus unit root problem refers to non-stationarity problem of economic time series. In this case, the process has a zero mean (that is, its trend, Σui for i = 1, 2, …, t; is stochastic, that cannot be predicted perfectly), but its variance and autocovariances increase infinitely with time: 1. The mean of yt is E(yt) = ΣE(ui) = 0, 2. variance of yt is Var(yt) = γ0= ΣVar(ui) = tσu2, for i = 1, 2, .., t, and 3. autocovariances for lag k are Cov(yt yt–k)= γk = E{Σui Σui–k ) = tσu2, both functions of time; and 4. ACF =  

 

 1,   0, 1, 2, … ..

….. (4)

Thus for a non-stationary process, there is perfect autocorrelation; note that φ which in absolute value is unity is both the autoregression coefficient and autocorrelation coefficient for equation (1). That ACF is non-decaying is thus indicative of nonstationarity. However, in general, theoretical ACF is unknown; hence we estimate

2 Remember the solution of a homogeneous first order difference equation yt = φ yt−1 is given by y0 φ t. The time path of the process converges, persists in oscillation or diverges (explodes) according as the root | φ |

is less than, equal to or greater than

unity.

14

Sample ACF ( ), using sample autocovariances and sample variance, which is given

by:

T −k

ρˆ k =

∑ ( yt − y )( yt + k − y ) t =1

T

∑ ( yt − y )

….. (5)

t =1

Note that in the sample estimate, as k increases, T – k decreases, so the numerator sum falls; the denominator remaining the same, the sample ACF tends to decay as k increases. Thus, a slowly decaying sample ACF suggests a non-stationary series. Let us now define partial autocorrelation function (PACF) or partial correlogram as correlation between the current value and the value k periods ago, after controlling for observations at intermediate lags (i.e., all lags < k). Thus this measure is analogous to partial correlation coefficient, given by

r12⋅3 =

r12 − r13r23 (1 − r132 )(1 − r232 )

….. (6)

for three variables x1, x2, and x3, obtained after removing the linear relation of x3 with x1 and with x2 (i.e., r13r23) from r12. Note that the autocorrelation is easy to interpret as the correlation coefficient of the current value of the dated variable with itself lagged a certain number of periods. The partial autocorrelations, however, are a bit more difficult to interpret; they measure the correlation of the current and lagged series after taking into account the predictive power of all the values of the series with smaller lags. The partial autocorrelation for lag 3, for example, measures the added predictive power of yt–3, when yt–1 and yt–2 are already in the prediction model.

We know that in a regression, the coefficients are partial regression coefficients. So in an AR model, say, yt = φ1yt–1 +φ2yt–2 + …. + εt ,the autoregression coefficients, φi, i = 15

1, 2, …., are the partial autocorrelation coefficients. At lag 1, autocorrelation coefficient is the partial autocorrelation coefficient itself, since with yt and yt–1, there is no intermediate lag effects to eliminate. Thus, given yt = φ21yt–1 +φ22yt–2 + εt, PAC of yt with yt–2 is:  

!

 ! , 

and

" #" !#"

.

….. (7)

where ρ1 = γ1 / γ0 and ρ2 = γ2 / γ0 are AC coefficients. For additional lags,  

" # ∑  &' % ,& " & !#∑  &' % ,& "&

,   3, 4, 5, … ..

….. (8)

where +,  +#!,, - + +#!,#, , j = 1, 2, 3, …., k–1.

Random walk With a unit root, the above process in (1: yt = φyt−1 + ut) is called a random walk (without drift), in recognition of the similarity of the evolution of yt to the random stagger of a drunkard. Thus the change in yt, i.e., the first difference ∆yt = yt − yt−1, is simply a (stationary) white noise (ut) and is hence independent of past changes. Thus in this case yt can be made stationary through first-differencing. A series that can be made stationary through differencing is said to belong to difference stationary process (DSP) class. Adding a constant (α ≠ 0) to the above simple random walk model yields a random walk with drift, accounting for a deterministic trend also in the series, upward or 16

downward depending on α > or < 0. The process in this case can be written as αt + Σui for i = 1, 2, …, t, so that the mean, variance and autocovariances of the process are all functions of time. On the other hand, for a stationary (less-than-unit root) process, all these characteristics are constant and this property is made use of in econometric estimation. Consequences of non-stationarity As already noted, if the series is non-stationary with non-finite and time-varying parameters, then its regression modelling with fixed coefficients is not possible. Remember that the OLS estimator from a regression of yt on xt is the ratio of the covariance between yt and xt to the variance of xt. If yt is a stationary (|φ | < 1) variable and xt is non-stationary (|φ | = 1), the OLS estimator from the regression of yt on xt converges to zero asymptotically, because the variance of xt, the denominator of the OLS estimator, increases infinitely with time and thus dominates the numerator, the covariance between yt and xt. Thus the OLS estimator cannot have an asymptotic distribution. This is the unit root (non-stationarity) problem.

Stationarity Checking

We know for a white noise process, Cov(εt, εt±k) = γk = 0 for all k; t; k ≠ 0; (Serially Uncorrelated) = σ2 ,

So the ACF,

k = 0. (Constant variance)

ρk = γk/γ0 = 0, for all k; k ≠ 0; = 1,

k = 0.

….. (29)

This is the basis for test on white noiseness or stationarity. We have two types of tests of significance on AC. One is based on individual autocorrelation coefficient. Here the

17

null hypothesis is that a particular value of the autocorrelation function is equal to zero: H0: sample ρk = 0, k > 0.

….. (30)

Fig. 3: ACF of a White Noise (Stationary) Series 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 0

5

10

Fig. 4: ACF of a Non-Stationary Series 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

5

10

Bartlett (1946) has shown that if the series is generated by a white noise process,

i.e.,

yt ∼ N(0, 1), then sample ρk ∼ approximately N(0, 1/T); where T = sample size.

18

Hence a 95% confidence interval is: ± 1.96

1 T

….. (31)

The null, H0: sample ρk = 0, is rejected, if sample ρk falls outside this region for any k. The second statistical test is based on the joint hypothesis that all of the autocorrelation coefficients are zero: H0: sample ρ1 = 0, ρ2 = 0, ρ3 = 0, …….

….. (32)

Here we make use of the Box-Pierce Q statistic (1970): m

Q = T ∑, ρˆ k2

….. (33)

k =1

where T = sample size; and m = maximum lag. Q ∼ χ2m under the null (that all m coefficients are zero). However, the Q statistic is beset with small sample problem, and hence we have another statistic: m

Ljung-Box Q statistic (1978): Q* = T (T + 2)∑ k =1

ρˆ k2 T −k

,

….. (34)

where Q* ∼ χ2m under the null. Note that asymptotically, Q* is equivalent to Q, as (T+2) and (T – k) cancel out. If the calculated value of Q* is greater than, say, the 5 percent significance level, we can be 95 percent sure that the true autocorrelation coefficients ρks are not all zero. Tests of Significance on PACF is the same as for ACF. For individual partial autocorrelation coefficient, the 95% confidence interval is given by: ± 1.96 1 . T The null, H0: sample PACC = 0 is rejected, if it falls outside this region for any k.

19

Spurious regression and Differencing Results from regressions with non-stationary variables can be very much misleading. However, for a long time, despite the fact that macroeconomic time series are often nonstationary, researchers had access only to standard methods developed for stationary data. In 1974, Clive Granger along with his colleague Paul Newbold

Fig. 5: PACF of a Stationary Series 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 0

5

10

Fig. 6: PACF of a Non-Stationary Series 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 0

5

10

20

(Granger and Newbold 1974) demonstrated that estimates of relationships between nonstationary variables could yield nonsensical results by erroneously indicating significant relationships between wholly unrelated variables. They found that the regression coefficient estimated from two series generated by independent random walk processes was statistically significant, with very high R2, but very low DW statistic (indicating high autocorrelation in residuals). When the regression was run in first differences of the series, the R2 was close to zero and the DW statistic close to 2, thus proving that there was no relationship between the series and the significant results obtained earlier was spurious. Hence they suggested that the event R2 > DW meant ‘spurious regression’ and the series should therefore be examined for association by running regression in the first differences of the variables, since the differenced series are statioanry. Plosser and Schwert (1978) gave further empirical evidences in favour of first-differencing in regression models.

Thus a common approach to dealing with the problem of nonstationary data was to specify statistical models as relationships between differences, i.e., rates of increase. Instead of using the exchange rate and the relative price level, for example, one would estimate the relationship between currency depreciation and relative inflation. If the rates of increase are indeed stationary, traditional methods provide valid results. But even if a statistical model based solely on differences can capture the short-run dynamics in a process, it has less to say about the long-run covariation of the variables. This is unfortunate because economic theory is often formulated in terms of levels and not differences.

Trend removal via differencing to induce stationarity in non-stationary series is still an important stage in autoregressive-integrated-moving average (ARIMA) model building (Box and Jenkins 1970). If a series requires differencing k times to be stationary, the series is said to be integrated of order k, denoted by I(k). In the earlier example of random walk model (1), yt is I(1) variable, while ∆yt is I(0), stationary, variable.

21

Since regression with non-stationary series involves the risk of spuriousness, it is essential that we check for stationarity of the variables. In the classical time series approach, correlogram is used non-stationarity test.

ARIMA Modelling The modelling and forecasting procedures in general involve knowledge about the mathematical model of the process. However, in real-life research and practice, patterns of the data are unclear, individual observations involve considerable error, and we still need to not only uncover the hidden patterns in the data but also generate forecasts. The ARIMA methodology developed by Box and Jenkins (1970) allows us to do just that; it has gained enormous popularity in many areas and research practice confirms its power and flexibility. However, because of its power and flexibility, ARIMA is a complex technique; it is not easy to use, it requires a deal of experience, and although it often produces satisfactory results, those results depend on the researcher's level of expertise. The following sections will introduce the basic ideas of this methodology. For those interested in a brief, applications-oriented, introduction to ARIMA methods, we recommend Makridakis, Wheelwright and McGee (1983).

The ARMA Processes

Suppose a stationary process {xt} is taken as the input (signal) to some dynamic system which then produces an output: . yt = ∑ bk xt − k k

….. (8)

Such a system is a (linear) filter – a nomenclature from electronics, where bk is the response in the output at time t+k to a unit pulse in the input at time t; and is called the k-period multiplier or transient response or impulse response: ∂yt+k /∂xt = ∂yt /∂xt–k. 22

The time path (for all k) of all such multipliers is called the impulse response function, which shows how the entire time path of a variable is affected by a stochastic shock.

Note that the impact multiplier is: ∂yt /∂xt, the impact effect of a change in xt on yt, which in this case is b0.

The above system may be taken as (1) a regression in yt and xt; (2) An Autoregressive process (AR) in yt, if xt = yt; (3) A Moving Average process (MA) in yt, if xt is white noise. An autoregressive model is essentially an infinite impulse response filter, having an impulse response function, which is non-zero over an infinite length of time. MA process, on the other hand, is a finite impulse response filter, which has only fixedduration impulse responses.

Below we consider the properties of different AR and MA processes.

Autoregressive (AR) process. Most time series consist of elements that are serially dependent in the sense that one can estimate a coefficient or a set of coefficients that describe consecutive elements of the series from specific, time-lagged (previous) elements.

The first order Autoregressive process AR(1) The process yt is an AR(1) process with zero mean if: yt = φyt–1 + εt where εt is a white noise process. Note yt is a first order stochastic difference equation. The terminology ‘autoregressive process’ is due to the fact that yt has a regression on its own past values. In AR(1), there is only one period dependence; thus there is only one period memory; such a process is also called a Markov process.

By repeated substitutions, we get 23

yt = φyt–1 + εt =

t −1

∑φ kε t −k .

….. (9)

k =0

The impulse response function is: φ k, k = 0, 1, 2, ……., which is infinite. Thus an autoregressive model is essentially an infinite impulse response filter. This process will be stationary if and only if |φ| < 1. It can be easily checked that when this is so,

1. E(yt) = 0 2. Var(yt) = σ 2 /(1–φ 2),

|φ| < 1, (constant variance)

3. Cov(yt; yt–k) = γk = φ k σ 2 /(1–φ 2), |φ| < 1, (depends only on k, not on t) 4. ACF = ρk = γk / γ0 = φ|k|, k = 0, 1, 2, …..

….. (10)

Since |φ| < 1 for a stationary series, ACF falls to zero as k increases, and this serves as a sign of stationarity. A positive φ implies direct convergence and a negative φ, oscillatory convergence. Also note that ACF equals the impulse response function. 5. PACF = ρ1 = φ.

….. (10 a)

There is only one partial autocorrelation coefficient. This together with the fastdecaying ACF (proving stationarity) serves as a sign of an AR(1) process. If φ = 1, unit root, we have a random walk: t −1

yt = yt–1 + εt =

∑ε k =0

,

t −k

….. (11)

an accumulation of past shocks. For this non-stationary yt, 1. E(yt) = 0 24

2. Var(yt) = σ 2t,

(function of t)

3. Cov(yt; yt–k) = γk = σ 2t, (function of t) 4. ACF = ρk = γk/γ0 = 1 5. PACF = φ = 1.

….. (12)

The non-decaying ACF (or the slow-decaying sample correlogram) is a sign of nonstationarity. That there is only one partial autocorrelation coefficient may be taken

Fig. 7: ACF of an AR(1) (Stationary) Series 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 0

5

10

Fig. 8: PACF of an AR(1) (Stationary) Series 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 0

5

10

25

(together with the non-decaying ACF) as a sign of order of integration (that is, order of differencing to make the series stationary). Thus the first difference of the above process, ∆yt = yt – yt–1 = εt, a stationary (more precisely, white noise) process. Note that E(yt) is finite if |φ| ≠ 1, that is, if there is no unit root, and we know that yt is stationary if |φ| < 1.

The general AR(1) process is specified as t −1

yt = α + φ yt–1 + εt = α ∑ φ + k

k =0

t −1

∑φ kε t −k

….. (13)

k =0

For stationary series, if |φ| < 1, E (yt) = α / (1 – φ),

….. (14)

a constant. And variance and covariances are also constant, exactly the same as in (10). If φ = 1, unit root, yt = α + φ yt–1 + εt = αt +

t −1

∑ εt −k.

….. (15)

k =0

E (yt) = αt,

….. (16)

a function of time; and variance and covariances are also functions of time, exactly the same as in (12).

The second order Autorregresive process AR(2): yt = φ1yt–1 +φ2yt–2 + εt ,

….. (17)

26

where εt is a white noise process. Note that yt is a second order stochastic difference equation and that we can rewrite it using the lag operator L as: (1 – φ1L – φ2L2) yt = εt,

where

….. (17 a)

Lyt = yt–1; ….., Lkyt = yt–k. Or,

φ(L) yt = εt,

….. (17 b)

where φ(L) = (1 – φ1L – φ2L2) is a lag polynomial of order 2. The stationarity conditions for AR(2) are obtained from the expressions of moments.

Mean of AR(2): E(yt) = E{(1 – φ1L – φ2L2) yt } = (1 – φ1 – φ2) E(yt) = E(εt) = 0.

With a constant c, E(yt) = c /(1 – φ1– φ2);

Thus one stationarity condition is:

….. (18)

(φ1+φ2) < 1.

(φ1+φ2) < 1.

….. (18a)

….. (19)

Second moments: Given yt = φ1yt–1 +φ2yt–2 + εt , multiply by yt–k and take expectations to get the YuleWalker equations: E(yt yt–k ) = E(φ1 yt–1 yt–k ) + E(φ2 yt–2 yt–k ) + E(εt yt–k )

.…. (20)

Remember we have

E(εt yt–k ) = 0,

k ≥ 1; 27

= E(εt yt) = E(εt2) = σ2, k = 0.

Now from (20):

Cov(yt yt–k) = γk = φ1γk–1 + φ2γk–2 ; k = 1, 2, ….. ….. (21)

Thus the ACF is given by:

ρk = φ1ρk–1 + φ2ρk–2 ;

k = 1, 2, …..

A difference equation of order 2. Thus all ρk can be calculated recursively:

When k = 1, we have

ρ1 = φ1ρ0 + φ2ρ–1 = φ1ρ0 + φ2ρ1 = φ1/(1–φ2);

|φ2| < 1.

….. (22)

|φ2| < 1.

….. (23)

When k = 2,

ρ2 = φ1 ρ1 + φ2 ρ0 = φ12/(1–φ2) + φ2;

Similarly we get AC coefficients, ρk , that all show geometric decay. Thus another stationarity condition is given by: |φ2| < 1.

….. (24)

Now we can find the variance: Var(yt) = γ0 = φ1γ–1 + φ2γ–2 + σ2 = φ1γ1 + φ2γ2 + σ2 Multiplying through by γ0 /γ0 and rewriteing,

γ0(1 – φ1 ρ1 – φ2 ρ2) = σ2; Or γ0 = σ2/ (1 – φ1 ρ1 – φ2 ρ2); 28

and substituting:

ρ1 = φ1/(1–φ2); and ρ2 = φ12/(1–φ2) + φ2, we have σ2 1 − φ2 Var(yt) = γ0 = 1 + φ2 ( φ1 + φ2 − 1 )( φ2 − φ1 − 1 )

….. (25)

This gives the stationarity conditions: 1. |φ2| < 1;

2. (φ1+φ2) < 1;

3. (φ2 – φ1) < 1.

….. (26)

To find the PACF, we consider yt = φ1yt–1 +φ2yt–2 + εt. In this equation φ1 is the first PAC (and AC) coefficient: PACC1 =φ1= ρ1;

….. (27)

and φ2 is the PAC coefficient at lag 2: 2 2 PACC2 = φ2 = ( ρ 2 − ρ1 ) /(1 − ρ1 )

….. (28)

where ρ1 = φ1/(1–φ2); and ρ2 = φ12/(1–φ2) + φ2. The notation AR(p) refers to the autoregressive model of order p. p

The AR(p) model is written

yt = c + ∑ φi yt −i + ε t i =1

where φ1, …, φp are the parameters of the model, c is a constant and εt is an error term. The constant term is omitted by many authors for simplicity.

Identifying an AR process In an AR(1), yt = φ1yt–1 + εt, φ1 is both AC and PAC coefficients: PACC1 = φ1 = ρ1; 29

In an AR(2), yt = φ1yt–1 +φ2yt–2 + εt,

φ1 is the first PAC coefficient: PACC1 = φ1 = ρ1(1–φ2); 2 2 φ2 is the PAC coefficient at lag 2: PACC2 = φ2 = ( ρ 2 − ρ1 ) /(1 − ρ1 )

Thus in an AR(1) model, there is only one partial autocorrelation coefficient and in an AR(2) model, there are two. Generalizing this we find that in an AR(p) process, number of PAC coefficients is p. This property is used for identifying (the order of) an AR model.

Moving average process. A moving average forecasting model uses lagged values of the forecast error to improve the current forecast. A first order moving average term uses the most recent forecast error, a second-order term uses the forecast error from the two most recent periods, and so on. Suppose a stationary process {εt} is taken as the input (signal) to some dynamic system which then produces an output: yt =

q

∑θk ε t − k

….. (35)

k =0

The relation is a MA(q) of yt , in terms of εt – the output of a filter with a white noise input and transient response θk. In general, θ0 = 1. With lag operator, yt = θ(L)εt ,

….. (36)

where θ(L) = (1 – θ1L – θ2L2 – …. – θqLq)

30

The first order Moving Average process MA(1): yt = εt – θ1εt–1;

….. (37)

or with lag operator: yt = (1 – θ1L)εt .

….. (37 a)

Note that there is only one period impulse response, unlike in AR(1); thus MA process is a finite impulse response filter. The moments are: 1. E(yt) = E(εt ) = 0; With a constant c, E(yt) = c. 2. Var (yt) = Var (εt – θ1εt-1) = σ2(1 + θ12), constant. 3. Cov(yt; yt–k) = γk :

with k = 1: γ1 = E(yt yt–1) = E{(εt – θ1εt-1)(εt-1 – θ1εt-2)} = –θ1 σ2. with k = 2: γ2 = E(yt yt–2) = E{(εt – θ1εt-1)(εt-2 – θ1εt-3)} = 0. Thus Cov(yt; yt–k) = γk = 0, ∀ k > 1. 4. ACF = ρk = γk / γ0 = –θ1 / (1 + θ12), k = 1; = 0,

k > 1.

….. (38)

The process has a memory of only one period: only one period impulse response. It should be noted that even though εt is white noise, yt is not, if the θks differ from zero. yt has zero mean and constant variance; but Cov(yt; yt–k) = –θ1 σ2 ≠ 0, for k = 1.

….. (39) 31

5. PACF:

Remember that PACF is applicable with yt and its lagged values. So, MA process need to be expressed as an AR one. As long as MA process is invertible, it can be expressed as an AR(∞) and vice versa, thanks to the Invertibility Condition, from Wold’s Decomposition Theorem.

Wold’s Decomposition Theorem: Wold’s Decomposition Theorem states that “Every weakly stationary, purely non-deterministic, stochastic process (yt – µ) can be written as a linear filter (linear combination) of a sequence of uncorrelated random variables.” (Wold 1938).

Here the term ‘purely non-deterministic’ means purely non-predictable, that is, all linearly deterministic components, say, a (constant) mean, if any, have been subtracted from yt.

Fig. 9: A Moving Average (MA) series of order 1 2

1

0

-1

-2

0

20

40

60

80

100

120

140

32

Fig. 10: ACF of a MA(1) Series 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 0

Such a linear filter is: yt – µ =

5



∑ψ j =0

ε

j t− j

10

,

ψ0 = 1;

….. (40)

where εt is white noise, and ψj s are psi-weights. Now let µ = 0, without loss of generality, and (1) Let ψ1 = –θ1, and ψj = 0, for j ≥ 2; then we get a MA(1) process: yt = εt – θ1εt–1. (2) Let ψj = φ j, then yt = εt + φεt–1+ φ2εt–2 + ….. = εt + φ (εt–1+ φεt–2 + …..) = εt + φ yt–1, an AR(1) process.

Thus there is a relationship between the two processes. 33

(1) By inverting an AR(1) process, we get an infinite MA process: yt = φ yt–1+ εt or (1 – φL) yt = εt, so that yt = (1– φL)–1εt.

Expanding by Binomial theorem: yt = (1+φL+ φ2L2 + ……)εt, = εt +φεt-1+ φ2εt-2 + …..

This MA(∞) series will converge asymptotically if |φ|< 1, which gives the stationarity condition or bounds of stationarity. Note that the moments of AR(1) can be found based on this MA(∞) series also.

(2) By inverting a MA(1) process, we get an infinite AR process: yt = εt – θ1εt-1 = (1 – θ1L)εt so that yt (1 – θ1L)–1 = εt .

Expanding by Binomial theorem: yt (1+θ1L+ θ12L2 + ……) = εt, that is yt + θ1 yt–1+ θ12 yt–2+ …… = εt.

This AR(∞) series will converge asymptotically if |θ1| < 1, which gives the invertibility condition or bounds of invertibility.

34

Thus, there is a ‘duality’ between the moving average process and the autoregressive process, that is, the moving average equation above can be rewritten (inverted) into an autoregressive form (of infinite order). However, analogous to the stationarity condition and as shown above, this can only be done if the moving average parameters follow certain conditions, that is, if the model is invertible. Otherwise, the series will not be stationary. Now let us go back to our MA(1) case:

The PACF of the MA(1) process:

Given the AR(∞) series, obtained by inverting the MA(1) series: yt + θ1 yt–1 + θ12 yt–1 + …… = εt, we can find the PACF of MA(1): yt = εt – θ1εt-1. Thus given yt = φ1 yt–1 + φ2 yt–2 + …….. + εt , the first PAC coefficient is

φ1 = ρ1 = –θ1 / (1 + θ12),

….. (41)

and the second PACC is 2 2 φ2 = ( ρ 2 − ρ1 ) /(1 − ρ1 )

= –θ12 / (1 + θ12 + θ14),

….. (42)

where ρ2 = 0 and ρ1 is as above. For other higher order PACF, we make use of (8). Note that if MA(1) shock is positive (negative), both ACF and PACF will be negative (positive); and in both cases, PACF exponentially declines. Also note that for MA(1), ACF has value zero after the first autocorrelation coefficient, while PACF declines geometrically; on the other hand, for AR(1), ACF declines geometrically, while PACF

35

has value zero after the first partial autocorrelation coefficient – remember this is how we identify MA(1) or AR(1) process.

Fig. 11: PACF of a MA(1) Series

The second order Moving Average process MA(2): yt = εt – θ1εt-1 – θ2εt-2;

….. (43a)

With lag operator, we have yt = θ(L)εt ,

….. (43b)

where θ(L) = (1 – θ1L – θ2L2). 1. E(yt) = E(εt ) = 0; With a constant c, E(yt) = c, as in MA(1). 2. Var (yt) = Var (εt – θ1εt-1 – θ2εt-2) = σ2(1 + θ12 + θ22), constant. 3. Cov(yt; yt–k) = γk : with k = 1: γ1 = E(yt yt–1) = E{(εt – θ1εt-1 – θ2εt-2)(εt-1 – θ1εt-2 – θ2εt-3)} 36

= –θ1 (1 – θ2) σ2. With k = 2: γ2 = E(yt yt–2) = E{(εt – θ1εt-1 – θ2εt-2)(εt-2 – θ1εt-3 – θ2εt-4)} = – θ2 σ2. Cov(yt; yt–k) = γk = 0, ∀ k > 2. 4. ACF = ρk = γk / γ0 = –θ1 (1 – θ2) / (1 + θ12 + θ22), k = 1; = – θ2 / (1 + θ12 + θ22), = 0,

k = 2; k > 2.

….. (44a)

Thus the process has a memory of only two periods.

5. PACF: The first PAC coefficient = φ1= ρ1; The second PACC = φ2 =

( ρ 2 − ρ12 ) /(1 − ρ12 )

….. (44b)

and similarly for other higher order PACF in accordance with (8).

Note that if MA(2) shocks are positive (negative), both ACF and PACF will be negative (positive); and in both cases, PACF exponentially declines. Also note that for MA(2), ACF has value zero after the second autocorrelation coefficient, while PACF declines geometrically; on the other hand, for AR(2), ACF declines geometrically, while PACF has value zero after the second partial autocorrelation coefficient – remember this is how we identify MA(2) or AR(2) process.

The general Moving Average process MA (q): yt =

q

∑θ k ε t − k,

θ0 = 1;

….. (44a)

k =0

or 37

yt = εt – θ1εt–1 – θ2εt–2 – ………. – θqεt–q ;

….. (44b)

With lag operator, yt = θ(L)εt ,

….. (44c)

where θ(L) = (1 – θ1L – θ2L2 – …. – θqLq). 1. E(yt) = E(εt ) = 0; With a constant c, E(yt) = c. 2. Var (yt) = σ2(1 + θ12 + θ22 + ……+ θq2),

constant.

3. Cov(yt; yt–k) = γk = σ2(–θk + θ1θk+1 + ……+ θq–kθq),

k = 1, 2, ….., q.

= 0,

k > q.

4. ACF: ρk = γk / γ0

=

− θ k + θ1θ k +1 + ..... + θ q − kθ q

= 0,

1 + θ12 + ..... + θ q2

k = 1, 2, ….., q.

k > q.

….. (45)

ACF of MA(q) cuts off after lag q: that is, the process has only q period memory. PACF on the other hand can be shown to be infinite in extent, though tailing off. Expressions for the PACF of MA processes are complicated, but in general they appear to be dominated by combinations of exponential decays for real roots in the lag polynomial, and damped sine waves for complex roots. Note that these patterns are very similar to the ACFs of the AR processes. In fact, by virtue of the duality between AR and MA processes, we find that while the ACF of an AR(p) process is infinite in

38

extent, the PACF cuts off after lag p; the ACF of a MA(q) process cuts off after lag q, while the PACF is infinite in extent.

The Moving Average process in economic data:

MA processes arise in econometrics through trend elimination or over-differencing:

(1): Trend elimination through differencing:

For example, consider the deterministic trend (also called a trend stationary process (TSP): yt = a + bt +εt;

….. (46)

then yt–1 = a + b(t–1) +εt-1; ⇒ ∆ yt = yt – yt–1 = b + εt – εt-1.

….. (47)

∆ yt has a MA unit root! Differencing detrends, but generates MA process that can show a cycle when there is none in the original series; such ‘spurious cycle’ is called Slutsky effect (Slutsky 1937). Thus differencing a TSP results in over-differencing.

(2) Over-differencing: Differencing a stationary (white noise) series, say, yt = εt , generates MA process: ∆ yt = yt – yt–1= εt – εt–1.

….. (48)

∆ yt has a MA unit root! Note that if |θ | = 1, as above, MA process is non-invertible.

39

Autoregressive Moving Average Process: ARMA(p, q) ARIMA (autoregressive integrated moving average) models are generalizations of the simple AR model that include three components:

The first component is the autoregressive (AR) term of any appropriate order. The second component is the integration order term. Each integration order corresponds to differencing the series being forecast. A first-order integrated component means that the forecasting model is designed for the first difference of the original series, and a second-order component corresponds to using second differences, and so on. The third component is the moving average (MA) term.

The general ARMA(p, q) has an AR of order p and a MA of order q: yt = φ1 yt–1 + φ2 yt–1 + …. + φp yt–p + εt – θ1εt–1 – θ2εt–2 – …. – θqεt–q ,

….. (49a)

where εt is a white noise process. With the lag operator:

φ (L) yt = θ (L)εt ,

….. (49b)

where

φ (L) = (1 – φ1L – φ2L2 –…. – φpLp) ; and θ(L) = (1 – θ1L – θ2L2 –…. – θqLq ).

It goes without saying that the properties of ARMA process are a mixture of those of AR and MA processes.

40

The simplest ARMA process: ARMA(1, 1): yt = φ1 yt–1 + εt – θ1εt–1.

….. (50)

(1) E(yt) = 0; With a constant c, E(yt) = c /(1 – φ1); |φ1| < 1.

(2) Var(yt) = E(φ1 yt–1 + εt – θ1εt–1)2: = φ12Var(yt) –2φ1 θ1E(yt–1 εt–1) + Var(εt) + θ12Var(εt–1): Since Var(εt) = Var(εt–1) = σ2; and E(yt–1 εt–1) = E(yt εt) = σ2, Var(yt) = σ2(1 + θ12 –2φ1 θ1)/ (1 – φ12),

|φ1| < 1.

(3) Cov(yt; yt–k) = γk : The covariances are recursively estimated: with k = 1: γ1 = E(yt yt–1) = E{(φ1 yt–1 + εt – θ1εt–1) yt–1 } = φ1Var(yt) –θ1 σ2 = φ1 γ0 –θ1 σ2 = σ2(1 – φ1 θ1)( φ1 –θ1)/ (1 – φ12),

|φ1| < 1.

with k = 2: γ2 = E(yt yt–2) = E{(φ1 yt–1 + εt – θ1εt–1) yt–2} = φ1 E(yt–1 yt–2) = φ1 E(yt yt–1) = φ1 γ1. Thus

γk = φ1 γk–1,

k ≥ 2.

(4) ACF: = ρk = γk / γ0 = (1 – φ1 θ1)( φ1 –θ1)/(1 + θ12 –2φ1 θ1), k = 1; = φ1 ρk–1,

k ≥ 2. ….. (51)

41

Thus ACF decays geometrically from the starting value, ρ1, a function of both φ1 and

θ1. Note that the decay is determined by φ1 only, as the MA part has only one period memory. Also note that when φ1 = θ1,

ρk = 0, as yt collapses to a white noise process:

yt = εt +θ1 (yt–1 – εt-1).

….. (52)

5. PACF: The first PAC coefficient = φ1= ρ1; 2 2 The second PACC = φ2 = ( ρ 2 − ρ1 ) /(1 − ρ1 )

Similarly for other higher order PACF. The PACF generally declines as the lag increases.

Non-stationarity and differencing: Consider the random walk: yt = yt–1+ εt where εt is a white noise process. It is obvious that its first difference, ∆ yt = yt – yt–1= εt, is a stationary process. If a non-stationary series is transformed into a stationary one by differencing once, the series in level is said to be integrated of order one and is denoted by I(1), which also implies one unit root; the differenced, stationary series is then integrated of order zero: I(0) with no unit root.3 Thus we write yt ∼ I(1); and ∆ yt ∼ I(0). Similarly, if a non-stationary series, yt , is transformed into a stationary one by differencing d times, ∆d, the series in level, yt is said to be integrated of order d: yt ∼ I(d); then ∆dyt is a stationary process. We can extend this definition to autoregressive integrated moving average (ARIMA) processes as follows: If ∆d yt is a stationary process that can be represented by an ARMA (p, q) model, then yt is an integrated autoregressive moving average (ARIMA) process of

3 Note that integration is the process of inverse of differencing. 42

order d, p, and q, that is, ARIMA (p, d, q), where d refers to number of differences or order of integration.

ARIMA Modelling: Box – Jenkins Methodology: AR models were first introduced by Yule (1926) and later generalized by Walker (1931); and MA models were first used by Slutsky (1937). Wold (1938) provided a theoretical foundation of combined ARMA processes. However, it was George Box and Gwilym Jenkins (1970; 1976) who comprehensively put together all the threads of autoregressive integrated moving average (ARIMA) modelling such that the term ‘time series/ARIMA modelling’ has come to be synonymously used for ‘Box-Jenkins methodology’.

The general model introduced by Box and Jenkins (1976) includes autoregressive as well as moving average parameters, and explicitly includes differencing in the formulation of the model. Specifically, the three types of parameters in the model are: the autoregressive parameters (p), the number of differencing required (d), and moving average parameters (q). In the notation introduced by Box and Jenkins, models are summarized as ARIMA (p, d, q); so, for example, a model described as (0, 1, 2) means that it contains p = 0 (zero) autoregressive parameters and q = 2 moving average parameters which were computed for the series after it was differenced once (d = 1).

The objective of Box – Jenkins methodology is to obtain a parsimonious model, one that describes all the features of the data of interest using as few parameters (or as simple a model) as possible. Why?

Remember the variance of estimators is inversely proportional to the number of degrees of freedom: Var( βˆ ) = σˆ u2 ( X ' X )−1 , where σˆ u2 =

∑ uˆt2 T −k

.

.

…. (53)

Thus larger the k, higher will be the standard error, which in turn implies smaller tvalue resulting in not rejecting a false null, that is, Type 2 error. 43

There are 3 stages involved in ARIMA model building: identification, estimation and diagnostic checking (see the flowchart given below).

Identification.

The first stage in ARIMA model building is identification (see the Table below for details).

As mentioned earlier, the input series for ARIMA needs to be stationary, that is, it should have a constant mean, variance, and autocorrelation over time. Therefore, usually the series first needs to be differenced until it is stationary (this also often requires log transforming the data to stabilise the variance). As already explained, the number of times the series needs to be differenced to achieve stationarity is reflected in the d parameter. In order to determine the necessary level of differencing, one should examine the plot of the data and autocorrelogram. Significant changes in level (strong upward or downward changes) usually require first order non-seasonal (lag = 1) differencing; strong changes of slope usually require second order non-seasonal differencing. Seasonal patterns require respective seasonal differencing. If the estimated autocorrelation coefficients decline slowly at longer lags, first order differencing is usually needed. However, one should keep in mind that some time series may require little or no differencing, and that over-differenced series produce less stable coefficient estimates.

In a typical ARIMA modelling application, it is preferable that there be a minimum of about 50 data points in the series in order to get reasonably accurate maximum likelihood estimates (MLE) for the parameters. Therefore we need to think of modelling at all only if at least the minimum required information is available.

44

Plot of the original series.

A visual inspection of a plot of the time series may reveal one or more of the following characteristics:

(1) seasonality, (2) trends either in the mean level or in the variance of the series, (3) persistence, (4) long-term cycles, or (5) extreme values and outliers.

Autocorrelation function (ACE).

Before the estimation can begin, we need to identify the specific number and type of ARIMA parameters necessary to yield an effective but still parsimonious model of the process (as noted earlier, parsimonious model means that it has the fewest parameters and greatest number of degrees of freedom among all models that fit the data). The major tools used in the identification phase are plots of the series, correlograms of autocorrelation (ACF), and partial autocorrelation (PACF).

The ACF, ρk, measures the amount of linear dependence between observations in a time series that are separated by lag k. As already explained above, Box and Jenkins [1970: 32-36] recommend a specific estimation procedure to find an estimate rk for ρk and also give approximate standard errors for the ACF estimates. To use the ACF in model identification, calculate and then plot rk against lag k up to a maximum lag of roughly T/4.

The first step is to examine a plot of the ACF to detect the presence of nonstationarity in the series. When the data are non-seasonal, failure of the ACF to damp out indicates that non-seasonal differencing is needed. For seasonally correlated data with the seasonal length equal to s the ACF often follows a wave pattern with peaks at s, 2s, 3s, and other integer multiples of s. As is shown by Box and Jenkins (1970: 174-175), if the estimated ACF at lags that are integer multiples of the seasonal length s do not die out rapidly, this may indicate that seasonal

45

differencing is needed to produce stationarity. Failure of other ACF estimates to damp out may imply that non-seasonal differencing is also required.

Once the data have been differenced just enough to produce non-seasonal stationarity for a non-seasonal time series and both seasonal and non-seasonal stationarity for seasonal data, then check the ACF of the series to determine the number of AR and MA parameters required in the model. The series is also used at the other steps of the identification procedure. The ACF for the series should not exceed a maximum lag of approximately T/4. For a seasonal model a maximum lag of about 5s (where 5s < T/4) is usually sufficient.

When the series is not white noise, then the following general rules may be invoked to help determine the type of model required.

Nonseasonal model:

For a pure MA (0, d, q) process, r k cuts off and is not significantly different from zero after lag q. When the model is pure MA, after lag q (Bartlett, 1946), !

Var2 ≅ 4 1 5 2Σ78! 27 , k > q 9

….. (56)

If r k tails off and does not truncate, then the model appropriate for the time series might be with AR terms.

Partial autocorrelation function (PACF). The theoretical PACF φk for an AR process of order k satisfies the Yule-Walker equations (Box and Jenkins, 1970 Chapter 3). For model identification calculate and

: of φk against lag k. The  : s are usually calculated for about 20 plot the estimates,  to 40 lags (where 40 < T/4). For seasonal models, higher lags of the PACF may be

required for identification. The following general rules may prove helpful for interpreting the PACF of the series: 46

Nonseasonal model: : truncates and is not significantly different from For a pure AR (p, d, 0) process,  : is approximately NID (0, 1/T). However, if  : tails zero after lag p. After lag p, 

off, then MA terms are required to model the series.

The decision, however, is not straightforward and in less typical cases requires not only experience but also a good deal of experimentation with alternative models (as well as the technical parameters of ARIMA). However, a majority of empirical time series patterns can be sufficiently approximated using one of the 5 basic models given below that can be identified based on the shape of the autocorrelogram (ACF) and partial autocorrelogram (PACF):

One autoregressive (p) parameter: ACF – exponential decay; PACF – spike at lag 1, no correlation for other lags.

Two autoregressive (p) parameters: ACF – a sine-wave shape pattern or a set of exponential decays; PACF – spikes at lags 1 and 2, no correlation for other lags.

One moving average (q) parameter: ACF – spike at lag 1, no correlation for other lags; PACF – damps out exponentially.

Two moving average (q) parameters: ACF – spikes at lags 1 and 2, no correlation for other lags; 47

PACF – a sine-wave shape pattern or a set of exponential decays.

One autoregressive (p) and one moving average (q) parameter: ACF – exponential decay starting at lag 1; PACF – exponential decay starting at lag 1.

Also, note that since the number of parameters (to be estimated) of each kind is almost never greater than 2, it is often practical to try alternative models on the same data. Also see the Table given below.

Estimation: There are several different methods for estimating the parameters. All of them should produce very similar estimates, but may be more or less efficient for any given model. In general, during the parameter estimation phase a function minimization algorithm is used (the so-called quasi-Newton method) to maximize the likelihood (probability) of the observed series, given the parameter values. In practice, this requires the calculation of the (conditional) sums of squares (SS) of the residuals, given the respective parameters. Different methods have been proposed to compute the SS for the residuals: (1) the approximate maximum likelihood method, (2) the approximate maximum likelihood method with backcasting, and (3) the exact maximum likelihood method.

Box and Jenkins (1970: Chapter 7) suggest that the approximate MLE for the ARIMA model parameters be obtained by minimizing the unconditional sum of squares function.

McLeod (1976) suggests minimizing a modified sum of squares function which provides parameter estimates that are closer approximations to the exact maximum likelihood estimates than those of Box and Jenkins (1970: Chapter 7). By utilizing simulation experiments, McLeod (1976) demonstrates some inherent properties of 48

the modified estimation procedure that yields improved parameter estimates. Some general conclusions regarding the advantages of the modified sum of squares method over the unconditional sum of squares technique are that (a) the modified method is more efficient, (b) the modified method gives better parameter estimates for shorter series, and (c) significantly improved parameter estimates are obtained for MA parameters. An additional advantage of the modified sum of squares method is that it is an exact MLE procedure for AR processes.

Various optimization techniques are available to minimize the sum of squares functions, the unconditional or the modified one. Some of the optimization algorithms that have been extensively applied include (a) the Gauss linearization (Draper and Smith, 1966: Chapter 10), (b) the steepest descent (Draper and Smith, 1966: Chapter 10), (c) the Marquardt algorithm (combination of (1) and (2)) (Marquardt, 1963), and (d) conjugate directions (Powell, 1964, 1965). McLeod (1976) recommends the use of conjugate directions to minimize the modified sum of squares function.

Box Cox Transformation

Sometimes at the estimation stage the data may need to be transformed by a Box-Cox transformation; this is important because in Box-Jenkins modelling the residuals are assumed to be independent, homoscedastic (i.e., variance is a constant), and usually normally distributed. The independence assumption is the most important of all, and its violation can cause drastic consequences (Box and Tiao, 1973: 522), and the normality assumptions are critical for many univariate intervals and hypothesis tests. The assumption of normality often leads to tests that are simple, mathematically tractable, and powerful compared to tests that do not make the normality assumption. Unfortunately, many real data sets are in fact not approximately normal. However, an appropriate transformation of a data set can often yield a data set that does follow approximately a normal distribution. This increases the applicability and usefulness of statistical techniques based on the normality assumption.

49

It should however be noted that the normality assumption of the residuals is usually not critical for obtaining good parameter estimates. As long as the errors are independent and possess finite variance, reasonable estimates (called Gaussian estimates) of the parameters can be obtained (Hannan, 1970: 372). McLeod (1974: 76-85) demonstrates this fact by Monte Carlo experiments. Simulated data from specified models with known parameters are generated for the errors distributed as uniform, double exponential, contaminated normal, and normal. For the first three cases, Gaussian estimates of the parameters for the generating model fit to the simulated data are very close to the known values. Of course this is also true for the MLE of the parameters for the normal case.

In practice however it is advantageous to satisfy the normality assumption reasonably well for a number of reasons. First, it can be expected that parameter estimates will at least slightly be improved if a suitable transformation of the series make the residuals approximately normal. Second, without the normality assumption, estimation of confidence intervals for the forecasted data would be impossible for practical use. And finally, if the diagnostic analysis of the residuals from the model estimation reveals errors that are heterogeneous and often non-normal, then a Box-Cox power transformation on the dependent variable is a useful method to alleviate heteroscedasticity, given the distribution of the dependent variable is unknown (McLeod, 1974: 14; Box and Cox, 1964). In cases where the dependent variable y is positive, the following transformation can be used:  ;< λ

=

 >?@AB@B λ #!

when λ ≠ 0

log;< 5 constant when λ  0 λ

….. (54)

where y is the variable being transformed and λ is the transformation parameter. Note that for λ= 0, the natural log of the data is used. Note that λ

limλ→P ;
?@AB@B λ #!

λ

 log;< .

….. (55)

50

In certain instances a chosen standardized transformation, such as a natural logarithm or a square root, may fail to remove heteroscedasticity and/or nonnormality of the residuals. It may therefore be desirable to compute a MLE of λ. The first step is to choose a value for the constant that causes all the values of the given series to be greater than zero. Then λ can be estimated simultaneously with the other model parameters.

Once the transformation is carried out, a normal probability plot is computed, and the linearity of the normal probability plot is summarized by means of the correlation coefficient, computed between the vertical and horizontal axis variables of the probability plot; the correlation coefficient is a convenient measure of the linearity of the probability plot – the more linear the probability plot, the better a normal distribution fits the data. The Box-Cox normality plot is a plot of these correlation coefficients for various values of the λ parameter. The value of λ corresponding to the maximum correlation on the plot is then the optimal choice for λ. Note that it is not recommended that λ be estimated unless diagnostic checks for the residuals indicate that the normality and/or constant variance assumptions are not satisfied.

In general, Box-Cox transformation does not change the form of the model. However, this is not true in general, and as is pointed out by Granger and Newbold (1976), certain transformations can change the type of model to estimate. When a transformation does change the type of model to be used, diagnostic checks would detect this fact and then the transformed data can be properly identified.

Diagnostic Checking:

The tentative model is then diagnosed for model adequacy, to find whether the residuals satisfy the classical assumptions. Correlogram is used here to verify the nonautocorrelated behaviour of the residuals (see below for details). 51

Evaluation of the Model (Diagnostic Checking) Parameter estimates:

We need to test for the significance of the t values, computed from the parameter standard errors. If not significant, the respective parameter can in most cases be dropped from the model without affecting substantially the overall fit of the model.

Other quality criteria:

Another straightforward and common measure of the reliability of the model is the accuracy of its forecasts generated based on partial data so that the forecasts can be compared with known (original) observations. Model Adequacy Tests (Tests for whiteness of the residuals) An important stage, however, that is to precede hypothesis testing in regression modelling is model adequacy diagnostic checking. The fitted model is said to be adequate if it explains the data set adequately, i.e., if the residual does not contain (or conceal) any ‘explainable non-randomness’ left from the (‘explained’) model. It is assumed that the error term in the model is a normally distributed white noise (with zero mean, constant (finite) variance, no serial (auto) correlation and no (cross) correlation with the explanatory variables). Since the ordinary least squares (OLS) estimators are linear functions of the error term, (under its normality assumption) they themselves are normally distributed. This normality assumption is essential for deriving the probability (sampling) distributions of the OLS estimators and facilitates hypothesis testing, using t and F statistics, which follow t and F distributions only under normality assumption, in finite samples. Hence a diagnostic checking on normality assumption must be carried out before proceeding with hypothesis (significance) tests. The normality test as described in Doornik and Hansen (1994) tests whether the skewness and kurtosis of the OLS residuals correspond to those of a normal distribution. A reasonably high probability (p-) value, associated with a small test statistic value, indicates non-rejection of the normality assumption. It should also 52

be noted here that the mean of the OLS residuals is zero by construction when an intercept is included in the model.

The no serial correlation assumption may be tested by checking whether the residual autocorrelation coefficients are statistically zero compared with standard deviation limits. Alternatively, we can test the joint hypothesis that all the autocorrelation coefficients (for a given lag) are statistically zero, using the residual correlogram (‘portmanteau’) statistic, viz., Ljung-Box (1978) statistic.4 Too large a value of the ‘portmanteau’ statistic can be viewed as evidence against model adequacy, or conversely, a large p-value confirms model adequacy.

However, as residual

autocorrelations are biased towards zero, when lagged dependent variable is included as regressor in the model, this (as well as Durbin-Watson, DW) statistic is not reliable. The correct procedure in such conditions is to use Lagrange Multiplier (LM) test as residual correlogram; the F-form LM test, suggested by Harvey (1981), is the recommended diagnostic test of no residual autocorrelation. Durbin h test for firstorder serial correlation is a LM test. It should also be noted that a low DW statistic need not be due to autoregressive errors, warranting correction for first-order autoregression5 (AR(1)). Mis-specifications in time series data can also induce serious serial correlation among the residuals, to be reflected in low DW statistic. The RESET (Regression Specification Test, due to Ramsey 1969) tests the null of no functional form mis-specification, which would be rejected if the test statistic is too high.

4 This corresponds to Box-Pierce Q statistic (Box and Pierce 1970), but with a degrees of freedom correction (Ljung and Box 1978), and has more powerful small sample properties than the Box-Pierce Q statisitic. 5 Hendry and Doornik (1999) remark: “…most tests also have some power to detect other alternatives, so rejecting the null does not entail accepting the alternative, and in many instances, accepting the alternative would be a non sequitur” (p.187). “Perhaps the greatest non sequitur in the history of econometrics is the assumption that autocorrelated residuals entail autoregressive errors, as is entailed in ‘correcting serial correlation using Cochrane-Orcutt’” (p. 131). 53

In addition to these, the assumption of no-heteroscedastic errors should also be checked, using, say, White’s (1980) general heteroscedasticity (F-) test; a small pvalue (associated with large F-value) rejects the null of no heteroscedasticity in errors. Often the observed serial correlation in errors may be due to what is called autoregressive conditional heteroscedasticity (ARCH) effect, that makes the residual variance at time t depend on past squared disturbances (Engle 1982). Hence it is advisable that one test for the ARCH effect too before accepting the routine statistics at face value. We can also test for the instability of the parameters in the model through a joint (F-) statistic, large values of which reveal parameter non-constancy and indicate a fragile model with some structural breaks (Hansen 1992). Note that the indicated significance is valid only in the absence of non-stationary regressors.

Thus a good model should not only provide sufficiently accurate forecasts, it should also be parsimonious and produce statistically independent residuals that contain only noise and no systematic components (e.g., the correlogram of residuals should not reveal any serial dependencies). A good test of the model, in addition to the above, is (a) to plot the residuals and inspect them for any systematic trends, and (b) to examine the autocorrelogram of residuals (there should be no serial dependency between residuals). Model Selection Criteria Another important aspect in ARIMA model building is employing model selection criteria. Based on the usual preliminary specification and residual-based diagnostics, several models often appear to be essentially equivalent for representing the behavior of a time series. In such cases, we have to resort to model selection criteria. Note that the so called statistical model refers to the distribution of the random term, εi, i = 1,…,n, in say the regression equation. More specifically, it refers to the joint distribution of εi , i = 1,…,n, not the trend function, y = α + βx.

The model selection criteria take account simultaneously of both the goodness of fit (likelihood) of a model and the number of parameters used to achieve that fit. The 54

criteria take the form of a penalized likelihood function, that is, the negative log likelihood plus a penalty term, which increases with the number of parameters.

As already stated, given a class of almost equivalent models, and assumed that there is a true model, the goal of statistical analysis is to find a model out of all the available models which is closest to the true model. The first question then is how to define such a closeness. In information theory, assume that the true model is characterized by a probability density function (pdf), f(x, θ*), and the estimation for the model is characterized by g(x, θ), then the closeness between the two is given by the KullbackLeibler (K-L) information, or Kullback-Leibler (K-L) distance: QRS, T ∗ , VS, T  W RS, T ∗ log

XY,Z∗ [Y,Z

\S

 W RS, T ∗ log RS, T ∗ \S - W RS, T ∗ log VS, T \S. It can safely be assumed that I(f(x,θ∗), g(x,θ)) ≥ 0, and the distance is zero if and only if f(x,θ∗) = g(x,θ). The smaller the distance, I(f(x,θ∗), g(x,θ)), the better the g(x,θ) is used to approximate f(x,θ∗). In K-L distance, the first term is not estimable, but it is a constant for all competing models for a given f(x,θ*) and so can be ignored. The second term can be estimated from the given sample data. A strategy called the Entropy Maximization Principle (EMP) (Akaike, 1974), is used to choose such a g(x,θ) so as to maximize the second term, which is equivalent to minimizing the distance with respect to f. In statistical inference, there are different ways to evaluate the second term, each leading to a particular information criterion.

In the theory of estimation, information content (of a parameter) in a random sample is represented by the variance of its (unbiased) estimator; the smaller the variance, the larger the information. In an ARIMA model, adding more lags (p, q) necessarily reduces residual sum of squares, such that R2, the goodness of fit, increases; but the degrees of freedom fall, and the error variance estimate increases. At the same time, parsimony demands including fewer parameters: if a MA(2) model provides the same 55

fit as an AR(10) model, the law of parsimony selects the former. Thus there is a tradeoff between goodness of fit and parsimony. This trade-off is incorporated into information criteria (IC(k)), where k stands for the number of parameters in the model. In general, IC(k) = T ln σˆ 2 + k{ f (T )}, for k = 1, …, p+q where σˆ 2 is the estimated error variance, given by (RSS/(T – k); and k{f(T)} = ‘Penalty’ function for increasing the order of model, for the loss of degrees of freedom.

Depending upon different penalty factor, f(T), we have different information criteria:

(1) When f(T) = 2, we have Akaike information criterion (AIC: Akaike 1974): 2 AIC = T ln σˆ + 2k.

(2) When f(T) = lnT, we have Schwartz information criterion or Bayesian information criterion (SIC or BIC : Schwartz 1978) 2 BIC = T ln σˆ + k ln T .

(3) When f(T) = 2 ln(lnT), we have Hannan and Quinn information criterion (HQIC: Hannan and Quinn 1979) HQIC = T ln σˆ 2 + 2k ln(ln T )

BIC gives more weight to k than AIC if T > 7, such that an increase in k requires a larger fall in σˆ 2 under BIC than under AIC. As lnT > 2, BIC always selects a more parsimonious model than AIC. For HQIC, weight on k is greater than 2, if T > 15.

56

In general model selection process, we should fit all the models under consideration and choose the one which minimize the information criterion. One significant advantage of it is that it has totally eliminated the confusion created by the hypothesis test and has provided an objective tool for model selection.

Over-fitting

One of the diagnostic checks is devised to test model adequacy by over-fitting. This test assumes that the possible types of model inadequacies are known in advance. Over-fitting involves fitting a more elaborate model than the one estimated to see if including one or more parameters greatly improves the fit. Extra parameters should be estimated for the more complex model only where it is feared that the simpler model may require more parameters. For example, the PACF

may

possess decreasing but significant values at lags 1 and 2. If an

ARMA(2, 0) model is originally estimated, then a model to check by over-fitting the model would be ARMA(3, 0). A MLE estimate of φ3 three or four times its standard error would definitely indicate that the new model be selected. For this case the AIC and also the remaining tests in this diagnostic check stage would also point out the higher model as the best one to use. In general, if an ARIMA(p, d, q) model is tentatively selected, then also try two new models, ARIMA(p+1, d, q) and ARIMA(p, d, q+1) and check for the significance of the estimates of the additional parameters.

Forecasting:

If the residual analysis is satisfactory, the tentative model is accepted. The estimates of parameters are used in this last stage (of forecasting) to calculate new values of the series (beyond those included in the input data set) and confidence intervals for those

57

Flow Chart 1: Box – Jenkins Methodology Plot the series No

Does the series appear stationary?

Apply transformation/ Differencing

Yes Obtain ACF and PACF

Does the Correlogram (ACF) decay to zero?

No

Yes Identification (Model selection Check ACF and PACF)

Is there a sharp cut-off in ACF? No

Yes MA

Is there a sharp cut-off in PACF? No

Yes AR

ARMA

Estimate parameter

Diagnosis - Are the residuals white noise? (Check ACF and PACF)

No

Modify model

Yes Forecast

58

Table 1: Identification of ARIMA Processes Process

ACF

PACF

ARIMA (0,0,0) White Noise

No significant spikes

No significant spikes

Slow decay; all significant spikes

Single significant spike at order of differencing (lag 1)

ARIMA (0,1,0) d=1 Exponential decay, with the first Single significant positive spike at few lags significant lag 1 ARIMA (1,0,0) 0 0

1 negative spike at lag 1.

Exponential decay of negative spikes.

MA(1) θ1 < 0

1 positive spike at lag 1.

Oscillating decay of positive and negative spikes.

MA(2) θ1, θ2 > 0

2 negative spikes at lags 1

Exponential decay of

and 2.

negative spikes.

60

MA(2) θ1, θ2 < 0

MA(q)

2 positive spikes at lags 1

Oscillating decay of positive

and 2.

and negative spikes.

Spikes up to lag q.

Exponential/oscillating decay.

Hybrid Processes ARIMA(p, 0, q)

ARIMA(1, 0, 1)

Exponential decay of

Exponential decay of

φ1 > 0, θ1 > 0

positive spikes.

positive spikes.

ARIMA(1, 0, 1)

Exponential decay of

Oscillating decay of positive

φ1 > 0, θ1 < 0

positive spikes.

and negative spikes.

Oscillating decay.

Exponential decay of

ARIMA(1, 0, 1) φ1 < 0, θ1 > 0

negative spikes. ARIMA(1, 0, 1) φ1 < 0, θ1 < 0

ARIMA(p, 0, q)

Oscillating decay of

Oscillating decay of

negative and positive

negative and positive

spikes.

spikes.

Decay (either direct or

Decay (either direct or

oscillatory) beginning at

oscillatory) beginning at lag

lag q.

p.

predicted values. The estimation process is performed on transformed (differenced) data; before the forecasts are generated, the series needs to be integrated (integration is the inverse of differencing) so that the forecasts are expressed in values compatible with the input data. This automatic integration feature is represented by the letter I in

61

the name of the methodology (ARIMA = Auto-Regressive Integrated Moving Average).

The constant in ARIMA models.

In addition to the standard autoregressive and moving average parameters, ARIMA models may also include a constant, as described above. The interpretation of a (statistically significant) constant depends on the model that is fit. Specifically,

(1) if there are no autoregressive parameters in the model, then the expected value of the constant is the mean of the series;

(2) if there are autoregressive parameters in the series, then the constant represents the intercept.

If the series is differenced, then the constant represents the mean or intercept of the differenced series. For example, if the series is differenced once, and there are no autoregressive parameters in the model, then the constant represents the mean of the differenced series, and therefore the linear trend slope of the un-differenced series.

Seasonal models.

Multiplicative seasonal ARIMA is a generalisation and extension of the method introduced in the previous paragraphs to series in which a pattern repeats seasonally over time. In addition to the non-seasonal parameters, seasonal parameters for a specified lag (established in the identification phase) need to be estimated. Analogous to the simple ARIMA parameters, these are: seasonal autoregressive (ps), seasonal differencing (ds), and seasonal moving average parameters (qs). The model is given as ARIMA(p, d, q)(ps, ds, qs). For example, the model (0,1,2)(0,1,1) describes a model that includes no autoregressive parameters, 2 regular moving average parameters and 1 seasonal moving average parameter, and these parameters were computed for the series after it was differenced once with lag 1, and once seasonally differenced. The 62

seasonal lag used for the seasonal parameters is usually determined during the identification phase and must be explicitly specified.

The general recommendations concerning the selection of parameters to be estimated (based on ACF and PACF) also apply to seasonal models. The main difference is that in seasonal series, ACF and PACF will show sizable coefficients at multiples of the seasonal lag (in addition to their overall patterns reflecting the non-seasonal components of the series).

63

64

Chapter 3: Some Methodological Issues They argue: There is a method even in madness…. And my Socrates adds: Vice versa!

An unfortunate consequence of the widespread fascination for econometrics in the context of the Indian power sector analysis has been the fatality of a consistent, logical approach to power demand analysis, in the particular confines of its objective environment. Correlating aggregate electricity consumption and gross domestic product in the industrialized, advanced, countries where electricity service contributes significantly to everyday, social and work, life has become a standard tool of simple analysis for some obviously general conclusions and is beyond any methodological error as such there. However, the story is different and turns vicious with attempts at mapping this methodology on to an alien range in an underdeveloped power system where the contribution of the service of electricity is insignificant. This is so even in the industrial sector in India.

Correlating the Less Correlatables

For example, in 1994-95, the cost of fuels, electricity, and lubricants consumed in the production process in all the industries in the factory sector in India was just 9.64 per cent of the total inputs costs (and in Kerala, 6.01 per cent only). The cost of electricity purchased was only 4.02 per cent (and in Kerala, 2.85 per cent). Again, this was only 14.8 per cent of the net value added (in Kerala, 10.41 per cent). (Annual Survey of Industries, Factory Sector, 1994-95, Vol. 1, Tables 3 and 9.) In 1995-96, the percentage share of fuels, electricity, and lubricants consumed in the ASI factory sector of India in the value of total inputs was 9.56 per cent only and that in the value of 65

products, 7.81 per cent; in Kerala, it was respectively 5.77 and 4.72 per cent only (Statistical Abstract – India, 1998, Central Statistical Organization, Government of India, Table 9.1, pp. 85-86). During the 1990s (1991-92 to 1997-98), power and fuel expenses of the whole manufacturing sector in India remained at about 6 per cent of the net sales and at about 7.5 per cent of the total production costs. The percentage share of electrical energy consumption in net sales as well as in total production costs (given in brackets below) in some of the major industries in India in 1997-98 was as follows – •

Textiles: 8.8 (10.2);



Chemicals: 4.5 (5.8);



Fertilizer: 13.7 (17.6);



Pesticides: 4.4 (5.8);



Polymers: 11.2 (14.2);



Petroleum Products: 0.9 (1.2);



Cement: 30.1 (44.0);



Metal and Metal Products: 10.1 (12.1);



Steel: 9.8 (11.6);



Ferrous Metals: 9.1 (10.7);



Non-Ferrous Metals: 15.3 (19.8);



Machinery: 2.0 (2.6);



Electrical Machinery: 3.0 (3.9);



Paper: 22.6 (26.2).

(Source: CMIE, Industry – Financial Aggregates and Ratios, June, 1999, Company Finance.)

The methodology is questionable even in the industrialized sector power demand analysis in a less industrialized region like Kerala, that too with very limited number of electricity-intensive industries. Adoption of this methodology here then amounts to correlating the national/state domestic product or industrial product exclusively with an insignificant input, in violation of the ethics of a consistent and logical analytical 66

exercise, and results in gross specification error. Moreover, there are a large number of small scale and cottage industries that use practically little electricity but together contribute significantly to the industrial product.

Thus the percentage share of the unregistered firms in the manufacturing sector’s contribution to net domestic product (at current prices) in India was 35.4 per cent; in 1994-95, it was 34.99 per cent (and in 1970-71, 1980-81, and in 1990-91, it was respectively 46.66, 46.25, and 39.08 per cent). In the case of Kerala, about 41 per cent of the net State domestic product that originated in the manufacturing sector was contributed by the unregistered firms in 1997-98. It was found in 1994-95 that about 66 per cent of the enterprises in rural India and about 52 per cent in urban India) did not use any energy in their manufacturing process (Unorganized Manufacturing Enterprises in India – NSS 51st Round, 1994-95, NSSO, GOI, July 1998, p.ii). Only 7.9 per cent of the rural firms and 30.4 per cent of the urban firms in India (and 11.9 and 17.7 per cent respectively in Kerala) are reported to have used some electrical energy in their production process in that year (ibid., pp. 35-36). The contribution of the unorganized manufacturing sector, on the other hand, in terms of gross value added to the national economy in 1994-95 was estimated at Rs. 322748.9 million, out of which 41 per cent came from the rural sector; and that to the (Kerala) State’s economy was at Rs. 6466.4 million with 72.07 per cent from the rural enterprises (ibid., pp. 5764). It is worth noting that among the Extra High tension (EHT) consumers in Kerala only 37.5 per cent were consuming above 50 million units (MU) of electricity in 1995. All these are in addition to the problem of using aggregate variables that conceal everything of the characteristics of the units into which the analysis is paradoxically intended to make a look.

Despite this fact, the craze for econometrics has made a fetish of this methodology, correlating the less correlatable aggregates in almost all the energy demand studies at the national/State levels. The Fuel Policy Committee of India (1974), P. D. Henderson (1975), R. K. Pachauri (1975), Nirmala Banerjee (1979), World Bank (1979) J. K. Parikh (1981), and P. P. Pillai (1981) are some of the forerunners of this accurst tradition. P. P. Pillai (1981) analyses the aggregate as well as the sector-wise demand 67

in Kerala considering constant net domestic product, average price of electricity, time trend, and the one-period lagged dependent variable and computes short- and long-run elasticities.

Unwarranted Craze for Econometrics

P. P. Pillai’s work, ‘the outcome of an attempt for a quantitative study of the supply and demand aspects of electricity in Kerala using some known analytical tools’ (as the author prefaces, p. ix), may be the first of its kind in Kerala. However, the simplicity of the approach whether in terms of aggregation or by means of mechanical adoption of econometrics has in fact detracted from its value of analytical rigour and useful exercise. Average price as used in this study as an explanatory variable in demand analysis especially using time series data contains dangers of aggregation bias as well as measurement error. Proper estimation of price elasticity of demand requires, (in conformity with the suggestion made by Taylor long back, 1975, 1977), the use of data on actual rate paid by a cross-section of consumers, rather than the aggregate average revenue (to the utility) over time; (pooled time series cross section data can also be used). Average revenue might be an indicator of the supply price in the aggregate, but never a representative of the demand price, the particular price a customer is faced with at a decision making juncture, especially in the context of the block rate tariff system. Moreover, the use of time series data simply ignores the possibility of changes of the intercept (that on an average accounts for the influence of factors other than those considered in the model) and of the slope of the line (that reflects the average intensity of energy consumption with respect to that variable); when using pooled data, it should also be checked that the intercept and slope have not changed significantly over time.

One problem in using the time series average price is that it has the same (increasing) trend over time as electricity consumption has (‘the common factor problem’), thus misrepresenting price-demand relationship and the corresponding elasticity measure. Choice of a suitable deflator also poses problems. Thus, P. P. Pillai’s use of nondeflated average price series must have ipso facto produced a positive price demand 68

relationship, even though he reports negative price elasticity in all cases. This negativity could as well be due to severe multi-collinearity among the regressors, all having the same (increasing) trend along with time, used as one of the regressors, and the author may have unknowingly accepted it as the true sign. Moreover, the use of OLS in the presence of lagged dependent variable as a regressor and of possible serial correlation may have marred the credibility of the estimates.

An apt example of mechanical adoption and use of econometrics against its grain usual in the academic circles is P. P. Pillai’s Cobb-Douglas production function approach to Kerala’s hydro-electric power system, with capital and labor as ‘variable’ inputs. It is common sense that labor is not at all a variable factor of production in hydro-electric power generation, it being a part of sunk capital, and capital is ‘variable’ only in the long-run in a power system, while production function is used to depict short-run stories, with ceteris paribus assumptions.

Even the very elasticity of demand analysis is open to serious questioning. The price elasticity of demand loses its relevance in an underdeveloped power system such as ours. Demand for electricity remains largely unresponsive or less responsive to its price as it has almost become a necessity in the absence of important fuel substitutes, and, at the same time, it commands a substantially lower budget share (due both to lower unit price and to low consumption level) in the case of most of the consumers. Thus for example, the share of fuel and power in the total private final consumption expenditure in the domestic market (at current prices) in India in 1997-98 was just 3.29 per cent; and electricity consumption accounted for only 0.68 per cent in the total (National Income Statistics, Central Statistical Organization, Government of India). In 1980-81 and 1990-91, the share of fuel and power was 4.64 and 4.52 per cent respectively and that of electricity, 0.40 and 0.62 per cent respectively. The growth in electricity consumption has not been strong enough to facilitate a pronounced rate of substitution for other fuels, especially, the traditional one, kerosene oil. The percentage share of electricity in the private final consumption expenditure on total fuel and power grew from 8.63 per cent in 1980-81 to 20.57 per cent in 1997-98, marking an average annual compound growth rate of 5.24 per cent, while that of kerosene oil fell from 69

15.22 per cent to 10.5 per cent over the same period at a decay rate of (−)2.16 per cent p.a. This in turn suggests a very weak marginal rate of substitution of electricity for kerosene oil (or elasticity) of just about (−) 0.40; i.e., one percentage increase in the share of electricity consumption expenditure could on an average substitute for (or induce a fall of) 0.4 percentage in the share of kerosene oil consumption expenditure. In short, electricity could not yet make an effective inroad upon the economic life in India in general to the extent it should have done.

An Input of Insignificance

The results of the National Sample Survey on consumer expenditure also present the same picture of an input of insignificance. The percentage share of monthly per capita consumption expenditure on fuel and light in the total has been hovering about a virtually frozen range of 6-7 per cent over the whole second half of the last century in India. The expenditure share in 1995-96 (52nd round of National Sample Survey: NSS) for the rural consumers was just 7.1 per cent and for the urban ones, 6.4 per cent (Household Consumer Expenditure and Employment Situation in India, 1995-96, NSSO, Government of India, September 1998). The proportions were respectively 6.2 and 5.6 per cent in 1952-53 (5th round), 5.8 and 6.0 in 1960-61 (16th round), and 7.5 and 6.8 per cent in 1987-88 (43rd round). Implicit in this almost stationary consumption share scenario might be some stunning explanations of a stunted expression of Engel’s law.

For a more concrete example, let us consider the case of the connected consumers themselves. The per capita electricity consumption of the connected domestic customers (that made up about 75 per cent of the total customers) in India in 1995-96 was 772.32 units at an average tariff rate of Ps. 95.94 per unit (for 18 State Electricity Boards), thus giving in general an average per capita electricity consumption expenditure of Rs. 740. 97 (or, Rs. 61.75 per month) – only 7.04 per cent of the per capita income (of Rs. 10524.8) of that year. In the case of Kerala State, the electricity consumption per (electrified) domestic consumer in 1997-98 was 953.72 units at an average rate of Ps. 76.96 per unit, that indicates an average domestic consumption 70

expenditure of Rs. 733.99 (or Rs. 61.66 per month) on electricity. The domestic sector that made up about 75 per cent of the total customers nearly consumed 50 per cent of the electricity sold in Kerala in that year. In general, the consumption of electricity per connected consumer in Kerala in 1997-98 was 1480.71 units at an average tariff rate of Ps. 123.74 per unit, giving an average electricity consumption expenditure of Rs. 1832.16 (or, 152.68 per month) – only 15.35 per cent of the per capita income (of Rs. 11936) of that year. Similarly, as a substantial share of residential and commercial electricity consumption goes to serve basic need of lighting which is fairly unresponsive to income rather than to more income elastic, luxury end uses, power demand remains less income elastic also to this extent. Moreover, the whole edifice of demand analysis crumbles to dust in an encounter with power cuts and load shedding, that restrict actual consumption to availability rather than to actual requirement which is the long run experience of Kerala.

Power demand Analysis and Power Cut

Till the turn of the 1980s, Kerala had been pictured as a power surplus state, ostensibly exporting power to neighbouring states (Period I). Since the draught year of 1982-83, unprecedented power shortage has become a part of life in the state (Period II). Recurring draught coupled with inadequate installed capacity has thus unleashed a reign of power cuts and load shedding, constraining the actual demand down. Reliance on past demand data for forecasting purposes thus becomes grossly erroneous and highly questionable. If some measurement of these shortages is possible to be made, the constrained demand can be adjusted accordingly to arrive at a probable measure of unsuppressed demand, which in turn can be used as data base for forecasting. One method is to assume first that when restrictions are imposed on consumers, their level of consumption is held at some fraction of their consumption during an earlier base period. Then the shortfall in supply equal to these percentage restrictions can be found and inflated by a factor that reflects suppressed growth in demand since the base period and the impact of unscheduled load shedding. This in turn can be used to adjust the suppressed demand data (World Bank, 1979: 13). Another method uses as demand inflative factor, the fraction of customers affected by load shedding during peak period 71

and thus deprived of chance to contribute to peak period demand. The main problem with all such methods is the non-availability of accurate data and information.

We have seen that the economic relationship demand is hypothesized to have with (per capita) income and unit price is weak and hence unwarranted in the case of an underdeveloped power system such as in India/Kerala. This then leaves us with a bare option of demographic variables to consider in the power demand analysis. However, with nearly 50 per cent of the households in Kerala (and nearly 60 per cent in the rural areas) remaining unelectrified (as per 1991 Census; the restrictions imposed on providing connections on account of the serious power shortage may not have altered this situation substantially in the later years), population may not be a sufficiently strong variable. The search for the appropriate power demand determinants may thus end in the domain of technical factors such as number of electricity consumers, etc., which are more immediate and direct causatives. The growth of demand for power is generally assumed to be determined by the growth of number of (connected) consumers and that of intensity of their power consumption (i.e., electricity consumption per customer), as also the interaction between these two factors. (See P. P. Pillai, 1981: 81 – 82; Henderson (1975) uses sectoral output in the place of number of consumers.)

Another immediate factor of influence is connected load, the total of the rating (in kilowatts) of all the electricity using appliances installed on a consumer’s premises. This also may be considered along with the relevant intensity of energy consumption (electricity consumption per kilowatt (KW) of connected load) and the interaction between the two. However, the demand function we specify correlates power (energy) demand with number of customers only. Other variables, connected load, consumption intensity, and interaction, are left out in view of the serious multicollinearity problem. Moreover, number of customers (N) is more immediate and direct than connected load (CL) in determining energy demand, as not only is N in fact the causative of CL, but also a customer may not use all his electric devices simultaneously or continuously; there are times, on the other hand, when all the consumers together exert demand pressure on the system. There is yet another significant reason. A growing power 72

system is expected to become more and more electricity intensive in that its CL grows faster than N (so that the electricity intensification factor, i.e., connected load per customer (CL/N), increases over time). Despite the restrictions imposed on providing new connections since 1982-83, the domestic, commercial, and LT industrial consumers in Kerala have behaved in the expected line, becoming more electricity intensive (in terms of appliances installations), but the HT-EHT industry and ‘others’ (agriculture, public services, licensees, etc.) have not. This surprising tendency of a faster decaying electricity intensity in the State’s HT-EHT industrial and agricultural sectors has overshadowed the normal growth in the other sectors and been reflected in the aggregate, the growth of CL trailing behind that of N. Thus, considered on all counts, N is preferred to CL in this study.

Any demand analysis in the context of the Kerala power system must distinguish between the Period I of no-power-shortage (till 1981-82), and the Period II of power cuts (since 1982-83), and hence adjust the supply–constrained demand data series of Period II upward in order to ensure its continuity and conformity. This we accomplish here by assuming that the unconstrained demand of Period II is basically a continuum of the demand function of Period I, but for changes in the demand intensity rate. That is, we assume that the Period I demand relationship continues unhampered into Period II also, but with time-varying slope coefficient, which is obtained in effect by allowing the slope of the Period I demand function to grow over time at a rate at which the ‘actual’ rate of change of demand in Period II grows in relation to change in number of consumers.

The so-called ‘Gulf boom’ of increasing remittances of the non-resident Keralites from the Gulf has triggered an unprecedented growth of the housing sector and encouraged an increasing demand for electricity intensive appliances in Kerala especially since the mid-seventies. Number of houses in the electrified group must also have increased (in absolute terms) as a result of the social security schemes of the government (IRTC and IEI, 1996). Though the serious power shortage situation has however entailed restrictions on providing new connections since 1982-83, energy consumption in the domestic and commercial (as also in the LT industrial and agriculture) sectors have 73

outgrown the number of connections, resulting in higher consumption intensity in this period (Period II). Consumption intensity in relation to CL also has increased during this period in the domestic, commercial, and agricultural sectors (but declined with the industrial customers). Thus, in the aggregate, while N has grown at an exponential rate of 6.68 per cent and CL at 6.1 per cent during Period II, consumption of electricity (C) has increased at 7.14 per cent per annum.

A New Methodology

Thus we find that both the ‘horizontal’ (growth of N or CL) and ‘vertical’ (growth of consumption intensity, i.e., growth of C/N or C/CL) growth components contributed to the total growth of consumption during Period II in Kerala. Note that

r(C) = r(N) + r(C/N) = r(CL) + r(C/CL),

where ‘r’ represents the growth rate.

This dramatic growth in consumption intensity (i.e., consumption per customer, C/N as well as per KW of CL, C/CL) in Period II even in the face of power cuts and restrictions is ample evidence to back our assumption of an increasing slope (dC/dN – marginal electricity consumption intensity of customers) of the energy consumption function during Period II in relation to Period I function.

Now, below we discuss how we obtain and apply this time-growing slope coefficient.

We can approximate the ratio of the annual increment of consumption to that of number of customers (∆C/∆N, where ∆C = Ct – Ct-1, and similarly for ∆N) in Period II to the slope (dC/dN) of the actual consumption function obtainable in that period. Next we take the annual average growth rate of this ∆C/∆N series and then allow the slope of the Period I consumption function to grow at this rate during Period II. The intercept of this function (that stands to account for the influence of all other factors on consumption) may also be allowed to vary in Period II; however, we can justifiably 74

assume that the changes in the influences of these other factors are only minimal and that the Period II consumption function stands to keep the continuum upon the Period I function intercept, but with a time-growing slope. Thus the adjusted consumption adjusted

(C

) data series of Period II is obtained as:

Ct

adjusted

t

= a + b(1+r) Nt

where a = Period I function intercept, b = Period I function slope, r = Annual growth rate of ∆C/∆N of Period II, t = 1, 2, ……, for years 1982-83, ……, and N = number of customers.

This then provides us with a measure of unconstrained energy consumption of the actually existing (i.e., constrained) number of customers in Period II – consumption model I. We can also have another scenario of unconstrained consumption of unconstrained number of customers – consumption model II, where we assume no restriction on providing connections, that grow unhampered at the same rate as in Period I of no-power-shortage. Here both C and N are adjusted upwardly in terms of Period I growth structure, and the growth rate of ∆C/∆N is applied to the slope of Period I consumption function along with the adjusted N.

Now that we have the complete consumption data base, consisting of the original consumption data till 1981-82 and the estimated, unconstrained data since 1982-83, we can feed it into an appropriate model for forecast purposes. We follow the logic of multi-variate autoregressive-moving average (MARMA) model, also called transfer function model, that correlates a dependent variable to its own lagged values, current and lagged values of one or more explanatory variables, and an error term whose behaviour is partially ‘explained’ by a time series model.

75

In our bi-variate model Ct = α + β Nt + Ut, the error term Ut representing unexplained variations in Ct, can be analyzed and ‘explained’ by constructing an autoregressiveintegrated-moving average (ARIMA) model to it. The combined regression-time series model is then Ct = α + β Nt + φ

-1

(L) θ (L) ηt,

where ηt is an independently and identically distributed (iid) normal error variate (white noise),

φ and θ are respectively the autoregressive (AR) and moving average 2

(MA) parameters and L is the lag operator (for example, LCt = Ct-1 , L Ct = Ct-2, etc. As the combined model yields a structural explanation of that part of the variance of Ct that can be explained in terms of a structural regression, and a time series explanation of that part of the variance of Ct that cannot be explained structurally, it is likely to provide much better forecasts (see Box and Jenkins, 1970, chaps, 10, 11; Makridakis and Wheelwright, 1978, chap. 11; Pindyck and Rubinfeld, 1981, chap. 20).

It should be acknowledged that forecasting is essentially uncertain, and forecasts should define a range (confidence interval) around a central estimate (expected value, mean) within which it is agreed there is a high probability that the unsuppressed demand will actually fall. Only the central estimates are considered in evaluating the APS and Planning Commission forecasts, but for capacity planning purposes, a range of values would more accurately reflect our extent of knowledge about the future. Moreover, forecasts should be principally concerned with estimating demand more than five years ahead. Generation projects in particular take a long time to gestate; initial decisions on such large commitments of capital must be made at least 4 to 8 years before the project can be commissioned, and generation projects typically have a life of at least 30 years. Hence, the significance of long-range forecasting. Also, there is a need, at the official level, to develop a single, generally acceptable demand forecast (a single exercise that defines a single range of demand to be expected). A first step would be to reach a consensus on the methodology to be employed for short-run 76

forecasts. For long range planning, a macro econometric regression model would suit better.

An advantage of the regression approach is its flexibility, but an equally strong disadvantage is its practicality in the face of possible intrusion of errors. Any decision maker using an analytical tool would like to know where in the whole exercise errors are likely to arise and be able to make allowances for them. In econometric forecasting models, there are five commonly recognized reasons why forecasts may go astray: i) wrong equation or variables are used (specification error) ii) coefficients are inaccurately estimated (estimation error), iii) one-off shocks to the economy may alter the underlying situation (eg., the oil crisis or a change in state boundaries), iv) more gradual change s in the economic structure may not be anticipated (e.g., an acceleration of rural electrification with a new government), and v) the exogenous variables, the weather, level of economic activity, etc., may be incorrectly forecast. In a relatively simple model, it is easier to make allowances for errors of types (iii) to (v). Again, a simple forecasting model is to be preferred to taking the risk of making specification errors with a complex model.

Uncertainty accompanies every forecast. The range of uncertainty of the forecast increases with the length of the period of forecast. For very long forecast periods, nonformal methods are utilized, such as some variant of the Delphi method. The basic features of this approach are the use of questionnaires, anonymous collective work, iterative precisioning of results, evaluation of the experts’ qualifications, and a statistical evaluation of the obtained answers.

77

78

Chapter 4: A Time Series Analysis

We put equations in time; But who puts time in equations?

Remember both the data bases we have used here are the ones actually constrained by power shortages – power cuts and load shedding. It goes without saying that reliance on and use of these data for forecasting purposes just involves high risk of errors of underestimation, as already explained – for example, the steady decrease that is observed in the APS demand forecasts for Kerala is an apt case in point. Hence the significance of the need to adjust these data upward to account for the suppressed part of the actual demand. The methodology explained earlier for accomplishing this is illustrated in detail with the data series of total energy consumption and of maximum demand in the two scenarios (models). Then we take up an analysis of the sectoral energy consumption in the framework of the ‘bottom-up’ method of forecasting total consumption. Once the constrained part of the data series is adjusted upward, the whole data base is used to develop the bi-variate ARIMA model for forecasting.

1. Total (Internal) Energy Consumption

The electricity consumption (C) – number of customers (N) linear relationship during Period I (1957-58 to 1981-82) is found to be highly significant and with very small simulation errors:

Ct = 382.959 + 1.625 Nt (5.755) (19.224) 79

2

Adjusted R = 0.9389;

SER = 197.119;

F = 369.575;

DW = 0.3073; RMSE = 189.07;

TIC = 0.0587;

where figures in brackets are t-values, SER is standard error of regression, DW is Durbin-Watson statistic, RMSE is root-mean-square error, and TIC is Theil inequality coefficient. The low DW statistic indicates the presence of positive first-order serial correlation. However, no correction is made here and we carry over these vestiges on to the next period; the whole data base is purged of time-series disturbances at the stage of the development of our BARIMA model.

During 1982-83 to 1997-98, electricity consumption in the State grew at an exponential rate of 7.14 per cent per annum. The widely fluctuating consumption data series of this period is smoothed out using this annual growth rate in order to obtain an increasing trend over the period. The series of the ratio of the annual increment of this consumption to that of number of customers (∆C/∆N) is then obtained, and its growth rate is found to be 1.734 per cent per annum.

Now, the slope coefficient (1.625) of the Period I relationship is made to grow annually at this rate during Period II, and the corresponding consumption series is generated for the respective number of customers upon the same Period I function intercept which gives the adjusted, unconstrained, consumption. Thus we find that in 1982-83, the first year of the worst draught era, the actual (i.e., constrained) consumption fell by 71.5 MU over the previous year (in 1981-82, consumption increased by 144.1 MU over 1980-81), whereas if the above consumption – number of customers relationship continued, unhampered by the restricting draught, but with an increased consumption intensity of the actual number of customers, into that year also, it should have been about 3637.6 MU. Similarly, in 1997-98, the last year of the period of our study, the energy consumption should have been 11528.5 MU, given the actual number of customers, instead of the restricted 7716.23 MU.

80

The whole data series, consisting of the original data up to 1981-82 and the adjusted one since 1982-83, when correlated with the actual number of consumers, gives the following relationship:

Ct = 59.468 + 2.0501 Nt (0.755)

(60.093)

2

Adjusted R = 0.989; SER = 336.956;

F = 3611.22;

DW = 0.1685; RMSE = 328.635;

TIC = 0.0343;

To improve upon the forecasting performance of this model, we construct and apply an ARIMA noise model to its residual series (Ut). Based on the examinations of the patterns of the sample autocorrelation and partial autocorrelation functions, we have tried a number of specifications and tentatively chosen an ARIMA(1, 0, 2) model for these residuals. That is, in lag polynomial terms, Ut = (1 − φ1 L)

where

−1

(1− θ1 B − θ2 L ) Vt 2

φ1 is AR parameter, θ1 and θ2 are MA parameters and Vt is an iid normal

variate. The parameter estimates are:

Estimate of

φ1 = 0.893; t-statistic = 13.519;

Estimate of θ1 = 0.240; t-statistic = 1.731; Estimate of θ2 = 0.582; t-statistic = 4.21;

81

2

Adjusted R = 0.856; SER = 127.755; DW = 1.9404;

χ2 (3, 40) = 8.53;

RMSE = 121.199;

TIC = 0.1885.

The parameters θ1 and in the θ1,

F = 78.278;

θ2 of a MA(2) model constrained to lie in the triangular region

θ2 plane defined by the inequalities: θ2 + θ1 < 1; θ2 − θ1 < 1; and −1 < θ2 < 1.

Similarly, the parameters

φ1, and φ2 of an AR(2) process must lie within the same

triangular region as do the MA(2) parameters. The restriction is necessary for the AR(2) model to ensure stationarity while for an MA(2) process, it is to ensure invertibility, since any MA process of infinite order is necessarily stationary. (For an AR(1) process, the necessary condition for stationarity is −1