Portfolio Diversification 1785481916, 9781785481918

Portfolio Diversification provides an update on the practice of combining several risky investments in a portfolio with

894 170 8MB

English Pages 274 [267] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Portfolio Diversification
 1785481916, 9781785481918

Table of contents :
Cover
Portfolio Diversification
Copyright
Introduction
1 Portfolio Size, Weights and
Entropy-based Diversification
2 Modern Portfolio Theory
and Diversification
3 Naive Portfolio Diversification
4 Risk-budgeting and Risk-based Portfolios
As
5 Factor Models and Portfolio Diversification
6 Non-normal Return Distributions, Multiperiod
Models and Time Diversification
7 Portfolio Diversification in Practice
8 Conclusion
Bibliography
Index
Back Cover

Citation preview

Portfolio Diversification

Quantitative Finance Set coordinated by Patrick Duvaut and Emmanuelle Jay

Portfolio Diversification

François-Serge Lhabitant

First published 2017 in Great Britain and the United States by ISTE Press Ltd and Elsevier Ltd

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Press Ltd 27-37 St George’s Road London SW19 4EU UK

Elsevier Ltd The Boulevard, Langford Lane Kidlington, Oxford, OX5 1GB UK

www.iste.co.uk

www.elsevier.com

Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. For information on all our publications visit our website at http://store.elsevier.com/ © ISTE Press Ltd 2017 The rights of François-Serge Lhabitant to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library Library of Congress Cataloging in Publication Data A catalog record for this book is available from the Library of Congress ISBN 978-1-78548-191-8 Printed and bound in the UK and US

Introduction

Portfolio diversification, or the practice of spreading one’s money among many different investments, aims to reduce risk. It has many parallels in common parlance, for instance, in the old saying “don’t put all your eggs in one basket”. It is also widely advocated in non-financial literature. In William Shakespeare’s The Merchant of Venice (1598), the title character Antonio says: “My ventures are not in one bottom trusted, nor to one place; nor is my whole estate upon the fortune of this present year”. A similar sentiment was expressed in Robert Louis Stevenson’s Treasure Island (1883) where the main villain Long John Silver comments on how he keeps his wealth: “I put it all away, some here, some there, and none too much anywhere, by reason of suspicion”. In Finance, the first official attempt to introduce portfolio diversification goes back to the 18th Century and the development of mutual funds in the Netherlands. The goal was to create diversified pools of securities specifically designed for citizens of modest means. For instance, the mandate of the 1774 “Negotiatie onder de Zinspreuk Eendragt Maakt Magt” (Fund under the Motto Unity is Strength), organized by Abraham van Ketwich, was to hold a portfolio as close to equally weighted as possible of bonds issued by foreign governments and plantation loans in the West Indies. These included bonds from the Bank of Vienna, Russian government bonds, government loans from Mecklenburg and Saxony, Spanish Canal loans, English colonial securities, South American Plantation loans, as well as other securities from various Danish American ventures. All of them were traded on the Amsterdam market.

x

Portfolio Diversification

More formal discussions about portfolio diversification started with Leroy-Beaulieu [LER 06] and his “division of capital” idea. He described it as “ten, fifteen or twenty values, especially values that are not of a similar nature and which were issued by different countries”. Similarly, Lowenfeld [LOW 07] stated that “the only means for insuring permanent investment success consists of the adoption of a true and systematic method of averaging investment risks” and introduced the idea of a “geographical distribution of capital”. Using price data from global securities traded on the London Exchange around the turn of the century, Lowenfeld carried out quantitative studies of the risk-adjusted performance of equally weighted industry-neutral international portfolios. Later, Neymarck [NEY 13] suggested that “a portfolio should be composed of stocks of different sorts which will not be influenced in the same way by a given event and for which, on the contrary, the fall in price of certain stocks would be, as far as possible, counterbalanced by the simultaneous increase of the price of other stocks”. He also introduced the notion of “general scale” risks, which affect all the stocks, and “inside scale” risks, which affect only one company. Today, all these ideas seem remarkably familiar and echo what we would call modern portfolio theory. However, the associated works remained purely literary with occasional calculations but no formal mathematical approach to portfolio diversification. This was probably intentional, as most investors at the time had no advanced education in mathematics and/or statistics. Thus, for many years, investors were aware of the notion of portfolio diversification, probably practiced it informally, but only discussed it in very general terms. Simply stated, the question of the underlying common characteristic according to which it would make sense to diversify assets had never been formally addressed. No analysis had been conducted on how to measure the benefits of diversification with respect to this characteristic. By contrast, Markowitz’s [MAR 52] approach was both normative and positive. He provided not only the scientific arguments to support portfolio diversification, but also the tools to measure it and build better portfolios. Many other authors have since built on these foundations and most portfolio construction paradigms are based on the idea that diversification pays.

Introduction

xi

Warren Buffet once said: “Diversification is protection against ignorance”. Indeed, from a financial perspective, portfolio diversification seems like a common-sense approach in the presence of uncertainty, or equivalently when future asset returns cannot be forecasted perfectly. In such a case, one of the simplest ways to avoid a financial disaster is by holding several investments, which ideally behave differently from each other. Keep in mind that we live in an increasingly litigious society where financial advisors, financial planners and money managers are justifiably afraid of being sued if a company or issuer blows up, causing client asset values to plunge. It is therefore not surprising that portfolio diversification has become an established tenet of conservative investing. The growth of mutual funds and exchange-traded funds has further facilitated the creation of diversified portfolios for smaller and less sophisticated investors, making diversification inherent in portfolio construction. However, this trend seems to have weakened over the recent past. First, many investors have pushed diversification to an extreme by adding assets to their portfolios just for the sake of dampening volatility, rather than based on their investment quality. They often ended up holding a little bit of everything and, more importantly, disappointed by their portfolio performance. Second, most investors endured a hard lesson in 2008 when what they thought were well-diversified portfolios collapsed with the market and experienced large losses. Thus, many of them started claiming that portfolio diversification no longer works. Reality shows that the question of how to best create a well-diversified portfolio is the one that comes with many answers. Going back to basics, even the definition of diversification and the way it can be measured are not unique. Moreover, these notions are continuously evolving with the progression of financial theory. Therefore, it is probably a good time to carefully revisit the concepts behind portfolio diversification, and see how one may integrate them in the portfolio construction. This is exactly what we plan to do in the following chapters. The structure of this book is as follows. Chapter 1 introduces portfolio diversification when there is very little information about the risk and return behavior of the underlying assets. In such cases, investors must rely either on asset weights or on the notion of entropy to assess the diversification of their portfolios. Chapter 2 analyzes the well-known parametric modern portfolio theory and its impact on portfolio diversification. Investors often forget that modern portfolio theory does not aim to deliver a diversified portfolio and, worse yet, that it is structurally biased towards generating concentrated

xii

Portfolio Diversification

portfolios, due to the combination of approximation and estimation errors with an optimization process. There are of course some workarounds to force optimizers to behave in a better way; however, they come with drawbacks. Chapter 3 discusses a very simple heuristic called naïve portfolio diversification, which consists of allocating an equal amount of capital to each asset. Despite its simplicity, it generally performs well and will often be used as a benchmark case when testing more sophisticated approaches. Chapter 4 focuses on risk-based portfolios, meaning the portfolios built through an optimization process exclusively focused on risk allocations and ignoring expected return considerations. Specifically, it reviews: (1) risk parity portfolios, which split risk evenly across assets; (2) the so-called most diversified portfolio, whose exclusive focus is to maximize diversification benefits; (3) the minimum variance portfolio, i.e. the portfolio that has the smallest variance of all possible portfolios. Chapter 5 explores factor-based diversification, including principal components analysis, and how the technique can be used to partition efficiently large universes of assets to increase diversification benefits. Chapter 6 discusses a series of more complex models for portfolio diversification, including the case of non-normal return distributions, the diversification of skewness and kurtosis as well as the use of tail risk measures. It also provides an introduction to multiple period models, in which investors can choose a mix of asset diversification (spreading their wealth over several assets) and time diversification (changing their allocation to assets over time). Chapter 7 discusses various empirical observations on how investors effectively diversify – or not – their portfolios. It also challenges the usual practical interpretation of what correlation coefficients effectively measure. Chapter 8 concludes our journey.

1 Portfolio Size, Weights and Entropy-based Diversification

Investors willing to diversify their portfolio will typically spread it amongst various assets. Their implicit assumption is that diversification increases as a function of the number of assets they hold. In financial literature, the latter is often referred to as “portfolio size” or “number of lines”, and is commonly used as a quick indicator of how well or poorly diversified a portfolio is. Intuitively, we would expect a portfolio made of assets to be more diversified than a portfolio made of assets, if is . For instance, Markowitz [MAR 52] reports that “the larger than adequacy of diversification is not thought by investors to depend solely on the number of different securities held”. Sharpe [SHA 72] also affirms that “the number of securities in a portfolio provides a fairly crude measure of diversification”. However, in practice, there are several cases where these statements happen to be wrong. For instance, a 50-stock portfolio can have all its positions equally weighted at 2%, or be 99% invested in one stock and share the remaining 1% between the other 49 stocks. Both portfolios would have an identical size, but their diversification level would obviously be very different. To be meaningful, a measure of portfolio diversification should therefore consider the distribution of asset weights in its calculation. The concept of diversity was introduced by Fernholz [FER 99] as a measure of the distribution of capital in an equity market. It was later extended by Fernholz [FER 02] and Fernholz et al. [FER 05] in the context

2

Portfolio Diversification

of stochastic portfolio theory1. Heuristically, a market is considered as being “diverse” if its capital is spread amongst a reasonably large number of assets rather than concentrated into a few very large positions. The same notion is applicable in the context of a long-only portfolio of assets, which is a subset of the market. We will say that a portfolio exhibits diversity if no single asset or group of assets dominates it in terms of relative weighting. This definition is both intuitive and simple. At one end of the spectrum, a “completely diverse” portfolio will have its capital equally distributed across all the assets. At the other end of the spectrum, a “completely undiversified” portfolio will concentrate all its capital into one single asset. All other portfolios will fall between these two extreme cases. What we need now is a quantitative indicator to measure and compare the degree of diversity for various portfolios, or by symmetry, their degree of concentration2. A capital distribution curve is useful to visualize the concentrated or diverse nature of a given portfolio. It is essentially a graph showing the portfolio weights versus their respective ranks in descending order, most of the time in a log–log format – it uses logarithmic scales on both the horizontal and vertical axes. Figure 1.1 shows the capital distribution curves for the Swiss Market Index (SMI, 20 stocks), the French CAC 40 index (40 stocks) and the German DAX index (30 stocks). The SMI seems to be the most concentrated of these indices. This was somehow predictable given the very large weighting of its top three components at the end of 2016 (Nestlé: 22.7%; Novartis: 19.5%; Roche: 16.3%) compared with the CAC 40 (Total: 10.9%; Sanofi: 8.4%; BNP Paribas: 6.4%) and the DAX (Siemens: 10%; Bayer AG: 8.8%; and BASF: 8.7%). Keep in mind that the DAX and the CAC 40 have a cap on the weights of their components fixed at 10 and 15%, respectively, while the SMI has no cap. Capital distribution curves are useful visual indicators, but they do not summarize all the portfolio weights into one single number or index. It would be helpful to have a descriptive statistic to summarize diversity information. Fortunately, many concentration and diversity measures have 1 Stochastic portfolio theory offers a relatively novel approach to portfolio construction. It aims at building portfolios that outperform an index over a given time horizon with probability one, whenever possible. 2 Portfolio concentration and portfolio diversity are two sides of the same coin. If we can measure one, we should easily be able to translate it into a measure of the other. Intuitively, there are close conceptual ties between portfolio concentration, portfolio diversity and the inequality of the portfolio weights, as we will see in this chapter.

Portfolio Size, Weights and Entropy-based Diversification

3

already been introduced in various areas of economics such as welfare and monopoly theory, competitive strategy or industrial economics. Such measures have typically been used to assess the competition that, in a given market, country, or group of countries, with a generally accepted view that competition should benefit consumers, workers, entrepreneurs, small businesses, and the economy more generally. By making a few adjustments, these measures can also be used to assess a portfolio’s concentration level. 1

10

100

100%

Rank SMI CAC 40 DAX

10%

1%

Weight (%) 0%

Figure 1.1. Capital distribution curves calculated for the Swiss Market Index (SMI), the CAC 40 index and the DAX index as of October 2016

Other interesting concentration and diversity measures come from the worlds of physics and/or information theory, given the functional parallelisms that exist between nature, the transmission of information, probabilities and financial markets. These measures are also directly applicable in the context of assessing a portfolio’s diversification level. Before reviewing these measures and discussing which might be “superior”, let us first introduce a few mathematical notations.

4

Portfolio Diversification

1.1. Mathematical notations In its most general form, a portfolio can be modeled mathematically as a collection of “exposures” to each of the underlying assets in the investment universe. The term “exposure” should be taken in a broad sense here. It can include any non-negative numerical value, for instance, notional dollars invested, beta-adjusted dollar amounts, duration-adjusted capital, market values, etc., as long as the individual exposures add up to the overall portfolio exposure. We will denote by the portfolio exposure to asset and by ∑

[1.1]

the total exposure of the portfolio portfolio as

. We define the weight of asset

in [1.2]

For notation consistency with other chapters, we will store these weights as an N×1 column vector :

[1.3]

In some instances, we will need to sort these weights from the smallest one to the largest one. To avoid confusion, we will use a tilde for sorted weights, so that :

[1.4]

is the vector of sorted weights in ascending order, e.g. ⟹ . Since all exposures are required to be non-negative, all the weights are also non-negative: 0

[1.5]

Portfolio Size, Weights and Entropy-based Diversification

5

By construction, in a fully invested portfolio, asset weights must sum to one ∑

1

[1.6]

which implies that 1

0

[1.7]

These properties are essential because they will allow us to interpret asset weights as probabilities. Formally, we now have a probability space whose elementary elements are the random selection of a “small” fractional portfolio exposure out of the overall portfolio exposure , and the identification of the that it belongs to. By construction, any fractional portfolio exposure ∈ exposure will be part of an actual exposure and will have a probability ≡ associated with it. From there, we can define two random variables: , which corresponds to the size of the exposure it belongs to, and , which is the ranking order of the exposure it belongs to. While this recasting of portfolio weights as probabilities may look cumbersome at first glance, it will allow for a consistent and uniform interpretation of the various portfolio concentration and diversity measures to be discussed. 1.2. Portfolio concentration and diversity measures As mentioned above, several metrics are available to capture the concentration of a given portfolio. Most of them take the form of a weighted sum of some function of the portfolio weights. The weighting scheme used in the sum essentially determines the sensitivity of the concentration measure towards changes at the tail end of the asset weights distribution. In practice, there are four types of commonly used weighting schemes to calculate the concentration measure: (1) use weights of one for some assets and zero for others; (2) use portfolio weights as weights; (3) use the rankings of the asset weights as weights and (4) use the negative of the logarithm of the portfolio weights as weights. Each of these approaches will be illustrated shortly. In addition, the concentration measure may also be discrete and focus only on some assets in the portfolio, or it may be cumulative and use the entire set of asset weights. The former is simpler when the portfolio is dominated by a few assets. The latter is better when any asset in the portfolio might have an influence.

6

Portfolio Diversification

1.2.1. Properties of a portfolio concentration measure Several attempts have been made to propose an axiomatic approach, i.e. a set of desirable properties that a “good” concentration measure should possess3. The first six listed below are the most important ones: 1) Transfer principle: A concentration measure should not decrease when a given exposure is reduced and a larger exposure is increased by the same amount. 2) Uniform distribution principle: A concentration measure should attain its minimum value when all exposures are of equal size. 3) Lorenz-criterion: If two portfolios, which are composed of the same number of exposures, satisfy that the aggregate size of the biggest exposures of the first portfolio is greater or equal to the aggregate size of the biggest exposures in the second portfolio for 1 , then the same inequality must hold between the concentration measures of the two portfolios. 4) Super-additivity: If two or more exposures are merged into one within a portfolio, its concentration measure should not decrease. 5) Independence of exposures quantity: Consider a portfolio consisting of exposures of equal size. An increase in the number of exposures should not lead to an increase in the portfolio concentration measure. 6) Irrelevance of small exposures: Adding a new exposure of a relatively small amount to a given portfolio should not increase its concentration measure. A few remarks on these properties can be useful. First, properties 1–3 consider portfolios with a fixed number of exposures, while properties 4–6 introduce a change in that number. Second, we can prove that if a concentration measure satisfies property 1, then it also fulfills properties 2 and 3. Further, if a concentration measure satisfies properties 1 and 6, then it also meets property 4. Finally, fulfilling properties 2 and 4 imply that property 5 is met. Thus, if a concentration measure satisfies properties 1 and 6, then it satisfies all six properties4.

3 See, for instance, Marfels [MAR 71], Hause [HAU 77], Hannah and Kay [HAN 77], Curry and George [CUR 83] or Becker et al. [BEC 04]. 4 For a formal demonstration of this, please see Calabrese and Porro [CAL 06].

Portfolio Size, Weights and Entropy-based Diversification

7

Additional desirable properties are also occasionally discussed in the academic literature: 7) Symmetry: A permutation of two exposures should not change the value of the concentration measure. 8) Withdrawal and entry: The concentration measure should be unaffected by the withdrawal or addition of an asset with zero exposure. 9) Maximum concentration: the concentration measure should reach its maximum value when only one exposure is present. As we will see in the next sections, not all concentration measures commonly used in practice satisfy all these properties. 1.2.2. The concentration ratio One of the most straightforward ways to measure the concentration of a portfolio is to calculate the cumulative weight of its -largest positions, where is an exogenous parameter. The corresponding indicator is called the concentration ratio of order . It is defined as ∑

[1.8]

The result varies between / (portfolio with equally weighted assets) and 1 (highly concentrated portfolio, particularly if is small). The concentration ratio verifies the six properties discussed in section 1.2.1, but it suffers from several drawbacks. The main one is the fact that number is arbitrary and must be chosen cautiously. If is too small, will only consider a small fraction of the entire distribution of exposures. If is too large, the concentration ratio conveys less and less information. In the most extreme case, → , → 1 and CR becomes useless. As an illustration, Figure 1.2 shows the concentration ratios of the top 1, 5, 10 and 20 positions for various equity indices. In all cases, the SMI is the most concentrated of these indices, and the equal version of the S&P 500 is the least concentrated.

8

Portfolio Diversification

Figure 1.2. Concentration ratios for various equity indices

The probabilistic interpretation of the concentration ratio of order is as follows: is the likelihood that a randomly selected unit exposure will belong to the largest exposures. Technically, it is the cumulative distribution function of the (sorted) portfolio weight distribution. Intuitively, a concentrated portfolio will have a high , or equivalently, a high likelihood that a randomly selected unit exposure will be part of a particularly large position. 1.2.3. The Herfindahl–Hirschman index (HHI) Originally introduced by Herfindahl [HER 50], an environmental economist, and Hirschman [HIR 64], a member of the Institute for Advanced Study at Princeton University, the HHI is a widely accepted measure of the degree of competition in a market, or conversely, the closeness to a monopolistic market structure5. If we consider a market in which

5 It is called the Simpson [SIM 49] diversity index in biology and ecology, the Greenberg [GRE 56] diversity index in linguistics, and the Blau [BLA 77] index in sociology.

Portfolio Size, Weights and Entropy-based Diversification

9

N companies operate, and denote the market share of the ith company by 1, the HHI is defined as with 1, . . . , and 0

,



[1.9]

All other things being equal, a market is said to have no concentration if 0.15, moderate concentration if 0.15 0.25, and high concentration if 0.25 . The HHI verifies the six properties discussed in section 1.2.1. The HHI can be adapted to assess the degree of concentration of a portfolio by replacing market shares by asset weights: ∑

[1.10]

The result is bounded between 1/ (portfolio with equally weighted assets) and 1 (portfolio holding one single asset). Note that, by construction, the HHI puts higher weights on the larger assets of a given portfolio. Since the lower bound of the HHI changes with the number of assets, it is difficult to compare the concentration level of portfolios of different sizes, particularly if they are well diversified. To avoid the issue, we can use the scaled HHI, calculated as follows: ∑

/ /

[1.11]

The scaled HHI is bounded between 0 (equally weighted portfolio) and 1 (single asset portfolio). It is invariant to the number of portfolio holdings. Note that the functional form of the normalized index suggests that it describes how far the weights are from an equal allocation. We will come back on that point later. As an illustration, Figure 1.3 shows the HHI and the scaled HHI for a series of equity indices. Once again, it suggests that the SMI is the most concentrated index in our sample, while the equally weighted version of the S&P 500 is the least concentrated (with its SHHI equal to zero).

10

Portfolio Diversification

Bloomberg European 500 Scaled HHI

HHI

SP 500 EQW SP 500 Hang Seng Nikkei SMI FTSE 100 DAX CAC 40 Eurostoxx 50 0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Figure 1.3. HHI and scaled HHI for a series of equity indices

The HHI can be expressed in terms of the moments of the underlying asset weights distribution, as ∑ where

denotes the expectation operator. Since

[1.12] 1/ , we obtain [1.13]

is the variance of the asset weights. Equation [1.13] clearly where illustrates the complex relationship between the number of assets ( in the universe and the concentration of the portfolio, as also depends on . In addition, it also shows that very different portfolios may end up with the same HHI.

Portfolio Size, Weights and Entropy-based Diversification

11

The HHI also has an interesting probabilistic interpretation. It is the ratio of the expected exposure ∑



over the total portfolio exposure . Note that the expected exposure different from the average exposure , which is defined as ∑

[1.14] is

[1.15]

The expected exposure picks a unit exposure at random. So, a higher portfolio concentration implies a higher expected exposure. In the most extreme case, there is only one single exposure in the portfolio, with unit . By contrast, the average exposure picks one of probability and the exposures at random. It is therefore not affected by changes in the portfolio concentration6. 1.2.4. The Lorenz curve and the Gini index Another interesting way to visualize portfolio concentration is to draw a Lorenz curve, as devised by the American statistician Conrad Lorenz in 1905. It requires sorting weights in descending order and then plotting their cumulative value on the y-axis against the cumulative proportion of the sample on the x-axis. In the case of no concentration (i.e. equal weights for all assets), the Lorenz curve becomes a straight 45-degree line. If there is some concentration, the resulting curve will plot below that 45-degree line. Figure 1.4 provides an example of such curves for the SMI, the CAC 40 index and the DAX index. Formulated by the Italian statistician Gini [GIN 21], the Gini coefficient is a summary statistic of the Lorenz curve and a commonly used measure of inequality. The Gini coefficient is calculated as the ratio of the area that lies between the straight 45-degree line and the Lorenz curve (marked Area A in Figure 1.5) over the total area over the straight 45-degree line (marked Areas A and B). That is, the Gini coefficient is equal to / .

6 From a mathematical perspective, the average and the expected exposure calculate the same quantity, but under different probability measures. They therefore provide different numerical results.

12

Portfolio Diversification

100%

Cumulative weights (%)

90% 80%

SMI CAC 40 DAX

70% 60% 50% 40% 30% 20% 10%

Percentile of the number of stocks

0% 0%

20%

40%

60%

80%

100%

Figure 1.4. Lorenz curves of the SMI, the CAC 40 index and the DAX index

100% Cumulative weights (%)

90% 80%

SP 500 Equal weighting Single asset portfolio

70% 60% 50%

Area A

40% 30%

Area B

20% 10% Percentile of the number of stocks 0% 0%

20%

40%

60%

80%

Figure 1.5. The Gini coefficient for a Lorenz curve is calculated as the ratio of the surface of Area A divided by the surface of Areas A and B

100%

Portfolio Size, Weights and Entropy-based Diversification

13

Mathematically, the Gini index of concentration is calculated as the mean expected absolute deviation between all pairs of observations scaled by their mean. In a case of a given portfolio, this gives: ∑

GIC



[1.16]

/

Using the sorted weights expression simplifies into7 1

GIC

1/ , this

and the fact that

2∑

1

[1.17]

The result varies between 0 (portfolio with equally weighted assets) and 1 (portfolio holding one asset only). The Gini index verifies properties 1, 2, 3 and 5 discussed in section 1.2.1, but not properties 4 and 6. Bloomberg European 500 SP 500 EQW SP 500 Hang Seng Nikkei SMI FTSE 100 DAX CAC 40 Eurostoxx 50 0.00

0.10

0.20

0.30

0.40

0.50

0.60

Figure 1.6. Gini coefficients for a series of equity indices

7 Allison [ALL 78] provides a similar formula, but it is unfortunately incorrect.

0.70

14

Portfolio Diversification

Figure 1.6 shows the Gini coefficients for our series of equity indices. As per this figure, the SMI is no longer the portfolio with the highest inequality in terms of weights, it is replaced by the Nikkei. As expected, the equally weighted version of the S&P 500 is the index with the least inequality of portfolio weights. Figure 1.6 also highlights one of the weaknesses of the Gini coefficient, namely that Lorenz curves can have different shapes, yet still yield similar Gini coefficients. For instance, the Gini coefficients of the S&P 500, the SMI and the FTSE 100 are very close to one another, but their respective weight distributions are very different. 1.2.5. Other concentration indices There are many alternative indices that we could use to measure portfolio concentration or diversity. Hereafter, we will just mention a few that are frequently quoted in the economic literature. 1.2.5.1. The Hall–Tideman index In the context of measuring industrial concentration, Hall and Tideman [HAL 67] have suggested using the following index: HT





[1.18]

As it weights each exposure by its rank, the Hall–Tideman index will typically give more importance to larger positions and to the total number of assets considered. Like the HHI, its value varies between 1/ (equally weighted portfolio) and 1 (single asset portfolio). We will not spend a lot of time on it, because it is directly linked to the Gini index: HT

[1.19]

However, we should note that the Hall–Tideman index verifies the six important properties outlined in section 1.2.1, which in theory makes it a better measure than the Gini coefficient.

Portfolio Size, Weights and Entropy-based Diversification

15

1.2.5.2. The Hannah–Kay index In the context of measuring industrial concentration, Hannah and Kay [HAN 77] observed that the HHI is just one amongst several measures that uses the sums of the market shares weighted by the shares themselves, raised to some power. They therefore suggested a more general formula as a concentration indicator, namely: HK



[1.20]

with 0 and 1. The parameter is an elasticity parameter, which allows varying the weighting attached to the upper portion of the weight distribution relative to the lower portion. A high emphasizes the role of larger positions, while a low emphasizes smaller positions. In the case of 2, we obtain the HHI. The Hannah–Kay index verifies the six properties discussed in section 1.1. The Hannah–Kay index is inversely proportional to the level of concentration. For practical purposes, it is often replaced by the reciprocal Hannah–Kay index, which is defined as RHK



[1.21]

with 0 and 1. The reciprocal Hannah–Kay index is proportional to the level of concentration. It varies between 1/ (equally weighted portfolio) and 1 (portfolio of one unique asset). 1.2.5.3. The comprehensive concentration index Introduced by Horvath [HOR 70], the comprehensive concentration index aims to solve two diametrically opposed problems. On the one hand, dispersion measures like the Gini coefficient tend to undervalue the importance of assets with large weights. On the other hand, discrete concentration measures ignore smaller positions in the portfolio and exclusively focus on the largest ones. To circumvent these deficiencies, the comprehensive concentration index attempts to reflect both relative dispersion and absolute magnitude. It is defined as the weight of the

16

Portfolio Diversification

largest position, plus the sum of the square of the weights of other assets weighted by a multiplier that reflects the proportional size of the rest of the portfolio. ∑

1

1

[1.22]

The index equals 1 for a single-asset portfolio. For an equally weighted , 3 1 / long-only portfolio, it reaches a minimum value of 3 which is always larger than the largest asset weight. 1.2.5.4. The variance of natural logarithms of weights Lipczynski et al. [LIP 09] suggested using the variance of the natural logarithms of the weights as a relative concentration measure. Mathematically: ∑



ln

ln

[1.23]

which can also be written as ∑



ln

ln

[1.24]

Since the weights are expressed as positive percentages and sum to 1, the natural logarithms of the weights are negative except for the case of a singleasset portfolio. The minimum value of is zero; it is reached for an equally weighted portfolio. The maximum value varies as a function of the number of assets in the universe. It can exceed one, but it declines consistently as more assets are added to the portfolio with a very small weight, and the weight of the largest asset is reduced accordingly. 1.2.5.5. Some academic measures Academics have introduced various other weight-based measures of diversification, but most of them are rarely used in practice and thus have no official name. For instance, Chamberlain [CHA 83a] uses the -norm and defines a well-diversified portfolio as one with lim





0

[1.25]

Portfolio Size, Weights and Entropy-based Diversification

17

For portfolios with a very large number of assets, this measure is directly linked to the HHI. Ingersoll [ING 87] defines a portfolio as being fully diversified if lim





→0

[1.26]

with probability one and lim





C



[1.27]

is for all and C is a constant independent of N. This is where obviously a stricter condition than that defined by Chamberlain, but it suffers from the same issue: investors wants to know the exact weight of each asset in their portfolio, and how these amounts can be discerned when the number of assets goes to infinity remains unclear. More recently, Bouchaud et al. [BOU 97] observed that characterizing the concentration of portfolio weights is a problem that resembles analyzing the random spin structure of a spin glass8. They therefore propose using the norm ∑

[1.28]

as an indicator of portfolio concentration. Obviously, the case 1 is 1 for all portfolios. For → 1, 1 / 1 uninteresting as converges towards the Shannon entropy which will be discussed in section 1.3. The measure is therefore a generalization of Shannon’s entropy. When 2, is equivalent to the HHI. In an equally weighted 1/ , which converges to 0 for large values of . Bouchaud portfolio, et al. therefore suggest using 1/ as an approximation of the number of effective assets in a portfolio. We will revisit this idea later.

8 Spin glasses are disordered magnetic alloys that exhibit a variety of properties that are characteristic of complex systems. The orientation of the north and south magnetic poles of their component atoms are not aligned in a regular pattern. See Mézard et al. [MÉZ 87] for a discussion.

18

Portfolio Diversification

1.3. Entropy The notion of entropy was introduced by Clausius [CLA 65] in thermodynamics to explain the maximum energy available for useful work in heat engines. Boltzmann [BOL 77] applied it in classical statistical mechanics to measure the uncertainty that remains about a system after observing its macroscopic properties (such as pressure, temperature or volume). Shannon [SHA 48] introduced it in engineering and mathematics, and gave it a probabilistic interpretation in the theory of communication and transmission of information (a.k.a. “information theory”) and used it to capture uncertainty. More recently, Theil [THE 67, THE 72] developed several applications of Shannon’s entropy and more generally of information theory in economics. 1.3.1. Defining Shannon entropy Although it is well beyond the scope of this book to engage in a comprehensive discussion of the Shannon entropy, some historical context is important to understand it. Shannon was originally trying to model a general communication system made of five parts: (1) a source, which generates a message symbol by symbol; (2) a transmitter, which turns the message into a signal to be transmitted; (3) a channel, which is the medium used to transmit the signal from the transmitter to the receiver; (4) a receiver, which reconstructs the message from the signal; and (5) a destination, which receives the message. Shannon deliberately decided to ignore the semantic aspects of the message and focused on the physical and statistical constraints limiting its transmission, notwithstanding its meaning. He modeled the source as a stochastic process choosing successive states (symbols) with given probabilities. Let us denote by , … , the possible states, by ,…, their respective probabilities and by ,…, their associated probability distribution. Shannon introduced the quantity 1/ as a measure of the “information” generated at the source by the occurrence of state , knowing the probability distribution . In a sense, is the “surprise” in observing the occurrence of state , given prior knowledge on the source summarized in . We can show that the log function is the only functional form that satisfies all the properties required from an information function. Since the

Portfolio Size, Weights and Entropy-based Diversification

19

source produces sequences of states to form a message, Shannon defined the entropy of the source as the average amount of information it produces: SE



1/

[1.29]

Similarly, the entropy of the receiver will also be defined as the average amount of information it receives. In information theory, entropy is measured in bits, thus the base 2 logarithms. In communication theory, entropy corresponds to the minimum number of bits that should be transmitted to discriminate one message from all the other possible ones. However, the definition of entropy is only unique up to a positive multiplicative constant, so we may choose any positive real number as a base. In the rest of this book, we will follow the usual convention in Economics and use natural logarithms rather than base 2 logarithms. At this stage, the notion of what “information” represents might be unclear, so let us try to illustrate it by a simple example. Suppose we are receiving an English message made of a single word through a communication channel and that we learn that the first letter is a “t”. This indication is useful, but it provides limited information as “t” is the most frequent letter in any English word (16.67% frequency). By contrast, if this letter was a “z” (0.034% frequency), then the information content would be higher. In this example, the amount of information is a function of the underlying probabilities. In general, English characters have low entropy (0.6–1.3 bits of entropy per character), or stated differently they are predictable. By contrast, Chinese characters have much higher entropy (11.3 bits per character) and are therefore much more difficult to predict. In a more general context, the Shannon entropy measures the degree of uncertainty – or by symmetry, the degree of predictability – of a dynamic stochastic system and its associated probability distribution. It is calculated by weighting some information values by their respective probabilities9. For ,…, where instance, say there are different possible states, with 0 represents the probability of state 1, … , . States with a smaller probability yield more information, because they are the least expected. A measure of the amount of information in such a system should therefore be a . In general, there is function of the probabilities . We will denote it by an infinite set of possible information measures based on arbitrary functions of 9 For the sake of simplicity, we will limit our discussion to the case of discrete probability distributions.

20

Portfolio Diversification

. How can we select the most suitable one? Shannon further explored this idea, and came to the conclusion that the measure should satisfy three requirements: (1) it should be a continuous function of the ; (2) if all the are equal, should be a monotonic increasing function of N and (3) it should be additive. Shannon then proved that the only function satisfying these three requirements (up to a multiplicative constant) was: SE



[1.30]

where . denotes the natural logarithm function and we define 0 0 ≡ 0 because lim → x log x 0. This quantity has become known as the Shannon entropy. It is minimum SE 0 if one state has a probability of one and all other states have a probability of zero, i.e. the system is fully deterministic. It reaches its maximal value SE ln if each state is equally likely, which means that the system is unpredictable. Note that maximum entropy increases with , but decreasingly so. The idea of measuring information by the Shannon entropy can easily be applied in the context of a long-only portfolio, because asset weights display the structure of a proper probability distribution (they sum to 100% and are all positive). We can therefore treat them as the probability mass function of a random variable and calculate the Shannon entropy of a given portfolio as follows: ∑

[1.31]

The result is bounded between zero for a single-asset portfolio and N for an equally weighted portfolio. The logic is that when the entropy is equal to zero, there is no uncertainty about the outcome for the “random” variable, since the portfolio is only made of one asset. When entropy reaches its maximum, the uncertainty about the value (from the set of possible outcomes) of the “random” variable is the highest, since each asset is equally weighted. In a sense, increased portfolio diversification reduces uncertainty and lowers entropy. Entropy can therefore be used as an inverse measure of concentration. As an illustration, Figure 1.7 shows the Shannon entropies of our sample of equity indices. As expected, the SMI has the lowest entropy of the group, while the equally weighted version of the S&P 500 has the highest.

Portfolio Size, Weights and Entropy-based Diversification

21

Bloomberg European 500 SP 500 EQW SP 500 Hang Seng Nikkei SMI FTSE 100 DAX CAC 40 Eurostoxx 50

Shannon entropy 0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

Figure 1.7. Shannon entropies for various equity indices

The Shannon entropy satisfies properties 1–5 of section 1.2.1, but not property 6. Note that we can directly link Shannon’s entropy to the RHK measure, as lim

SE



[1.32]

As discussed previously, the Shannon entropy is inversely proportional to the level of concentration. To get a directly proportional measure of concentration, we may use a normalized version of the Shannon entropy, called the Theil [THE 67] entropy and calculated as follows: TE

Max SE

SE

ln



,

,

[1.33]

The result is bounded between 0 for an equally weighted portfolio and ln for a single-asset portfolio. It is related to the Hannah–Kay index by the following relationship: lim



ln

[1.34]

22

Portfolio Diversification

Note that, as the upper limit of the Shannon entropy changes with the number of assets, it is difficult to use it to compare the concentration level of portfolios of different sizes, particularly if they are well diversified. 1.3.2. Cross entropy and divergence measures Several applications of probability theory require an appropriate measure of the divergence between probability distributions. Many divergence measures have been proposed, extensively studied by various researchers, and applied to a variety of disciplines. Hereafter, we will just introduce one of them, which we will use later for portfolio construction purposes. If we have two sets of (long-only) portfolio weights, or equivalently, two probability distributions ,…, ′ and ,…, ′, the cross entropy between and is defined as

,













,

[1.35]

The last term, , , is often referred to as the relative entropy or as the Kullback–Leibler [KUL 51] divergence measure. It is defined as ,



[1.36]

Technically, the Kullback–Leibler divergence measure is a pseudo distance between the two probability distributions. It is commonly called the Kullback–Leibler distance because its value is non-negative and equals zero only if the two distributions are equal. However, we dislike the term “distance”, because it suggests that the Kullback–Leibler divergence has the properties of a metric on the space of probability distribution. This is not the case, because it is not symmetric and it does not obey the triangle inequality. Two other pseudo-distances worth mentioning are the Jeffreys [JEF 48] divergence and the Jensen–Shannon divergence introduced by Lin (91). The Jeffreys divergence is defined as follows: ,







[1.37]

Portfolio Size, Weights and Entropy-based Diversification

23

Although it is symmetric, the Jeffreys divergence is not a metric function either, because it does not fulfill the condition of triangle inequality. Related to the Jensen difference proposed by Rao [RAO 82, RAO 85] in a different context, the Jensen–Shannon divergence is based on Jensen’s [JEN 06] inequality and the Shannon entropy. It is calculated as the mean of the relative entropy of each distribution to the mean distribution, which is ,





[1.38]

The Jensen–Shannon divergence may be thought of as a symmetrized and smoothed variant of Kullback–Leibler divergence (smoothing is achieved by consideration of the midpoint distribution). It is bounded between 0 and 2 . Although it is not a metric either, it can be shown that it defines a true metric, or more precisely, that , is the square of a metric. If needed, we can show that the two distances related by the inequality ,

,

,

and

,

are

[1.39]

We will use these distances in section 2.5.5.3. 1.3.3. Other entropy functions While the Shannon entropy is probably the most widely known, there are numerous other entropy functions that have been suggested to measure information. Let us mention the most popular. 1.3.3.1. Renyi entropy The Renyi [REN 61] entropy is a generalization of Shannon’s entropy. The Renyi entropy of order 0 is defined as ln ∑

[1.40]

24

Portfolio Diversification

In the case of 0, we obtain the Hartley entropy. For → 1, using L’Hôpital’s rule, we can show that converges towards the Shannon entropy. For 2, we obtain the collision entropy. Renyi also introduced a divergence function of order 0, which extends the Kullback–Leibler divergence (relative entropy). Given two probability distributions, or equivalently, two sets of long-only portfolio weights, and , the Renyi divergence of order 1 is , In the case of measure.

ln ∑

[1.41]

→ 1, we obtain the Kullback–Leibler divergence

1.3.3.2. Kendall information and Tsallis entropy Per Ord and Stuart [ORD 94], Kendall defined the information content of a discrete probability distribution as ∑

[1.42]

but did not provide any justification for it. In the case of → 1, we obtain the Shannon entropy. Havrda and Charvat [HAV 67] used a similar quantity, which they called the structural -entropy. Tsallis [TSA 88] introduced a new form of non-additive entropy, now commonly referred to as the Tsallis entropy. It is defined as: 1 ∑

∑ ln



1



1

[1.43]

Surprisingly, the Tsallis entropy raised considerable controversy – see Cho [CHO 02] for a discussion. As an illustration, U.S. physicist and Nobel Prize winner Murray Gell-Mann stated that the Tsallis entropy was a true generalization of the Boltzmann–Gibbs entropy, while others consider it at best as a fitted model, where is a fitting parameter for systems that are not understood well enough.

Portfolio Size, Weights and Entropy-based Diversification

25

1.3.3.3. Hill’s effective numbers and the notion of diversity The Shannon entropy is a profound and useful diversity index, but its value captures the uncertainty in the distribution rather than its diversity10. The distinction between “entropy” and true “diversity” is not merely semantic – it is fundamental. Let us illustrate it with a simple example. Say a biologist has a sample and wants to understand the underlying diversity of its population. The entropy will give him an index, but it is generally a nonlinear function of the underlying diversity and is expressed in unknown units. Our biologist would prefer an estimation of the effective number of species in his sample. The latter is called pure diversity, while the entropy is a (nonlinear) index of diversity. Fortunately, entropies can be transformed into true diversities. Relying on Renyi entropies, Hill [HIL 73] introduced the notion of “effective numbers of species”, a.k.a. Hill numbers, and defined them as ∑

[1.44]

The parameter α is called the “order of the diversity”. It can take any value, but we are usually only interested in positive values. For α → 1, we obtain the exponential of Shannon’s entropy. lim



exp



,

,

[1.45]

which is a well-known quantity in biology. A few examples of other Hill numbers are given in Table 1.1. If we consider the above biology example, the units of Hill’s numbers would be species. For α 0, the diversity is the total number of species, regardless of their abundance (all species, rare or common, count equally). For α 1, we get the number of abundant species; for α 2, we get the number of very abundant species; etc. As α increases, less weight is placed on the less abundant species. In the context of a portfolio, we just need to replace “species” by “weights”.

10 An intuitive definition of “diversity” used in biology is the number of species in a population.

26

Portfolio Diversification

Hill number

Equivalent entropy

A 0

Max, or Hartley index

1

Shannon 1

2

Collision

… 1



Min, or Berger-Parker’s index

Table 1.1. Examples of Hill numbers and their equivalent entropy

Hill also suggested a measure of evenness, / with a b, but recommended not using because it is subject to a high sampling bias. However, this criticism does not apply in finance, where the total number of assets in the investment universe of a given investor can be considered as known.

3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0%

90% 15%

75% 30%

Weight of asset 1

60%

45%

45%

60%

30%

75% 90%

Weight of asset 2

15% 0%

Figure 1.8. Diversity measure for a fully invested portfolio of three assets. The diversity is maximal for an equally weighted portfolio

Portfolio Size, Weights and Entropy-based Diversification

27

Meucci [MEU 09] has recently suggested using in the context of a portfolio, and called it “diversity measure”. As expected, the diversity measure reaches its maximum when weights are uniformly distributed across assets. As an illustration, Figure 1.8 shows the diversity measure for a fully invested portfolio of three assets. The maximum diversity level is reached 3. when each asset’s weight is equal to 1/3, so that 1.3.3.4. Rao quadratic entropy Rao [RAO 82] defined quadratic entropy as the average difference between two individuals that would be randomly selected (with replacements) in a population. It is measured by ∑



,

[1.46]

where , is the pairwise distance between the i-th and j-th individuals. To apply the in practice, the pairwise distances , can be defined arbitrarily, with the only requirements that , 0, or , and , calculated by any distance function. We can show that if , 1 for all , reduces to the Simpson diversity index, which is the 1complement of the HHI. Rao quadratic entropy differs from other entropy measures by the fact it uses both the relative abundance of individuals (their probability) and their intrinsic difference ( , ). This is the source of its attraction and flexibility11. We will use it in section 4.4.3. 1.3.4. Entropy-based portfolio optimization and diversification When estimating a probability distribution for a given system, we should apply the Principle of Maximum Entropy, which states that the distribution that best represents the current state of knowledge (or lack of knowledge) is that with the largest entropy. Similarly, when very little is known about the risk and return characteristics of the assets in the investment universe, or when the information available is not considered as reliable or trustworthy,

11 In general, Rao quadratic entropy can be related to the Shannon entropy through a generalized version of the Tsallis parametric entropy – see Ricotta and Szeidl [RIC 06].

28

Portfolio Diversification

investors need to create a portfolio that bears the least assumptions and is maximally non-committal. Maximizing the portfolio weight entropy is an efficient way to do this while reflecting the fact that investors are missing information that would otherwise allow them to construct what we would regard as more “certain” portfolios. For instance, if we choose to maximize the Shannon entropy of a portfolio of assets, the corresponding optimization problem is: ∗

max



[1.47]

subject to 1

[1.48]

0 for all

[1.49]

With no additional constraint, this optimization program will deliver a trivial solution: an equally weighted portfolio. This is an intuitive result that comes from Laplace’s principle of insufficient reasons (a.k.a. principle of indifference). If we want to assign probabilities to states of a system and see no reason for one state to occur more often than any other, then the different states are assigned equal probabilities. In other words, being most uncertain means being closest to the uniform distribution. Similarly, if we know nothing about individual asset characteristics, then the best way to diversify is by doing an equal allocation. To avoid such a trivial answer, it is common practice to add either a minimum return or a maximum risk as an additional constraint to the optimization problem of equations [1.47–1.49]. We will discuss these approaches in section 2.5.5. Alternatively, we could try to maximize the Yager [YAG 95] entropy, which is defined as a distance between the effective portfolio weights and weights all equal to 1/ . Mathematically, the Yager entropy of order is ∑

/

[1.50]

where 1 is a constant. For 1, the Yager entropy is a linear function 1/ of the weights. For → ∞, the Yager entropy converges to max . Here again, constraints are needed when maximizing the Yager entropy

Portfolio Size, Weights and Entropy-based Diversification

29

to avoid ending up with the same trivial solution of an equally weighted portfolio. Another possible approach suggested by Wang and Parkan [WAN 05] is to minimize the maximum distance between weights. This is called the min–max disparity model, defined as follows:

∈ ,..,

|

|

[1.51]

subject to the usual full investment and positive weights constraints, plus any additional required constraint to avoid ending up with a trivial solution. The advantage of this model is that it can be linearized, so it can be solved by linear programming. 1.4. Conclusions diversification

on

pure

weights

and

entropy-based

The strength of portfolio diversification approaches based exclusively on weights or weight entropies is their parsimony in terms of assumptions. Apart from two very basic ones (full investment and long only), there are no requirements of normally distributed returns or complex parameters to be estimated. This makes their results very robust and often very intuitive. Entropy has the additional advantage of being able to measure risk as well as to describe distributions, which ensures full consistency. Unfortunately, these strengths are also weaknesses, as financial assets are usually tagged with additional information that would be useful to create better-diversified portfolios, but cannot be captured by weights or their entropies. As an illustration, let us consider the three technology-orientated portfolios described in Table 1.2. Which one of the three is the most diversified? Portfolio A has only five assets, with Apple representing 40% of it. Portfolios B and C both hold 10 assets which are equally weighted. From a size perspective and from a concentration ratio perspective, portfolios B and C seem more diversified than portfolio A. A simple HHI or Shannon entropy calculation would give the same answer, as illustrated in Table 1.3. Portfolios B and C seem to be identical from a quantitative perspective, but a qualitative assessment would rapidly reveal that portfolio C contains several non-U.S. companies, which are likely to increase its diversification compared with portfolio B.

30

Portfolio Diversification

Portfolio A

Portfolio B

Portfolio C

Apple Inc. (AAPL)

40%

American Express Company (AXP)

10%

Analog Devices Inc. (ADI)

10%

Alphabet Inc. (GOOG)

15%

Apple Inc. (AAPL)

10%

Glu Mobile Inc. (GLUU)

10%

Visa Inc. (V)

15%

Dell Inc. (DELL)

10%

Jabil Circuit Inc. (JBL)

10%

eBay Inc. (EBAY)

15%

Hewlett Packard Co. (HP) (HPQ)

10%

Micron Technology Inc. (MU)

10%

ADP LLC (ADP)

15%

Intel Inc. (INTC)

10%

Murata Manufacturing Co. Ltd (6981)

10%

Mastercard Inc. (MA)

10%

Nidec Corp. (6594)

10%

Microsoft Corp. (MSFT)

10%

Qualcomm Inc. (QCOM)

10%

Qualcomm Inc. (QCOM)

10%

Samsung Electronics (005930)

10%

Visa Inc. (V)

10%

STMicroelectronics (STM)

10%

Western Digital Corp. (WDC)

10%

Texas Instruments Inc. (TXN)

10%

Table 1.2. Constituents and weights of three hypothetical portfolios

Portfolio A

Portfolio B

Portfolio C

CR(1)

40%

10%

10%

CR(3)

70%

30%

30%

HHI

0.25

0.1

0.1

SE

1.50

2.30

2.30

Table 1.3. Measuring the diversification of our three hypothetical portfolios based on their constituent weights

Portfolio Size, Weights and Entropy-based Diversification

31

However, we may also start thinking in terms of ecosystems. To keep things simple, let us define an ecosystem as “competitors plus dependent suppliers plus dependent customers”. Then, each company in portfolio A is in its own ecosystem, and the weights of each ecosystem are equal to the weights of their single representative company. In portfolio B, there are essentially two ecosystems, one made by all the pure technology companies (with a total weight of 70%) and the other made by financial services corporations (with a total weight of 30%). Portfolio C is composed of companies that are all in Apple’s top suppliers. They belong to the same ecosystem which has a 100% weight. In the new ecosystem perspective, portfolio A would be considered as the most diversified, as illustrated in Table 1.4. Portfolio A

Portfolio B

Portfolio C

Technology ecosystem 70% 5 different ecosystems

Fin. Services ecosystem 30%

1 unique ecosystem

Table 1.4. Measuring the diversification of our three hypothetical portfolios based on the weights of their ecosystems

Portfolio A Portfolio B Portfolio C CR(1)

40%

70%

100%

CR(3)

70%

100%

100%

HHI

0.25

0.58

0

SE

1.50

0.6

2.30

Table 1.5. Measuring the diversification of our three hypothetical portfolios based on the weights of their ecosystems

32

Portfolio Diversification

It should be clear from this simple example that portfolio diversification is not always as simple as it seems. Using weights and explicit attributes does help, but implicit attributes and a qualitative context are also meaningful and should not be ignored. Therefore, in the next chapters, we will explore models that rely not only on weights, but also on additional information to build diversified portfolios.

2 Modern Portfolio Theory and Diversification

Portfolio construction and diversification were for a long time more of an art than a science. Investors were intuitively aware of the notion of return and risk, but had no mathematically consistent framework to model and build portfolios. In addition, the question of the underlying common characteristic along which some assets may be diverse had never been formally addressed. Thus, there had been no analysis on how to measure the benefits of diversification with respect to this characteristic. Markowitz [MAR 52] was the first to formalize the measurement of portfolio risk and return in a mathematically consistent framework, which he subsequently expanded in Markowitz [MAR 56, MAR 59]. Acknowledging that measuring portfolio risk and portfolio return was only the first step, Markowitz introduced a methodology for assembling portfolios that considers the expected returns and risk characteristics of the underlying assets as well as the investor’s appetite for risk. The result, usually referred to as the modern portfolio theory, pushed portfolio construction toward a science and away from being an art1.

1 Twelve years before Markowitz, de Finetti [DE 40], an Italian mathematician and actuary, published a paper in an actuarial journal that introduced a mean-variance approach similar to that of Markowitz to solve financial problems under uncertainty. Unfortunately, this paper was written in Italian and went unnoticed for several years. Markowitz [MAR 06] nevertheless acknowledged de Finetti’s merits in applying the mean-variance approach to Finance.

34

Portfolio Diversification

2.1. The mathematics of return and risk Markowitz’s core contributions to the world of finance can essentially be summarized as follows: (i) modeling returns as random variables and using their variance as a measure of risk; (ii) providing a formula to calculate the expected return and the variance of a portfolio from the expected returns and co-variances of its components2 and (iii) introducing an optimization framework to build efficient portfolios. 2.1.1. Modeling returns as random variables Let us assume we are in a single-period model. That is, there is only one time period, which starts at time 0 and ends at time . The investment universe is made of 2 risky assets, which are infinitely divisible and can be traded at time 0 in any fraction, without taxes or transaction costs. The rates of return on the assets from time 0 to time are given by the N 1 random column vector: :

[2.1]

where denotes the (random) return of asset number over the period in question. Initially, we will assume that all returns are jointly normally distributed and their joint distribution is completely characterized by their expected value: : where

[2.2]

, and their ⋮ ,

covariance matrix: ⋯ ⋱ ⋯

,



[2.3]

2 Markowitz [MAR 93] acknowledges that this formula was unknown to him until he discovered it in a statistical textbook by Uspensky [USP 37].

Modern Portfolio Theory and Diversification

35

The non-diagonal elements , − − of denote the covariance terms between asset and asset returns. The , terms can be expressed as a function of the correlation , between asset and asset , and their respective standard deviations as , = , . The diagonal elements − of denote the variance of asset returns. It is , important to note that the quantities in the and matrices are not necessarily the moments of a data-generating process, but may be those of a subjective distribution if the investor uses his/her subjective probabilities of various states to define these quantities – see Ingersoll [ING 87]. To be valid, a covariance matrix must be symmetric, as the covariance between and is the same as that between and ( , , ). It must also be positive semi-definite, or, equivalently, all its eigenvalues are nonnegative. For the sake of simplicity, we will assume that there are no redundant assets, that is, no asset return can be obtained as a linear combination of the returns of other assets. This, added to the fact that all assets are risky, implies that the covariance matrix is non-singular. Occasionally, we will use the correlation matrix of asset returns instead of the covariance matrix. The correlation matrix is defined as: 1

,

,

1 ⋮

,

,



⋯ ⋯ ⋱ ⋯

, ,

⋮ 1



[2.4]

We will use the operator to generate a diagonal matrix from a vector so that the diagonal is filled with the vector elements and all nonto diagonal entries are set to zero. We will also use the operator extract the diagonal of a matrix into a vector, and denote by:



[2.5]

36

Portfolio Diversification

the vector of the various assets’ volatilities, and by: 0 0 ⋮ 0

⋮ 0

⋯ ⋯ ⋱ ⋯

0 0 ⋮

[2.6]

the diagonal matrix created from the assets’ volatilities. By construction, we have: /



[2.7]

and: [2.8] 2.1.2. Portfolio return and risk statistics In the above-described framework, investors create their portfolio by allocating capital among the assets at time 0. As we are in a single-period model, we will assume that (i) investors must hold their portfolio without any change until time T and (ii) investors are concerned about the return on their portfolios at the end of the period. We will represent a portfolio of these assets by the 1 column vector: :

[2.9]

denotes the fraction of total capital initially held in where each element asset . Occasionally, we will use the following matrix: 0 0 ⋮ 0

⋮ 0

⋯ ⋯ ⋱ ⋯

0 0 ⋮

[2.10]

Modern Portfolio Theory and Diversification

37

By construction, all portfolio weights must sum to 100%, which can be expressed as: ∑

1

[2.11]

where is a 1 column vector with each element equal to 1. Negative asset weights represent short sales, where the investor receives today’s asset price and must pay the then current price in future. Initially, we will also assume that all weights are positive or, equivalently, that shorting is not available. It is relatively easy to quantify the return and the variance of a portfolio from those of its components. Indeed, the return of a portfolio is just a weighted average of the returns of its components: ∑

[2.12]

Similarly, the expected return of a portfolio is a weighted average of the expected returns of its components: ∑

[2.13]

However, because of diversification benefits, the risk of a portfolio is generally not a weighted average of the risk of its components. For instance, can easily be calculated using the weights and the variance of a portfolio covariance of its components. Mathematically: ∑



,

[2.14]

Equation [2.14] is one of the cornerstones of portfolio diversification. When weights are constrained to be positive, it shows that the variance of a portfolio is linearly and positively related to the covariance of its components or, equivalently, to their correlations. To reduce the variance of a portfolio, we should seek to identify assets with low or even negative covariance. In addition, equation [2.14] shows that the variance of a portfolio is nonlinearly related to the weights of its various components. To reduce the variance of a portfolio, we should seek to find an optimal

38

Portfolio Diversification

combination of assets. Segregating the cases where from the cases where , it is possible to rewrite equation [2.14] as follows: ∑





,

[2.15]

Equation [2.15] shows that the portfolio risk as measured by its variance is made of two components: (i) a risk associated with the pure variance terms of the returns of the individual assets and (ii) a risk associated with the − 1 pure covariance terms between the returns of the individual assets. As gets larger, the number of covariance terms largely surpasses the number of variance terms to the point that the overall contribution of the variance terms becomes negligible compared to that of the covariance terms. This seems to suggest that pure variance risk can be reduced and even eliminated through appropriate diversification, while covariance risk remains in the portfolio. 2.2. Modern Portfolio Theory (MPT) Now that we have introduced the analytical framework to measure the risk and return of a portfolio from those of its components, we can move forward and start thinking about portfolio construction. 2.2.1. Maximizing utility In a single-period model, how should a rational, risk-averse investor build his/her portfolio? To keep things simple, we will assume that the investor’s is entirely invested in his/her portfolio. His/her terminal initial wealth wealth will therefore be equal to 1 , where is the (random) return of his/her portfolio. Following von Neumann and Morgenstern [VON 47], we will assume that our investor derives a certain utility from his/her terminal wealth or, equivalently, a certain utility from the return on his/her . A rational investor should always choose portfolio3, denoted by

3 A utility function is a cardinal object that converts wealth outcomes (or, equivalently, portfolio returns) into a subjectively perceived value, that is, investors’ satisfaction or happiness.

Modern Portfolio Theory and Diversification

39

his/her initial portfolio weights ∗ to maximize the expected value of his/her terminal utility. This is equivalent to solving the following optimization problem: ∗

arg max

[2.16]

subject to: 1

[2.17]

This optimization problem was well known by economists prior to Markowitz4, but it had little applicability in practice, mostly because (i) nobody knew how to measure the portfolio risk and return from the risk and return of its underlying assets; (ii) the choice of an appropriate utility function for a given investor was somewhat subjective and (iii) in the 1950s, it was frequently argued that the expected utility maximization was computationally too cumbersome to be implemented5. 2.2.2. Optimal and efficient portfolios To simplify the full-scale optimization problem, Markowitz [MAR 59] asserted that if can be approximated closely enough by a quadratic could also be approximated by some function of the function, then portfolio’s mean return and variance. Using a second-order Taylor expansion of around the expected portfolio return and assuming higherorder terms are negligible, we have: U

[2.18]

4 The basic ideas behind the utility theory date back to the 18th Century and Bernoulli [BER 54]. The expected utility model was formally developed by von Neumann and Morgenstern [VON 47], who proved that it was a rational criterion for decision-making under uncertainty. However, they viewed it as a side note in the development of the game theory. 5 The idea of explicit utility maximization for portfolio choice has recently resurfaced in the “full-scale optimization” approach introduced by Cremers et al. [CRE 05], which uses the entire historical return distribution in the optimization problem rather than some of its summary statistics. Solving for an optimal solution still relies on numerical optimization and dynamic programming, but it is computationally feasible today.

40

Portfolio Diversification

where a prime denotes differentiation. Unfortunately, this expression still requires an explicit specification of the investor’s utility function. To be able to solve the portfolio selection problem analytically, Markowitz suggested using a simple objective utility function that explicitly includes a trade-off between the risk ( ) and the return ( ) of the portfolio, for instance, a quadratic utility function: −

[2.19]

where 0 denotes the investor’s risk aversion coefficient. A small value of indicates low risk aversion, with 0 being the extreme case of a riskneutral investor, who only cares about expected returns. Increasing the value of puts more weight on risk and therefore represents an investor with a higher degree of risk aversion. In practice, is often set somewhere between 2 and 4. With such a quadratic utility function, the investor’s optimization problem becomes: ∗

max





[2.20]

subject to: 1

[2.21]

We will refer to equations [2.20] and [2.21] as the mean-variance utility optimization problem. In the absence of additional constraints, the optimal weights can be found in closed form by using Lagrange multipliers: ∗



[2.22]

, the inverse of the risk aversion coefficient, is called the where investor’s risk tolerance coefficient. The mean-variance utility optimization problem can be further simplified if rational, risk-averse investors only hold efficient portfolios. Markowitz defined an efficient portfolio as one that has minimal variance for a given target return level or maximum return for a given variance level. This allowed him to introduce two alternative simplified quadratic optimization

Modern Portfolio Theory and Diversification

41

procedures. The first one corresponds to minimizing the risk of the portfolio (as measured by portfolio variance) subject to a target expected return level. It requires the following problem to be solved: ∗

min

[2.23]

subject to: ∗

[2.24] [2.25]

where ∗ denotes the target return for the portfolio, which is given exogenously. In practice, individual investors will choose ∗ in accordance with their expectations. We will refer to equations [2.23]–[2.25] as the mean-variance optimization problem. It is a convex quadratic programming problem, because the objective function is convex and all the constraints are linear functions of the decision variables. In the absence of additional constraints on weights, the optimal solution can be found in a closed form by using Lagrange multipliers. The optimal weights are: ∗





[2.26]

with , and . The set of all meanvariance efficient portfolios can be obtained by varying the target return for the portfolio. The result is a parabolic curve in the variance-return space, given by: ∗







[2.27]

Equivalently, when the portfolio return is plotted against the portfolio volatility, the set of minimum-variance portfolios is a hyperbolic curve, as illustrated in Figure 2.1. It is important to note that as most investors are not allowed to sell short, it is common practice to add a non-negative weight constraint: 0 for all i

[2.28]

42

Portfolio Diversification

In such a case, no closed-form solution exists, but we can solve for optimal weights using numerical quadratic programming methods – see, for instance, Best and Grauer [BES 90].

Figure 2.1. Set of minimum-variance portfolios obtained by minimizing the variance of the portfolio subject to a target expected return level

Alternatively, the MV optimization problem has an equivalent dual representation, in which the investor seeks to find portfolios that maximize portfolio expected return for a given level of risk as measured by portfolio variance. The corresponding problem is: ∗

max



[2.29]

subject to: ∗

1

[2.30] [2.31]

where ∗ denotes the target variance for the portfolio. Here again, by varying the target variance for the portfolio, all mean-variance efficient portfolios can be obtained.

Modern Portfolio Theory and Diversification

43

Figure 2.2. Set of minimum-variance portfolios obtained by maximizing the return of the portfolio subject to a target volatility level

In practice, the mean-variance optimization problem of equations [2.23]– [2.25] is the most frequently used. This is due to computational conveniences and the fact that investors are usually more willing to specify target expected returns rather than target risk levels. In the following sections, for the sake of simplicity and unless otherwise specified, we will therefore consider this problem when discussing modern portfolio theory. 2.3. Empirical applications Applying the Markowitz framework to build efficient portfolios seems relatively simple and essentially requires two steps. The first step, which we will illustrate in this section, is essentially qualitative and intuitive. It consists of selecting an investment universe made of assets, ideally with low or even negative correlations. We will call it the quest for uncorrelated assets. The second step, which we will further discuss in section 2.4, is exclusively quantitative. It consists of using an optimizer to identify the optimal weights.

44

Porttfolio Diversifica ation

2.3.1. Diversificat D tion at the asset level In most m investm ment universees, the spreaad between the t best- andd worstperform ming assets iss massive. Picking P a losser is therefoore one of thhe major risks off investing, particularly p i a highly concentrated in c portfolio. C Consider, for instaance, the 5000 largest com mpanies in thee United Staates as represented by the S&P P 500 index. Given the naature of these companiess, we could thhink that their shhares represeent a relativvely safe inv vestment. Unfortunately U y, this is fundamentally flaw wed, as illuustrated in Figure 2.3,, which shoows the urns for the years y 2014, 22015 and distribuution of thesee companies’’ annual retu 2016. Every E year, thhe left tail of o that distrib bution includdes stocks w with large double-digit losses. Of course, we could arrgue that there are also sstocks in ouble-digit gains g – see Table 2.1. the righht tail of that distribution with large do Howeveer, picking winners w and avoiding a loseers is not an easy game. There is no systeematic way to identify in advance wh hich stock wiill outperform m others, and beinng wrong uppfront can turrn out to be extremely e costly. Concenntrating a portfolioo into a sinngle stock or o even a feew stocks will w thereforee expose investorrs to very large l variatiions in retu urns. By contrast, a divversified portfolioo allocating to all those stocks in a given g univerrse (such as tthe S&P 500, or the S&P 5000 Equally Weighted W Ind dex) would have h delivereed much more staable results – see Table 2.1. 2

Figure 2.3. Distribution off the annual re eturns achieve ed by the 500 s in the United U States in n 2014, 2015 and 2016 largest stocks

Modern Portfolio Theory and Diversification

2014 Indices S&P 500 S&P 500 EWI Best Performing Southwest Airlines Electronic Arts Edwards Lifesciences Avago Technologies Allergan Mallinckrodt Delta Air Lines Royal Carib. Cruises Keurig Green Mountain Kroger Worst Performing Transocean Denbury Resources Noble Corp. Genworth Financial Avon Products Ensco Range Resources Freeport-McMoran QEP Resources Mattel

+13.7% +14.5% +126.3% +104.9% +93.7% +93.1%

2015 Indices Indices S&P 500 +1.4% S&P 500 S&P 500 EWI −2.2% S&P 500 EWI Best Performing Best Performing Netflix +134.4% NVIDIA Amazon.com +117.8% Oneok +94.1% Freeport Activision Blizzard McMoRan NVIDIA +67.1% Newmont Mining

+91.6% Cablevision Syst. +89.5% Hormel Foods +80.5% VeriSign +76.9% Reynolds American +76.8% Starbucks +64.8% First Solar Worst Performing −59.6% Chesapeake Energy −49.6% CONSOL Energy −46.4% Southwestern Energy −45.3% FreeportMcMoRan −44.5% Fossil Group −43.9% Kinder Morgan −36.5% Micron Technologies −35.8% NRG Energy −33.9% Marathon Oil −32.1% Murphy Oil

45

2016 +12.0% +14.8% +223.9% +132.8% +94.8% +89.4%

+57.9% Applied Materials +54.5% Quanta Services +53.3% Spectra Energy +48.7% Comerica

+72.9%

+48.2% Martin Marietta

+62.2%

+72.1% +71.6% +62.8%

+48.0% Idexx Labs. +60.8% Worst Performing −76.8% Mallinckrodt −33.2% −76.9% Ilumina −73.8% Alexion Pharma.

−33.3% −35.9%

−70.1% Stericycle

−36.1%

−66.8% Under Armour −62.7% Vertex Pharma. −59.6% Perrigo

−38.9% −41.5% −42.5%

−57.2% Tripadvisor −54.1% First Solar −53.9% Endo International

−45.6% −51.4% −74.1%

Table 2.1. Best- and worst-performing stocks in the United States in 2014, 2015 and 2016

Besides avoiding the embarrassment of having selected the worstperforming asset(s), diversification allows investors to reduce their portfolio

46

Portfolio Diversification

volatility. As seen with equation [2.15], portfolio volatility depends on the correlation coefficients of its components. The lower the correlation between the assets, the greater the risk reduction that can be derived at the portfolio level. As an illustration, consider an investment universe limited to two assets with identical volatility (say 20% p.a.). Figure 2.4 represents the volatility of the portfolio combining these two assets for various levels of correlation. Lower correlation levels result in lower portfolio volatility. In the extreme case of two perfectly negatively correlated assets, it is even possible to create a portfolio that has zero volatility. 25%

Volatility of the two-asset portfolio Correlation = 1

20% Correlation = 0.5 Correlation = 0.0

15%

Correlation = -0.5 10%

5% Correlation = -1 Weight of asset 1 0% 100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Figure 2.4. Risk reduction gains from diversifying across two assets vary with the correlation between them

Of course, the more assets are included in the universe, the more likely it will be to find assets with low correlations. If their returns are high enough, these lowly-correlated assets will likely be selected by the portfolio optimizer. To get a sense of the potential diversification benefits of an investment universe of assets, a simple indicator6 that may be considered is their average correlation coefficient, which is calculated as: ∑



,

[2.32]

As an illustration, Table 2.2 shows the average correlation between every possible pair of stocks in four European equity indices (the FTSE 100, the 6 More advanced indicators are discussed in section 2.3.6.

Modern Portfolio Theory and Diversification

47

CAC 40, the DAX and the SMI) in March 2016. On this basis, it seems that the FTSE 100 is the universe with the lowest average correlation and thus the largest potential diversification benefits. However, we should be cautious with that interpretation, because an optimizer will not necessarily invest in all the stocks of the universe. In an extreme scenario, an optimizer may only select two stocks, and the correlation between these two stocks might be quite different from the average correlation between the assets. FTSE 100 0.36

CAC 40 0.54

DAX 0.67

SMI 0.50

Table 2.2. Average correlation between every possible pair of stocks within various European equity indices, as of March 2016

In practice, however, it is unusual to see investors apply the Markowitz framework at the single-asset level, especially given that as the number of assets increases, the number of correlation coefficients to be estimated increases greatly. Investors therefore attempt to reduce the dimensionality of their correlation matrix by grouping assets with supposedly similar characteristics into sub-portfolios, and then they apply the Markowitz framework between the sub-portfolios. This can be done qualitatively or quantitatively. We will illustrate it qualitatively in the next sections using some of the most commonly used grouping criteria, and quantitatively in section 5.2.6. 2.3.2. Diversification by industry sector and/or by country Although there is no official standard for such a classification, grouping assets by industry sectors sounds relatively intuitive7. Companies within the same industry sector are expected to be highly correlated, because they are influenced by similar macro drivers of revenue and profit growth. By contrast, companies from different industry sectors should be less correlated and perform differently. As an illustration, Table 2.3 shows the annual performances achieved by stocks from various sectors in the United States since 2007. On average, the difference between the best- and worstperforming sectors has exceeded 30% per year. In addition, they seem to change every year, making it difficult to predict losers and winners beforehand. Combining different sectors in a portfolio, be it equally weighted by sectors (“Equal Sector”) or weighted by market capitalization 7 In general, companies active in multiple sectors are classified per the largest and/or highest revenue producing underlying business, which rarely changes.

48

Portfolio Diversification

(“S&P 500”), is therefore an efficient way to enjoy some portfolio diversification and reduce portfolio risk. This is further confirmed in Table 2.4, which shows the correlations between the various sectors in the United States over the period 2015–2016, based on the Select Sector SPDR exchange-traded funds. The largest correlation (0.82) is observed between the Industrial and the Technology sectors, whereas the smallest one is between Financials and Utilities (−0.10). The average correlation between all sectors is only 0.31. 2007

2008

2009

2010

2011

2012

Energy

Cons. Stap.

Tech.

Indus.

Utilities

Financ.

36.3%

−15.0%

50.9%

27.8%

19.5%

28.5%

Cons. Discr. 27.5%

Cons. Stap. 14.0%

Cons. Discr. 23.6%

Mater.

Health.

Mater.

22.0%

−23.2%

48.5%

Utilities

2014

Energy

28.6%

2015 Cons. Discr. 9.9%

Health.

Health.

Health.

Financ.

41.2%

25.2%

6.9%

22.7%

Cons. Stap. 6.8%

19.9%

Tech.

Mater.

5.6%

16.6%

S&P 500

Utilities

1.4%

16.0%

Utilities

Utilities

19.1%

−29.1%

+46.3%

15.2%

Cons. Discr. −33.4%

Cons. Discr. 41.2%

Cons. Stap. 14.5%

Equal Sector −35.4%

Equal Sector 27.5%

Indus.

S&P 500

S&P 500

13.2%

−37.0%

26.5%

Equal Sector 15.9%

Energy

Indus.

S&P 500

−38.8%

22.6%

15.1%

2.7%

14.9%

Equal Sector 30.8% Cons. Stap. 26.3%

Equal Sector 13.5%

Tech.

Equal Sector 10.6%

S&P EWI S&P EWI

2013 Cons. Discr. 42.7%

+21.9% Energy 21.8% Mater. 20.6%

Health.

S&P EWI

Indus.

Tech.

12.4%

17.7%

40.4%

17.8%

Health.

S&P EWI

17.5%

36.1%

Cons. Stap. 15.9%

S&P 500

Financ.

Financ.

35.4%

15.0%

Cons. Discr. 6.0% Equal Sector 3.2%

16.0%

Energy

Tech.

3.0%

15.5%

Tech.

Indus.

28.0%

Indus.

14.5%

Equal Sector −1.3%

14.8%

S&P 500

Financ.

S&P EWI

13.7%

−1.6%

14.8%

S&P 500 S&P EWI 32.4%

2016

Tech.

7.1%

−38.9%

21.6%

Cons. Stap. 13.8%

2.1%

Equal Sector 14.6%

−2.2%

Equal Sector 14.0%

S&P 500

S&P EWI

Health.

Financ.

S&P EWI

Mater.

Energy

Indus.

Indus.

S&P 500

5.5%

−39.7%

19.8%

11.9%

−0.1%

14.7%

26.2%

10.5%

−4.3%

12.0%

Health.

Indus.

Energy

S&P 500

S&P EWI

S&P EWI

Tech.

Financ.

Tech.

Indus.

1.5% Cons. Discr. −13.4%

−41.4%

11.4%

−1.0%

Utilities

Mater.

Energy

−44.0%

17.5% Cons. Stap. 14.2%

Cons. Stap. 10.7%

5.3%

−11.0%

5.2%

25.8%

7.3%

−8.6%

Cons. Stap. 5.0% Cons. Discr. 5.9%

Financ.

Financ.

Utilities

Health.

Financ.

Utilities

Utilities

Energy

Energy

Health.

−18.8%

−55.2%

11.4%

3.3%

−17.2%

1.1%

13.0%

−8.6%

−21.5%

−2.8%

Mater.

26.0%

Cons. Discr. 9.5%

Mater.

Mater.

Tech.

Utilities −4.9% Mater.

Table 2.3. Sector returns for the 500 largest companies in the United States, as represented by 10 Select Sector SPDRs ETFs. The Equal Sector portfolio is rebalanced annually. The S&P 500 Equally Weighted Index (EWI) is rebalanced quarterly

Modern Portfolio Theory and Diversification

XLY

XLP

XLE

XLF

XLV

XLI

XLB

49

XLK

Cons. Discr. (XLY) Cons. Staples (XLP)

0.74

Energy (XLE)

0.13

0.05

Financials (XLF)

0.52

0.43

0.09

Healthcare (XLV)

0.17

0.07

0.50

0.04

Industrials (XLI)

0.21

0.08

0.69

0.10

0.72

Materials (XLB)

0.72

0.59

0.20

0.51

0.12

0.22

Technology (XLK)

0.23

0.14

0.59

0.06

0.73

0.82

0.20

Utilities (XLU)

0.01

0.12

0.28

−0.10

0.33

0.40

−0.01

0.41

Table 2.4. Correlation matrix between nine sectors in the United States for the period 2015–2016. Each sector is represented by the corresponding Select Sector SPDR exchange-traded fund

In an international context, an alternative criterion that can be used to regroup assets is their respective countries8. Interestingly, this grouping is also supported by the same two arguments – performance and diversification. On the performance side, none of the major equity markets has consistently outperformed or underperformed its peers on an annual basis. Predicting which market will be a top performer each year is very difficult, and investing in a single market may result in holding the worst-

8 The traditional practice was to assume that multinationals were located where their main office was. However, a company’s domicile or even the place of its listing does not necessarily provide a guide to its business orientation or investment opportunity. For example, Samsung earns only 10% of its revenues from Korea and McDonald’s only 33% of its revenues from the United States. More recently, classification schemes have therefore started prioritizing economic exposure over domicile.

50

Portfolio Diversification

performing one. To avoid that risk, investors can opt to hold a portfolio that is diversified across several countries. On the diversification side, companies in a specific country are expected to be exposed to the same domestic macro shocks (e.g. monetary and fiscal policy cycles, drop in local consumption, increase in tax rates and rise of unemployment) and should tend to have correlated returns. By contrast, companies from different countries may be exposed to different macro shocks and may be less correlated. Table 2.5 shows the correlation between equity markets from different countries. The highest correlations are found between France and Germany (0.94), the United Kingdom and Japan (0.93), Japan and Taiwan (0.93) and the United Kingdom and Taiwan (0.92). The smallest correlations are between Turkey and Japan (0.20). The average correlation between countries is 0.55. EZA EWQ EWG EWC EWU EWA EWJ EWY EWT EWZ EWW EIS S. Africa (EZA) France (EWQ) Germany (EWG) Canada (EWC) U.K. (EWU) Australia (EWA) Japan (EWJ) S. Korea (EWY) Taiwan (EWT) Brazil (EWZ) Mexico (EWW) Israel (EIS) Turkey (TUR)

0.69 0.66

0.94

0.68

0.68

0.64

0.46

0.5

0.47

0.43

0.68

0.66

0.64

0.72

0.44

0.30

0.27

0.27

0.24

0.93

0.29

0.76

0.68

0.66

0.70

0.43

0.74

0.29

0.46

0.38

0.37

0.38

0.92

0.43

0.93

0.48

0.69

0.55

0.51

0.64

0.37

0.58

0.25

0.63

0.38

0.75

0.65

0.62

0.68

0.47

0.65

0.32

0.72

0.47

0.70

0.5

0.62

0.60

0.52

0.39

0.59

0.26

0.55

0.33

0.44

0.51

0.63

0.56

0.54

0.51

0.33

0.46

0.2

0.55

0.31

0.52

0.55

Table 2.5. Correlations between various countries measured over the period 2015–2016. Each country is represented by an iShares MSCI exchange-traded fund

0.39

Modern Portfolio Theory and Diversification

51

Diversification by sector and by country has the advantage of being intuitive for investors, has proven relatively stable over time and has been extensively tested empirically. Interestingly, the academic literature has shown mixed evidence as to which of the two is most effective for reducing portfolio risk. While early studies advocate for country diversification, more recent ones find there is more diversification potential among industries than among countries9. Without getting into debate, we will provide a few empirical data points pertaining to the Eurozone. The latter is an interesting case to study with respect to diversification by sector and/or by country, for two reasons. First, the Eurozone introduced the Euro as a unique common currency in January 1999. We would therefore expect the correlation between those countries that adopted the Euro to go up, as they become more integrated economically. Meanwhile, we have no real expectations regarding the correlation between sectors. Second, some countries are now considering a possible exit from the Eurozone, and this may affect their future correlations with other countries, possibly in an inverse way to what happened when they entered it10. Table 2.6 shows the correlations between value-weighted11 portfolios of stocks of various Eurozone countries 7.5 years before and 7.5 years after the introduction of the Euro. Correlations that decreased over the period are highlighted in gray. Although the average correlation between these stock portfolios increased from 0.41 pre-Euro to 0.46 post-Euro, the evidence is mixed when studying country by country. For instance, highly correlated countries, such as France and Germany, saw their correlation increase from

9 See, for instance, Grubel [GRU 68], Levy and Sarnat [LEV 70], Solnik [SOL 74] and Rouwenhorst [ROU 99] for the former group, and Baca et al. [BAC 00], Cavaglia et al. [CAV 00], Kraus [KRA 01], Fratzscher [FRA 02] or Flavin [FLA 04] for the latter. 10 The case of the United Kingdom is a hybrid one, as it is in the European Union, wants to exit from it, but did not adopt the Euro. 11 For each country, the portfolio considered includes at least 60% of the highest market capitalization, a threshold chosen to avoid including less liquid small caps.

52

Portfolio Diversification

0.72 to 0.88. Italy and Luxembourg saw their correlations with all other countries increase. However, Austria and Ireland saw most of their correlations with other countries decrease. Table 2.7 shows the same data but from a sector perspective. Here again, the evidence is mixed. The average sector correlation decreased from 0.62 to 0.55 over the period. While sector correlations declined in 35 out of 45 cases, correlations rose in some sectors, for example, telecommunications and financial services. It therefore seems that the influence of the Euro on potential diversification benefits was not uniform across all sectors. AT AT

BG

FN

FR

BD

IR

IT

LX

NL

PT

ES

0.39

0.12

0.31

0.32

0.27

0.39

0.07

0.35

0.10

0.27

0.38

0.74

0.70

0.44

0.71

0.23

0.83

0.33

0.62

0.65

0.62

0.29

0.43

0.07

0.57

0.43

0.58

0.88

0.47

0.83

0.21

0.88

0.54

0.79

0.45

0.76

0.19

0.86

0.52

0.78

0.48

0.05

0.46

0.20

0.44

0.25

0.80

0.43

0.72

0.25

0.15

0.17

0.47

0.75

BG

0.47

FN

0.36

0.50

FR

0.49

0.56

0.47

BD

0.56

0.62

0.55

0.72

IR

0.40

0.47

0.47

0.47

0.52

IT

0.32

0.43

0.37

0.49

0.51

0.34

LX

0.05

0.04

0.04

0.04

0.05

0.03

-0.05

NL

0.51

0.67

0.55

0.66

0.75

0.55

0.47

0.01

PT

0.35

0.43

0.32

0.43

0.49

0.33

0.28

0.08

0.45

ES

0.50

0.51

0.47

0.60

0.65

0.48

0.51

0.02

0.64

0.54 0.45

Table 2.6. Correlations between equities of various countries, before and after the introduction of the Euro (AT = Austria, BG = Belgium, LX = Luxembourg, FN = Finland, FR = France, BD = Germany, IR = Ireland, IT = Italy, NL = the Netherlands, ES = Spain, PT = Portugal). “Pre-Euro” correlations are below the diagonal and use data from July 1992 to January 1999. “Post-Euro” correlations are above the diagonal and use data from January 1999 to July 2006. Correlations that decreased over the period are highlighted in gray

Modern Portfolio Theory and Diversification

BM BM

CG

CS

FIN

HC

IND

OIL

TEC

TEL

UT

0.78

0.61

0.75

0.40

0.73

0.48

0.46

0.34

0.57

0.75

0.82

0.54

0.78

0.48

0.63

0.51

0.64

0.76

0.45

0.82

0.33

0.66

0.66

0.55

0.59

0.76

0.50

0.58

0.54

0.69

0.36

0.45

0.34

0.27

0.54

0.40

0.72

0.62

0.53

0.24

0.11

0.49

0.66

0.36

CG

0.78

CS

0.76

0.83

FIN

0.71

0.79

0.72

HC

0.68

0.67

0.69

0.61

IND

0.78

0.82

0.80

0.79

0.68

OIL

0.48

0.61

0.56

0.52

0.54

0.58

TEC

0.61

0.71

0.64

0.65

0.57

0.72

0.48

TEL

0.40

0.48

0.48

0.46

0.37

0.43

0.38

0.41

UT

0.70

0.70

0.72

0.70

0.64

0.71

0.58

0.57

53

0.35 0.49

Table 2.7. Correlations between various sectors before and after the introduction of the Euro (BM = Basic Materials, CG = Consumer Goods, CS = Consumer Services, FIN = Financial Services, HC = Health Care, IND = Industrials, OIL = Oil & Gas, TEC = Technologies, TEL = Telecommunications, UT = Utilities). “Pre-Euro” correlations are below the diagonal and use data from July 1992 to January 1999. “Post-Euro” correlations are above the diagonal and use data from January 1999 to July 2006. Correlations that decreased over the period are highlighted in gray

Although most of the examples used so far were equity-based, the same logic is applicable to other asset classes. Consider, for instance, bonds. A simplistic view is that bonds are all related to interest rates and should therefore be highly correlated. The reality is more complex, as segments of the bond market behave differently. For instance, we may split bonds as a function of their time to maturity, issuer quality or collateral nature. Table 2.8 illustrates the correlation matrix between various segments of the bond market. The average correlation is 0.48, but the minimum is 0.03 and the maximum is 0.97, which suggests the potential for some diversification.

54

Portfolio Diversification

SHV

SHY

IEI

IEF

TLH

TLT

LQD

HYG

MUB

EMB

Short-Term T-Bonds (SHV) 1–3Y T-Bonds (SHY) 3–7Y T-Bonds

0.28

0.24

0.88

0.22

0.80

0.96

0.2

0.72

0.91

0.97

0.18

0.63

0.84

0.94

0.97

Invest Grade Corp. Bonds (LQD)

0.17

0.68

0.83

0.87

0.86

0.85

High Yield Corp. Bonds (HYG)

0.06

0.03

0.04

0.04

0.03

0.04

0.12

Municipal Bonds (MUB)

0.24

0.57

0.68

0.70

0.68

0.66

0.62

0.09

USD Emg. Mkts Bond (EMB)

0.08

0.18

0.19

0.17

0.15

0.13

0.36

0.28

0.23

0.27

0.69

0.81

0.82

0.78

0.73

0.76

0.08

0.62

(IEI) 7–10Y T-Bonds (IEF) 10–20Y T-Bonds (TLH) 20+Y T-Bonds (TLT)

MBS Bond (MBB)

0.26

Table 2.8. Return correlations for the various bond terms (as represented by iShares exchange-traded funds) over the period 2015–2016

2.3.3. Diversification by decile Introduced by Aranyi [ARA 67], decile range diversification is a systematic approach to portfolio diversification. It consists of scoring all assets following some predefined criteria and then dividing the universe into

Modern Portfolio Theory and Diversification

55

deciles on the basis of that score. For instance, using the market capitalization as a criterion, we could create 10 groups of stocks, where the first decile (D1) includes the 10% largest companies and the last decile (D10) includes the 10% smallest companies in the universe. For each decile, we would then create an equally weighted portfolio of its components and calculate its performance over time, as well as the correlation between the various deciles. Low correlations would highlight the pair of deciles that could result in large diversification benefits when mixed in a portfolio. As an illustration, Table 2.9 shows the correlation of U.S. stocks grouped in 10 decile-based equally weighted portfolios as a function of their past 60-day total return variance over the period January 2010 to October 2016, with monthly rebalancing. The correlation between the first and the last decile is relatively moderate (0.62), but the average correlation remains high (0.90). D1

D2

D3

D4

D5

D6

D7

D8

D9

D1

1.00

D2

0.91

1.00

D3

0.86

0.98

1.00

D4

0.83

0.97

0.98

1.00

D5

0.81

0.96

0.98

0.99

1.00

D6

0.80

0.95

0.98

0.99

0.99

1.00

D7

0.77

0.92

0.95

0.97

0.97

0.99

1.00

D8

0.75

0.91

0.95

0.96

0.97

0.98

0.99

1.00

D9

0.71

0.87

0.92

0.94

0.94

0.96

0.97

0.98

1.00

D10

0.62

0.78

0.82

0.84

0.86

0.88

0.90

0.92

0.95

D10

1.00

Table 2.9. Correlation of equally weighted decile portfolios based on the past total return variance of all U.S. stocks. D1 includes the securities with the lowest past total return variance, and D10 includes the securities with the highest past total return variance

56

Portfolio Diversification

2.3.4. Diversification by asset class Many investors chose to reduce the risk of significant declines in their portfolio by including exposure to multiple asset classes. They generally think that various asset classes respond differently to market conditions and changes in the economic environment. For example, a domestic economic event that triggers a decline in U.S. stock prices may result in an increase in domestic bond prices, while international equity prices may not react at all. As an illustration, Table 2.10 illustrates the correlation matrix between several asset classes. The average correlation is 0.18, the minimum is −0.19 and the maximum is 0.94. Clearly, some of these asset classes behave differently in the various phases of the business cycle. This means that combining them appropriately may create interesting diversification benefits. TIP TIPS (TIP) Gold (GLD) U.S. Invt. Grade Bonds (AGG) US$ Emg. Mkts Bonds (EMB) Commodities (GSG) REIT Index (VNQ) Internat. Real Estate (RWX) Emg. Mkts stocks (EEM) EAFE stocks (EFA) U.S. Small Cap (VB) U.S. Large Cap (VV) U.S. Mid-Cap (VO)

GLD

AGG

EMB

GSG

VNQ

RWX

EEM

EFA

VB

VV

0.00 −0.06

0.38

−0.03

0.14

0.31

0.94

0.00

−0.05

−0.02

−0.02

0.06

0.19

0.42

−0.03

−0.04

0.11

0.14

0.14

−0.04

0.05

0.00

0.03

0.07

0.17

0.00

0.04

0.79

−0.01

0.03

0.02

0.09

−0.01

−0.01

0.85

0.85

−0.04

−0.04

0.00

0.14

−0.03

0.06

0.7

0.74

0.79

−0.03

−0.01

0.00

0.11

−0.03

0.00

0.76

0.79

0.84

0.02

−0.18

−0.19

0.42

0.01

0.62

−0.03

0.00

−0.01 0.09 0.03

0.93

Table 2.10. Return correlations between various asset classes (as represented by exchange-traded funds) over the period 2015–2016

Modern Portfolio Theory and Diversification

57

2.3.5. International diversification and currencies From a pure risk perspective, the traditional case for international diversification relies on two components. The first component is the potential risk reduction benefits of holding foreign assets, which may have low correlations with domestic assets due to their exposure to different economic, political, institutional and even psychological factors. We have illustrated and discussed this in section 2.3.2. The second component is the potential risk reduction benefits of being exposed to foreign currencies, as investing internationally requires currency conversion at the beginning and at the end of the investment period, if only for performance calculation purposes. If these currencies display a low correlation to domestic assets, there may be some additional diversification benefits. To fully understand the effect of international diversification on portfolio risk, it is necessary to go over some portfolio construction mathematics. Let us start with a foreign asset, quoted and traded in a foreign currency. The return offered by this asset, from a domestic investor’s perspective, is given by: 1

1

1

[2.33]

or equivalently: [2.34] where is the asset return in the domestic currency (“domestic return”), is the asset return in the foreign currency (“local return”) and is the rate of change of the exchange rate of the domestic currency per unit of the foreign currency. The variance of the domestic return is: var ( RDom ) = var ( RFor ) + var ( E ) + var ( RFor RFX )

+ 2covar ( RFor , E ) + 2covar ( RFor , RFor E )

[2.35]

+ 2covar ( RFX , RFor E )

where . and . denote variance and covariance, respectively. Equation [2.35] shows that the volatility of the domestic currency return is composed of the volatility of the local currency return, the volatility of the exchange rate change and the volatility due to the interaction between the

58

Portfolio Diversification

domestic return and the exchange rate return, which is measured by the last is very small, as four terms. For practical purposes, the cross product it is expressed in percentages of percentages. It is thus often ignored, and we can write the following approximation12: [2.36] which yields: ,

2

[2.37]

The variance of the domestic return is due to the variance of the asset’s local returns, the exchange rate variance and the covariance between local returns and exchange rate returns. While it is obvious that the exchange rate variance contributes additional risk, as variance is always non-negative, the covariance term could be either positive or negative. This means that ultimately the variance of the domestic return could be higher or lower than the variance of the local return. This is illustrated for various equity markets over the period 2010–2016, in Table 2.11, using euro as a domestic currency. We can see that Japan and Switzerland have domestic returns with a lower variance than local returns, whereas the United States, the United Kingdom and Canada are in the opposite camp. France, Germany and Italy all used the euro over the period, and their domestic returns are identical to their local returns when the reference currency is the euro. Let us now consider the case of a portfolio made of investments. For the sake of simplicity, we will assume that each asset is quoted in its own currency – or stated differently, that there are currencies, which is a higher bound. We have: ∑

,

[2.38]

,

and: ,





,

,

,

[2.39]

12 An exact relationship can be derived using continuously compounded returns. For the sake of simplicity, we will continue to use simple returns and therefore work with the approximation.

Modern Portfolio Theory and Diversification

Returns (%.p.a.)

Volatility (% p.a.)

Market Return

Currency Return

Euro Return

Market Return

Currency Return

Euro Return

U.S. (USA)

12.8

4.5

17.9

15.5

9.5

16.1

Canada (CAN)

6.9

0.9

7.9

13.1

9.4

17.1

Japan (JAP)

9.9

1.1

11.1

21.0

11.9

20.8

U.K. (UK)

7.9

0.6

8.5

16.0

8.4

17.8

France (FR)

6.9

6.9

21.5

21.5

Germany (GER)

9.8

9.8

20.6

20.6

Italy (IT)

0.8

0.8

26.8

26.8

Switzerland (SWI)

6.7

11.7

15.9

4.7

59

11.1

15.0

Table 2.11. Characteristics of foreign equity returns for a Euro-based investor over the 2010–2016 period. The market return is in local currency, and the currency return is the change in the value of the foreign currency relative to the euro. The following set of indices has been used: S&P 500 (U.S.), S&P/TSX Composite Index (Canada), Topix (Japan), FTSE 100 (U.K.), CAC 40 (France), DAX (Germany), FTSE MIB Index (Italy), SMI (Switzerland)

Using the approximated definition of domestic return as given in equation [2.36], we can rewrite equation [2.39] as: ,



∑ ∑

2∑

,

,

∑ ∑

,

, ,

[2.40] ,

60

Portfolio Diversification

Therefore, the overall portfolio risk depends on the covariance between the local asset returns, the covariance between the exchange rate changes and the cross-covariance between the local asset returns and the exchange rate changes. Equation [2.39] can further be rewritten as: ∑ , ∑



,





,

2 ∑



,



,

,

,

,



[2.41]

Even if they are approximations, equations [2.40] and [2.41] show that exchange rate risk affects the overall portfolio risk in three ways: through its own volatility, through interactions between local market returns and exchange rate changes and through interactions among exchange rate changes themselves in cases where more than one foreign currency is involved. Overall, foreign exchange exposure can be a negative risk contributor if the covariance between the exchange rate changes and the cross-covariance between the local market returns and the exchange rate changes are negative with the following condition: ∑



, ∑

2∑



,

[2.42]

,

If this is true, then foreign exchange exposure is beneficial in terms of risk reduction. As an illustration, Table 2.12 shows the historical correlation between the returns expressed in their respective local currencies of our previous set of equity markets over the period 2010–2016. Table 2.13 shows the historical correlation between the corresponding currencies. Finally, Table 2.14 shows the historical correlation between the various equity markets and the corresponding currencies. The average (local) market return correlation is

Modern Portfolio Theory and Diversification

61

0.64, the average currency correlation is 0.42 and the average market/currency correlation is −0.19. If we rely on these average correlation numbers to assess international diversification benefits, it seems that most of the potential diversification gains could come from the low and even negative correlations between equity and currency returns. USA

CAN

JAP

UK

FR

GER

IT

SWI

U.S. (USA) Canada (CAN)

0.73

Japan (JAP)

0.60

0.37

U.K. (UK)

0.81

0.72

0.46

France (FR)

0.76

0.61

0.60

0.80

Germany (GER)

0.72

0.56

0.60

0.70

0.86

Italy (IT)

0.62

0.51

0.55

0.67

0.86

0.75

Switzerland (SWI)

0.65

0.43

0.55

0.62

0.67

0.58

0.58

Table 2.12. Correlations between various equity markets returns, in local currency terms

USD/EUR

CAD/EUR

JPY/EUR

GBP/EUR

CHF/EUR

USD/EUR CAD/EUR

0.55

JPY/EUR

0.68

0.43

GBP/EUR

0.57

0.45

0.30

CHF/EUR

0.35

0.21

0.41

0.29

Table 2.13. Correlations between currency returns (against the Euro)

62

Portfolio Diversification

USD/EUR

CAD/EUR

JPY/EUR

GBP/EUR

CHF/EUR

U.S. (USA)

−0.54

0.01

−0.60

−0.14

−0.20

Canada (CAN)

−0.43

0.05

−0.40

−0.15

−0.04

Japan (JAP)

−0.19

0.04

−0.61

0.15

−0.03

U.K. (UK)

−0.47

−0.05

−0.42

−0.34

−0.02

France (FR)

−0.35

−0.05

−0.49

−0.02

0.00

Germany (GER)

−0.19

0.10

−0.35

0.12

0.12

Italy (IT)

−0.40

−0.21

−0.51

−0.06

−0.07

Switzerland (SWI)

−0.18

0.05

−0.40

0.05

−0.30

Table 2.14. Correlations between currency returns (against the euro) and local equity market returns

These results are of course based on a small sample of arbitrarily selected countries and currencies, over one unique test period. However, they have generally been confirmed by a variety of academic studies over the past decades. It is therefore not surprising that several investors have chosen to consider currencies as a new separate asset class in their portfolios. That is, their currency allocation is no longer the consequence of their foreign asset selection, but is the result of an independent decision. The latter may be based on risk views, return views or both. 2.3.6. A note on average diversification benefits

correlations

as

a

proxy

for

As discussed in section 2.3.1, the average correlation between a universe of assets is a good proxy for their potential diversification benefits should they be combined in a portfolio. However, this measure does not consider weights. Intuitively, if an asset receives no allocation, its correlation does not matter in terms of diversification; on the contrary, if it receives a large

Modern Portfolio Theory and Diversification

63

allocation, its correlation matters more. What is therefore needed is a weighted-average correlation rather than a simple average. In practice, several approaches can be used to calculate it. Weighted average of all pairwise correlations: The brute force method consists of calculating the full correlation matrix for all assets and weighting each correlation by the weights of the corresponding assets in the portfolio. Of course, elements on the diagonal of the correlation matrix should be ignored, as they are all equal to one. The resulting formula is: ∑



,





[2.43]

When the portfolio is long-only, the weighted-average correlation is bounded: ≤

−1 ≤



−1

≤1

1/ for all ), the defined in equation

In the case of an equally weighted portfolio ( is equal to weighted average correlation , [2.32]. Its bounds become: −



,

[2.44]

≤1

[2.45]

is the most accurate measure of weighted cross-asset In practice, average correlation, but it has the highest computational complexity as its calculation requires weights plus − 1 /2 correlations, that is, 1 /2 inputs. For this reason, it is commonly used only for small portfolios. As soon as gets large, some approximations are made to reduce computational complexity. Implied average correlation (a.k.a. implied correlation index): This first approximation starts from the definition of the variance of a portfolio: ∑

2∑



,



[2.46]

64

Portfolio Diversification

and replaces the − 1 individual correlations , with an average crosssectional correlation coefficient . Equation [2.45] becomes: ∑

2





[2.47]

which can be inverted to obtain the average cross-sectional correlation: ∑ ∑



∑ ∑



[2.48]

It is important to note that equation [2.48] is related to a quantity called the Rayleigh quotient, which is used in some algorithms to obtain an eigenvalue approximation from an eigenvector approximation13. Volatility-based correlation proxy: The third approach assumes in addition that when the number of assets gets large, the term ∑ becomes very small and can therefore be neglected. Then, as an approximation, the average correlation between assets is equal to the squared ratio of the portfolio volatility to the average volatility of its components: ∑

We can show that as → ∞ and for each will converge toward .

[2.49] → 0, the result given by

Variance-based correlation proxy: Another approximation is the variance-based correlation proxy suggested by Bossu and Gu [BOS 04] in the context of pricing variance swaps: ∑

[2.50]

13 As a proxy for it, in the case of equity market indices, we may also use a forward-looking measure of the average correlation. For instance, in the United States, the Chicago Board Options Exchange disseminates the CBOE S&P 500 Implied Correlation Index under the ticker symbols ICJ, JCJ and KCJ. This index approximates the expected average correlation of the S&P 500 components by using the implied volatility of index options and the implied volatilities of options on the 50 largest stocks comprised in the S&P 500. However, in the presence of market dislocations, the CBOE S&P 500 Implied Correlation Index may take values greater than one. This occurred three times in November 2008.

Modern Portfolio Theory and Diversification

65

It should also be noted that the variance-based proxy of equation [2.50] is always lower than or equal to the volatility-based proxy of equation [2.49]. This property is a straightforward consequence of Jensen’s inequality: ∑

≤∑

[2.51]

2.4. Using MPT in practice: key issues Markowitz’s mean-variance portfolio optimization is a great conceptual framework, but it is not trivial to translate it into a satisfactory portfolio selection tool in a real-world environment. It is subject to approximation and estimation risk, and combining these two risks with an optimization process has dramatic consequences in terms of portfolio diversification. Simply stated, efficient portfolios will often lack diversification, in the most naïve sense of the word. 2.4.1. Approximation risk Although often forgotten, the optimal portfolio given, for instance, by equation [2.26] is based on an approximation of the investor’s original utility by a quadratic function of the portfolio return. How good is this approximation? Using a variety of simple utility functions and numerical illustrations, Levy and Markowitz [LEV 79b] and Markowitz [MAR 87] observed that the difference between a quadratic approximation and the explicit utility function is very small in practice. They therefore claim that if an investor chooses a mean-variance efficient portfolio, he/she will maximize his/her expected utility. However, more recently, Lhabitant [LHA 97] has seriously challenged these results on the following basis: – All the above-mentioned studies used simple utility functions, which have continuous higher-order derivatives. If investors have more complex utility functions, for instance, because they express preferences that do not admit continuous derivatives, the quadratic approximation fails miserably. – The assumption of a quadratic function is convenient because it eliminates all moments of the return distribution that are of higher order than the variance. However, assuming a quadratic utility function is fatuous because it implies increasing absolute risk aversion and satiation for large amounts of wealth, two features that are unrealistic. Using it as an

66

Portfolio Diversification

approximation for relatively low returns (or equivalently, a narrow range of wealth) might be valid, but it should not be used for investments with a potential for very high returns. – The utility function approximation may provide an exact result only if the distribution of the portfolio returns features spherical symmetry14 or if investors are indifferent to higher moments of the portfolio return distribution (i.e. they only look at the mean portfolio return and its variance). In all other cases, it remains an approximation and its convergence interval may be very small. It is interesting to note that even Markowitz [MAR 52] himself acknowledged that mean-variance optimization may not always be optimal and provided an illustration with an investor’s utility function based on mean return, variance and skewness. 2.4.2. Estimation risk A fundamental assumption of the Markowitz model is that the investor has perfect knowledge of two key inputs, the expected returns of the underlying assets ( ) and their variance-covariance matrix ( ). In practice, however, these two parameters are not known and must be estimated. Hereafter, we will denote by and their estimates. To obtain and , we must rely on the available information, which is usually limited and only consists of historical returns. Under the assumption that returns are normally and independently identically distributed, the maximum likelihood estimates of and are the sample mean and the sample covariance matrix . These are also the method-ofmoments estimates when returns are not normally distributed, but the i.i.d. assumption is replaced by weak stationarity (i.e. time-invariant mean and covariance). However, in practice, these assumptions on the behavior of asset returns are rarely verified. The presence of outliers and/or nonstationary parameters, particularly over long time periods, results in a noisy estimation of the future, which opens the door to estimation errors – see, for instance, Bengtsson and Holst [BEN 02] and Ledoit and Wolf [LED 04a, LED 04b, LED 04c]. 14 In this case, it rules out possible asymmetry in the return distribution of assets, which commonly occurs in practice.

Modern Portfolio Theory and Diversification

67

2.4.2.1. Estimating expected returns When estimating expected returns, the length of the time series directly affects the precision of the sample estimate. As an illustration, consider asset and say we have τ 1 years of historical data available with 1 subperiods of equal length per year to estimate its expected return and its expected volatility . As evidenced by Merton [MER 80], under the geometric Brownian motion framework15, we can show that the precision of the estimates is given by the variance of the estimators. For the expected return , we have: ̂ and for the standard deviation

[2.52] : [2.53]

It should be clear from equations [2.52] and [2.53] that: – The volatility of asset can be estimated much more precisely than its expected returns, as ̂ / 2m. When using daily data, we have 250 days per year, so the volatility estimation is 500 times more precise than the expected return estimation. – Increasing the frequency of the observations ( ) enhances the precision of the standard deviation estimates. At the hypothetical limiting case → ∞, there is no estimation risk for the standard deviation, but this is an unattainable limiting case in practice because the sampling frequency cannot go beyond transaction time. In addition, higher-frequency sampling comes with the cost of microstructure noise, which may result in biased estimates. Moreover, increasing the frequency of the observations ( ) has no impact on the expected return precision. In a sense, the return is a function of the beginning and ending value of an asset, and knowing how the value changed from the start to the end is irrelevant.

15 Goldenberg and Schmidt [GOL 96] discussed the case of more complex diffusion price models, their maximum likelihood estimators and their rate of convergence – or lack thereof – to the true expected return.

68

Portfolio Diversification

– For both estimates, increasing the length of the historical period increases the precision. At the hypothetical limiting case → ∞, there is no estimation risk, but for limited size samples, estimation risk can be quite large. As an illustration, Table 2.15 compares the size of the estimation risk for expected returns and for volatility in various scenarios. While the precision of the volatility estimator increases relatively rapidly, particularly when using higher-frequency data, the precision of the expected return estimator remains weak. Monthly data, Nb. years ̂

1

5

10

20

50

100

1000

25.0%

11.2%

7.9%

5.6%

3.5%

2.5%

0.8%

5.1%

2.3%

1.6%

1.1%

0.7%

0.5%

0.2%

Weekly data, Nb. years ̂

̂

25% . .

1

5

10

20

50

100

1000

25.0%

11.2%

7.9%

5.6%

3.5%

2.5%

0.8%

2.5%

1.1%

0.8%

0.5%

0.3%

0.2%

0.1%

Daily data, Nb. years

25% . .

25% . .

1

5

10

20

50

100

1000

25.0%

11.2%

7.9%

5.6%

3.5%

2.5%

0.8%

1.1%

0.5%

0.4%

0.3%

0.2%

0.1%

0.0%

Table 2.15. Precision of parameter estimates for a geometric Brownian motion sampled at different frequencies

As summarized by Merton [MER 80], “estimating expected returns from time series of realized return data is very difficult”, and this applies regardless of the estimator type. Goldenberg and Schmidt [GOL 10] tested various estimators on 59 years of daily data for the S&P 500 and concluded

Modern Portfolio Theory and Diversification

69

that it is not possible to efficiently estimate the expected return from this data set. In a sense, this also confirmed Black’s [BLA 93] intuition that “we need such a long period to estimate the average return that we have little hope of seeing changes in expected returns”. 2.4.2.2. Estimating the covariance matrix Estimating the covariance matrix and inversing it is also a source of concern. The first issue is that the sample covariance matrix is a biased and inefficient estimator of the true covariance matrix when measured using the intrinsic geometry of positive-definite matrices. The problem has been extensively discussed by Smith [SMI 05], but curiously seems to be ignored by the finance literature. The second issue is the undesirable properties potentially displayed by the sample covariance matrix when the number of assets ( ) is large in comparison with the number of historical observations ( ). As an illustration, say one wants to estimate the covariance matrix for the 500 components of the S&P 500 using 10 years of monthly data. This gives 500 500 250,000 terms to estimate using 500 12 10 60,000 data points. As summarized by Ledoit and Wolf [LED 04b], if , the sample covariance matrix is not of full rank and therefore not invertible, while we need its inverse to determine optimal portfolio weights. If , the sample covariance matrix is invertible. However, if is not significantly larger than , the sample covariance matrix becomes ill-conditioned. Inverting it then amplifies estimation errors when calculating optimal portfolio weights, and as discussed in the next section, this will result in highly concentrated portfolios. It is important to note that when asset returns are normally distributed, the estimate of covariance matrix has a Wishart distribution and its moments are known – see, for instance, Muirhead [MUI 05]. In this case, we still have uncertainty, but its form is known: there is an ellipsoidal constraint on the entries of the covariance matrix. This structure may be exploited to increase the speed of numerical algorithms.

70

Portfolio Diversification

2.4.3. Impact on portfolio diversification Approximation and estimation errors exist, but are they an issue? Do they significantly influence the diversification of the optimized portfolio? Unfortunately, the answer is positive. Optimized portfolio weights are extremely sensitive to the expected returns and covariance parameters. In other words, a small variation in the input parameters may lead to a drastic change in the optimized portfolios16. In a sense, the use of an optimizer amplifies approximation and estimation errors in only one direction, as it retains optimistic errors and discards pessimistic ones. Intuitively, if the variance of an asset is underestimated, its expected return is overestimated or its covariance with other assets is underestimated, the optimizer will like this asset and tend to assign a larger weight to it in the optimized portfolios. Thus, the risk of the estimated optimized portfolios will typically be underpredicted and its return over-predicted – see Michaud [MIC 89] or El Karoui [EL 09a, EL 09b]. In addition, optimized portfolios will tend to be highly concentrated in terms of weights and/or involve extremely large positions. If the optimizer is run without constraints, it will very often recommend extreme positive or negative weights in some assets. If it is run with a positive weight constraint, the problem is moderated but does not disappear; we often end up with a portfolio with zero holdings in most assets and very large weights in a few of them. Such portfolios go against the common intuition that weights in a well-diversified portfolio should be spread reasonably evenly across many assets, and thus the weight in any one asset should get smaller as the number of assets available grows. Let us illustrate this with a few examples of increasing complexity. EXAMPLE 2.1.– Consider two perfectly correlated assets with identical volatility, but 20% and 20.1% expected returns, respectively. A meanvariance optimizer facing these two assets will invest exclusively in the second one, due to the additional 0.1% expected return it seems to offer. Most of the time, this 0.1% additional expected return will be much smaller

16 See, for instance, Bawa and Klein [BAW 76], Jobson and Korkie [JOB 80], Jorion [JOR 85, JOR 86, JOR 92], Michaud [MIC 89], Best and Grauer [BES 91a, BES 91b, BES 92], Broadie [BRO 93] or Chopra and Ziemba [CHO 93].

Modern Portfolio Theory and Diversification

71

than the estimation error. A much better allocation – from a diversification perspective – would split the capital equally between the two assets, with a relatively immaterial impact on return. EXAMPLE 2.2.– Consider the three assets characterized by the following expected returns, volatility and correlations: Expected return

Volatility

Correlation matrix

Asset A

12.1%

15.9%

1.00

Asset B

15.5%

23.3%

0.57

1.00

Asset C

9.2%

7.0%

0.37

0.01

1.00

When targeting an expected return of 13.5%, a mean-variance optimizer will suggest investing in a portfolio made of 19.5% of asset A, 59.3% of asset B and 21.3% of asset C. However, if we change the expected return on asset A from 12.1% to 13.3%, the new optimal portfolio with the same expected return is made of 50.3% of asset A, 35.2% of asset B and 14.5% of asset C. Therefore, a small change in the input parameter (+10%) has a dramatic change on some of the portfolio weights (up to +160%). As discussed by Hurley and Brimberg [HUR 15], in this example, the linear system produced by the first-order conditions of the optimization program is ill-conditioned, and this results in an extreme sensitivity of the optimal portfolio weights. EXAMPLE 2.3.– We now consider an investment universe made of nine nonoverlapping asset classes, namely US Large Cap Growth, US Large Cap Value, US Small Cap Growth, US Small Cap Value, International Stocks, US Bonds, International Bonds, Commodities and Cash. Figure 2.5 shows the efficient frontier derived from these nine asset classes using historical data from 1995 to 2004. Figure 2.6 shows the asset allocation of various portfolios on the efficient frontier, moving from the lowest risk portfolio (TBills, left) to the maximum return portfolio (U.S. Small Cap Value, right). We can see that nearly half of the asset classes in the opportunity set are excluded from optimized portfolios, and that most of the time, the asset allocation is dominated by very large allocations to one or two asset classes.

72

Portfolio Diversification

18

Return (% p.a.)

U.S. Small Cap Value

16

U.S. Large Cap Value

14

U.S. Large Cap Growth

12 Commodities 10 8

Intl. Bonds

T-Bonds

U.S. Small Cap Growth

Intl. Stocks

6 4

T-Bills

2 Volatility (% p.a.) 0 0

5

10

15

20

25

30

Figure 2.5. Example of the efficient frontier generated from a set of nine asset classes

Asset Allocation (%) 100 90

Commodities

80 T-Bills

70

T-Bonds

60 50 40

U.S. Small Cap. Value

30 20 U.S. Large Cap. Value

10 0 0

2

4

6

8

10

12

14 16 Portfolio volatility (% p.a.)

Figure 2.6. Asset allocation of portfolios on the efficient frontier

35

Modern Portfolio Theoryy and Diversificcation

73

EXAMPL LE 2.4.– According A too Modern Portfolio P Thheory, the “market portfolioo”, which is i usually approximate a d by a cappitalization-w weighted index, should s be loocated on thee efficient frrontier. While this claim m is now widely challenged177, we will juust discuss here h the divversification level of c n-weighted indices. Fiigure 2.7 shows the market these capitalization capitalizzation breakkdown by coountry for the t 45 develloped and eemerging market countries inncluded in the MSCI All Countryy World Innvestable nated by thee United Staates with Market Index18. Thhe mix is clearly domin t total, while 31 of thee 45 countriees represent lless than more thhan 48% of the 1% eachh of the globbal market caapitalization.

Figure 2.7 7. Global markket capitalizatio on breakdown n by country

17 See, foor instance, Am menc et al. [AM ME 11]. 18 Note that t China marrket capitalizattion excludes A-shares, A whicch were historically only available to mainland Chhina investors.

74

Porttfolio Diversifica ation

Figure 2.8. To F op 10 holdings s as percentag ge of each coun ntry’s market capitalization c

Lookking at indivvidual counttries reveals some intereesting differrences in terms of sector and security conncentration. On the sectoor side, largeer equity ny) are well diversified, whereas marketss (e.g. U.S., U.K., France or German smaller ones are hiighly concenntrated in on ne or two seectors. For eexample, K derives 56% of its market m capitaalization from m the financiial sector Hong Kong and Russsia derives 58% of its market m capitaalization from m the energyy sector. On the security sidee, some markkets appear very v diversiffied, as illusstrated in 2 While in i the United States and d Japan, the 10 largest hholdings Figure 2.8. comprisse only 17% % and 19% of o market caapitalization,, other counntries are much more m concenttrated. The equity e markeets of Mexicoo and Singappore, for examplee, consist off only 40 andd 110 stocks, respectivelyy, and the 100 largest stocks make up a significant 78% and 54% 5 of eacch country’ss market capitalizzation. Similarly, the larrgest stock in South Koorea and Ruussia, for examplee, represent 20% and 25% 2 of marrket capitalization, respectively, versus approximatel a ly 4% of marrket capitalizzation in Jappan and in the United States. An A extreme example off a situation with a veryy high conceentration was reaached in Finland in 1999.. At the peak k of the dot-ccom bubble, Nokia’s

Modern Portfolio Theory and Diversification

75

market cap reached $250 billion and represented 75% of the Helsinki Stock Exchange. With such a large allocation in one single company, it is difficult to claim that the Finnish market portfolio at that time was well diversified. Finnish investors who nevertheless thought their portfolio was well diversified learned it the hard way in 2013 when Nokia market cap plummeted to $16.6 billion. 2.5. Increasing the diversification of Markowitz portfolios Several strategies have been suggested in the financial literature to increase the diversification of mean-variance optimized portfolios. We can regroup them in five broad categories: (i) strategies reducing estimation risk for the input parameters by using more sophisticated estimators; (ii) strategies modifying the optimization problem, for instance, by creating more robust portfolios; (iii) strategies adding constraints on weights and creating, for instance, norm-constrained portfolios; (iv) strategies combining several portfolio weight estimates to average down the possible errors and (v) strategies introducing entropy or weights functions to force a trade-off between optimization and diversification. 2.5.1. Reducing estimation risk for input parameters Concerning the estimation of the vector of expected returns, Green et al. [GRE 13] offered a review of no less than 333 papers that tackle the question. Nevertheless, as already discussed, estimating expected returns remains a difficult task. For the estimation of the covariance matrix, several alternative estimators can be used. Without aiming to be exhaustive, here are a few examples: – Impose a special structure in the variances and covariance of the returns to reduce the number of parameters to be estimated. For instance, Elton and Gruber [ELT 73] introduced constant correlation estimators, which assume that every pair of assets in the portfolio has the same correlation coefficient. This means that only 1 terms need to be estimated, namely return variances and one correlation coefficient, generally using the corresponding sample quantities. Alternatively, diagonal estimators assume that all asset returns are pairwise uncorrelated. This is also a very strong assumption, but it greatly simplifies the estimation process as only the variances of individual asset returns need to be estimated, once

76

Portfolio Diversification

again generally using sample variances. Using even more extreme simplifications, Ledoit and Wolf [LED 03] suggested using a scalar multiple of the identity matrix, with the mean of the sample variances used as the diagonal entry. – Linear factor models attempt to reduce the dimensionality of the covariance matrix by using a small number of factor returns. Examples include explicit factor models, implicit factor models (such as PCA), approximate factor structures [CHA 83b] and latent factor models [LAL 99, LAL 00]. We will discuss factor models and their applications to build better-diversified portfolios in Chapter 5. – Shrinkage procedures strive to achieve a compromise between the instability of the sample covariance estimator and the biases introduced by model-based estimators – see, for instance, Stein [STE 56], Ledoit and Wolf [LED 04a, LED 04b, LED 04c, LED 12, LED 13a, LED 13b, LED 14]. Simply stated, a shrinkage estimator is a convex combination of two extreme estimates, which are usually the unstructured sample covariance matrix , and a structured target covariance matrix (a.k.a. the shrinkage target). The new estimated covariance matrix becomes 1− , where 0 ≤ ≤ 1 is called the shrinkage intensity. The challenge is now to set an appropriate target and its associated optimal shrinkage intensity19. – The sparsity approach suggested by Dempster [DEM 72], Bien and Tibshirani [BIE 11] and Rothman [ROT 12] assumes that beyond the diagonal terms, only a small number of entries in the covariance or the precision matrix differs from zero20. It therefore estimates the covariance matrix from sample multivariate data by maximizing its likelihood while simultaneously penalizing the likelihood function for each entry of the covariance matrix (or its inverse) that is non-zero. The result is a steep reduction of the effective number of parameters and a sparse estimate of the covariance matrix.

19 The target can either be derived from an assumption or from a model. By using the Frobenius norm, we can define the “distance” between the true covariance matrix and the estimator as a quadratic error function. The shrinkage intensity parameter is usually set to a value that minimizes this quadratic error – see Ledoit and Wolf [LED 03]. 20 In a Gaussian framework, this is equivalent to assuming that the corresponding covariates are independent. This is the case, for instance, when assets belonging to a given group are correlated together, whereas assets pertaining to different groups are more likely to be independent.

Modern Portfolio Theory and Diversification

77

– Using higher-frequency data in estimators, as proposed by Jagannathan and Ma [JAG 03], gives promising results but requires more complex estimation techniques that practitioners are less likely to embrace21. – Decompose the conditional covariance matrix into a product involving the conditional correlation matrix and a diagonal matrix of conditional standard deviations. See, for instance, the constant conditional correlation (CCC) model of Bollerslev [BOL 90], the dynamic conditional correlation (DCC) model of Engle [ENG 02], the time-varying correlation (TVC) model of Tse and Tsui [TSE 02] and the recent dynamic equi-correlation (DECO) model of Engle and Kelly [ENG 09]. None of these techniques focus on generating more diversified portfolios. They aim at generating better-quality estimates as input parameters for the optimizer, with a view that if we have more confidence in estimates of and , we should be willing to accept less diversification in the optimized portfolio. As a side comment, it is interesting to see that much effort is being made to obtain a better estimator of , while the parameter of interest is not , the so-called precision the covariance matrix itself but its inverse matrix. In addition, the inverse of the estimated covariance matrix is expected to provide a rather poor estimate of the precision matrix, if only because of the numerical instability of the inversion process – see Muirhead [MUI 05]. In our opinion, going forward, researchers should therefore focus . on building better estimators of 2.5.2. Robust portfolio selection Robust portfolio selection aims to explicitly model the uncertainty around the estimates of and by defining an “uncertainty region” for each of them. Given the estimated input parameters , and their respective uncertainty regions , , we then select a portfolio that performs well for 21 Investors often use different sources and models to produce their estimates, without necessarily verifying their coherence and distributional assumptions. For instance, following Merton’s [MER 80] suggestion that as the sampling interval approaches zero, arbitrarily precise volatility estimates can be obtained, the estimation of variance and co-variance parameters is sometimes done using higher-frequency data than for estimating average returns. However, Merton’s suggestion only applies to the ideal case of asset returns generated from an i.i.d. normal distribution. In practice, results are not as promising as Merton’s ideal case – see, for instance, Bai et al. [BAI 01], who showed that using higherfrequency data does not necessarily translate into more precise estimates.

78

Portfolio Diversification

all possible parameter values in these regions. The simplest way to approach the problem is to look at the worst-case scenario of the estimated parameters in their respective uncertainty set. The result is a min–max optimization problem, whose solution is less sensitive to parameter fluctuations, as we are considering the worst case. For instance, the robust equivalent of the traditional Markowitz problem given by equations [2.23]–[2.25] is to minimize the worst-case variance of the portfolio, subject to the constraint that the worst-case expected return on the portfolio is at least ∗ . Mathematically: ∗

min max

∈ ∈

[2.54]

subject to: min





[2.55] [2.56]

This is usually a semi-infinite programming problem, which is computationally challenging and introduces the additional difficulty of having to estimate the uncertainty sets. However, when uncertainty sets are defined as intervals (“box constraints”), for example, ≤ ≤ , or − as ellipsoidal regions (“ellipsoidal constraints”), for example, ′ − ≤1, the problem becomes a semi-definite programming problem that is easier to solve. In practice, there are multiple ways of formulating and solving robust portfolio selection problems22. As an illustration, Garlappi et al. [GAR 07] derived an explicit formula for the min–max robust solution using the ellipsoidal uncertainty set for , assuming that is known and that shortselling is allowed23. Their model shows that an explicit constraint on the weights reduces the amount of possible change of the weight vector in

22 See, for instance, Ceria and Stubbs [CER 06], Goldfarb and Iyengar [GOL 03], Tütüncü and Koenig [TÜT 04] and Garlappi et al. [GAR 07]. 23 Zhu et al. [ZHU 09] show that even with an additional no-short-selling constraint, the solution to the min–max robust problem is a solution to the nominal problem of equation [2.20], but with a higher risk-aversion coefficient .

Modern Portfolio Theory and Diversification

79

response to changes in the parameter estimates. However, the specification of uncertainty sets plays crucial roles in robust solutions. In a sense, we now have the choice between dealing with uncertainty specification errors or with estimation errors. 2.5.3. Adding constraints on the weights Instead of shrinking the moments of asset returns, such as the covariance matrix, as suggested in section 2.5.1, we can also shrink the portfolio-weight vector. This is equivalent to solving the traditional Markowitz problem with additional constraints on weights, for example: ‖ ‖≤

[2.57]

where ‖ ‖ is the norm of the portfolio-weight vector and is a given threshold. Smaller values for restrict the set of feasible solutions to more diversified portfolios. Examples of possible norms include: – the 1-norm, defined as ‖ ‖



– the 2-norm, defined as ‖ ‖



– the -norm, defined as ‖ ‖



|

|

|

|

In the least-diverse scenario (portfolio made of only one asset), ‖ ‖ reaches its maximum value, which is 1. In the most diverse scenario (equally . The weighted portfolio), ‖ ‖ reaches its minimum value, which is ,1 . range of possible values for is therefore DeMiguel et al. [DEM 09b] introduced a general framework to analyze the impact of using such constraints on portfolio weights. They showed that this framework nests as special cases the shrinkage approaches of Jagannathan and Ma [JAG 03], Ledoit and Wolf [LED 03, LED 04a, LED 04b, LED 04c] and the equally weighted portfolio studied in DeMiguel et al. [DEM 09a, DEM 09b]. They also proposed using: – the -norm, defined as ‖ ‖

′≤

is a positive definite matrix. Although this allows where ∈ investors to express their preferences for each asset by specifying the matrix , in practice, the matrix can be difficult to decide. Most of the time, it is

80

Portfolio Diversification

set equal to the identity matrix, which means the -norm becomes equivalent to the 2-norm. To reduce the impact of estimation risk on mean-variance efficient portfolios and/or to force the optimizer to produce more diversified portfolios, we can also add specific constraints on individual weights. Two types of constraints are of relevance for portfolio diversification, namely the quantity constraints and the cardinality constraint24. The quantity constraints prescribe upper and/or lower bounds on each asset weight. That is, if asset is held in the portfolio, 0 is its minimum proportion and ≤ 100% is ≤ . It is important to note that: its maximum proportion, so that ≤ – The maximum constraint directly avoids an overly concentrated portfolio by restricting the feasible set of solutions only to the more diverse is used for all assets, then portfolios. If the same weight constraint the possible range for h becomes 1/ , 1 . The case 1/ allows only one solution, the equally weighted portfolio, whereas the case 1 means the constraint is redundant. – In practice, represents a “min-buy” or “minimum transaction level” for asset . Mathematically, it is not a very important constraint because it can be easily validated by allocating a minimum proportion to asset and by allocating a proportion of 100% − to all the assets without the is used for all the assets, then constraint. If the same weight constraint the possible range for becomes 0,1/ . The case 1/ allows only one solution, the equally weighted portfolio, whereas the case 0 means the constraint is redundant. The cardinality constraint imposes a limit on the maximum number of assets to be held within the portfolio. It is frequently used for (i) smaller dollar-size portfolios, which can only realistically own a limited number of securities; (ii) “focused funds” that are only allowed by their prospectus to own a small collection of securities and (iii) portfolios managed against a benchmark that do not want to turn into indexers. In all cases, it goes against the idea of increasing portfolio diversification. We will therefore ignore it.

24 A third type of constraint, called the class constraint, is sometimes used to limit the total proportion of the portfolio invested in those assets which belong to the same class (e.g. sector, country and industry). Classes of assets are mutually exclusive, and each class of assets can be assigned a lower proportion limit and an upper proportion limit. We will not discuss it here.

Modern Portfolio Theory and Diversification

81

These constraints can be incorporated in the traditional portfolio optimization program with the addition of binary variables. Let ,…, is a binary decision variable such that si = 1 if asset , where is held in the portfolio and si = 0 otherwise. The new optimization program is: ∗

arg min

[2.58]

subject to: ∗

[2.59] [2.60]

s’1=K ≤

[2.61] ≤

for all

1, . . ,

[2.62]

This is a quadratic mixed-integer nonlinear program, which is NP-hard or, stated differently, extremely difficult to solve. The objective function is positive semi-definite and hence we are minimizing a convex function. However, the cardinality constraint is complicated to deal with because it is discrete and therefore the solution space becomes discontinuous25. Compared to linear integer optimization, quadratic mixed-integer optimization problems have received relatively little attention in the academic literature. Nevertheless, a few algorithms, heuristics and metaheuristic optimization techniques are available for solving them. See, for instance, Fernandez et al. [FER 07] for the neural network method, Chang et al. [CHA 00] for the simulated annealing, Tabu search and genetic algorithms, Woodside-Oriakhi et al. [WOO 11] for the revised neighborhood algorithm, Di Gaspero et al. [DI 11] for a combination of the local search and quadratic programming approaches, Lwin et al. [LWI 13] for a mix of the population-based incremental learning and differential evolution approaches or Cui et al. [CUI 14] for a hybrid method combining a metaheuristic approach (particle swarm optimization) for the cardinality constraint and a mathematical programming method for the rest of the program. See also Bienstock [BIE 96] for a branch-and-cut algorithm, 25 Some practitioners therefore prefer relaxing the cardinality constraint from an equality to an inequality.

82

Portfolio Diversification

Bertismas and Shioda [BER 09] for a tailored procedure based on the Lemke and Howson [LEM 64] pivoting algorithm, Borchers and Mitchell [BOR 94] for a branch-and-bound algorithm, and Li et al. [LI 06] for a convergent Lagrangian method. All these techniques are generally complex to implement and may be slow depending on the number of assets considered. Alternatively, we may try to replace the variance of the portfolio with a measure of risk that is a linear function, or equivalently a linearizable function26, thus enabling the array of algorithms available for mixed-integer linear programming to be used. See, for instance, Speranza [SPE 96], Mansini and Speranza [MAN 97], Kellerer et al. [KEL 97] and Young [YOU 98]. A different path toward building more diversified optimized portfolios consists of using the equally weighted portfolio as a benchmark and imposing the following constraint: −



for all

1, . . ,

[2.63]

The larger h, the looser the constraint. Setting 0 results in the optimizer selecting the equally weighted portfolio, which is the best solution when we have “zero information” about the ex-ante parameters and . For − 1 / , we are back to the unconstrained mean-variance optimizer. This is the best solution when we have perfect information about the ex-ante parameters and . In practice, the h parameter should be calibrated to appropriately represent the investor’s tolerance level toward moving away from diversification. One criticism of equation [2.63] is that the constraints are homogenous across all assets. In practice, it would be logical to impose more stringent constraints on assets with a higher volatility relative to other assets, because their estimation errors are likely to be larger. For instance, we may want to introduce volatility-based constraints: −



for all

1, . . ,

[2.64]

26 This approach is not new. For instance, Sharpe [SHA 67, SHA 71] approximates the quadratic objective function of Markowitz by a linear and piecewise linear function. Jacob [JAC 74] assumes equal weights across assets to formulate the problem as a pure 0-1 problem.

Modern Portfolio Theory and Diversification

83

∑ where is the average standard deviation of all assets. Albeit interesting, this approach has the drawback of imposing identical weight constraints for all assets with the same standard deviation, while their contribution to the overall portfolio risk and return may be very different. Moreover, as long as they remain within the allowed bounds, deviations from the equally weighted portfolio are not subject to any penalty. One possible solution is to replace the specific constraints on each asset deviation with one global volatility-based constraint on the total “cost” of all deviations. For instance, using a quadratic deviation, we obtain: ∑





for all

1, . . ,

[2.65]

With this new constraint, stocks with large standard deviations are penalized more when they deviate from the equally weighted solution. Unfortunately, in practice, constraints on weights are often set arbitrarily with not much thought about their consequences. They generally increase the diversification of the optimized portfolio, but the danger is that if such constraints are very tight, they will pre-determine its content. One way to think about constraints is that each of them will enter the mean-variance objective function, multiplied by a Lagrange multiplier. The optimizer will produce not only the optimal weights but also the value of the Lagrange multipliers. If a multiplier is different from zero, it means the corresponding constraint is binding. As explained by Grinold and Easton [GRI 98] or Jagannathan and Ma [JAG 03], a constrained optimization with a given set of input parameters is equivalent to an unconstrained optimization with a modified set of input parameters. Thus, introducing constraints is implicitly equivalent to adjusting expected return and/or co-variance estimates. 2.5.4. Combining several portfolio weights The bootstrapping approach is a well-known solution to deal with the estimation risk and the lack of diversification of optimized portfolios. Introduced by Effron [EFF 79], bootstrapping consists of drawing many samples from a given distribution and calculating a statistic on each sample. Ultimately, this will provide an estimate of the distribution of the statistic.

84

Portfolio Diversification

Bootstrapping using historical returns was suggested in the context of portfolio optimization by Jobson and Korkie [JOB 81a] and by Jorion [JOR 92]. The process works as follows. First, we select observations of the returns on the assets and compute the sample expected returns and the sample covariance matrix . We calculate the optimal portfolio using and as inputs and memorize it27. We then select another sample of observations and repeat the procedure as many times as needed. Ultimately, we average the weights of all our optimal portfolios and obtain the optimal bootstrapped portfolio. By construction, this portfolio is likely to display a high degree of diversification and it is unlikely to change if one additional sample is added to the data set (and its corresponding optimal portfolio is averaged with the other ones). As an illustration, we can simply consider the maximum return portfolio (which corresponds to 0 in equation [2.20]). With traditional mean-variance optimization, this portfolio will generally be made of the highest-performing asset. With re-sampling, the same will happen in each simulation, but the highest-performing asset may not always be the same in each sample. The final averaging of portfolio weights will likely give a portfolio made of several assets which is therefore better diversified. Bootstrapping returns can also be made from a parametric distribution. For instance, Michaud [MIC 98] suggested re-sampling from a multivariate normal distribution instead of an empirical one and develops a method of constructing “statistically equivalent” efficient frontiers. The empirical research of Delcourt and Petitjean [DEL 11] confirms that re-sampled portfolios generally have better diversification properties and are more stable over time compared with the mean-variance portfolio. It is important to note that Scherer [SCH 02] has raised several objections against the bootstrapping approach in the case of long/short portfolios. An interesting alternative to re-sampling is portfolio mixtures, which combine the estimated optimal portfolio with either a fixed portfolio or a portfolio that depends on a smaller number of estimated parameters – see, for instance, Golosnoy and Okhrin [GOL 07], Kan and Zhou [KAN 07], DeMiguel et al. [DEM 09a] or Tu and Zhou [TU 11].

27 If the serial dependence of the data is important, we can divide the data into overlapping blocks of fixed length and resample with replacement from these blocks.

Modern Portfolio Theory and Diversification

85

2.5.5. Using entropy in the objective function To avoid an excessive concentration of optimized portfolios in terms of the number of assets, we may also attempt to incorporate the portfolio entropy in the optimization problem of the investor. 2.5.5.1. Minimum entropy value 0 on the entropy

A minimum entropy constraint adds a lower bound of the portfolio: −∑

[2.66]

In the least-diverse scenario (single-asset portfolio), reaches its minimum value, which is 0. In the most diverse scenario (equally weighted portfolio), reaches its maximum value, which is . The range of possible values for is therefore 0, , with larger values of resulting in better diversity. It is important to note that imposing a maximum weight constraint max ≤ in a portfolio is equivalent to imposing an entropy constraint like the one in equation [2.56]. Under a maximum weight constraint, the least-diverse portfolio is structured as follows: 1/ assets have a weight equal to , one asset has a weight of 1 − 1/ and other assets have a weight equal to zero. The lower bound of is: −



ln



1−



Since 0 ≤ 1 − 1/ ≤ ≤ 1, we have Replacing terms in equation [2.67] yields: −

ln



1−



1−



1 − 1/



[2.67] ≤

≤ 0.

[2.68]

However, imposing an entropy constraint − is not necessarily equivalent to imposing a weight constraint max ≤ .

86

Portfolio Diversification

2.5.5.2. Maximizing entropy One option suggested by Jiang et al. [JIA 08] and Zheng et al. [ZHE 09] is to replace the variance with the Shannon entropy in the traditional Markowitz program and to add constraints on the portfolio expected return and/or variance. The resulting optimization program is28: ∗

arg max − ∑

[2.69]

subject to: ≤





[2.70] [2.71] [2.72]

0

[2.73]

where ∗ denotes the target return for the portfolio and ∗ denotes its maximum variance. Both parameters are exogenously given by investors. This is a concave optimization problem with linear constraints. We can show that it is technically equivalent to shrinking portfolio weights toward an equally weighted portfolio, a problem that is discussed in the next section. 2.5.5.3. Prior information and minimum cross entropy In some instances, we may have a given set of reference portfolio weights in mind that the optimized portfolio weights ∗ should be close to. For instance, the reference portfolio could be equally weighted portfolio, that is, 1/ , . . ,1/ , but any other set of reference weights could be used. If both ∗ and are long-only, one idea is to minimize the

28 Jaynes [JAY 57] explored the idea of determining the optimal distribution of a system by maximizing its Shannon’s entropy measure. The resulting distribution is usually referred to as the MaxEnt distribution in the information theory literature.

Modern Portfolio Theory and Diversification

87

pseudo-distance between ∗ and , for instance, using the Kullback– Leibler divergence. The new optimization problem becomes: ∗





/

[2.74]

subject to the constraints [2.70]–[2.73]. This approach is known as the Kullback–Leibler minimum entropy principle, the minimum rectified deviation principle, or the minimum information discrimination principle. As equation [2.74] simplifies to: ∗



ln



[2.75]

minimizing the cross entropy with respect to an equally weighted portfolio is equivalent to maximizing the negative Shannon’s [SHA 48] entropy measure. Alternatively, we could also minimize Yager’s entropy or minimize the maximum distance between weights. 2.5.5.4. The Mean-Variance-Entropy An interesting idea introduced by Philippatos and Wilson [PHI 72], Hua and Xingsi [HUA 03], Bera and Park [BER 08] and Ke and Zhang [KE 08] is to modify the mean-variance model to incorporate the portfolio entropy in the objective function of the investor. The resulting optimization problem is: ∗

min



[2.76]

subject to: ∗

[2.77] [2.78]

where ∗ denotes the target return for the portfolio, which is exogenously given, and is a coefficient (sometimes called “momentum factor”) that determines the significance of the term for entropy in the objective function. This model offers a compromise between the risk and the diversification level of a portfolio. For 0, we obtain the traditional mean-variance model.

88

Portfolio Diversification

2.5.5.5. Almost efficient solutions An interesting observation made by Corvalan [COR 05a] is the fact that the region near the efficient frontier is the densest in the whole risk-return space. In general, portfolios seem to crowd there, which implies that there is a very large amount of almost efficient portfolios, that is, portfolios located very close to the efficient frontier. To reduce the concentration of a given efficient portfolio, Corvalan [COR 05b] therefore suggested searching for an almost efficient but more diversified portfolio within its close neighborhood. The process is extremely flexible, but involves two optimization procedures. The first optimization generates an efficient, but often poorly diversified portfolio that we will call . This portfolio is characterized by its weights ∗ and expected return and risk pair ∗ , ∗ . The second optimization looks for portfolio ’s most appropriate neighbor. To do so, it defines an infinitesimal region around ∗ , ∗ by considering all portfolios whose expected return is between ∗ − Δ ∗ and ∗ Δ ∗ , and whose volatility is between ∗ − Δ ∗ and ∗ Δ ∗ , where Δ ∗ and Δ ∗ are negligible enough from the investor’s perspective. The second optimization problem then becomes, for instance: arg max

[2.79]

subject to: √ ′ ∗



−Δ







Δ





[2.80] [2.81]

and the original constraints, for instance: [2.82] 0

[2.83]

The objective function is a measure of diversification for the resulting portfolio, which can be set freely by the investor. For example, we ∑ could choose .

Modern Portfolio Theory and Diversification

89

2.6. Conclusions on MPT As a summary, we will say that the theory and logic behind Markowitz’s portfolio theory is sound, but that its implementation creates a large number of issues. Due to approximation and estimation errors combined with an optimization process, mean-variance optimization tends to significantly overweight assets with large expected returns, low volatilities and/or negative correlations with other assets, and this results in highly concentrated portfolios. Green and Hollifield [GRE 92] were the first to focus specifically on this concentration issue. They demonstrated that the extreme weights in efficient portfolios are due to the dominance of a single factor in the covariance structure of returns and the consequent high correlation between diversified portfolios. We will come back to these findings in Chapter 5.

3 Naive Portfolio Diversification

Naive portfolio diversification is a heuristic approach to investing and probably the simplest one. It finds its roots in the Babylonian Talmud, which explains why it is sometimes referred to as “Talmudic diversification”. In Tractate Baba Mezi’a, folio 42a, Rabbi Issac bar Aha recommends the following asset allocation strategy: “one should always divide one’s wealth into three parts: a third in land, a third in merchandise, and a third ready to hand”. Some people, less reliable in our opinion, also claim that naive portfolio diversification was inspired by the simple composition of the Negroni cocktail, with equal parts gin, vermouth rosso and Campari. When extended to a larger set of assets, naive portfolio diversification consists in dividing capital allocations evenly amongst all assets. Its goal is to reduce the overall portfolio risk in a simple and intuitive way, without having to go through the mathematical complexities of optimization problems. 3.1. A (very) simplified model To understand the impact of naive portfolio diversification on portfolio size, let us first analyze it in an excessively simplistic model. Let us assume that all assets have identical expected returns and standard deviations , and that there is an equal correlation ( ̅ ) between all pairs of assets. In such a case, we can rewrite equation [2.15] as ∑ ̅





[3.1]

92

Portfolio Diversification

For any given , the sum of all for all must equal 1 Substituting into the last term and simplifying yields: ̅ ∑

1

̅

In the case of equal weights ( given by: 1

̅

. [3.2]

1/

̅

for all ), the portfolio variance is [3.3]

Equation 3.3 provides useful insights into the potential benefits of naive diversification. The portfolio variance is made of two terms. The first one can be reduced by increasing the number of assets in the portfolio. The second one depends on the average correlation between assets but not on the number of assets; it represents the lower bound of the portfolio variance in a large naively diversified portfolio. Equation 3.3 can be rewritten as: ̅

[3.4]

which clearly shows that, all other things being equal, (1) the volatility of the portfolio declines when the number of assets in the portfolio increases; (2) the lower the average correlation between assets, the greater the potential diversification benefits; and (3) the limit of the portfolio volatility is ̅, and is reached when gets very large and all idiosyncratic risk is eliminated. We will not discuss further our (excessively) simplified model but we should note that when all assets have identical expected returns and standard deviations, and all correlations are identical, the equally weighted portfolio is efficient in the Markowitz sense, as well as from an entropic perspective. 3.2. The law of average covariance In a more general context, when all assets are equally weighted but have different expected returns, standard deviations and correlations, it is

Naive Portfolio Diversification

93

relatively easy to show that a portfolio’s standard deviation is increasing in the average pairwise correlation of its constituents. From equation [2.14], we have that: ∑



[3.5]

,

Setting 1/ , taking the square root of both sides of equation [3.5] and differentiating with respect to , yields:

,



0



[3.6]

,

This suggests that all, other things being equal, in an equally weighted portfolio one strategy to minimize the portfolio variance consists in selecting assets with a lower average pairwise return correlation. However, there is more than just correlation coefficients to be considered here. When all weights are equal, the variance of the portfolio becomes: ∑





,

[3.7]

The average variance of the assets is: ∑

σ

[3.8]

and their average covariance is: ∑

,



,

[3.9]

By seeing that: [3.10] we can insert the average variance and the average covariance expressions into the portfolio variance formula and obtain what Markowitz [MAR 76] called the law of average covariance: σ

,

[3.11]

94

Porttfolio Diversifica ation

Equaation 3.11 states that the variance of a naively diiversified porrtfolio is a functiion of the average a variance and off the averagge covariance of the individuual assets coomprised in the t portfolio. As the num mber of asseets in the portfolioo increases, the contribuution of the average a variaance shrinkss and the variancee of the porrtfolio converges toward ds the averaage covariannce. The phenom menon can be b graphicallly represented as in Fiigure 3.1. Since the specificc variance coontributed byy individual assets a gets diversified d aw way, it is called “diversifiable “ e risk” or “sppecific risk”.. By contrastt, the covariaance risk that rem mains in the portfolio p wheen all N asseets of the uniiverse are inccluded is called “systematic riisk”.

Figure 3.1 1. The effect of o increasing th he number of assets in a po ortfolio on the variance of th he portfolio. Sp pecific risks ca an be controllled byy portfolio sele ection. System matic risk is the e risk inherent to the asset tthat canno ot be diversified away

3.3. Th he relative benefits b off naive porttfolio diverrsification Equaation [3.11] quantifies q thhe impact of naive portfoolio diversificcation in absolutee terms. To compare thee results betw ween differennt portfolioss or over differennt time periodds, it is neceessary to worrk in relativee terms. Thiss is done by dividing both sides s of equuation [3.11]] by the aveerage variannce of a portfolioo of size onne, which iss the same as the averrage variancee of the

Naive Portfolio Diversification

95

underlying assets of the population (σ ). The result is usually called the relative portfolio variance (RPV): ,

[3.12]

Rearranging terms yields: ,

1

,

[3.13]

Several interesting observations can be made from equation [3.13]: – The first term, , /σ , denotes the relative systematic risk that cannot be eliminated by naive diversification. The second term, (1- , /σ , represents the diversifiable risk. – The relative portfolio variance is an inverse function of . For a singleasset portfolio, RPV always equals one. As increases, RPV decreases and ultimately converges towards , /σ as → ∞. – The relative benefits of naive diversification are independent of the length of the holding period. However, they depend on the ratio of the average covariance to the average variance of all assets in the population ( , / ), which may vary from one sample to another and from one period to another. – In a finite universe of assets, naive diversification cannot eliminate 100% of the diversifiable risk of a portfolio. However, when is large, naive diversification eliminates on average 1 / percent of the diversifiable risk, independently of the type of assets considered. So, with a portfolio of size two, half of the diversifiable risk is eliminated on average. With a portfolio of size 10, 90% of the diversifiable risk is eliminated on average, etc. In a more general context, an interesting question is how many assets are required in a portfolio to eliminate a certain percentage of its diversifiable risk, given a universe of assets. We know from equation [3.13] that the maximum amount of diversifiable risk is 1- , /σ . Compared with a portfolio of size one, a -asset portfolio eliminates on average 1 / percent of that diversifiable risk. Similarly, an -asset portfolio

96

Portfolio Diversification

(with ) eliminates on average 1 / percent of its diversifiable risk. From there, we can conclude that an -asset portfolio in an -asset universe eliminates on average / percent of its maximum diversifiable risk. Table 3.1 presents the number of assets required in a portfolio to eliminate a certain percentage of diversifiable risks given different population sizes. The smaller the population size, the smaller the required number of assets. As an illustration, to eliminate 98% of the diversifiable risk, we need on average 16 assets in a universe of 25 assets, 37 assets in a universe of 500 assets and 40 assets in an infinite universe. Number of assets (N)

Percentage of diversifiable risk to eliminate 50%

75%

90%

95%

98%

99%

2

1.3

1.6

1.8

1.9

2.0

2.0

5

1.7

2.5

3.6

4.2

4.5

4.8

10

1.8

3.1

5.3

6.9

8.2

9.2

25

1.9

3.6

7.4

11.4

15.6

20.2

50

2.0

3.8

8.5

14.5

22.5

33.6

100

2.0

3.9

9.2

16.8

28.8

50.3

500

2.0

4.0

9.8

19.3

37.1

83.5

1,000

2.0

4.0

9.9

19.6

38.5

91.0

5,000

2.0

4.0

10.0

19.9

39.7

98.1

10,000

2.0

4.0

10.0

20.0

39.8

99.0



2.0

4.0

10.0

20.0

40.0

100.0

Table 3.1. Number of assets ( , rounded to one decimal) required in a portfolio to eliminate on average a given percentage of diversifiable risk given different population sizes ( )

3.4. Empirical tests Despite the existence of the analytical expressions, discussed in the previous sections, used to quantify the benefits of naive diversification, several academic studies have attempted to quantify them empirically.

Naive Portfolio Diversification

97

3.4.1. Archer and Evans [ARC 68] A systematic exploration of all possible combinations of assets is generally not feasible from a computational perspective as the number of portfolios to be considered is far too large. With a population of assets we would need to form all possible portfolios of size , calculate their respective variances and average them. The process would have to be repeated for all possible values of , so that each portfolio size has an associated average portfolio variance. With a population size of , the exact number of possible portfolios of size is given by the binomial factor , ! / ! !. For example, for a population of only 100 assets, we would have 100,1 100 portfolios of size 1, 100, 2 4,950 portfolios of size 2, 100, 3 161,700 portfolios of size 3, 100, 4 3225 portfolios of size 4, etc. Clearly, the number of portfolios to be analyzed explodes as the number of assets increases. Fortunately, taking all possible combinations of portfolios and averaging them is the same as taking the expectation of equation [3.11]. With a population of size and a portfolio of size , the expected portfolio variance is given by: [3.14]

,

where is the average variance of the assets and , is their average covariance. Most empirical studies therefore prefer to rely on sampling procedures to randomly generate a series of portfolios for each size and calculate their variance. This approach can be summarized as follows: 1) Define the population of N assets to be considered. 2) For a given number

of assets, starting with M=1:

– randomly select assets in sequence without replacement from the population and create an equally allocated portfolio; – measure the variance of this portfolio from the time series of asset returns; – repeat the two previous steps as many times as required to create a series of portfolios of size and therefore obtain an estimated variance for each of these portfolios; – calculate the average variance of all these portfolios.

98

Porttfolio Diversifica ation

3) Inncrease M byy 1 and repeaat step 2 untill M = N, or M is considerred large enough.. 4) Pllot the averaage variance obtained fo or each portffolio size aggainst M, the porttfolio size. Notee that this appproach suffeers from a “lo ook-ahead” bias, b as the saample of N assets can only include i thosse with com mplete return informationn for the ntly, assets thhat have disaappeared time-period being examined. Staated differen during the t considereed period cannnot be inclu uded in the analysis. a How wever, in practicee, this bias is usually negllected. As an a illustratioon, let us deescribe the first f and moost consistenttly cited portfolioo diversificaation study, namely thee one conducted by Arccher and Evans [ARC 68]. In their stuudy, Archer and Evanss built portffolios of M = 1, 2, …, 40) by rand dom selectioon from a saample of increasiing sizes (M N = 4700 U.S. stockss listed in thee S&P 500 Index and calculated the standard deviatioon of their seemi-annual returns r betweeen January 1958 and Juuly 1967. They reepeated the process p 60 tim mes to obtain n 60 observaations of the standard deviatioon for each portfolio sizze. They theen calculatedd the mean standard deviatioon σˆ p (M ) foor each size M and plotteed it against M. The resuult is the curve shhown in Figgure 3.2. Theey concluded d that the divversification benefits were beecoming marrginal beyondd 10 stocks in n the portfolio.

Figure 3.2. Empirical results r obtaine ed by Archer and a Evans a reduc ction of portfollio volatility [ARC 68] showing the average he number of assets a in a porrtfolio as a function of th

Naive Portfolio Diversification

99

Archer and Evans noted that the portfolio average variance asymptotically approached the variance of the population (consisting of all 470 securities) as the portfolio size increased. The population variance was well approximated with only 10 securities, which led them to express their “doubts concerning the economic justification of increasing portfolio sizes beyond 10 or so securities”.

3.4.2. Subsequent studies The approach pioneered by Archer and Evans [ARC 68] is sometimes referred to as “throwing darts at the Wall Street Journal” because of its random selection of portfolio components. Nevertheless, due to its simplicity, it has been widely used in many subsequent empirical studies. Some of them closely follow the original methodology while others differ in terms of markets, data frequency, holding period and risk measure considered (variance, mean absolute deviation, terminal wealth standard deviation, second-degree stochastic dominance, etc.). Without aiming to be exhaustive, Table 3.2 lists some of these studies and summarizes their findings regarding the number of assets required to create a well-diversified portfolio. While most studies agree on the fact that portfolio risk decreases in a monotone way as a function of portfolio size, they clearly do not reach a consensus on how many assets are needed to achieve a “well-diversified” portfolio. The traditional wisdom from the 1960s, 1970s and early 1980s was that 8 to 30 stocks were generally sufficient. After the mid-1980s, this number seems to have significantly increased above 100 before decreasing back to lower levels in the late 2000s. These surprisingly wide variations are attributable to a combination of three factors: (1) the increase in computing power has generally resulted in a “too large a sample size” problem, also known as Lindley’s paradox in the Bayesian literature; (2) there has been intertemporal variations in the average correlations between the underlying assets; and (3) there has been intertemporal variations in the idiosyncratic volatility of the underlying assets.

100

Portfolio Diversification

Authors

Number of assets recommended

Archer and Evans [ARC 68]

8 to 10

Latane and Young [LAT 69]

8 to 16

Fisher and Lorie [FIS 70] Mao [MAO 70] Jennings [JEN 71] Wagner and Lau [WAG 71] Fielitz [FIE 74] Klemkosky and Martin [KLE 75] Upson et al. [UPS 75]

8 (80% risk reduction), 16 (90% risk reduction) 17 (90% risk reduction), 34 (with 0.2 correlations) 15 10 to 15 8 8–14 16

Elton and Gruber [ELT 77]

at least 15

Levy [LEV 79a, LEV 79b]

8 for 20 year, 128 for 1 to 5-year holding periods

Tole [TOL 82]

60

Gup [GUP 83]

8 to 9

Statman [STA 87]

30 to 40

Francis [FRA 91]

10 to 15

Levy and Livingston [LEV 95] Cleary and Copp [CLE 99] Fabozzi [FAB 99] Campbell et al. [CAM 01]

10 30 to 50 20 20 before the year 1985, 50 in the 1990s

De Vassal [DE 01]

15 to 50

Statman [STA 02]

more than 120

Domian et al. [DOM 03]

40 for 5-year, 60 for 20-year holding periods

Malkiel [MAL 02]

200

Statman [STA 04]

300

Benjelloun [BEN 06] Domian et al. [DOM 07] Kearney and Poti [KEA 08] Xu [XU 09] Benjelloun [BEN 10b] Daryl and Shawn [DAR 12] Zhou [ZHO 14]

200+ in the U.S., 6 in Qatar and the U.A.E. more than 100 35 in the year 1974, 166 in the year 2003 much more than 30 40 to 50 10 to 16 (cap weighted), 33 to 41 (eq. weighted) 10

Table 3.2. Review of the portfolio diversification literature – how many stocks are needed to create a “well-diversified” portfolio

Naive Portfolio Diversification

101

3.4.2.1. Lindley’s paradox Lindley’s paradox is a well-known illustration of the disagreement between standard sampling theory significance tests and Bayesian methods when testing a precise null hypothesis ( : ) against an unspecified alternative ( : ). More specifically, we can show that, in a large sample, a null hypothesis may be rejected by standard significance tests while being awarded high odds by Bayesian methods. Although it is difficult to explain Lindley’s paradox without getting into a complex econometrics discussion, let us try to review the intuition behind it. Several of the studies mentioned in Table 3.2 use some sort of statistical test to see if adding assets in a portfolio results in a significant reduction of the variance. To do so, they posit a null hypothesis – say, for instance, that the difference in variance between portfolios of size and size 1 is zero. They then fix the probability of a Type I error (reject the null while it is true), run their simulations, obtain a sample of portfolio variances for sizes and 1, calculate their average difference and test whether this quantity is statistically different from zero. As the sample size increases, the probability of a Type II error (accepting the null while it is false) goes to zero, which is desirable, but the probability of a Type I error stays constant. In a sense, there is strict control over Type II errors but no control over Type I errors. This implicitly favors the alternative hypothesis, which is that there has been a significant reduction of variance and, consequently, that 1 is significantly better than . As the sample size grows, interpreting significance tests therefore becomes difficult. As stated by Kennedy [KEN 17], “almost any parameter can be found to be significantly different from zero if the sample size is sufficiently large”. Now comes in addition a side effect of Moore’s law, which states that computing power approximately doubles every two years. The first study in Table 3.2 was made in 1968, the last in 2014. The computer used by Archer and Evans to run their simulations was what it was and their sample size had to be limited to 60 portfolios. With the increase in computing power, more recent studies have been able to simulate more portfolios than the earlier ones. By increasing their sample sizes, they have indirectly

102

Portfolio Diversification

created a bias towards concluding that adding one more asset in a portfolio provides diversification benefits statistically significant from zero. This could explain some of the increases in recommended portfolio sizes over time1. 3.4.2.2. Intertemporal variations in correlations Another possible explanation for the variations in the number of assets required to naively diversify a portfolio is the existence of intertemporal variations in correlations. For instance, an increase in the correlation between assets implies that more of them will be required on average to obtain the same risk reduction benefits. This comes from the two key features of naive diversification, the random selection of assets and their equal weighting. With the former, when assets become more highly correlated, we need to pick up more assets on average to find less correlated ones. Unfortunately, the latter feature then implies that these new uncorrelated assets will receive a lower weight in the portfolio and therefore that less diversification benefits can be extracted from them2. Several researchers have been monitoring the evolution of correlations between asset returns. In the 1990s and 2010s, the sentiment was that correlations were generally increasing – see, for instance, Longin and Solnik [LON 95], J.P. Morgan [MOR 10, MOR 11] or Goetzmann et al. [GOE 05] for a very long-term study from 1872 to 2000. However, more recent studies seem to suggest that they have stabilized or even that they have crashed. As an illustration, Figure 3.3 shows the evolution of the correlation between U.S. and international stocks over several decades. Figure 3.4 shows the evolution of the average pairwise correlation of stocks in the MSCI U.S. index. Figure 3.6 shows the evolution of the Morgan Stanley Global Correlation Index, which measures the average 6-month cross-correlations between a series of currencies, bond spreads and major equity markets, including regional indices.

1 Note that there are two possible cures to this problem. The first one is to make the significance level a decreasing function of the sample size. The second one is to use common sense and test whether the difference in variance, which is statistically significant, is meaningful for an investor. For a discussion of these issues, see Beck et al. [BEC 96], McCloskey and Ziliak [MCC 96], Leamer [LEA 78], and Kennedy [KEN 17]. 2 The situation would be different if we selected assets non-randomly and weighted them appropriately. However, this could no longer be called naive diversification.

Naive Porttfolio Diversifica ation

Figure 3.3. Evolution by decade of the e correlation between U.S. &P 500) and in nternational sto ocks (MSCI EAFE E Index) stocks (S&

Fiigure 3.4. Evo olution of the average a pairw wise correllation of stockks included in the t MSCI U.S. index

103

104

Po ortfolio Diversificcation

Figure 3.5. Evolution E of Morgan Stanleyy 6-month cro oss-asset corrrelation index

All these t figuress confirm ann initial risee of correlatiions until m mid-2011, followed by a periood of relativeely higher vo olatility but generally asssociated with a decrease d in average a correelation levells. The initiaal rise of corrrelations has beeen primarily attributed too the globallization of thhe economy and the integrattion of capiital markets,, which hav ve both redduced diverssification opportuunities. Addittional explannations inclu ude the devellopment of nnew riskmanageement and allpha-extractiion techniqu ues, risk-on/ooff trading, ccurrency carry trrades, cross-asset arbitraage and the increased usage of indeex-based productts, which meeans a buy orr sell trade will w affect alll componennts of the index at a the samee time. The reasons beehind the most m recent drop in correlations are still being debbated but th hey usually include the loss of linkage between couuntries and regions r becaause of more divergent economic policiess, politics andd currency moves. m In addition, as wee are getting towards the endd of an econnomic cycle, the relation nship betweeen credit and equity productts gets weaaker and individual i stocks s tendd to displaay more idiosynccratic risk, which w lowers the correlatiions betweenn equity indicces.

Naive Porttfolio Diversifica ation

105

Whille it is relattively difficuult to forecaast whether average corrrelations will conntinue to deecline or staabilize at cu urrent levels,, it is reasonable to assume that correlaations will generally co ontinue to vary v over economic cycles. Consequentlly, diversificcation benefiits should alsso be expectted to be time-varying. 3.4.2.3. Intertempo oral variatio ons in idiosy yncratic vola atility ning how maany assets aree needed Idiossyncratic riskk matters whhen determin to creatte a well-divversified porttfolio. The in ntuition behiind this is illustrated in Figurre 3.6, whichh shows thatt as idiosynccratic risk inncreases, morre assets are needed to reeach the same s level of diversification (aassuming undiverrsifiable risk remains thee same). Thiss has two brroad implicattions for portfolioo diversificaation. The onne is that in markets m wheere idiosyncrratic risk is low, a small num mber of securrities will bee needed to allow a diversiification. m be the case, for innstance, in smaller s marrkets that arre either This may dominatted by one unique u indusstry, or wherre all assets are primarily driven by the same s exogennous factor (which ( in a sense becom mes systemattic). The second implication is that if thee level of idiiosyncratic riisk varies ovver time, the num mber of assetss required to diversify a portfolio p willl also vary.

Figure 3.6. 3 Idiosyncra atic risk variatiions and risk reduction r

106

Portfolio Diversification

Campbell et al. [CAM 01] were the first to document that over a fourdecade period ranging from the early 1960s to the end of the 1990s, U.S. public firms exhibited an increase in their idiosyncratic volatility3 while the aggregate market as well as industry volatilities remained stable. Consequently, they claimed that 20 stocks were necessary before 1985 to obtain a well-diversified portfolio versus 50 stocks to achieve the same goal in the 1990s. Kearney and Poti [KEA 08] reached a similar conclusion in Europe, with 35 stocks needed in the year 1974, versus 166 in the year 2003. In both studies, the results were not due to a rise in stock return correlations but to the fact that idiosyncratic risk itself had gone up. This upward trend in idiosyncratic volatility was also confirmed by Malkiel and Xu [MAL 03], Fama and French [FAM 04], Wei and Zhang [WEI 06] and Jin and Myers [JIN 06], amongst others. Since this early research the topic of rising idiosyncratic volatility has been extensively discussed in the empirical finance literature, with two major questions in mind: – Why has idiosyncratic volatility been rising? Most suggested explanations are based on differences in volatility levels between firms in cross-section analyses. For instance, Malkiel and Xu [MAL 03] use institutional ownership as an explanatory variable, while Wei and Zhang [WEI 06] and Rajgopal and Venkatachalam (2006) claim that firm fundamentals have become more volatile or opaque. The increased role of small companies in the market and the fact that newly listed firms are increasingly younger and riskier are suggested by Bennett et al. [BEN 03], Brown and Kapadia [BRO 07] and Fink et al. [FIN 10]. Irvine and Pontiff [IRV 09] also observe that product markets are becoming more competitive. Last but not least, Brandt et al. [BRA 10] suggest that irrational “noise” traders drove the high levels of idiosyncratic risk during the Internet boom of the 1990s. – Should it matter for asset pricing models? Unfortunately, as often in empirical finance, the answer diverges widely across studies. A first group claims that idiosyncratic volatility should not be priced, a second group

3 The increase in firm-specific return volatility came with more volatile income and earnings, lower profitability and lower survival rates.

Naive Portfolio Diversification

107

claims that stocks with higher idiosyncratic volatility command a higher risk premium, and a third group – which is clearly the more puzzling one – argues that the relationship between idiosyncratic volatility and mean returns is negative4. To worsen things, the way idiosyncratic volatility has been measured often varies between these groups5. More recent empirical studies have gone one step further and suggest that idiosyncratic volatility seems to alternate between low- and high-level regimes. For instance, Brandt et al. [BRA 10] show that, during more recent years, U.S. idiosyncratic volatility has fallen substantially to pre-1990s levels. Sault [SAU 05] obtained similar conclusions in Australia. Bekaert et al. [BEK 12] found no evidence of significant trends in idiosyncratic volatility for 23 developed equity markets. Guo and Savickas [GUO 08] used G7 country data and provided evidence that the value-weighted idiosyncratic volatility increased during the late 1990s and reversed to the pre-1990s level afterwards. Once again, these results seem consistent with the idea that the benefits of portfolio diversification are expected to vary over time. 3.4.3. Analytical (but incorrect) approximations As discussed in the previous section, several academic studies have attempted to quantify empirically the benefits of naive diversification on portfolio volatility through simulations. As we could expect, their conclusions were generally in line with the analytic expression given by equation [3.11]. Funnily enough, several of these studies have gone one step further – or maybe we should say one step too far. They attempted to fit a regression line through their empirical results to obtain again… an

4 See Bali et al. [BAL 05], Huang et al. [HUA 10, HUA 11] and Fink et al. [FIN 12] for the first group; Goyal and Santa-Clara [GOY 03], Jiang and Lee [JIA 06], Diavatopoulos et al. [DIA 08], Fu [FU 09] and Garcia et al. [GAR 10] for the second group; and Ang et al. [ANG 06, ANG 09] and Guo and Savickas [GUO 10] for the third group. 5 For instance, some papers use the variance of residual returns from a market model ( ) as the measure of firm-specific return variation, whereas others use return synchronicity, or from the same market model. Unfortunately, these two measures are not interchangeable and can lead to contradictory inferences – see Li et al. [LI 14].

108

Po ortfolio Diversificcation

analyticcal relationsship. For instance, i Archer A and Evans [AR RC 68] empirically linked the t standard deviation off a portfolio to the numbber of its M) using the following regression: underlying assets (M

σˆ p (M) = α + β M −1

[3.15]

Archher and Evanns considereed αˆ as an estimate e of the t non-diveersifiable −1 risk andd β M as an a estimate of o the diversiifiable risk for fo a portfolioo of size 5, which M. Theeir estimatedd coefficientts are αˆ = 0.1191 and βˆ = 0.08625 correspoonds to the risk-reductiion curves of o Figures 3.7 3 (absolutee terms) and 4.8 (relative terrms). Both curves c suppo ort their “dooubts concerrning the econom mic justificattion of incrreasing porttfolio sizes beyond 100 or so securitiees”.

Figure 3.7. Evolution E of the average porrtfolio volatilityy as a function n off its size, acco ording to Arche er and Evans [ARC 68]. The e dots represe ent the empirical observation ns, the curve the th theoreticall relationship e estimated from m these empiri rical data pointts

Naive Porttfolio Diversifica ation

109

Figure 3.8 8. Percentage e of the diversiifiable risk in th he portfolio as a function n of its size, according a to Arrcher and Eva ans [ARC 68]

Simiilarly, Latane and Young [LAT 69] empiricallyy linked the standard deviatioon of a portfo folio to the number n of itss underlying assets (M) uusing the followinng regressionn:

σˆ p (M) = α + β M −0.5

[3.16]

Usinng four data sets, they compared c thee results of this model regressiion used by Archer andd Evans and d observed that t fitting [3.15] gave g a higheer R-square than t fitting equation e [3.116] in three They allso stated thaat both models generally y gave good results but determine the non-ddiversifiable risk r in the fo ourth data sett.

with the equation of them. failed to

Unfoortunately, Bird B and Tipppett [BIR 86 6] have show wn that bothh Archer and Evaan’s [ARC 668] and Latane and You ung [LAT 69] 6 regressioons were generallly incorrectlyy specified. Starting bacck from equaation [3.7] annd using

σ 2 andd σ i , j as defined in equattions [3.8] annd [3.9], we can c write: 1/ 2

σ P (M ) = σ i, j

⎡ σ 2 − σ i, j ⎤ ⎢1 + M −1 ⎥ σ i, j ⎢⎣ ⎥⎦

[3.17]

110

Portfolio Diversification

To simplify the notation, let us denote the terms in brackets 1 . By / applying a Taylor series expansion of 1 about the origin, we get: /

1

. …



1

[3.18]

!

or equivalently: σ ,

,

1



,

,

. …

,

!

[3.19]

From Taylor’s remainder theorem, we have: σ

,

,

[3.20]

,

where: ,

0

,

,

which implies [3.20], we obtain: σ M

[3.21]

,

1. Taking expectations on both sides of equation

,

, ,

[3.22]

This is the exact parametric relationship between a portfolio size and its standard deviation. Comparing equation [3.22] with equation [3.15] of Archer and Evans clearly shows that their regression is generally misspecified, as they assume 1. Consequently, the beta coefficient of their regression is under-estimated, which means that their predicted line over-estimates the rate at which diversifiable risk is eliminated as the portfolio size increases.

Naive Portfolio Diversification

111

An important related question is the following: given a population of assets, what is the expected variance of an equally weighted portfolio of randomly chosen assets? An analytical answer to this question was derived by Elton and Gruber [ELT 77]. The expected variance of a portfolio of assets is: M

[3.23]

,

where is the average variance of all assets in the population and , is the average covariance between all assets in the population. If we denote by the variance of an equally weighted portfolio of all assets in the population, then the expected covariance is: [3.24]

,

and the expected variance of a portfolio of M randomly chosen assets becomes6: 1

[3.25]

Since this is just an expected value, we may want to build a confidence interval around it. This is done using the variance of the variance, which can also be calculated analytically and is equal to: 1 2

1 1

4

1

,

1

1

,

2

4

1

2

1

1

1

,

[3.26]

,

, ,

6 As a check, we can verify that the expected variance of a portfolio of size variance of the equally weighted full population portfolio.

is simply the

112

Portfolio Diversification

where: is the variance associated with the distribution of



variances of individual assets. –

,

is the covariance between security and security .



,

is the average squared covariance for all assets in the population.

– , is the square of the average covariance for all assets in the population. –

is the variance of security .

As a check, we can verify that when approaches the variance of the variance converges to zero. Stated differently, when all assets in the population enter the equally weighted portfolio, its variance converges with the (equally weighted) population portfolio variance with certainty. Note that Elton and Gruber [ELT 77] also derive simpler formulas for equations [3.22] and [3.26] under the additional assumption that asset returns follow a single factor model.

3.5. Economic limits and statistical tests In a universe of assets, even though we may be tempted to increase portfolio size as much as possible to eliminate “almost all” unsystematic risk through diversification, in practice there are two limits to this approach. The first limit is economic. Adding more positions in a portfolio generally comes with associated costs (for instance transaction, holding and monitoring costs). We should therefore weight the benefits of portfolio diversification against these costs. The optimal portfolio size is the point where, at the margin, the cost of adding one new position is equal to the benefit of the reduction in portfolio risk – see [STA 87] and [STA 04]. The second limit is statistical. As the portfolio size increases, portfolio volatility will generally decrease, but does it decrease in a statistically significant manner? Various statistical tests are available to answer this question. In the following sections, we will discuss some of these tests in the context of

Naive Portfolio Diversification

113

a base portfolio with assets and an expanded portfolio with assets, where the former portfolio universe of assets is a subset of the latter portfolio universe of assets. In the particular case of Archer and Evans, we would have 1 and would stop adding assets as soon as the addition results in non-significant results. 3.5.1. Tests based on variance or standard deviation Most empirical studies conclude that naively increasing the number of assets in a portfolio reduces its variance but at a marginally decreasing speed. Surprisingly, very few of them have tested the statistical significance of these diversification benefits. Formally, say portfolio has a size and , while portfolio has a size and a return a return variance variance . We want to test for equality of variances against the alternative that the variances are unequal: :

[3.27]

:

[3.28]

with a significance level of

(or, equivalently, a confidence level of 1

).

3.5.1.1. The F-test When portfolio returns are normally distributed, we can use the F-test to compare the two portfolio variances. The null hypothesis is that the two variances are equal. The test statistic is: [3.29] 1 and 1 degrees of freedom if It has an -distribution with the null hypothesis of equality of variances is true. Otherwise, it has a noncentral -distribution. The null hypothesis is rejected if is either too large or too small. EXAMPLE 3.1.– Say portfolio P1 has 20 assets and a volatility of 20%, while portfolio P2 has 21 assets and a volatility of 16%. For a significance level of 5%, we get the critical values

114

Portfolio Diversification

0.3986 and 2.4821. The test statistic is . %, , 1.5625, which is between the two critical values, so we cannot reject the equality of variance of P1 and P2. Consequently, adding an extra asset did not statistically significantly decrease the portfolio volatility. . %,

,

This test can be two-tailed as above, in which case we test the null hypothesis against the alternative that the variances are not equal. It can also be one-tailed, in which case we test only in one direction. That is, the first portfolio variance is either greater than or less than the second portfolio variance (but not both). The choice between two-tailed and one-tailed is determined by the nature of the portfolios being examined. Note that, in any case, the F-test is extremely sensitive to the normality assumption – so sensitive that it is not appropriate for most practical applications. 3.5.1.2. The Bartlett test When the base and extended portfolio returns are normally distributed, the Bartlett test can be used to test if their variances are statistically different. The null hypothesis is that their variances are equal. The Bartlett test statistic is defined as: [3.30] where

is the pooled variance defined as: [3.31]

The test statistic follows approximately a distribution. The variances of the two portfolios are judged to be unequal if is greater than the critical value of the chi-square distribution with one degree of freedom and a confidence level of 1-α. EXAMPLE 3.2.– Say portfolio P1 has 20 assets and a volatility of 20% while portfolio P2 has 21 assets and a volatility 16%. For a significance level of 5%, the critical value is of 3.84. The test statistic is 0.37 . , . , , so we cannot reject the equality of the variance of P1 and P2. Consequently, adding an extra asset did not statistically significantly decrease the portfolio volatility.

Naive Portfolio Diversification

115

This example shows the difficulty of affirming that changes in variance are statistically significant in a small sample. In fact, we would need the value that is high enough to volatility of P2 to fall below 12.1% to get a affirm that the decrease in variance is statistically significant. In a larger sample, however, we may come to a different conclusion. Note that the Bartlett test is also very sensitive to departures from normality – which is not surprising as it is derived from the likelihood ratio test under the normal distribution. 3.5.1.3. The Levene test An alternative to the Bartlett test is the Levene test [LEV 60], which requires access to the returns of both portfolios and not just their variance. Say we have a first series of returns from portfolio P1, in which each return is denoted , , with 1, . . , . Say we have a second series of returns from portfolio P2, in which each return is denoted , , with 1, . . , . The null hypothesis is that the two portfolio variances are equal or, equivalently, that the average distance to the sample mean is the same for each sample group. The alternative hypothesis is that the two portfolio variances are different. The Levene test statistic is defined as: M

2





,







[3.32]

,

where is the total number of observations, is the arithmetic , ̅ and ̅ are the group mean return of the ith portfolio, , , ̅ means of the , and ∗ is the overall mean of the , . The Levene test rejects the hypothesis that the variances are equal if , where , , is the upper critical value of the F-distribution with 1 and 2 , , degrees of freedom at a significance level of . The Levene test is less sensitive to departures from normality than the Bartlett test, but it still rejects the null hypothesis of equal volatilities too frequently in the presence of excessively leptokurtic distributions. A natural way to increase the robustness of Levene’s original statistics while retaining good power is to replace the group means in the definition of , by a more robust estimator of location, for example the median or the trimmed mean – see Brown and Forsythe [BRO 74]. The test based on the median is sometimes referred to as the Bonett test.

116

Portfolio Diversification

3.5.2. Tests of portfolio improvement As an alternative to tests comparing portfolio variances, we may use a test of risk-adjusted performance improvement, such as the Jobson and Korkie [JOB 81b] and the Gibbons, Ross & Shanken [GIB 89] tests. Both essentially test whether the Sharpe ratio of two portfolios are statistically different, with a null hypothesis that they are equal. The Jobson and Korkie test requires a total of returns for each portfolio, with the assumption that their bivariate return distribution does not change over time. The test is defined as: [3.33]



where is the asymptotic variance of the expression in the numerator, defined as:7 θ

2

2

[3.34]

and are the returns of the two portfolios, is the risk-free rate, where are estimates of the standard deviation and covariances and , , and of the excess returns of the two portfolios. The statistic is approximately normally distributed with a zero mean and a unit standard deviation for large samples. A large enough implies a rejection of the null hypothesis, which means that one portfolio is statistically superior to the other. Unfortunately, the Jobson and Korkie test is not robust against tails heavier than the normal distribution. Ledoit and Wolfe [LED 03] suggest some (complex) adjustments based on robust inference methods to improve it. Alternatively, Gibbons et al. [GIB 89] have suggested a test that ) and the compares the maximum Sharpe ratios obtained for the base ( ). Since increasing the number of assets will never extended portfolios (

7 This equation was suggested by Memmel [MEM 03] and corrects a typographical error in the original formula of Jobson and Korkie [JOB 81b] which leads to an underestimation of the asymptotic variance and therefore a too-frequent rejection of the null hypothesis.

Naive Portfolio Diversification

hurt the Sharpe ratio of the portfolio, we must have is not statistically different from hypothesis is that statistic is:

1

117

. The null . The test

[3.35]

and is Wishart distributed. Under the null hypothesis, to reflect the similarity in numerator and denominator, we would expect to be close to zero. In practice, a simple transformation yields: ~

,

[3.36]

, This statistic follows an F-distribution with 1 degrees of freedom if short-selling is allowed. Examples and extensions of the Gibbons, Ross & Shanken test can be found in Rada and Sentana [RAD 97], including a discussion of the effects of the number of assets and portfolio composition on the test power. 3.6. Naive versus Markowitz diversification Naive diversification is very naive but for many investors it offers a number of advantages: no estimation error, no optimization, no covariance matrix inversion, no need for short positions, extremely low portfolio turnover over multiple periods, easy application to a large number of assets, and well diversified portfolios by construction. Given this long list of advantages, the usefulness of more complex solutions for portfolio construction and diversification versus the robustness of the 1/ approach has been vividly discussed in the finance literature. For instance, in a thought-provoking paper, DeMiguel et al. [DEM 09b] have compared 14 competing portfolio construction strategies, including a variety of sample-based mean–variance models, a series of sophisticated extensions of the Markowitz rule, and the 1/ strategy. The result was that no strategy consistently outperformed naive diversification, as estimation

118

Portfolio Diversification

risk eroded nearly all the gains from more sophisticated optimization techniques. In a sense, the impact of the “misallocation errors” of the 1/ strategy was smaller than those of the more sophisticated optimizing models in the presence of an estimation risk. Of course, these results have been severely criticized by advocates of the more complex models, who argued that DeMiguel et al.’s [DEM 09b] research methodology was biased for the following reasons. – Its naive diversification approach is based on a random selection of assets that are equally allocated in a portfolio. Randomly selecting assets is an academic concept, not something that practitioners can do. – It only considered portfolios of stocks and not individual stocks when testing competing models. This implicitly gave naive diversification an advantage because diversified portfolios have lower idiosyncratic volatility than individual stocks, so the loss from using naive diversification as opposed to optimal strategies is smaller. However, Dickson [DIC 16] ran a similar exercise with 15 portfolio construction techniques applied over eight empirical data sets comprising individual stocks and also concluded that no strategy consistently outperformed naive diversification in terms of mean excess return, Sharpe ratio and turnover. – It focused on the tangency portfolio, which targets a conditional expected return higher than the conditional expected return of the 1/ strategy. This placed the mean–variance models at an inherent disadvantage relative to naive diversification, resulting in a higher estimation risk, which in turn led to excessive portfolio turnover and hence poor out-of-sample performance. Kirby and Ostdiek [KIR 12] argued that had the sophisticated optimizing models tested by DeMiguel et al. [DEM 09a], [DEM 09b] targeted the conditional expected return of the 1/ portfolio, the resulting portfolios would have outperformed for most of the monthly data sets. However, they also admit that this result is no longer valid in the presence of transactions costs. Obviously, the best choice between using optimizers and relying on naive diversification is not yet clearly identified and the academic debate is still actively ongoing. For instance, Tu and Zhou [TU 08] introduced a new 4fund theory-based portfolio strategy, which assumes that asset returns are independently and identically distributed (i.i.d.) over time. Their portfolio

Naive Portfolio Diversification

119

optimally combines the Kan and Zhou [KAN 07] 3-fund portfolio with the 1/ portfolio as a shrinkage target. The rate of shrinkage towards the 1/ portfolio is determined by the level of estimation risk. Tu and Zhou claimed that their strategy would have historically outperformed both the 1/ rule and other existing strategies across various scenarios, even with relatively small estimation windows. However, Pflug et al. [PFL 12] demonstrated both theoretically and empirically that the 1/ strategy was rational for some risk-averse investors in the presence of model uncertainty. Let us also mention the idea explored by Kan and Zhou [KAN 07] and Tu and Zhou [TU 11] of combining several portfolio construction approaches to create the optimal portfolio, for instance by mixing the sample mean–variance portfolio and the equally weighted portfolio. In a sense, this allows the specific risks of each approach to be diversified away by exploiting the imperfect correlation between the different approaches’ parameter estimation errors and the differences in their underlying optimality assumptions. It is also likely to generate more diversified portfolios than those exclusively constructed by optimizers. Last but not least, let us recall that according to Zweig [ZWE 98], Markowitz also used naive diversification and justified it as follows: “my intention was to minimize my future regret. So I split my contributions fiftyfifty between bonds and equities”. 3.7. Conclusions on naive diversification One of the key advantages of naive diversification is that it does not require any of the information about the underlying assets, such as their expected returns or covariance, to build up a diversified portfolio. This makes it an extremely robust investment strategy. Unfortunately, this advantage can also turn into a disadvantage. Since it ignores the specific characteristics of assets, naive diversification is highly sensitive to the structure of the universe of assets under consideration. For instance, let us compare the MSCI EAFE Index with the MSCI EAFE Equal Weighted Index. Both indices are designed to represent the performance of large and mid-cap securities across 21 developed markets, including countries in Europe, Australasia and the Far East and excluding the U.S. and Canada. Both indices include 900+ stocks, but the distribution of the weights is very different, as illustrated in Table 3.3.

120

Po ortfolio Diversificcation

Nestle Roche Holding H Genusss Novartis Toyota Motor Corp HSBC Holdings H (GB) BP Total British American A Tobaacco Royal Dutch D Shell A Royal Dutch D Shell B

Market Weight 1.922% 1.344% 1.322% 1.28% 1.244% 0.91% 0.89% 0.89% 0.88% 0.80% 11.447%

Caixaabank Recru uit Holdings Coo Sharp p Corp Glenccore Nexo on Co South h 32 (AU) Melcco Crown Entmtt ADR Idem mitsu Kosan Co Noble Group STM MMicroelectronics

Equaal weight 0..15% 0..15% 0..14% 0..14% 0..14% 0..14% 0..14% 0..14% 0..14% 0..14% 1..43%

Table 3.3. Top 10 0 constituents of the MSCI EAFE E Index an nd the MSCI E EAFE al Weighted In ndex on 31 Occtober 2016. The T MSCI EAF FE Equal Weig ghted Equa I Index is rebala anced quarterl rly (February, May, M August and a November)

The weightings of the MSCI EAFE Ind dex componeents tend to be longtailed, with w a few sttocks that haave market caps c significaantly higher than the mean off the index annd many stoccks that havee market capss below the m mean. By compariison, the MS SCI EAFE Equal E Weightted Index haas a significaant small cap biass, as it under--weights largger stocks and d over-weighhts smaller onnes.

Figure 3.9. Country weightings w of the MSCI EAF FE Index us the MSCI EAFE EA Equal Weighted W Indexx (EWI) versu

Naive Porttfolio Diversifica ation

121

Figure 3.10. Sector weightings of the MSCI EAF FE (left) (r and the MSCI EAFE Equal Weiighted Index (right)

Counntry and secctor exposurres will also o differ signnificantly in the two indices, as illustrated by Figuress 3.9 and 3.10. In the MS SCI EAFE Inndex, the s is deetermined by the totall market weight of each coountry or sector m capitalization capitalizzation of thee stocks in thhat group relative to the market of the entire inddex. In thee MSCI EAFE E Equall Weightedd Index, country/sector weigghts are only determined by the numbber of stockss in each ks). The diffference in w weighting group (assuming eqqual weights for all stock betweenn the two indices for a given co ountry/sectorr will thereefore be explaineed by both the t number of o stocks in each group and the totaal market capitalizzation of thhe largest stocks in eaach group. For F instancee, Japan represennts more thaan a third off the equally y weighted inndex but lesss than a quarter of the markket capitalizaation index; Australia A reppresents 7.5% % of the most irrelevvant in the market equally weighted index, whille it is alm capitalizzation indexx; inversely, Switzerland d represents 8.7% of thee market capitalizzation indexx but is almoost irrelevantt in the equaally weighteed index, etc. Theese differences will ultim mately affect the performaance of thesee indices but alsoo potentially their t level off diversificattion.

122

Portfolio Diversification

In general, aiming for naive diversification at the asset level, the resulting portfolio will directly represent the number of assets in each group, regardless of their size or economic importance. One solution to this problem consists in pre-assigning assets into (supposedly homogenous) clusters and then equally weighting clusters rather than assets. This approach is commonly implemented using industry or country groups for equities. On the academic side, it has been tested by Statman [STA 87] and Domian et al. [DOM 03, DOM 07], for instance. On the commercial side, there are various adaptations in the marketplace – see, for instance, the one developed by QS Investors’ Diversification Based Investing (DBI), which has been used in the FTSE Diversification-Based Investing Index Series. We will again discuss this technique in section 5.6.

4 Risk-budgeting and Risk-based Portfolios

As discussed in Chapter 2, expected returns are much more difficult to estimate than the return covariance matrix and small estimation errors push optimizers to generate excessively concentrated portfolios. It is therefore not surprising that some portfolio construction approaches have chosen to ignore expected returns – or, equivalently, assume that all expected returns are equal – and focus exclusively on risk. These strategies are usually referred to as “risk-based” or “risk-budgeting” strategies. In this chapter, we will review four of them: the naive risk parity, the equal risk contribution, the most diversified portfolio and the minimum variance portfolio. All the portfolio construction approaches that we have discussed in Chapters 1–3 can be classified as capital-budgeting strategies. That is, they follow the traditional practice of allocating capital to various assets and then measuring the resulting overall portfolio risk reduction that is attributable to diversification. Many investors implicitly assume that their capital allocation is equal to their risk allocation. This assumption is unfortunately wrong. For illustration purposes, let us consider a 60/40 portfolio with 60% of its capital in equities and 40% in bonds. This portfolio seems relatively well-balanced from a capital perspective. As Figure 4.1 shows, things are very different from a risk point of view. The 60/40 portfolio is overexposed to equity risk, which contributes 94% of the overall portfolio volatility. By contrast, bond risk represents only 6% of the overall portfolio volatility, despite a 40% capital allocation to bonds. Clearly, capital allocation is not equivalent to risk allocation and capital diversification does not imply risk diversification.

124

Po ortfolio Diversificcation

Figure 4.1. Capital C contrib bution (left) verrsus risk contrribution (right) of a U.S. 60 0/40 portfolio, measured durring the 1900– –2016 period

From m an asset allocation a peerspective, risk-budgetin r ng strategiess are the mirror opposite o of capital-budge c eting strategies. That is, they start byy setting risk lim mits for the aggregate a poortfolio and for f each of its i componennt assets and alloocating risk in line withh these risk limits. Theyy then determ mine the capital allocation a thhat corresponnds to these risk allocations. As we will see, thinkingg in terms of risk is much more pow werful than thinking in terms of capital and can generate g porrtfolios thatt are muchh better divversified. Howeveer, it typically requires three inputts: (1) risk measuremennt, i.e. a definitioon of what risk r is and how h to meassure it; (2) riisk attributioon, i.e. a methodoology to callculate the contribution c of each possition to thee overall portfolioo risk; and (33) a risk allocation goal, also known as a risk buddget. 4.1. Ris sk measure es and their propertie es Quanntifying the risk of a porrtfolio is usu ually achieveed by modeling some of its beehaviors as a random varriable and app plying a certtain functionn – called risk meaasure – to itt. For exampple, the rand dom variablee could be thhe future value off the portfoliio and the funnction could d be the standdard deviatioon of that value. Broadly speeaking, a risk r measuree attempts to assign a single

Risk-budgeting and Risk-based Portfolios

125

numerical value to the potential random financial loss. Mathematically, it can be described as a mapping from a set of risks to the real line. If we denote the random variable under consideration by X and the risk measure by , we have : → . Given a portfolio P, we will denote its risk by . While the standard deviation of returns is the predominant measure of risk that is used in finance, there is a long series of alternative risk measures that one could choose to use. However, not all of them are equally wellbehaved. Artzner et al. [ART 99] were the first to propose an axiomatic setting for risk measures in discrete probability spaces, later extended by Delbaen [DEL 00] to the case of arbitrary probability spaces. According to this setting, a risk measure is called coherent if it satisfies the following four axioms: – Monotonicity: if then . This essentially means that we associate a higher risk with a higher loss potential. – Cash invariance: if m ∈ , then . This property is also known as translation invariance. It can be interpreted as follows: if a risk-free amount is added to , then the risk of is reduced by the same amount . – Positive homogeneity: if λ > 0, then ρ λX λρ X . That is, if a position is increased in size then the risk of that position increases with the same factor. Stated differently, risk arises from the asset itself and is not a function of the quantity purchased1. – Sub-additivity: . This ensures that a coherent risk measure takes portfolio diversification into account, as investing in both X and Y results in a lower overall risk than the sum of the risk of investing in X and the risk of investing separately in Y. Although highly desirable for risk management, these axioms are not verified by all risk measures. In this chapter, we will continue to focus primarily on volatility. In addition to being coherent, volatility is a measure that often results in analytical formulas for the quantities that we will want to analyze. The case of other risk measures will be discussed in Section 6.2 of Chapter 6.

1 Implicitly, we are assuming there is no liquidity issue for the asset.

126

Portfolio Diversification

4.2. The toolkit for portfolio risk attribution Having defined risk measurement, we can now turn to risk attribution. When analyzing the risk of a portfolio, the identification of which assets contribute to it the most is not obvious. First, due to diversification effects, the risks of individual assets are generally not additive with respect to the overall portfolio risk. Second, even if the standalone risk of an individual asset is large, it could contribute little to the overall portfolio risk if its correlations with other assets are negative. Therefore, we need a quantitative methodology to measure (1) the marginal contribution of an individual asset to the overall portfolio risk, (2) the proportion of the overall portfolio risk that can be attributed to each of its individual components and (3) the incremental effect that adding a new asset to an existing portfolio may have on the overall portfolio risk. In the following sections, for the sake of simplicity, we will assume positive risk measures are homogeneous and continuously differentiable. 4.2.1. Marginal contributions to portfolio risk The marginal contribution to risk (MCR) of an asset is defined as the marginal impact on the overall portfolio risk from an infinitesimal change in the position size of that asset, holding all other positions fixed. Formally, it is the derivative of with respect to . For asset in portfolio , we have: [4.1] is positive, then increasing the position size of asset If the sign of by an infinitesimal amount will increase the overall portfolio risk; if the sign is negative, then increasing the position size of asset by an infinitesimal amount will reduce the overall portfolio risk – in a sense, asset behaves as a hedging instrument in the portfolio. Note that for a given portfolio of assets, it is possible to calculate a vector that contains all the marginal risks of the portfolio assets. We will denote this vector by . When volatility is used as a risk measure, marginal contributions to risk can be computed analytically. We have: √

[4.2]

Risk-budgeting and Risk-based Portfolios

127

or equivalently: ∑

,

,

[4.3]

,

where , is the covariance of asset with portfolio asset with respect to portfolio .

and

,

is the beta of

4.2.2. Total contributions to portfolio risk (TCR) The total contribution to risk (TCR) of an asset is defined as the absolute amount of portfolio risk contributed by this asset. It is calculated as the asset’s marginal contribution to risk multiplied by its weight in the portfolio: [4.4] For a given portfolio P of N assets, it is possible to calculate a vector of total contributions to portfolio risk: [4.5] For positive homogeneous risk measures, we can apply Euler’s theorem and obtain the following equation: ∑

[4.6]

Thus, using total contributions to risk makes risk attribution easier to understand, as it splits risk in portions that are additive and sum up to the total portfolio risk. When volatility is used as a risk measure, total contributions to risk can be computed analytically. We have: [4.7]



or equivalently: ∑

,

[4.8]

128

Portfolio Diversification

If needed, one can verify that the sum of the total contributions to portfolio volatility is equal to the volatility of the portfolio: ∑



√ ′



[4.9]

4.2.3. Percentage contributions to portfolio risk (PCR) The percentage contribution to risk (PCR) of an asset is defined as the relative amount of portfolio risk contributed by this asset. It is calculated as the total contribution to portfolio risk of that asset divided by the overall risk of the portfolio: [4.10] By construction, we have: ∑

100%

[4.11]

However, some of the PCRs can be negative, which means we cannot interpret them as a probability distribution. For instance, when risk is defined as the volatility, we have: ∑

,

,

,

[4.12]

The percentage contribution to risk of an asset can easily be interpreted as the ratio of the covariance between the asset’s component return and the overall portfolio return, to the total variance of the portfolio. Therefore, it is the beta of the asset’s return against the portfolio’s return. 4.2.4. Illustrations Let us now illustrate the calculations of risk budgets such as the ones we have just described. We will start by the simple example of a two-asset portfolio and go through the mathematical developments, then later use the case of a larger, more realistic portfolio.

Risk-budgeting and Risk-based Portfolios

129

4.2.4.1. A two-asset portfolio In a two-asset portfolio, equation [2.15] becomes: 2

[4.13]

When the two assets are uncorrelated, this simplifies into: [4.14] is and the decomposition of the portfolio variance is straightforward: the portfolio variance contribution of asset 1, is the portfolio variance / is the portfolio variance percentage contribution of asset 2, contribution of asset 1 and / is the portfolio variance percentage contribution of asset 2. One may be tempted to use the standard deviation of the portfolio instead of variance. Unfortunately: [4.15] However, it is possible to obtain an additive decomposition by using the fact that: [4.16] / is the portfolio This decomposition is additive and shows that standard deviation contribution of asset 1 and / is the portfolio standard deviation contribution of asset 2. Note that percentage contributions to the portfolio standard deviation are the same as percentage contributions to the portfolio variance. In the more general case, where the two assets are correlated, the decomposition of portfolio risk is less straightforward and requires a split of the covariance term. For instance, we could split it as follows: [4.17] We again have an additive decomposition and can define as the portfolio variance contribution of asset 1 and as the portfolio variance contribution of asset 2;

130

Portfolio Diversification

/

is the portfolio variance percentage contribution of asset 1 and / is the portfolio variance percentage contribution of asset 2. We can also define a similar additive decomposition for the portfolio standard deviation: [4.18] and calculate the standard deviation contribution of each asset. This approach can easily be generalized to portfolios with more than two assets. Note that when using equations [4.17] or [4.18], attributing portfolio variance risk or portfolio volatility risk will result in an identical percentage allocation to risk. EXAMPLE 4.1.– Consider a universe made of five assets with the following volatility and correlation parameters. Asset

Volatility

Correlation

A

10%

1

0.21

−0.49

0.16

−0.31

B

20%

0.21

1

−0.19

−0.21

0.42

C

30%

−0.49

−0.19

1

0.48

0.16

D

40%

0.16

−0.21

0.48

1

0.02

E

50%

−0.31

0.42

0.16

0.02

1

An equally weighted portfolio of these five assets would have a variance equal to 0.0284 or, equivalently, a volatility of 16.85%. The variance and the volatility total and percentage attributions based on the above described methodology are given in the following table. Variance attribution Volatility attribution Asset

TRC

PRC

TRC

PRC

A

20%

–0.0004

–1.4%

–0.23%

–1.4%

B

20%

0.0023

8.1%

1.38%

8.1%

C

20%

0.0058

20.4%

3.45%

20.4%

D

20%

0.0085

29.9%

5.01%

29.9%

E

20%

0.0122

43.0%

7.23%

43.0%

Total

100%

0.0284

100.0%

16.85%

100.0%

Risk-budgeting and Risk-based Portfolios

131

It should be clear from this example that an equal capital allocation does not always result in a portfolio that is well-diversified from a risk perspective. 4.3. Risk allocation and risk parity approaches Having defined risk measurement and risk attribution, we can now turn to the third element of risk-based strategies, which is risk allocation. If the goal is to diversify risk, then a simple but efficient approach is to allocate an equal amount of risk to each asset (or group of assets) in the portfolio. This is usually referred to as risk parity. 4.3.1. Naive risk parity Naive risk parity (NRP), also known as “pseudo risk parity”, is the simplest and most rudimental form of risk parity strategy. It starts with the observation that to balance risk contributions in a portfolio, one must intuitively underweight riskier assets and overweight less risky ones. As a simplification, NRP therefore suggests setting portfolio weights in proportion to the inverse of the individual asset’s risk. When risk is measured by volatility, this implies that in a fully invested portfolio, each asset receives an allocation with a weight of: ,

/ ∑

/

[4.19]

or equivalently: ∗

[4.20]

This portfolio weighting scheme has the advantages of being simple to understand, easy to calculate and independent of the correlation between assets (it only requires the assets’ volatility and ignores correlations). In a sense, it can be viewed as a volatility-adjusted form of naive diversification. Instead of allocating an equal amount of capital to each asset, NRP allocates an equal amount of volatility, which should ensure some sort of portfolio diversification. However, the sum of these equal amounts of allocated

132

Portfolio Diversification

volatility is generally not equal to the actual overall portfolio volatility2. It is larger because it does not take the correlation effects into account. As a result, if an investor is given a total risk budget for the portfolio and allocates it equally to each asset according to equation [4.19], the final portfolio volatility will be below its original risk budget. The investor will then have to leverage up the resulting portfolio so as to use the risk budget entirely. EXAMPLE 4.2.– Consider again that our universe is made of five assets described by the following volatility and correlation parameters. Asset

Volatility

Correlation

A

10%

1

0.21

−0.49

0.16

−0.31

B

20%

0.21

1

−0.19

−0.21

0.42

C

30%

−0.49

−0.19

1

0.48

0.16

D

40%

0.16

−0.21

0.48

1

0.02

E

50%

−0.31

0.42

0.16

0.02

1

By using equation [4.20], we can obtain the portfolio weights of the naive risk parity portfolio and can then calculate its volatility (10.27% p.a.) and attribute it to its components. The result is as follows: Volatility attribution Asset

Weights

TRC

PRC

A

43.8%

1.06%

10.4%

B

21.9%

2.30%

22.4%

C

14.6%

1.79%

17.5%

D

10.9%

2.71%

26.4%

E

8.8%

2.41%

23.5%

Total

100%

10.27%

100.0%

As one would expect, the higher-volatility assets have been significantly scaled down, which results in a lower portfolio volatility than in the equally weighted case. The portfolio volatility is not evenly spread on all the assets because the impact of correlations has been ignored during the allocation.

2 Unless all assets are perfectly correlated, in which case risk diversification is useless.

Risk-budgeting and Risk-based Portfolios

133

The major drawback of NRP is its extreme sensitivity to the characteristics of the underlying universe of assets and the potential biases this may create. For instance, if all assets have the same volatility, naive risk parity will create an equally weighted portfolio of all available assets, which generally overweighs smaller sized assets compared to capitalizationweighted benchmarks. Similarly, if some assets are substantially more volatile than others, the portfolio will likely become dominated by capital allocated to lower-volatility assets. Note that an interesting algorithm to choose the weights for assets while balancing between the naive risk parity and equal dollar allocation is given by the following Tikhonov regularization: ∗

arg min ∑

1



1

[4.21]

where 0 represents a measure of the uncertainty we have around estimates of the individual asset volatilities. The resulting weights are as follows: [4.22] Clearly, when → ∞, the portfolio weights converge towards equal weights, while when → 0, the portfolio weights converge towards the naive risk parity. 4.3.2. Maimonides risk parity The Maimonides risk parity is based on an allocation algorithm devised by the 12th Century philosopher Moses ben Maimon [MAI], which was re-introduced into finance by his descendants Maymin and Maymin [MAY 13]. In a bankruptcy, typical solutions include dividing the estate equally among all debt holders or dividing it proportionally among all debts. Maimonides suggested an alternative hybrid approach, which essentially gives some higher priority to smaller creditors and reduces the claims of larger ones. It can be described as follows: “if, when the property is divided into equal portions according to the number of creditors, the person owed the least will receive the amount owed him or less, the property is divided into that number of equal portions. If dividing the property into equal portions would give the person owed the least more than he is owed, this is what

134

Portfolio Diversification

should be done: we divide the sum equally among the creditors so that the person owed the least will receive the money that he is owed. He then withdraws. The remaining creditors then divide the balance of the debtor’s resources in the following manner.” The problem of allocating estate holdings among debt holders has strong similarities with the problem of allocating capital among assets. Maymin and Maymin [MAY 13] therefore suggested adapting the Maimonides rule to solve it. In their analysis, they used historical volatility as a risk measure and the reciprocal of the volatility as a proxy for “safety”. They then allocate “safety” as if it were amounts of money in an estate context. That is, the reciprocal of the portfolio risk is the amount to be allocated and the reciprocals of the individual asset volatilities are the debt amounts. Mathematically, this can be summarized as follows: say the individual asset ⋯ , and let ∗ be the volatilities are sorted in descending order, , then we create an equally portfolio risk to be allocated. If ∗ weighted portfolio. If ∗ , let be the largest number such that ∗



[4.23]

The Maimonides “weights” are given by the following equation: 1, . . , ∗



[4.24] 1, . . ,

The Maimonides approach is remarkably simple and it can easily be extended to any type of risk measure or quantifiable criteria (e.g. market capitalization, etc.). However, keep in mind that the Maimonides weights are non-normalized – their sum will be equal to the reciprocal of the portfolio risk, which differs from 100%. In addition, the effective volatility of the portfolio is usually lower than ∗ due to diversification effects. Therefore, some regularization is required to obtain portfolio weights.

Risk-budgeting and Risk-based Portfolios

135

4.3.3. Equal risk contributions Equal Risk Contribution (ERC) strategies go one step further than NRP by considering the impact of correlations between assets. Their goal is to create a portfolio where each asset has the same effective risk contribution. Mathematically, this means that the total contribution to risk of all assets is equal to: for all ,

[4.25]

Several approaches have been suggested to obtain the weights of the ERC portfolio. For instance, Maillard et al. [MAI 10] propose to solve the following minimization problem: ∗

min







[4.26]

which is subject to the usual constraints on weights. Equivalently, one could argue that the percentage risk contribution of each of the N assets should be equal to 1/ . To obtain the ERC portfolio weights, one may therefore solve the following optimization problem: ∗

min





[4.27]

which is subject to the usual constraints on weights. Both equations [4.26] and [4.27] introduce constrained nonlinear programming that must be solved numerically, for instanc using a Sequential Quadratic Programming Algorithm (SQP) and interior point methods (IPMs). Analytical solutions are only available in trivial cases such as the bivariate case and the case of constant correlations. EXAMPLE 4.3.– The bivariate case. In the case of two assets, the solution is unique and analytical. The weights of the ERC portfolio are / / for the second asset. Interestingly, the for the first asset and solution does not depend on the correlation between the two assets but only on their respective volatilities.

136

Portfolio Diversification

EXAMPLE 4.4.– The -asset trivial cases. If all correlations are equal, the solution is unique and analytical. The weight of asset in the ERC portfolio is given by the ratio of the inverse of its volatility with the harmonic average of all the other volatilities: ,

[4.28]



If all correlations are different, but all asset volatilities are equal, the weight of asset in the ERC portfolio is given by the ratio between the inverse of the weighted average of correlations of component with other components bearing the same average uniformly Mathematically: ∑ ∑

,



[4.29] ,

Although this looks like a closed-form solution, it is endogenous because appears on both sides of the equation – as well as in the fully allocated constraint. EXAMPLE 4.5.– The N-asset general case. Despite the absence of an analytical solution, it is still possible to get a simple intuition of the nature of the solution to such a case. From equations [4.3] and [4.4], we easily observe that in an ERC portfolio: [4.30] The weight of asset is therefore given by the following equation: [4.31] High beta values result in low weights and low beta values result in high weights. Therefore, an asset with high individual volatility and/or with high correlations with the other asset classes becomes penalized in the portfolio allocations. While this result is intuitive, it unfortunately does not really help is a function of and is a function us solve analytical weights, since of . We therefore again have an endogeneity problem, which explains why a closed-form solution is generally impossible and why we need to use numerical procedures.

Risk-budgeting and Risk-based Portfolios

137

EXAMPLE 4.6.– Consider again our universe made of the five assets described by the following volatility and correlation parameters. Asset

Volatility

Correlation

A

10%

1

0.21

−0.49

0.16

−0.31

B

20%

0.21

1

−0.19

−0.21

0.42

C

30%

−0.49

−0.19

1

0.48

0.16

D

40%

0.16

−0.21

0.48

1

0.02

E

50%

−0.31

0.42

0.16

0.02

1

The equal risk contribution portfolio is allocated as follows: Volatility attribution Asset

TRC

PRC

A

52.5%

1.77%

20.0%

B

17.3%

1.77%

20.0%

C

15.5%

1.77%

20.0%

D

7.4%

1.77%

20.0%

E

7.3%

1.77%

20.0%

Total

100%

8.89%

100.0%

As one would expect, the portfolio volatility of 8.89% is evenly contributed by each asset. We can also see that ERC approaches tend to allocate more capital to low-risk assets and less capital to high-risk assets. Since the latter are usually associated with higher expected returns, risk parity portfolios will naturally face some relative return losses associated with low-risk assets. Several ERC products therefore enhance returns by applying leverage on their portfolio. We will not discuss them here, as diversification does not change with leverage. One can show that the ERC portfolio is optimal if we assume a constant correlation between assets and that assets all have the same Sharpe ratio (see Maillard et al. [MAI 10]). In the other (more realistic) cases, it is not necessarily efficient but it is an attempt to find a trade-off between minimizing risk (minimum variance) and maximizing nominal diversification (equal weights). In particular, one can show that the risk of

138

Portfolio Diversification

the ERC portfolio is bounded from below by the risk of an MV portfolio and from above by the risk of an EW portfolio3: [4.32] How do ERC approaches perform in terms of diversification? From a risk perspective, they allocate to all the available assets in the investment universe and each asset contributes by construction an equal amount of effective risk to the portfolio. While this sounds ideal, it also means that they are very sensitive to the choice of the investment universe. For instance, if we have a universe made of 90 stocks from one sector and 10 stocks from other sectors and we perform an ERC allocation at the asset level, the resulting portfolio is unlikely to be well-diversified. A much better choice would be to perform an ERC allocation at the sector level. Mathematically, we can express this by summing over all assets in a sector. However, from a capital perspective, there could be several portfolios with different weights that satisfy equation [4.25] and their level of asset diversification may vary.

4.4. The maximum diversification approach Introduced by Choueifaty and Coignard [CHO 08], the maximum diversification approach aims at building a portfolio that maximizes the benefits of diversification. This portfolio is exclusively based on risk parameters and does not use any return forecasts.

3 This result is not very surprising as, by using the Kuhn-Tucker conditions, Maillard et al. [MAI 10] have shown that solving equation [4.26] is equivalent to solving the optimization ′ min subject to ∑ ln and a positive weight constraint. problem ∗ The solution is the same if we use an equality instead of an inequality and adding a fully ∗ funded constraint is equivalent to replacing the weights ∗ by ∗ / ∑ , . One can therefore rewrite the optimization problem as



min





/

under

a positive weight constraint and a fully funded constraint. This equation is between the , which gives the minimum variance portfolio, and the maximization minimization of ′ of ∏

/

, which gives the equally weighted portfolio.

Risk-budgeting and Risk-based Portfolios

139

4.4.1. The diversification ratio and its properties An important property of volatility as a risk measure is its sub-additivity. That is, the volatility of a portfolio is never larger than the weighted average volatility of its components: √

[4.33]

The equality corresponds to extreme cases such as a single-asset portfolio, or a portfolio where all assets are perfectly correlated. One may therefore use the ratio between these two quantities to measure portfolio diversification: [4.34]



and call it the diversification ratio. Its numerator is the hypothetical volatility of the portfolio if all assets were perfectly correlated. Its denominator is the effective volatility of the portfolio, which is calculated using prevailing correlation estimates. The result ranges from one (single asset portfolio, or portfolio of perfectly correlated assets, i.e. no diversification) and is unbounded from above, with higher values indicating higher diversification benefits. If all assets are perfectly uncorrelated, then the ratio equals √ . Intuitively, portfolios with concentrated weights or highly correlated assets are poorly diversified and should exhibit a low diversification ratio. Following Choueifaty et al. [CHO 13], we can formalize that intuition by decomposing the diversification ratio into a weighted correlation element and a weighted concentration element. From equation [3.15], we have: ∑





,

[4.35]

We denote the volatility-weighted average correlation of the assets in the portfolio by , which is defined as follows: ∑

∑ ∑

,



[4.36]

140

Portfolio Diversification

We denote the volatility-weighted concentration ratio of the portfolio by , which is defined as follows: ∑

[4.37]



The concentration ratio is a generalization of the HHI but that would be calculated in the risk space rather than in the weight space. It is bounded between 1/ for an equal volatility-weighted portfolio and 1 for a singleasset portfolio. We can rewrite equation [4.35] as follows: ∑



AC



[4.38]

noting that: ∑







[4.39]

We can combine equations [4.38] and [4.39] and obtain the following equation: ∑

1

Dividing both sides by ∑



[4.40]

yields the following equation:

1

[4.41]

from which we get: 1

.

[4.42]

This decomposition shows that the diversification ratio increases when the average correlation or the volatility-weighted concentration ratio decreases. Consequently, holding many assets does not necessarily increase a portfolio’s diversification ratio. What is needed is exposure to several diversified sources of risk.

Risk-budgeting and Risk-based Portfolios

141

An alternative interpretation of the diversification ratio or, more specifically, of the square of the diversification ratio, is the effective number of independent sources of risk in a portfolio. Consider, for instance, a portfolio that holds equal risk allocations in two uncorrelated assets with equal volatility. Such a portfolio will have a diversification ratio of √2 1.41. Similarly, a portfolio that holds equal risk allocations in three uncorrelated assets with equal volatility will have a diversification ratio of √3 1.73, etc. As an illustration, Figure 4.2 shows the square of the diversification ratio over time for the MSCI U.S., MSCI Europe (EMU) and MSCI World indices. Several observations are clear from this chart. Firstly, the U.S. and European indices are less diversified than the World index. Secondly, it is obvious that increasing the number of countries and industries in a portfolio expands the opportunity set in terms of effective risk factors and results in a greater diversification potential. Thirdly, we see that the number of effective factors in each of these markets has changed through time. For the MSCI World index, this number has varied between 3 and 15. This clearly suggests that diversification benefits will be time-varying.

Figure 4.2. Square of the diversification ratio for the U.S., E.U., and world markets (as measured by MSCI indices)

If all correlations were equal to one then the diversification ratio would also be equal to one, regardless of the volatility-weighted concentration

142

Portfolio Diversification

ratio. One criticism for the diversification ratio is that although it does quantify the degree of portfolio diversification, it is a differential diversification measure and not an absolute measure. Stated differently, we cannot really compare two different portfolios on the basis of their diversification ratios. 4.4.2. The most diversified portfolio Choueifaty and Coignard [CHO 08] define the most diversified portfolio (MDP) as the long-only portfolio that maximizes the diversification ratio introduced in equation [5.20]. The corresponding optimization program is given as follows: arg

[4.43]

which is subject to: [4.44] w

0

[4.45]

The solution effectively maximizes the diversification potential of the underlying assets, given their estimated correlation and volatility structure. In some very specific situations, the MDP has an analytical solution. For instance, if all assets have the same volatility, the solution corresponds to the global minimum variance portfolio. In more general cases, one must use numerical procedures to calculate the weights of the most diversified portfolio – see Chincarini and Kim [CHI 06]. Three key properties of the long-only MDP are worth mentioning because they have a direct impact on its diversification and help in understanding what it represents: – All assets included in the MDP have the same correlation to it. This property, although initially surprising, is consistent with the fact the MDP cannot be further diversified – if there were assets with a below-average correlation to the MDP, adding them to the portfolio would reduce its risk. – Assets that are non-diversifying because they have a high correlation with the MDP are excluded from it. Conversely, assets contained in the MDP have the lowest correlation with it. This property explains why the

Risk-budgeting and Risk-based Portfolios

143

long-only MDP is typically expected to be invested in approximately half the assets of the investment universe. In a sense, the other assets are not included in the MDP but are considered to be effectively “represented” in the portfolio. – The MDP is a duplicate invariant. If an asset is duplicated, for instance due to multiple listings, it will not receive twice the allocation. This is a key advantage over the risk parity portfolios discussed in section 4.3 of Chapter 4. Similarly, adding a new asset that is a positive linear combination of existing assets should not result in a change of allocation of the MDP. As an illustration, consider the TOBAM MDP World Equity portfolio described in Table 4.1. One would conclude that this portfolio has zero allocations to the energy sector and a significant exposure to the utility sector (placing third by dollar allocation). If all sectors were independent and had the same level of volatility, this conclusion would be correct. In this case, it is wrong. In fact, the portfolio has a higher correlation with the energy sector than with the utility sector. This shows that it is perfectly possible to be highly exposed to a given sector while holding none of its assets. The exposure is sourced in assets from other sectors, which are correlated with the target sector. In this example, the MDP favors the most diversified sectors that also have the most diversifying potential compared to other sectors. Average correlation of portfolio to the sector

Weight in portfolio (%)

0.893

14.78

Cons. staples

0.858

27.88

Energy

0.835

0.00

Sectors Cons. discretionary

Financials

0.83

7.55

Healthcare

0.876

8.86

Industrials

0.885

5.63

Information technology

0.85

5.17

Materials

0.857

9.36

Telecom. services

0.828

5.78

Utilities

0.821

14.07

Table 4.1. Average correlations and allocations of the TOBAM MDP World Equity portfolio to various sectors (as of December 2011) (source: data from TOBAM)

144

Portfolio Diversification

This better implied diversification can have important implications for the performance of MDP portfolios when specific risks materialize. Consider, for instance, the Fukushima earthquake in March 2011, during which shares of the Tokyo Electric Power Company (“TEPCO”), the firm that was running the Fukushima-Daiichi nuclear power plant, fell by 50% in a few days. Over the 2011 year, the MSCI Japan TR fell by 18.1% and had a volatility of 23.5%. By contrast, the MDP Japan portfolio would have only fallen by 9.3% with a volatility of 18.1%. Surprisingly, this strong outperformance was coupled with a larger allocation to utilities (21.3% vs. 5.1% in the MSCI Japan Index) and a larger allocation to TEPCO itself (4.6% vs. 1.3% in the MSCI Japan Index). However, the rest of the portfolio was, by construction, allocated to stocks that were the least correlated with TEPCO and the least correlated to utilities. In a sense, the MDP Japan portfolio had 4.6% in TEPCO and 95.4% in anti-TEPCO companies – as defined from a correlation perspective. By contrast, in the MSCI Japan Index, there was a smaller exposure to TEPCO, but a higher exposure to TEPCO-like companies. Clearly, for diversification to be effective, one should consider the commonalities behind asset returns. The MDP portfolio does so by using the covariance matrix. 4.4.3. Extensions of the most diversified portfolio Carmichael et al. [CAR 15a, CAR 15b] extended the MDP problem by introducing a generalization of the diversification ratio based on Rao’s quadratic entropy. Given a portfolio of assets, they define its quadratic entropy as follows: ∑



[4.46]

,

where , is a dissimilarity function measuring the distance between asset and asset . The usual requirements for a distance function apply, namely 0, , 0 for all and between 1 and . , , , and , There are three interesting interpretations of what

measures:

is the – The first interpretation is in terms of portfolio weights. average difference between the weights of a given portfolio and the weights of a single-asset portfolio. This value is very small when the

Risk-budgeting and Risk-based Portfolios

145

portfolio is close to a single-asset portfolio and is large when the portfolio is far from that scenario and thus well-diversified. – The second interpretation is in terms of information concentration. The distance , captures the amount of unshared information between assets aggregates this at the portfolio level. For instance, if all and , and 0 for all and , then the portfolio will have assets are similar, , 0. If all assets are very different from each other, their distances will be large as well. is therefore a measure will be large, and of portfolio concentration in terms of information. – The third interpretation is in terms of the effective number of independent risk factors. When the distances , are normalized to be in the [0,1] range, can be used to calculate the effective number of independent risk factors in a portfolio, which is given as follows: [4.47] This definition generalizes the Bouchaud et al. [BOU 97] approach, and 1 includes the Gini index as a particular case when , , , where is any positive constant and , is the Kronecker delta defined as , 1, if i j and , 0 otherwise. All these interpretations fortunately converge towards the same idea. For diversification purposes, one should seek to create a portfolio with the . If we define the dissimilarity matrix by the highest possible following equation: ⋮

,

,

⋯ ⋱ ⋯



,

[4.48] ,

one can solve a new optimization problem, which is: ∗

max

[4.49]

subject to the usual constraints on weights. Alternatively, one can also solve: ∗

min



[4.50]

146

Portfolio Diversification

which is subject to: ′

[4.51]

and to the usual constraints on weights. In equation [4.51], h represents the minimum level of diversification that the investor wants. These optimization problems are extensively discussed by Carmichael et al. [CAR 15a, CAR 15b]. 4.5. The minimum variance approach Minimum variance investing is one of the oldest risk-based strategies. Its goal is to deliver returns with the lowest possible variance out of any portfolio and it is subject to similar weighting constraints (if any), with no consideration for the possible impact on the portfolio return. It should therefore be considered as an absolute risk-minimizing strategy. 4.5.1. The global minimum variance (GMV) portfolio The weights of the GMV portfolio are the solution to the following quadratic optimization problem: arg min

[4.52]

which is subject to: [4.53] The fully invested restriction is arbitrary but it is commonly used to avoid ending up with a trivial zero-holding portfolio. In the absence of other constraints, this problem can be solved in closed form. For every invertible covariance matrix , there is a unique solution given by the following equation: [4.54]

Risk-budgeting and Risk-based Portfolios

147

The expected return of the GMV portfolio is given by the following equation:

μ

[4.55]

and its variance is given by the following equation: σ



[4.56]

The GMV portfolio is generally a long/short and levered portfolio, as the fully invested constraint imposes nothing in regards to the sign of individual weights4. Note that it is often stated in the financial literature that the GMV portfolio is the only efficient portfolio whose weights do not depend on – see for instance, Roll and Ross [ROL 77a] or Clarke et al. [CLA 06]. In our opinion this statement is incorrect as the expected returns of the components are either explicitly or implicitly used for the calculation of the covariance matrix . 4.5.2. The GMV portfolio and diversification The GMV portfolio aims at minimizing portfolio variance through an optimization process but its objective function does not take into consideration the distribution of the weights in the resulting portfolio. When unconstrained, the GMV portfolio often ends up being highly leveraged, with very large short positions in some assets. The concentrated nature of the GMV portfolio, as well as some of its implicit biases, are easy to demonstrate under the assumption of a singleindex model – see Clarke et al. [CLA 10] and Scherer [SCH 11]. As we will see in Chapter 5, in such a case the covariance matrix is given by the following equation: ′σ

[4.57]

4 Roll [ROL 77b], Rudd [RUD 77] and Roll and Ross [ROL 77a] present qualitative arguments for the GMV portfolio to have positive weights. Best and Grauer [BES 92] discuss the conditions under which the GMV portfolio has positive weights.

148

Portfolio Diversification

where is an 1 vector of , and is an 1 vector of . Using the Matrix Inversion Lemma, the inverse covariance matrix is analytically solvable (see Woodbury [WOO 49]) and we obtain the following equation:

[4.58]

where / is an N-by-1 vector of idiosyncratic risk-adjusted betas, β / . In the absence of a non-negative constraint on the weights of the GMV portfolio, the weight of asset is approximately given by the following equation: w

1

,

where the long/short threshold

[4.59] is defined as follows:



[4.60] ∑

can be approximated by the average beta of Under some conditions, the universe of N stocks considered (see Clarke et al. (2010)) and even more should be close to one (the average beta of the full closely by saying that universe). Equation [4.54] then becomes: w

,

1

[4.61]

which leads to the following two observations. Firstly, all other things being equal, the GMV portfolio is likely to invest more capital into assets with a low residual risk in the factor model. Secondly, the GMV portfolio aligns its components in reverse to their beta. It tends to align long assets with a low beta and short assets with a high beta5. On the long side, the GMV portfolio will allocate more capital to assets with a smaller beta. On the short side, it will assign a larger negative weight to stocks with a larger beta. This 5 The implicit assumption here is that all the betas are positive: “high” or “low” is defined by reference to the market beta, which is equal to one.

Risk-budgeting and Risk-based Portfolios

149

behavior is of course completely inconsistent with traditional asset pricing theories such as the CAPM, which assume that expected returns increase as a function of beta. In a sense, the GMV portfolio claims to be agnostic with regard to expected returns but it ends up making the implicit forecast that lower beta stocks will outperform higher beta stocks. Of course, one could argue that beta should not be considered as a proxy for return but as a proxy for risk, but these are two sides of the same coin. 4.5.3. The long-only GMV (LOGMV) portfolio To force the MV portfolio to be long-only one can simply add a constraint on its weights. The new optimization problem becomes: arg min

[4.62]

which is subject to: [4.63] w

0 for all

[4.64]

Unfortunately, it is no longer possible to obtain an analytic solution in this case. However, the problem may be solved numerically, as discussed by Best and Grauer [BES 90] and Chincarini and Kim [CHI 06]. Alternatively, Jagannathan and Ma [JAG 03] show that this new problem is equivalent to considering an alternative covariance matrix, shrunk towards a symmetric target that depends on the Lagrange multiplier associated with the positivity constraint. Through this process, allocation to assets with the highest covariances is reduced, which means those assets are less likely to cause large negative weights. 4.5.4. The LOGMV portfolio and diversification Unfortunately, the LOGMV portfolio is also poorly diversified, and very sensitive to the input covariance matrix. This can easily be illustrated by an example.

150

Portfolio Diversification

EXAMPLE 4.7.– Consider a portfolio of two assets with identical volatility (20%) and perfect correlation (+1). The LOGMV portfolio may combine these two assets in any proportion, for instance 50% and 50%. However, if the volatility of the first asset becomes 19.9%, it receives 100% of the allocation while the second asset gets nothing. Like its unconstrained cousin, the LOGMV portfolio is biased and rules out many assets with average to high volatility. Once again, this can also easily be demonstrated under the assumption of a single-index model. In the presence of a non-negative constraint on all the weights, the weight of asset is approximately given by the following equation: w

1 w



[4.65]

0

where is the ex‐ante return variance of the global long‐only minimum variance portfolio, is the in-sample idiosyncratic variance of asset and is the long-only threshold beta, which is defined as follows: ∑

[4.66] ∑

is the in-sample single-index market model variance. Equations where [4.65] and [4.66] clearly illustrate that the LOGMV portfolio is likely to invest into low residual risk and low beta stocks, and consequently rule out many assets. In practice, the LOGMV portfolio is often a mathematical corner solution with null weights in most of the assets and very large holdings in the remaining assets. 4.5.5. Constrained MV portfolio construction To avoid highly concentrated minimum variance portfolios, one may add constraints on asset weights. This introduces a trade-off between losing optimality on the variance of the portfolio while at the same time gaining diversification. We have already discussed the introduction of such

Risk-budgeting and Risk-based Portfolios

151

constraints in the more general setting of Section 2.5.3 in Chapter 2. However, the case of the original minimum variance problem under an norm constraint on weights provides an interesting situation. Mathematically: arg min

[4.67]

which is subject to: [4.68] δ

[4.69]

Obviously, as δ decreases, the resulting portfolio becomes more and more diversified. By using the Lagrangian model, one can show that the solution to this problem is as follows: ∗

[4.70]

where is a Lagrange multiplier. Equation [4.70] is equivalent to equation [4.54] if we shrink the covariance matrix towards the scaled identity matrix, with shrinkage intensity . Consequently, for 0, we obtain the unconstrained minimum variance portfolio, and for → ∞ we have an equally weighted portfolio. An interesting value for the shrinkage intensity is given as follows: ∗

0;

0

[4.71]

Using ∗ in equation [4.19] yields the long-only portfolio with the lowest possible volatility – see Coqueret [COQ 15]. Smaller values of result in a long/short portfolio that is less diversified, while higher values lead to a portfolio with non-minimal volatility. Another interesting value for the shrinkage intensity is: ∗∗

0;





[4.72]

Using ∗∗ in equation [4.19] gives a “semi-diversified portfolio” which ∗ ∗ /2, approximately half is characterized by way between full diversification ( ) and no diversification at all ( 1). Note that this portfolio is generally long/short and leveraged.

152

Portfolio Diversification

4.5.6. Empirical properties The long-lasting success of the GMV portfolio is partially due to the large empirical literature that has documented its higher returns compared to capitalization-weighted benchmarks – see Table 4.2. Source

Market, benchmark

Analyzed period

Baker and Haugen [BAK 91]

U.S., Wilshire 5000

1972–1989

Chan et al. [CHA 99] Schwartz [SCH 00] Jagannathan and Ma [JAG 03]

U.S., value-weighted index of 500 random 1973–1997 stocks U.S., S&P 500

1979–1998

Results 21% lower standard deviation and 22% higher returns 16.7% lower standard deviation and 8.6% higher returns 30% lower standard deviation and 6.5% lower returns

U.S., value-weighted 21% lower standard deviation index of 500 random 1968–1999 and 6% higher returns stocks

Clarke et al. [CLA 06]

U.S., Russell 1000

1968–2005

28% lower standard deviation and 7% higher returns

Geiger and Plagge [GEI 07]

Germany: DAX 30, France: CAC 40, Japan: Nikkei, Switzerland: SMI, U.S.: S&P 500

2002–2006

12–45% lower standard deviation and 74–160% higher returns

Nielsen and Subramanian [NIE 08]

World, MSCI World

1995–2007

26% lower standard deviation and 6% higher returns

Poullaouec [POU 08]

World, MSCI World

1988–2008

23% lower standard deviation and 3.5% higher returns

Baker and Haugen [BAK 12]

21 developed countries and 12 emerging markets

1990–2011

12–25% lower standard deviations and 5–25% higher returns

Table 4.2. Examples of GMV portfolio performance studies. Most of these studies imposed upper weight limits and a short sales restriction

For Modern Portfolio Theory these higher returns are abhorrent. Since the minimum variance portfolio loads on assets that have low variances and covariances, it is expected to have low betas and thus deliver low expected returns. However, in reality, we get the exact opposite result. Fortunately, a closer look at the minimum variance portfolio provides some intuition

Risk-budgeting and Risk-based Portfolios

153

behind its outperformance. Firstly, as discussed in the previous sections, minimum variance investing implicitly picks up low beta and low residual risk stocks, which are well-known and well-documented pricing anomalies in a variety of markets and asset classes. Secondly, unless one uses explicit constraints, minimum variance portfolios tend to display a strong concentration in terms of assets and sectors. We could therefore summarize the situation by saying that minimum variance investing has historically outperformed but only thanks to significant, implicit bets on assets, sectors and factors. 4.6. Revisiting portfolio construction with a risk-based view One may revisit some of the portfolio approaches discussed previously and compare them from an allocation perspective. In fact, many of these approaches allocate something in equal proportions but that something differs from one approach to the other. For instance, the naive portfolio allocates assets equally: for all ,

[4.73]

Therefore, the naive portfolio can be considered as being well-diversified from an asset weight perspective. The GMV portfolio equally allocates marginal contribution to risk, that is: for all ,

[4.74]

Intuitively, if this were not the case, it would be possible to reduce the portfolio variance by carrying out a small capital reallocation between assets with the highest and the lowest marginal variance in the current allocation. The GMV portfolio is therefore well-diversified in terms of marginal contributions to risk: an incremental addition to the weight of an asset will increase the risk of the MVP by the same quantity as an identical incremental addition to the weight of any other asset. The ERC portfolio equally allocates total risk contributions, that is: for all ,

[4.75]

154

Portfolio Diversification

The ERC portfolio can therefore be considered as being well-diversified from a risk contribution perspective. The MDP portfolio equally allocates the marginal risk divided by the volatility, that is: for all ,

[4.76]

The MDP portfolio can therefore be considered as well-diversified in terms of relative or scaled marginal risk. Alternatively, because the correlation between its components and the portfolio itself are uniform, the MDP portfolio can also be considered as being well-diversified from a correlation perspective. In parallel, one can also show that the volatility of the MV, ERC and EW portfolios may be ranked in the following order: . Unfortunately, the MDP portfolio cannot systematically be included in such a ranking, apart from the obvious . 4.7. Conclusions on risk-based approaches Risk-based portfolio construction represents a novel approach in asset management. Unlike traditional approaches, it only requires the covariance matrix as an input and therefore it is not exposed to the difficulties of having to estimate expected returns. Examples of such strategies are the minimum risk portfolio, the risk parity portfolio and the maximum diversification portfolio. An important element for all risk-based strategies is their high dependence on the return covariance matrix. While a relatively large number of research papers have covered the impact of the accuracy of the covariance matrix forecasts on the performance of mean-variance, unfortunately much less research has been undertaken on their impact on risk-based strategies other than minimum variance6.

6 See for instance Chan et al. [CHA 99], Ledoit and Wolf [LED 03], Zakamulin [ZAK 15] or Hurley and Brimberg [HUR 15].

Risk-budgeting and Risk-based Portfolios

155

From a personal viewpoint, we confess that we prefer the maximum diversification portfolio to the other two portfolios. The minimum risk portfolio tends to be more concentrated and its allocation is usually highly time-varying, unless one adds some constraints on asset weights. The Risk parity portfolio is an interesting concept but it allocates by construction to all assets in the investment universe. This makes it extremely sensitive to the definition of the underlying universe and can lead to significant biases if there are some highly correlated assets and each of them receives an equal risk allocation. A possible enhancement to risk parity worth briefly discussing is the equal risk bounding approach suggested by Cesarone [CES 16]. Essentially, its key idea is to require that the risk contributions of all assets to the variance of the portfolio do not exceed a given threshold, which might then be minimized. As an initial threshold one could use, for example, the equal risk contribution achieved by the risk parity portfolio. The result is a hard non-convex quadratic programming problem with quadratic constraints but it can be translated into a finite series of simpler risk parity problems.

5 Factor Models and Portfolio Diversification

Factor models have been around since the early days of modern portfolio theory. In fact, in the same way that Mr Jourdain1 was surprised and delighted to learn that he had been speaking prose all his life without knowing it, investors were using factor models long before these notions were ever formalized in Finance. For instance, from an asset allocation perspective, equity investors bullish on small caps would typically increase the proportion of their portfolio allocated to this category and reduce the proportion allocated to larger caps. From a factor perspective, they are decreasing their exposure to the “market capitalization” factor, which is typically defined as a long large caps and short small caps dollar-neutral portfolio. In this simple example, investors have been using factor models without knowing it. Fortunately, as we will see in this chapter, factor models have much more to offer than a simple change of terminology. 5.1. Factor models 5.1.1. Introduction to factor models In its standard form, a factor model posits that the returns of assets can be explained by a set of common factors. Mathematically, we have for asset : ∑

,



[5.1]

1 Mr Jourdain is the main character in “Le bourgeois gentilhomme”, a famous play penned in 1670 by Jean-Baptiste Poquelin, a.k.a. Molière.

158

Portfolio Diversification

where is the return on asset and is the return on the -th factor. The term , is the sensitivity of asset i to the factor . It is calculated as the ratio of the covariance between asset and factor returns to the variance of factor returns: ,

,,





[5.2]

and in equation [5.1] represent the specific parts of the The terms return of asset , which cannot be explained by its exposures to the factors. The term is constant2, whereas the term is a residual return, which, by construction, has an expected value of zero and is independent of the factors. In this chapter, for the sake of simplicity, we will only discuss linear factor models with constant sensitivities3. It is important to note that: – A factor can be any measurable characteristic commonly shared by multiple assets. A few examples of factors will be discussed in section 5.2. – The model defined by equations [5.1] and [5.2] puts no empirical restrictions on asset or factor returns beyond requiring that their means and variances exist. To make it useful and reduce its dimensionality, additional assumptions on the structure of the model are required. A frequently used assumption is that the asset-specific returns are uncorrelated with each other. Mathematically, this implies that their covariance matrix is diagonal. – In most factor models, the factors are correlated, with a covariance matrix denoted by . We will assume that the rank of equals , otherwise one of the factors can be eliminated to create an equivalent , model with only 1 factors. We will denote by , the covariance between factors and .

2 In an exact factor pricing model, there are no non-risk-based components in expected returns and all alphas are equal to zero. In this case, the portfolio with the maximum Sharpe ratio (the tangency portfolio in the mean standard deviation space) is a linear combination of the k factors. In an approximate factor pricing model, in which the k factors are not sufficient to fully explain returns, then some αi will differ from zero. 3 The case of factor models and diversification when sensitivities are time-varying are discussed in Chen and Keown [CHE 81a, CHE 81b].

Factor Models and Portfolio Diversification

159

– Although there is no constraint on this, in practice, is usually much smaller than . In an extreme case, and each factor corresponds to an asset. Then, the -factor model is equivalent to the original -asset model. With a universe of

assets and ,





⋮ ,

factors, we have:

… ⋱ …

,









[5.3]

,

or equivalently in matrix form:

[5.4]

where is an matrix of factor betas, is a -vector of the factor returns and and are -vectors of asset-specific returns. From there, we can derive the factor-based expected returns of the assets:

[5.5]

as well as their factor-based covariance matrix:



[5.6]

Equation [5.6] defines a strict factor model. The first term, ′, which is entirely explained by the factors, is sometimes called systematic , is idiosyncratic; it only contains terms covariance. The second term, which are specific to each asset. For all assets and , we have: ,



,

[5.7]

is a 1 vector of asset ’s exposures to the where , ,…, , factors and Covar ε , ε is the covariance between the residuals of assets is diagonal, Covar ε , ε and . As by assumption the covariance matrix will equal zero if and are different, but will equal the variance of the asset’s residual if . It is important to note that if the factor model does not entirely explain the covariance of the asset returns, equation [5.6] becomes:

[5.8]

160

Portfolio Diversification

where represents the expectation of the residual’s pure covariance terms. This is called an approximate factor model and was originally developed by Chamberlain and Rothschild [CHA 83b]. This could happen, for instance, if there is a model error, for example, a relevant factor has been omitted, or if there is a sampling error – see Lhabitant [LHA 04a]. In such a case, as accounts for all the diagonal elements in , there are only non-diagonal elements in . We can show that the variance not accounted for by the strict factor model has a non-zero correlation with the common factors. In practice, users of factor models usually ignore this point and assume – most of the time with no justification – that equation [5.6] is correct. An initial benefit of using factor models is the reduction of the dimensionality of the portfolio selection problem. As discussed in section 2.4.2, estimating the unrestricted covariance matrix for assets requires the estimation of 1 /2 parameters. By comparison, estimating the same matrix using equation [5.6] and a -factor model is much simpler, only requires the estimation of 1 /2 terms, because the matrix plus terms for the beta vector and terms for , which is diagonal. As an illustration, for a universe of 500 assets, we need to estimate 125,250 parameters, versus only 1,001 in a one-factor model, or 3,015 in a five-factor model. Although the number of estimated parameters remains high, it is much smaller than initially, which should reduce estimation errors and ultimately increase portfolio diversification4. A second benefit of factor models appears to be a computational improvement when calculating the inverse of :









[5.9]

This expression seems lengthy, but it only requires the inversion of an diagonal matrix and a matrix, which is much easier than inverting the original unrestricted covariance matrix. Using a factor model therefore simplifies the calculation of optimized portfolios that are based on . Additional benefits of factor models will be introduced and discussed later in this chapter.

4 As discussed in section 2.4.3, estimation errors combined with optimizers often result in highly concentrated portfolios.

Factor Models and Portfolio Diversification

161

5.1.2. Examples of factors and factor models For the sake of illustration, let us now describe a few factor models. One of the simplest ones is the single-index model introduced by Sharpe [SHA 63], which uses the return on an index of stocks called the “market” as a factor. Mathematically: [5.10] with the assumption that the residual terms ( ) are uncorrelated to market returns and to one another. By taking the covariance of both sides of equation [5.10], we obtain the variance of asset as: σ

β

[5.11]

We can see that the variance of each asset is made of two parts, a systematic risk β , which is purely index-driven and thus common to all , which is unique, unrewarded and should assets, and a specific risk therefore be eliminated as much as possible through diversification. Because it is assumed that the residual terms are uncorrelated, the sole source of covariance among two assets is due to the general market, and: σ,

ββ

[5.12]

Although the single-index model is simple and convenient, a wealth of empirical research has evidenced that it is in itself not sufficient to explain the cross-section of expected asset returns. Additional factors may be required, which naturally opens the way to multi-factor models. In practice, a very long list of multi-factor models is available, but most of these just provide a description of the return generation process, with not much theory behind it5. They are usually classified in one of the following three categories:

5 The exceptions are factor models based on arbitrage arguments or on equilibrium arguments. Let us mention, for instance, the Arbitrage Pricing Theory (APT) developed by Ross [ROS 76] for the former and the Intertemporal Capital Asset Pricing Model (ICAPM) developed by Merton [MER 73] for the latter. We will not discuss them here.

162

Portfolio Diversification

– Macro-economic factor models, in which the common factor variables are observable economic variables and/or financial time series. For instance, Chen et al. [CHE 86] used the monthly growth rate of industrial production, the change in expected inflation, the unexpected inflation, the risk premium of the market and the term structure of interest rates (long-term government bond return minus the Treasury bill rate) as factors. Fama and French [FAM 92, FAM 93, FAM 96] suggested a model with the market return, the return of small cap minus large cap stocks (SMB) and the return of high minus low book-to-market stocks (HML) as factors. Carhart [CAR 97] extended this model by adding an additional factor which captures the momentum anomaly by going long-past winners and short-past losers (WML). Fama and French [FAM 15] revisited their original model by adding a profitability factor defined as robust minus weak (RMW) and an investment factor defined as conservative minus aggressive (CMA) and so on. – Fundamental factor models, in which the common factor variables are determined using fundamental, asset-specific attributes such as sector or industry membership, firm size (market capitalisation), dividend yield and firm style (value or growth as measured by price-to-book, earnings-to-price, etc.). A well-known example of such a factor model is the GEM2 model sold by Barra, which uses 142 individual factors, grouped as follows: a World factor that reflects the movements of the global equity market, eight global risk indices (value, momentum, liquidity, growth, size, nonlinear size, volatility, financial leverage), 34 industries based on the GICS classification scheme, 55 countries and 44 currencies. Several additional commercial providers have since developed similar models. – Statistical factor models, in which the common factor variables are hidden (latent) and whose structure is deduced statistically from the analysis of the observed asset returns. We will discuss these models in section 5.5. In the following sections, we will assume that all factors are investable. For some factors, this is a relatively light assumption – we just need to define an investable proxy. For instance, gaining exposure to the “equity risk premium” can be done by going long a stock index and short T-Bills; gaining exposure to the “size” factor in equities can be done by going long a developed country equity small-cap index and short a developed country

Factor Models and Portfolio Diversification

163

equity large-cap index; gaining exposure to the “credit spread” can be done by going long a high-quality credit index and short a Treasuries index with similar durations; gaining exposure to the “inflation” factor can be done by going long a nominal Treasuries index and short a TIPS index and so on6. For other factors (for instance, GDP growth and unemployment), finding an investable proxy might be more difficult. For the sake of simplicity, we will assume that factor-mimicking portfolios exist and are invested in the same original assets7. We will denote by w , the relative weight of asset in the factor mimicking portfolio. Unless otherwise stated, we will also assume that each factor is perfectly replicated by its factor-mimicking portfolio, so that: ,

,



,

[5.13]

It is important to note that the resulting factor-mimicking portfolio may contain long and short positions, and may be levered or not fully allocated. 5.1.3. Viewing portfolios from a factor lens Portfolios are traditionally viewed through an asset lens. That is, they are defined in terms of relative capital allocated ( ) to the assets in the universe, whose returns are given by and covariance matrix by . The risk and return of the portfolio are measured from the risk and return of its underlying assets. As seen in equations [2.12] and [2.14], we have: ∑

[5.14]

6 However, in practice, we need to be cautious as several factor models, especially those issued by the early academics, appear to be investable but were not designed for actual implementation. For instance, in the United States, many equity factors are calculated using all the stocks in the CRSP US Stock Databases, which includes all securities with primary listings on the NYSE, NYSE MKT and NASDAQ. The resulting portfolios typically involve long and short positions in very small market capitalizations, which are often illiquid, subject to very large transaction costs and large borrowing costs. In addition, these portfolios need to be rebalanced monthly, leading to a high turnover that would be complicated and costly to execute in practice. 7 If the creation of a factor-mimicking portfolio requires an additional asset, we can simply augment the original investment universe to include that asset and apply the same logic on the new investment universe made of N + 1 assets.

164

Portfolio Diversification

and: ∑



[5.15]

,

In this context, portfolio construction is done in terms of asset weights and portfolio diversification is assessed by looking at the distribution of the asset weights . Alternatively, portfolios may be viewed through a factor lens. In this case, they are defined in terms of exposures ( ) to the factors, whose returns are given by and covariance matrix by . The risk and return of the portfolio are measured from the risk and return of these factors. Combining equations [5.1] and [5.14], we may write the return on a portfolio as: ∑





,

with and variance of the portfolio: σ





[5.16]

. Taking the variance of both sides yields the



[5.17]

Defining the portfolio exposures to factor as rewrite equation [5.16] as:

′w allows us to [5.18]

and equation [5.17] as: σ







[5.19]

In this context, portfolio optimization is performed in terms of the exposures to the factors and portfolio diversification is assessed by looking . at the distribution of the betas, as well as at the size of vector of It is important to note that if a portfolio is defined by a exposures to the factors, it is always possible to calculate its weights in terms of the original underlying assets. This involves multiplying the

Factor Models and Portfolio Diversification

165

portfolio beta to each factor by the weights of its factor-mimicking portfolio. Consequently, portfolio exposure to asset is given by: ∑

w

,

[5.20]

The asset and the factor view of the portfolio are therefore completely equivalent. However, viewing a portfolio through the factor lens clearly highlights the real risk drivers behind it, while the traditional asset lens may not always capture them. EXAMPLE 5.1.– Consider a portfolio allocated to three asset classes, namely treasury bonds, equities and convertible bonds, and a two-factor model made of equity risk and interest rate risk. Although technically different, the three asset classes have common risk exposures. Treasury bonds will primarily be exposed to interest rate risk, whereas equities will be exposed to equity risk. However, convertible bonds are a hybrid asset class and will typically exhibit time-varying exposure to both the equity risk factor and the interest rate risk factor. Depending on the evolution of their underlying stocks, the convertible bonds will lead to a higher concentration of portfolio risk into the equity risk factor or into the interest rate risk factor. While the asset class lens will not be able to capture this because its keeps thinking in terms of three distinct market segments, a factor model will easily evidence it and analyze the portfolio as a dynamic allocation between two risk sources. EXAMPLE 5.2.– In 2008, many investors complained that all their components in their balanced portfolios moved in lockstep and ultimately collapsed at the same time. In hindsight, this was highly predictable. Consider, for instance, the hypothetical portfolio represented in Figure 5.1. It is invested across nine asset classes, namely Global Equities (MSCI World), U.S. Bonds (Barclays U.S. Aggregate Index), International Bonds (Barclays Global Aggregate USD-Hedged Index), Inflation Linked Bonds (Barclays U.S. TIPS Index), Private Equity (Cambridge Associate U.S. Private Equity Index), Hedge Funds (HFRI FoF Diversified Index), Real Estate (NCREIF Property Index), Commodities (Bloomberg Commodity TR Index) and Cash (3 month US Libor Index). This portfolio looks relatively balanced from an asset perspective, with half of its risk in equities and half in other asset classes.

166

Po ortfolio Diversificcation

Figuure 5.2 show ws the same portfolio vieewed througgh a four-facctor lens (the fouur factors arre developedd equity marrkets, emergging equity markets, corporate bonds annd currenciess). The persspective is completely c ddifferent, with thee exposure to t the develooped marketts equity riskk factor reprresenting close too 90% of thee underlyingg risk. With so much rissk commonaality, not much diversificatio d on should bee expected in this porttfolio shouldd equity marketss collapse. Looking L at thhe world thro ough the facctor lens wouuld have allowedd investors too go beyond asset labels and to obseerve the effecctive but unintenttional risks embedded e in their portfollios. EXAMPL LE 5.3.– A few f years aggo, the Danish pension fund f ATP addopted a factor-bbased investm ment approacch, which th hey detailed in their 20166 annual report. Figure F 5.3 shows a simpplified view of the resultt. Most asset classes are preddominantly exposed e to thhe equity facctor, followedd by the inteerest rate and infllation factors. Accordingg to this, a private p equitty investmennt would not addd much diverrsification too an equity portfolio, p evven if the coorrelation betweenn the two miight appear to be low, due d to stale pricing p and irregular valuatioons. In a sensse, the lack of o liquidity reesults in loweer correlationns and is thereforre interpretedd as a potenttial source off diversificattion, which is wrong. Similarlly, credit invvestments haave a signifiicant exposuure to the eqquity risk factor, despite d invessting in a diffferent part of o the capitall structure. T They will generallly not be a grreat source of o diversificaation for equiity portfolioss.

Figure 5.1. Asset classs view of a porrtfolio (capital allocation)

Factor Models M and Porttfolio Diversifica ation

Fig gure 5.2. Facttor exposure view v of a portffolio

Fig gure 5.3. Facttor-based view w of various assset cla asses, accord ding to ATP 20 016 annual rep port

167

168

Po ortfolio Diversificcation

Figure e 5.4. Risk stru ucture for the four major facctors for traditional assets, acccording to AT TP 2016 annua al report

Interrestingly, AT TP has also set s up a long--term “all-weather” guiddeline for the com mposition of risk in theirr portfolio, which w is as follows: 35% % to the equity factor, f 35% to t the interesst rate factorr, 15% to thee inflation faactor and 15% to other factors. It is worrth noting th hat each riskk factor cann also be o allocationns to mimick king portfollio, as illusttrated in defined in terms of Figure 5.4. 5 5.1.4. Portfolio P diiversificatio on with fac ctor models s Portffolio diversiification in a factor-baseed world is more compplex than that in an asset-baased world, essentially because it may occur at three differennt levels: (i) the factor selection proccess; (ii) thee constructioon of the factor-m mimicking poortfolios andd (iii) the allo ocation to thee factors to create the portfolioo.

Factor Models and Portfolio Diversification

169

– The factor selection process. As discussed earlier, there are several hundred observable variables deemed to have explanatory power for asset returns and which could to be considered as factors. Some are grounded in academic research, while others are just intuitive, purely empirical or chosen on the basis of the nature of the underlying assets. When selecting some of them to be included in a parsimonious factor model, they should ideally display a relatively low level of correlations. The intuition is that highly positively or negatively correlated factors ultimately capture very similar information. Code

Name

Description

EG

Dev. mkts. growth

MSCI World (“World”)

VAL

Value premium

World Value – World Growth

SIZ

Size premium

World Small Cap – World Large Cap

EMG

Emg. mkts growth

MSCI Emg. Markets – World

HY

Credit spreads

Barclays High Yield – Barclays Invt. Grade

DEF

Default risk

Barclays Aaa – Barclays BBB

DUR

Duration risk

Barclays 20Yr TBonds – Barclays 1–3Yr TBonds

RR

Real interest rates

Barclays TIPS

INF

Inflation

Barclays Treasuries – Barclays TIPS

VOL

Volatility

VIX Index Table 5.1. Examples of equity, fixed income and macro-economic factors

As an illustration, Table 5.1 lists 10 of the most widely studied factors and their investable proxies. We do not claim that this risk factor list is

170

Portfolio Diversification

exhaustive or that the investable proxies are the best ones. They are just examples of what is commonly used. Table 5.2 shows the correlations between these factors over a 15-year period. Most of them are much lower than the correlations between assets, asset classes, sectors, countries and other factors discussed in section 2.3. Several of them are negative. This suggests that diversification could have higher potential benefits when allocating capital between factors, rather than between assets, asset classes, sectors, countries and so on. Obviously, the expected diversification benefits should increase if the factors selected are loosely correlated. EQ

VAL

SIZ

EMG

HY

DEF

DUR

RR

INF

EQ VAL

−0.06

SIZ

0.04

0.09

EMG

0.33

−0.06

0.36

HY

0.64

0.02

0.30

0.38

DEF

0.48

0.00

0.31

0.33

0.73

DUR

−0.22

0.04

−0.06

−0.15

−0.40

−0.33

RR

0.07

−0.02

0.09

0.12

−0.03

0.18

0.50

INF

−0.35

0.03

−0.22

−0.34

−0.48

−0.59

0.23

−0.63

VOL

−0.68

0.08

0.08

−0.31

−0.47

−0.37

0.20

0.00

0.28

Table 5.2. Correlation between equity, fixed income and macro-economic factors over a 15-year period. Non-positive correlations are highlighted in gray

– The construction of the factor-mimicking portfolios. Once the factors have been identified, we need to build their investable proxies. The questions are essentially (1) how many assets should be included in each factormimicking portfolio and (2) how should these assets be weighted? Both decisions have an impact on the level of diversification and on the quality of

Factor Models and Portfolio Diversification

171

the factor-mimicking portfolios. For instance, we could argue that the factormimicking portfolio for the Developed Markets Growth factor of Table 5.1 should contain only Apple. While Apple’s market capitalization is very large and its returns explain some of the general behaviors of developed market stocks, it should be obvious that such a single-asset factor-mimicking portfolio would be excessively concentrated, severely biased toward the technology sector and toward larger capitalizations and poorly diversified in the sense that it would contain unrewarded specific risks. We could disagree on the latter point and claim that Apple will largely outperform the market, due to specific reasons. However, this is a return forecast, not a risk forecast, and it has nothing to do with diversification. Ideally, a good factormimicking portfolio should be representative of its universe, pure (not exposed to undesired other factors, which should be neutralized) and well diversified (no specific risk). Clearly, a single-stock portfolio made of Apple does not tick any of these boxes. – The allocation to the factors. Once the factors have been selected and the factor-mimicking portfolios created, the last step is to decide how much exposure a portfolio should have to each factor. As shown in equation [5.19], the specific variance of the portfolio contributes to the total portfolio variance. In a well-diversified portfolio with no return views, is not expected to be rewarded and should therefore be minimized. The complexity comes from the fact that we are now dealing with betas to factors, while the also involves asset weights. Fortunately, we can calculation of translate one into the other so that the complexity is only technical. It should be remembered that ultimately we are still allocating capital to the same assets, but we are trying to do it in a more efficient way. The factor-based portfolio diversification problem is therefore of the same nature as the assetbased portfolio diversification problem discussed in Chapters 1 to 4, except that: (i) we are in a lower-dimension universe, with factors instead of assets; (ii) when factors are appropriately selected (i.e. sufficiently different from each other), their correlations tend to be lower than the correlations between assets and (iii) factor correlations tend to be more stable over time than correlations between single assets.

172

Portfolio Diversification

Unfortunately, the old issues are still the same. To be able to run a mean variance optimization, investors need to provide their optimizer with estimates of the factors’ expected returns8. This opens the door to estimation errors and potentially concentrated and unstable factor-based portfolios. The solutions discussed in Chapters 3 and 4 are applicable here, for example, naïvely allocate an equal amount of capital to each factor-mimicking portfolio and create a risk-parity portfolio from the factors or even a partial risk parity for some factors but not all of them. How would this work? Consider, for instance, a universe of assets viewed through the lens of factors, out of which are country-related (for instance) and are non-country-related. As given in equation [5.6], we have:



[5.21]

where is the matrix of factor exposures, is the factor covariance matrix is a diagonal matrix of asset-specific variances. We can split the and into a country-related part (denoted by “c”) and a nonmatrices and country-related part (denoted by “n”). We get:



, ,

,



,

[5.22]

Now, say, for instance, we want to implement the restriction of an equal risk contribution by each country. This can be expressed as a set of convex constraints on relative marginal contributions to risk. We saw earlier that the factor exposures of the portfolio were defined as ′w. We can split into a country-related part (denoted by “c”) and a nonthe matrices country-related part (denoted by “n”) so that , , . The ERC restriction of equation [4.25] then becomes: ,

,

, ,

, ,

,

,

, ,

, ,

for all ,

[5.23]

8 More recently, many market participants have started to use factors based on investable and transparent rules-based portfolios. They usually go by the names of “alternative beta” or “smart beta”. Once these factors seem to have generated a positive rate of return over long periods, some people like to call them “risk premia”. While the above discussion is entirely applicable to these newcomers, we should not forget that such factors generally include a return forecast, or at least the assumption that their return has been and therefore will be positive.

Factor Models and Portfolio Diversification

173

where is a zero matrix with = 1. Alternatively, we could also use the more straightforward naïve risk parity approach and allocate to each factor an amount of capital inversely proportional to the inverse of its volatility, but it would ignore the covariance between factors. Several risk-based factor allocation strategies have been tested by Bender et al. [BEN 10a] and Ilmanen and Kizer [ILM 12] using equally weighted and equally risk-weighted portfolios. On the basis of Sharpe ratios, both concluded that factor-based diversification is more effective than asset class diversification. However, when building mean-variance efficient portfolios and testing for the statistical significance of their Sharpe ratios, Pappas et al. [PAP 12] found no conclusive evidence of such a domination. The debate therefore remains open. 5.2. Principal component analysis (PCA) As seen in section 5.1, covariance terms play a very important role when analyzing the diversification of a portfolio from a factor perspective. In an ideal world, however, factors should be uncorrelated and all covariance terms would become zero. This would greatly reduce the complexity of using factor models. While most observable factors are usually correlated, some statistical techniques can be used to create new uncorrelated factors. The most well-known of these techniques is the principal component analysis (PCA) developed by Karl Pearson [PEA 01] as an analogue of the principal axis theorem in mechanics. It is commonly used in statistics to reduce the complexity of a data set while minimizing information loss9. As illustrated below, it can also be used to simplify portfolio construction and improve portfolio diversification. 5.2.1. Introduction to PCA To understand how PCA works, it is helpful to compare it with the traditional factor approach. Say we want to explain the behavior of assets, ,…, ′ and their covariance matrix given by their random returns 9 PCA is also known as the discrete Kosambi–Karhunen–Loève transform in signal processing, the Karhunen–Loève expansion in astrophysics, the Hotelling transform in multivariate quality control, the proper orthogonal decomposition in mechanical engineering, the Schmidt-Mirsky theorem in psychometrics and the empirical orthogonal functions in meteorological science.

174

Portfolio Diversification

. As discussed in the previous section, the usual approach selects a series of exogenous variables, names them factors and assumes that asset returns are linearly related to these factors. As seen in equation [5.4], the relationship between asset returns and factors is given by:

[5.24]

In a traditional factor model, the factors are usually correlated. To keep things simple, let us assume that perfect factor-mimicking portfolios can be defined as a portfolio of the original assets so that: ,

,



[5.25]

,

PCA closely follows the path we have just described, but goes one step further. Rather than using pre-specified factors such as sectors or country indices, PCA creates its own set of uncorrelated factors, which are called the principal components. Each principal component is always defined as a linear combination of the original assets so that: ,

,



,

[5.26]

By comparing equations [5.27] and [5.28], we may conclude that PCA is just a variation of a factor model. However, there are some fundamental differences between them. First, factor models assume a theoretical model for the returns while PCA does not. Second, principal components are only defined as linear combinations of the underlying assets while factors could also be defined as nonlinear combinations. Third, principal components are by construction uncorrelated among themselves while factors are usually correlated. The next question is how to select the weights for each principal component . Before addressing this, let us note from equation [5.28] that we can calculate the covariance between and as: ,

[5.27]

Factor Models and Portfolio Diversification

and therefore the variance of

175

as: [5.28]

The algorithm to obtain the weights that define principal components works as follows: – the first principal component is the linear combination assets, with return , that maximizes subject to the normalization condition the only component that is constructed independently of the others; – the second principal component is the linear combination assets, with return , that maximizes subject to the normalization conditions , 0;

of the 1. It is of the 1 and

– the ith principal component ( 3, . . , ) is the linear combination of , that maximizes the assets, with return subject to the normalization conditions 1 and , 0 for 1, . . , 1.

Figure 5.5 geometrically illustrates the construction of and in the case of a universe of two assets. As 2, the original coordinate system is two-dimensional, with the first axis corresponding to the returns of the first asset and the second axis to the returns of the second asset. The top left scatterplot represents observations of the monthly returns of the two assets. The result is an elongated cloud in two dimensions. The top right plot represents the same information, but centered at their respective means (for each asset return, the respective mean is subtracted). The bottom left plot corresponds to the direction of maximum elongation (read: shows that variance) of the data. Geometrically, it is a straight line that goes through the widest part of the ellipse, and it can be defined as a linear combination of the is then found as being orthogonal to and original two assets. aligned with the direction of maximum residual elongation (or residual variance, or, equivalently, the part of the original variance unexplained by ) of the data. is also a linear combination of the original two assets. Finally, the bottom right plot rotates the initial axis so that is horizontal and is vertical.

176

Portfolio Diversification

Figure 5.5. Geometric illustration of the two principal components in a two-dimensional space (universe made of two assets). The same logic is applicable to universes with a higher number of assets, but the result cannot easily be represented in two dimensions

It is important to note that no change has been made to the original data itself – PCA does not transform data, it just provides a better viewpoint to understand what drives its variance (the best “implied factors”). 5.2.2. The mathematics behind principal components The key mathematical result behind PCA is a process called spectral decomposition, which factorizes a positive definite matrix ( ) into a set of eigenvectors ( ) and eigenvalues ( ). There are several ways to define

Factor Models and Portfolio Diversification

177

eigenvectors and eigenvalues, but the most common one defines an eigenvector of the matrix as a vector that satisfies: [5.29] where is the eigenvalue associated to the eigenvector . Equation [5.31] (the eigenvectors), the essentially states that for certain special vectors (the general transformation matrix only scales them by a factor eigenvalues) and does not rotate them. As the vector is not considered as an eigenvector, we can write: det

0

[5.30]

Equation [5.27] defines a polynomial of order , which needs to be solved to obtain the eigenvalues. These are then plugged back into equation [5.26] to get the eigenvectors. For convenience, the scalar eigenvalues of are usually sorted in descending order so that and the eigenvectors are usually normalized such that their length ⋯ equals one: ′

1

[5.31]

Let diag , , … , be the , sorted in decreasing order. Let

diagonal matrix of eigenvalues ,…, be the -dimensional ′

normalized eigenvectors, with . Let , , , ,…, , , ,…, be the quare matrix whose columns are the eigenvectors. We can rewrite equation [5.31] as: [5.32] As is a positive definite, all its eigenvalues are always non-negative and its eigenvectors are pairwise orthogonal when their eigenvalues are ′ . different. is therefore an orthogonal matrix10, which implies Applying this property to equation [5.34], we obtain the spectral decomposition of : ′

10 A matrix is orthogonal when its product by its transpose is a diagonal matrix.

[5.33]

178

Portfolio Diversification

We can show that the first principal component , defined by the , corresponds to the eigenvector associated with the largest weights , which is its variance. More generally, the ith principal eigenvalue component , defined by the weights , corresponds to the eigenvector associated with the ith largest eigenvalue , which is its variance. We have: arg



[5.34] [5.35]



0 for all ,

[5.36]

To illustrate what this means in practice, let us go back to our two-asset example of Figure 5.5. In the bottom left scatterplot, is a new, latent variable, which can be displayed as a line going through the origin and oriented along the direction of the maximal variance (thickness) of the cloud. The variance along this line is the first eigenvalue, and the orientation of the line is defined by the first eigenvector. 5.2.3. From principal components to eigen-portfolios Each principal component , or, equivalently, each eigenvector , is defined as a linear combination ẄPCAi of the original assets. In the context of Finance, it is convenient to interpret these principal components as portfolios. In the following paragraphs, we will refer to them as eigenportfolios. Their returns can easily be calculated from the original asset returns , as: [5.37] and their covariance matrix is the diagonal matrix of eigenvalues . These eigen-portfolios define a new base to represent assets and portfolios. Given a portfolio P defined in terms of weights of the assets, it is possible to calculate its weights in terms of the eigenportfolios: [5.38]

Factor Models and Portfolio Diversification

179

The portfolio return is then calculated as: [5.39] and the portfolio variance as: σ







w

,

[5.40]

Comparing equation [5.42] with the traditional formula to calculate the provide a great variance of a portfolio and recalling that insight into what PCA allows us to do. Essentially, we can either view the world through the traditional lens, that is, think of a portfolio in terms of its weights in the original assets universe with its “complex” covariance matrix , or use a new lens and think of a portfolio in terms of its weights in the eigen-portfolios universe with its much simpler diagonal covariance matrix . The main advantage of the new universe is that eigen-portfolios are uncorrelated by construction so that there are no pure covariance terms in any calculation. As a result, risk attribution in terms of variance terms becomes additive. In a sense, the original problem of portfolio selection from the correlated assets has been traded for the reduced problem of portfolio selection from a set of uncorrelated portfolios. The dimensionality of the problem remains the same, but the new one has no covariance terms to deal with. At this point, a quick geometric illustration in the case 2 might be helpful. Figure 5.6 shows our scatterplot. In the asset space, the point and asset 2 had corresponds to an observation, where asset 1 had a return a return . In the principal portfolio space, the point corresponds to an observation where the principal portfolio 1 had a return ∗ and the principal portfolio 2 had a return ∗ . The point ∗ corresponds to the projection of the . This axis has the point onto the axis defined by the direction property that the variance of the projected points is greater than the variance of the points when projected on any other axis passing through the center of the scatterplot11. Similarly, the point ∗ corresponds to the projection of the point onto the axis defined by the direction . The scalars ∗ and ∗ are called the principal component scores for the observation corresponding

11 Any axis parallel to PCA1 has the same property, but by convention, PCA axes must all pass through the center of the scatterplot.

180

Portfolio Diversification

to point . The cosine of the angle between and component of the eigenvector corresponding to .

gives the first

Figure 5.6. A portfolio can indifferently be measured in the original two-asset space or in the new two-eigen-portfolio space

Let us now explore some examples in a higher dimension. Let us start by analyzing the Dow Jones Industrial Average Index and its 30 stocks. Figure 5.7 plots, in descending order of magnitude, the eigenvalues of a correlation matrix. It is important to remember that the variance of each principal component is equal to the corresponding eigenvalue, so the total variance of the index is the sum of all its eigenvalues. Clearly, most of the variance of the 30 stocks is explained by the variations of the first principal component, . If needed, the second principal component can be used to explain some of the residual variance, then the third principal component and so on. By definition, all principal components/eigenvectors are defined as long/short portfolios of the 30 original stocks. The contents of the long/short portfolio for and are represented in Figure 5.8. Interestingly, the portfolio is long, but this is not always the case.

Factor Models and Portfolio Diversification

60%

181

Percentage of explained variance

50%

40%

30%

20%

10%

0% 1

6

11

16

21

26 Number of components

Figure 5.7. Sorted eigenvalues for the returns on the 30 stocks that comprise the Dow Jones Industrial Index, from October 2014 to October 2015

0.4 0.3 0.2 0.1

-0.1

AAPL AXP BA CAT CSCO CVX DD DIS GE GS HD IBM INTC JNJ JPM KO MCD MMM MRK MSFT NKE PFE PG TRV UNH UTX V VMT VZ XOM

0

-0.2 -0.3 -0.4

Figure 5.8. Composition of the eigenvectors for the returns on the 30 stocks that comprise the Dow Jones Industrial Index. PCA1 is in black, PCA2 in gray

182

Portfolio Diversification

The power of principal components comes from the fact that they have the highest possible explanatory power of all linear combinations of assets. Stated differently, we can show that the first principal components will always do a better (mean-square) job of explaining variance in the original data than any other linear factor model using only components. In our is the best portfolio/index we could use example on the previous page, to explain the variations of the 30 stocks in the universe. Unfortunately, it is a pure statistical index, with no recognizable “brand name” attached to it. This explains why some investors would probably prefer using the Dow Jones Industrial Average Index as a factor despite its lower explanatory power. A frequent question when using PCA is how many principal components should be used. Using all principal components, we can reconstruct exactly the original returns and their covariance matrix . Using only the first principal components, we will only be able to reconstruct , a small approximately and . The rationale is that as implies a small variance or, equivalently, that the data change little in the direction of component . We can therefore ignore the principal components that are associated with a very “small” eigenvalue and only focus on the others. In practice, the problem of figuring out how many components need to be considered is still open, but there are some commonly proposed guidelines: (i) retain sufficient components to account for an appropriate threshold percentage of the total variance, for instance, 90%; (ii) use Kaiser’s [KAI 70] rule and retain all principal components that have an eigenvalue greater than 1 or (iii) retain the components whose eigenvalues are greater than the average of the eigenvalues. Because the average eigenvalue is also the average variance of the asset returns, this is equivalent to retaining those components that account for more variance than the average variance of the assets or (iv) use a scree graph, which plots the value of each successive eigenvalue against the rank order, and find a point where the curve is subjectively steep to the left and linearly decaying on the right. 5.2.4. Eigen-portfolios, diversification and effective number of bets Viewing the world as allocations to uncorrelated eigen-portfolios rather than as combinations of correlated assets opens the door to interesting diversification analyses. Let us first start with the original universe of

Factor Models and Portfolio Diversification

183

assets, from which the eigen-portfolios are created. By construction, because ⋯ ), the first principal component of the ordering of eigenvalues ( captures the largest part of the variance in the data, the second principal component captures the largest part of the remaining variance and so on, for the remaining components. In general, the terms / ∑ are called the relative strengths of the ith principal portfolio. They represent the fraction of the total variance of assets owing to the jth principal component. On this basis, Rudin and Morgan [RUD 06] introduced a new portfolio diversification index to include within one single statistic the diversification potential that a given investment universe actually provides. It is calculated as: 2 ∑





1

[5.41]

The result is numerically bounded between 1 and . A PDI close to 1 indicates that almost the entire variance of assets can be attributed to a single principal component. Portfolios created from such an investment universe will always be completely non-diversified from a risk perspective, even though they may include the assets of the universe. A PDI close to 1/ , that is, an orthogonal risk structure across indicates that / ∑ assets. Portfolios created from such an investment universe have a large diversification potential. However, their effective diversification will ultimately depend upon their weightings. Let us now consider a specific portfolio rather than the universe of assets. When looking at it as an allocation to principal portfolios, as seen in equation [5.40], the variance contributions of each principal portfolio are additive. Consequently, the percentage of risk attributable to eigenportfolio is given by: σ

[5.42]

which is by construction non-negative, in the range of 0 to 1, and sums to 1. These properties allow us to consider the as a probability mass. Together, these probability masses form the “diversification distribution”, which can be summarized, analyzed and optimized using the tools and techniques discussed in Chapter 1. For instance, we could use the Shannon entropy as

184

Portfolio Diversification

defined in equation [1.28], but on the new “weights” asset weights. This gives: exp



ln

rather than the [5.43]

Intuitively, a well-diversified portfolio should not be too concentrated on a small number of values. In the absence of return forecasts, a rational approach is to allocate equal amounts of risk to each eigen-portfolio. Because the variance of each eigen-portfolio is given by : w

w

for all and

[5.44]

This new asset allocation goal follows the risk parity approach discussed in section 4.3, but it is expressed in terms of exposures to eigen-portfolios. As a note, when using this approach, it is advisable to limit the number of eigen-portfolios. The logic is that as eigen-portfolios associated with very small eigenvalues do not have a high explanatory power, equally allocating risk to them is probably equivalent to investing into noise and treating this noise identically to the higher quality information provided by the first principal components. In addition, we should keep in mind that PCA is a variance analysis approach, and as such, is not sensitive to the sign of returns. Stated differently, PCA considers two vectors of opposite directions as completely identical. While acceptable from a variance analysis point of view, this assumption is no longer valid from a return perspective. Before allocating, investors should therefore ensure that each principal portfolio has a non-negative expected return. If it is not the case, they should short it. Another interesting diversification indicator is the number of implied uncorrelated bets. For a given universe of assets, Polakow and Gebbie [POL 08] suggested a statistical criterion to determine the number of significant eigenvalues of the covariance matrix , which they interpret as the number of true independent assets that investors can choose from. While useful at the universe level, the technique does not really work for a portfolio, where additional information is available (e.g. weightings) and should be used. A possible solution is to use the Hill number of the Shannon entropy on the basis of eigen-portfolio exposures. It is defined as: exp



ln

[5.45]

Factor Models and Portfolio Diversification

185

In general, a higher value indicates a more diversified portfolio, whereas a lower value indicates concentration on only a few independent sources of risk. When a portfolio of N assets is made of one 1 and all the unique exposure to a single principal component, then 1. When the portfolio follows a other values are equal to zero, and are equal to 1/ , then , which is the risk parity and all the maximum possible value. In this case, there are N uncorrelated bets in the portfolio, and they are equally allocated. From a diversification perspective, this is optimal, but from an investment perspective, it implicitly ignores the investment return on each of these bets or, equivalently, assumes that their expected returns are identical and positive. The key advantage of versus the original Hill numbers discussed in Chapter 1 is that relies on all the available information, for example, the number of assets , the capital allocated to these assets and the return characteristics of these assets. In the style of Bera and Park [BER 08], various heuristics can also be explored for portfolio construction on the basis of . For instance, we could also try to solve the following optimization problem: ∗

max



[5.46]

subject to appropriate constraints on weights, as well as any required additional constraint (e.g. minimum portfolio return and maximum portfolio volatility). The result of the optimization will be a portfolio defined in terms of allocation to the eigen-portfolios. The allocations in terms of original underlying assets can be calculated by inverting equation [5.38]. 5.2.5. Extensions to PCA PCA is an elegant statistical approach that, by means of an orthogonal transformation, converts a set of correlated variables into exposures to uncorrelated factors or summarizes them by uncorrelated factors. These factors can be interpreted as eigen-portfolios, whose empirical properties have been extensively described in the financial literature12. While

12 See Frahm and Wiechers [FRA 11], Dfine [DFI 11], Lohre et al. [LOH 14], Lohre et al. [LOH 12] and Deguest et al. [DEG 13].

186

Portfolio Diversification

optimal from a statistical point of view, these factors have several limitations, which are summarized as follows: – They are purely statistical and therefore difficult to interpret or relate to known economic quantities. – They tend to be highly unstable over time when associated with low eigenvalues because of their low explanatory power and the changing nature of what they represent. – They are not unique due to sign ambiguities. In fact, there are potentially 2 different possible combinations of principal components bets, because if is one of the eigenvectors, so is its opposite . – They are not invariant under simple scale transformations. As principal components calculated from asset returns measured in basis points and percentage points are different, so are their associated eigen-portfolios. – In some instances, their usage results in counter-intuitive results – see, for example, Meucci [MEU 09]. There are other statistical techniques that aim to achieve the same result as PCA, that is, create a new series of uncorrelated factors by applying some transformation to the set of original assets (or more generally, a set of original factors). For instance, we could think of Independent Component Analysis, Linear Discriminant Analysis, Fourier analysis, wavelet decomposition and so on. They are more complex, and some of them suffer from the same issues as PCA, while some do not. We will not discuss them here. However, a new approach worth mentioning is the minimum torsion bet suggested by Meucci [MEU 15]. The minimum torsion bets approach starts with a set of original correlated factors denoted , which can be freely specified by investors. As an extreme case, we could set and use the original assets as factors. The minimum torsion transformation (MTT) essentially rotates the original factors through a linear transformation to create a new set of uncorrelated factors that are closest to the original factors – or stated differently, that minimize the multi-entry tracking error of the new factors versus the original factors. These new factors are called the minimum torsion bets. Once they have been determined for a given portfolio, we can calculate the exposures of assets or portfolios to these new factors, as well as the minimum torsion diversification distribution and the effective number of minimum torsion

Factor Models and Portfolio Diversification

187

bets. We are in a very similar setting to the principal portfolio environment we discussed in section 5.2.2, except that minimum torsion bets closely track the original factors and solve all the issues raised on principal portfolios. By contrast, principal components have no reason to be related to the original factors, which does not help in terms of interpreting what they represent economically. 5.2.6. Improving portfolio diversification by pre-clustering assets One of the key issues of risk-based and naïve portfolio diversification approaches is their sensitivity to the definition of the investment universe. As an illustration, an equally capital-allocated or risk-allocated portfolio, if the investment universe is made of 50 assets from one sector and 10 assets from three other sectors, is unlikely to be well diversified – the first sector will dominate. Fortunately, some pre-processing of the investment universe can easily solve the issue. This can be done qualitatively, for instance, by choosing a grouping criterion (e.g. sectors) and by pooling assets into groups-based thereon. For each group, we create a representative portfolio (whose weightings must be defined). The overall portfolio is constructed by allocating to these sub-portfolios. In our former example, we had four sectors or sub-portfolios, with one of them comprising 50 assets. The large number of assets in one sector is therefore no longer an issue. Although this approach is frequently used in practice by applying qualitative criteria, it is unfortunate that not even the most basic quantitative information about the underlying assets is considered. Several powerful quantitative techniques – regrouped under the name of “clustering” – are available to pre-process the assets in the above-mentioned situation. Clustering essentially partitions a series of assets into clusters, so that these clusters have a low inter-group correlation and a high intra-group correlation. Clusters can then be tracked by sub-portfolios allocated to their components, and the asset allocation takes place at the sub-portfolio level. The approach can improve the stability of the portfolio and reduce the impact of estimation error. We will not discuss such clustering techniques here, but refer the reader to Lhabitant [LHA 04a] for a review and Curto et al. [CUR 06] for an illustration based on PCA.

188

Portfolio Diversification

5.3. Conclusion on factor models Factor models are a very powerful tool that allows us to get a better understanding of what is driving asset returns and risks and, more importantly, to identify the common driving forces behind various assets and their covariance matrix. Using them allows investors to go beyond the asset labels or categories and start thinking in terms of effective risk exposures. The latter can be specified ex ante and correspond to readily identifiable fundamental characteristics, or be identified ex post by looking at the covariance matrix of the asset returns. The mathematics of factor models may be relatively complex, but their underlying concept is intuitively apparent. However, the major danger of using factor models has been perfectly summarized by Andrew Ang [ANG 10], who compared asset classes to food and factors to nutrients. We should eat food to get nutrients, not the other way around. Similarly, we should use factor models to understand what drives asset returns and build better diversified portfolios. Investing directly in factors may be questionable, particularly when these factors are based on portfolios with a limited number of assets and are therefore exposed to specific risks. As shown by Amenc et al. [AME 15], it is not sufficient to allocate to different factors to have a well-diversified portfolio, even if these factors are self-labeled “smart beta”. We must also ensure that the specific risks of the factor-mimicking portfolio are well diversified too.

6 Non-normal Return Distributions, Multiperiod Models and Time Diversification

In this chapter, we will discuss two advanced topics related to portfolio diversification. The first one is the case of non-normally distributed returns, which are typically not well captured when variance is used as a risk measure. The second one is the case of multi-period models, in which investors can rebalance their portfolio content over time. 6.1. Non-normal returns For computational convenience, it is commonly assumed that asset returns are normally distributed, so that portfolio construction only needs to focus on the first two moments of return distributions, namely the mean and the variance. The diversification of variance – or more specifically, its reduction – in a portfolio context is a well-known and well-documented effect, which we have discussed extensively in Chapter 2. Unfortunately, the normal distribution assumption is increasingly challenged by empirical evidence, which tends to suggest that asset returns exhibit clustering in volatility dynamics, asymmetry in upside and downside potentials, and heavier tails than the normal. If needed, think of the Latin American debt crisis in the early 1980s, the stock market crash of 1987, the United States Savings and Loans crisis in 1989–1991, the European exchange rate mechanism crisis in 1992, the Asian financial crisis of 1997, the Russian default in 1998, the burst of the technology bubble in 2000–2001, and the

190

Portfolio Diversification

more recent United States sub-prime mortgage crisis and the ensuing global financial meltdown of 2008. Examples of suggested distributions to replace the normal one when modeling returns include: the stable Paretian distributions by Mandelbrot [MAN 63] and Fama [FAM 63], [FAM 65], the Student’s t-distributions by Blattberg and Gonedes [BLA 74], the hyperbolic distributions by Eberlein and Keller [EBE 95], the finite mixture of normals by Kon [KON 84], and the tempered stable distributions by Bianchi et al. [BIA 08], among others. Using mean–variance analysis in the presence of non-normal returns is likely to produce portfolios that are not optimal from an investor’s perspective. Even though the mean–variance solution may have come close to a higher-moment optimal solution, the two differ most of the time – see Athayde and Flores (2004) for a discussion. In addition, an inefficient mean– variance portfolio may in fact be an optimal portfolio when higher moments are considered, and vice versa. Last but not least, diversifying risk from a variance perspective does not necessarily imply that higher order moments such as skewness and kurtosis have been well diversified. To avoid these issues, it is necessary to consider higher moments of the return distribution. This can be done either explicitly within the portfolio construction and diversification process, or implicitly by using a risk measure that is more sophisticated than variance and able to capture larger tail risks. 6.1.1. Higher moments: examples of skewness and kurtosis Given a set of assets and their -dimensional random vector of returns, the set of its -th order moments is, in general, a tensor. For the sake of simplicity, we will limit our discussion to moments of order 3 (skewness) and order 4 (kurtosis), as higher order ones have no easy intuitive economic interpretation. Note that there are different definitions of skewness and kurtosis in the finance literature. In the following section, we will use the term skewness to refer to the return distribution’s raw third moment. For asset i, we have: σ

[6.1]

Non-normal Return Distributions, Multi-period Models and Time Diversification

191

The skewness of a portfolio P is defined as: σ

[6.2]

which can be expressed as a function of the underlying assets as: ∑





, ,

[6.3]

where , ,



[6.4]

The , , terms denote the skewness of asset , whereas the , , terms denote the co-skewness between assets , and . Equation [6.3] can be expressed in matrix form as: ⨂

[6.5]

where ⨂ denotes the Kronecker product, and M3 is the three-dimensional co-skewness matrix, which is defined as: M3

′⨂



[6.6]

Numerical values of the skewness are typically stated in standardized form by dividing by the third power of the standard deviation:

[6.7]

Similarly, we will use the term kurtosis to refer to the return distribution’s raw fourth moment. For asset i, we have: [6.8] The kurtosis of a portfolio P is defined as: [6.9]

192

Portfolio Diversification

which can be expressed as a function of the underlying assets as: ∑







[6.10]

, , ,

where , , ,



[6.11]

The , , , terms are the kurtosis of asset , whereas the , , , terms are the co-skewness between assets , , and . Equation [6.10] can be expressed in matrix form as: ⨂ ⨂

[6.12]

where ⨂ denotes the Kronecker product, and M4 is the four-dimensional cokurtosis matrix, which is defined as: M4

′⨂

′⨂



[6.13]

Numerical values of the kurtosis are typically stated in standardized form by dividing by the fourth power of the standard deviation:

[6.14]

Using equations [6.3] and [6.10], we can show mathematically that skewness and kurtosis do not diversify like variance in the sense of offering more attractive risk figures in a multi-asset portfolio. As an illustration, let us discuss the case of skewness in an equally weighted portfolio. The portfolio skewness is given by: ∑

,,







[6.15]

, ,



We can rewrite equation [6.15] as: ∑

,,







, ,

[6.16]

Non-normal Return Distributions, Multi-period Models and Time Diversification

193

Taking expected values on both sides yields: σ

, ,

-

, ,

[6.17]

where σ is the average skewness and , , is the average co-skewness of the assets. All other things being equal, equation [6.17] shows that increasing the portfolio size ( ) will reduce the expected portfolio skewness only if 0) and asset returns are asset returns are positively skewed on average (σ on average less than perfectly “correlated” (σ , , ). These conditions may hold or not hold, i.e. there will be situations where increasing the portfolio size will result in an increase of the portfolio skewness coefficient compared to the skewness of its components, whereas in other situations, this will result in a decrease. Moreover, unlike variance, the skewness and co-skewness coefficients may be positive or negative. Increasing portfolio size may therefore also result in a change of the sign of its skewness. Moreover, having assets that are individually positively skewed does not guarantee that the portfolio will feature a positive skewness, as the coskewness terms may be negative. In summary, we should be cautious in the presence of return distributions with higher moments, and not naïvely believe that what has worked in terms of diversification with the variance will happen again just by increasing portfolio size. In general, skewness and kurtosis risks need to be managed explicitly. As we now have an analytical formula for portfolio skewness and kurtosis, we could include both the parameters in the investor’s optimization program. Here again, several approaches are possible. Let us mention a few for the sake of illustration. Mean–variance optimization with skewness and kurtosis constraints. In this case, we simply add target skewness and kurtosis constraints to the original mean–variance utility problem of equations [2.20] and [2.21]. The new problem to solve becomes: ∗

max



[6.18]

194

Portfolio Diversification

subject to: 1

[6.19]

[6.20]



[6.21]

Polynomial approximation of the utility function. In general, we can assume that investors will prefer high values for odd moments and low ones for even moments –see Scott and Horvath [SCO 80]. We can therefore extend the quadratic approximation of the utility function of the investor by explicitly including skewness and kurtosis. The objective function then becomes a real-valued polynomial of degree four, for example: [6.22] with A, B, C 0. Parameters , and can be related to the investor utility function if we use a Taylor series. However, Lhabitant [LHA 97] shows that this approach has severe limitations. Skewness maximization under mean and variance constraints. Investors with non-increasing absolute risk aversion should like positive skewness (extreme deviations from the mean more on the positive side) and dislike negative skewness. One idea is therefore to link their objective function directly to skewness. For instance, ∗

max



[6.23]

subject to: 1

[6.24]



[6.25] ,

[6.26]

Non-normal Return Distributions, Multi-period Models and Time Diversification

195

Kurtosis minimization under mean and variance constraints. Investors with non-increasing absolute risk aversion tend to dislike kurtosis (extreme events with a high probability on either side). One idea is therefore to link their objective function directly to kurtosis. For instance, ∗

min



[6.27]

subject to: 1

[6.28]



[6.29] [6.30]

,

Polynomial goal programming: this approach was initially suggested by Lai et al. [LAI 06] as a two-step procedure. In the first step, we determine that maximizes the expected rate of return, the portfolio the portfolio that maximizes the skewness, that minimizes the variance, the portfolio and the portfolio that minimizes the kurtosis. The aspired levels of expected return ( ∗ ), variance ( ∗ ), skewness ( ∗ ) and kurtosis ( ∗ ), respectively, are deduced from the portfolios to . In the second step, we aggregate the various objectives into a single objective function. Let , , and be the non-negative goal variables that account for the deviations of expected return, variance, skewness and kurtosis from their respective aspired levels. They represent the amount of underachievement with respect to the best scenario1. The problem to be solved becomes: ∗

max











[6.31]

subject to: ∗

[6.32] ∗

[6.33]

1 Some researchers have suggested using the general Minkowski distance for the specification of the objective function in PGP.

196

Portfolio Diversification

1



[6.34]



[6.35] [6.36]

Parameters , , and represent a given set of investor preferences. A non-negativity constraint on weights may be added if required. All these new portfolio construction approaches are theoretically appealing, but they suffer from several practical drawbacks. First, as are cubic and quartic functions of the weights, the optimization and problem becomes non-convex and can no longer be solved with traditional nonlinear programming algorithms, often causing consternation due to the added complexity and increase in computational cost. Second, the difficulty created by co-skewness and co-kurtosis is the curse of dimensionality. The co-skewness matrix has 1 2 /3! components, and the cokurtosis matrix has 1 2 3 /4! components. Each of these components must be estimated, which opens the door to potential sources of estimation errors. When using sample data, the problem is exacerbated by the fact that each outlier observation used in the calculation is raised to a higher power than when estimating the variance. Therefore, in a sense, the accuracy of co-skewness and co-kurtosis matrices is even more questionable than the accuracy of co-variance matrices2. As discussed in section 2.4.2, estimation errors are one of the primary reasons behind the relatively high concentration of optimized portfolios. In the context of non-normal distributions, the problem is likely to be more important – the higher moments of return distributions come from rare events, so the impact of estimation errors on higher moment will be larger. We should therefore be extremely cautious when exploring this route.

2 Note that improved estimators were developed lately by Martellini and Ziemann [MAR 10], who extended several improved estimates that had been proposed for the covariance matrix to the skewness and kurtosis dimensions. Such improved estimates most notably include the factor-based approach, the constant correlation approach and the statistical shrinkage approach. Similarly, Jondeau et al. [JON 10] extended the use of Principal Component Analysis (PCA) to the skewness and kurtosis dimensions to identify the factors that drive coskewness and co-kurtosis structures across a large set of time series.

Non-normal Return Distributions, Multi-period Models and Time Diversification

197

6.1.2. Risk budgeting with higher moments The extension of risk budgeting techniques to higher moments requires the calculation of marginal and absolute risk contributions with respect to higher moments. Marginal contributions are retrieved by partially deriving the portfolio’s third and fourth moments with respect to the weight vector: ⨂

[6.37]

⨂ ⨂

[6.38]

and

Similarly, the absolute risk contributions of the i-th asset are given by: A

,

A

,

,

w

[6.39]

w

[6.40]

and ,

We can show that higher moment absolute risk contributions must sum up to the respective portfolio higher moments. Using equations [6.39] and [6.40], it is relatively straightforward to build a risk parity portfolio extension of equation [4.21]. For instance, we could have: ∗



min

[6.41]

subject to: 1 0

[6.42] 1

[6.43]

Functions , and absolute risk contributions: ∑



are defined as the quadratic loss of the

,

,

[6.44]

198

Portfolio Diversification

and parameters moments.







∑ ,

,

,

,

,

and

[6.45] [6.46]

are the investor preferences for the respective

The program defined by equations [6.41]–[6.43] tries to find the best trade-off solution in terms of risk parity with respect to several moments of the distribution. Baitinger et al. [BAI 17] have tested it empirically on various datasets, and concluded that higher moment approaches outperform the volatility parity approach when the underlying data is characterized by significant non-normality and strong co-dependencies. It underperforms otherwise. According to the authors, this makes higher-moment risk parity portfolios ideal candidates for worst-case regimes. 6.1.3. The diversification delta Since correlation or covariance matrices do not account for higher moments of return distributions, Vermorken et al. [VER 12] introduced a new measure of diversification called the diversification delta (DD). This is defined as the ratio of the entropy of the assets to the entropy of the portfolio: ∑



[6.47]



where

is the differential entropy of the returns, which is defined as:





[6.48]

and is the density of . Note that the definition of entropy used here is a continuous one, as it is assumed that returns can follow a continuous distribution. It is essentially a generalization of Shannon’s entropy.

Non-normal Return Distributions, Multi-period Models and Time Diversification

199

According to Vermorken et al., the diversification delta varies between zero and one. A value of zero indicates a portfolio with one single asset, whereas a value of one indicates that all the idiosyncratic risk has been diversified, so that the portfolio is well diversified. The diversification delta – or more generally, the idea of comparing the uncertainty of individual assets with the uncertainty of a portfolio – is an interesting approach to analyze diversification without having to make assumptions on the distribution of asset returns. As seen in Chapter 1, the notion of entropy is effective in measuring uncertainty, and since it does not assume normality, it is able to take into account higher moments. Unfortunately, differential entropy does not have all the desirable properties of a risk measure: it is not left-bounded, homogeneous and sub-additive3, and unlike the discrete entropy, it can be negative. As a result, the diversification delta may behave poorly. Here are a few examples. For the sake of simplicity, let us consider two assets with normally distributed ~ , ~ , . In such a case, the two-asset and returns portfolio return will also be normally distributed with ~ , 2 . The differential entropy of a is: normally distributed variable R with variance log

2

[6.49]

Consequently, the diversification delta of the portfolio is:

[6.50]

which simplifies to: 1

3 See Cover and Thomas [COV 91].

[6.51]

200

Po ortfolio Diversificcation

EXAMPL LE 6.1.– Con nsider the caase of a porrtfolio made of two uncoorrelated assets with w normallly distributeed returns, with w σ1 = 100% and σ2 = 2.5%. Figure 6.1 6 shows the diversificaation delta off the portfolioo as a functioon of the weight of o the first asset. a We cann clearly see that the diveersification ddelta can be negaative, which is i not easy too interpret an nd contradictts Vermorkenn et al.’s statements. In addittion, it displays an erratic behavior as the asset weights change. EXAMPL LE 6.2.– Co onsider the case c of a po ortfolio madee of two asssets, and where the t second asset a returns are simply defined as twice the fiirst asset returns. It is clear that t these tw wo assets hav ve zero diveersification ppotential. Figure 6.2 6 shows the diversificaation delta off the portfolioo as a functioon of the weight of the first asset. a The reesult is not only o differennt from zero but also w whicch makes negativee and seems to vary as a function off the asset weights, no sensee.

Figure 6.1. Diversification delta of a portfolio of o two nt assets as a function of th he weight of th he first asset independen

Non-normal Return R Distributions, Multi-perio od Models and Time T Diversifica ation

201

Figure 6.2. Diversification delta of a portfolio of o two a as a function of the weight w of the first f asset related assets

To address a the issues i illustrrated in the previous p exaamples, one possible solutionn is to replacce the differeential entropy y by an expoonential entrropy that satisfiess the followinng property:

( (∑

exp e H

N i =1

wi Ri

)) ≤ ∑

N i =1

wi exp ( H ( Ri ) )

[6.52]

The revised diversification deelta becomess:

R P= RDD



N i =1

( (∑

wi exp ( H ( Ri ) ) − exp H



w exp ( H ( Ri ) ) i =1 i N

N i =1

wi Ri

))

[6.53]

Com mpared to thee original divversification delta of equuation [6.47]], it uses the weiighted geom metric mean of the expo onential entrropies of thee assets, instead of the weiighted arithm metic mean n on the lefft-hand sidee of the numerattor. Salazar et al. [SA AL 14] discu uss the propperties of tthis new measuree and illustraate its usefulnness in an em mpirical exam mple of a porrtfolio of United States S stockss and bonds.

202

Portfolio Diversification

6.1.4. Downside risk measures An alternative path to considering the non-normality of return distributions in the portfolio construction problem is the use of more sophisticated risk measures than the variance. Common examples of such measures include the Value at Risk (VaR), the Conditional VaR (CVaR), the Average Drawdown (ADD), the Maximum Drawdown (MD), the Conditional Draw-down at Risk (CDaR), etc. All these new risk measures may be used to define new portfolio optimization problems, which typically focus on minimizing some sort of downside risk – see, for instance, Rockafeller and Uryasev [ROC 00], [ROC 02], Alexander and Baptista [ALE 03], Chekhlov et al. [CHE 05]. Some of them allow for analytical risk decomposition, and therefore for portfolio risk budgeting as well – see Gourieroux et al. [GOU 00] or Scaillet [SCA 04]. Unfortunately, these new risk measures and their associated optimization programs are subject to the same old issue, namely, estimation risk. In fact, the issue is even worse when using the variance, because these risk measures are typically focused on the left tail of the return distribution, which has fewer data points available for the estimation process. For example, when looking at the CVaR at a 99% confidence level, we only have 1% of the observations to estimate the CVaR. With such a small sample, large estimation errors are highly likely – see, for instance, Lim et al. [LIM 11] for a discussion, whose conclusion is that CVaR is a coherent risk measure, but fragile in portfolio optimization. As an illustration, Allen et al. [ALL 16] compare various downside investment optimization strategies in a European context and observe that: (1) none of them dominates naïve diversification and (2) several of them end up with a portfolio of one asset only. 6.1.5. Extreme tail risk modeling Some investors prefer to ignore what happens in the center of return distributions, which is typically captured by their first two moments, and to focus exclusively on their extreme tails. Modeling non-normally distributed extreme returns for a single asset is not easy, as there are very few observations, if any. Modeling the joint distribution of extreme returns for several assets and their dependence on extreme tails is even harder.

Non-normal Return Distributions, Multi-period Models and Time Diversification

203

Nevertheless, several techniques have been developed to explicitly address this problem. Let us mention multivariate Extreme Value Theory (EVT) and non-parametric estimation techniques of the concordance between rare events4. These techniques are mathematically complex, and their application to a large number of assets is computationally intensive. Consequently, they are rarely used in practice. An alternative is to use copulas, which allow for a separate treatment of the margins of joint risks and the dependence structure between them. For an extensive survey, see Cherubini et al. [CHE 03], Embrecht et al. [EMB 03] or Schmidt [SCH 05a]. This approach is simpler from a computational perspective and allows using distributions with fatter tails than the normal (e.g. Laplace (double exponential), logistic and Cauchy) as the model for the marginal distributions, while letting the copula take care of the dependence. Then, Monte Carlo simulations are used to simulate observations from the chosen joint distribution. Unfortunately, this approach essentially replaces estimation risk by specification risk in the choice of the copula. As an illustration, Heyde and Kou [HEY 04] show that even with 20 years of identically independently distributed daily observations, we cannot distinguish between exponential and power-type tail; as a result, the adequate copula is extremely difficult to specify. 6.1.6. Diversification and heavy tail distributions As already mentioned, empirical returns for financial assets tend to exhibit a higher frequency of extreme observations than what is predicted by a normal distribution. For instance, over the period 1950–1986, the United States market had a daily standard deviation approximately equal to 1%. If its returns were normally distributed, we should see a 5% daily variation every 14,000 years, a 6.2% daily change once in the life of the universe, and a −23% daily change almost never. Nevertheless, the latter did occur on 19 October 1986. To account for such low-probability but high-consequence events, many researchers have attempted to find tractable distributions to model returns. An early suggestion by Mandelbrot [FAM 63] was to use a

4 See Coles et al. [COL 99] or Hall and Tajvidi [HAL 00] for the former, and Dobrc and Schmidd [DOB 05] or Schmidt and Stadmüller [SCH 05b] for the latter.

204

Portfolio Diversification

power law – a heavy tail distribution5, in which the probability is proportional to a value to a power . Mathematically: | |

~

[6.54]

The constant coefficient , called the tail index, measures the likelihood of observing outliers. When is very small, the tail of the distribution is very thick and returns have a highly dispersed distribution. As gets larger, the tail looks more like a normal distribution. Modeling returns using power laws offers several advantages. First, numerous empirical studies have evidenced the presence of heavy-tailed time series in asset returns – see, for instance, Loretan and Phillips [LOR 94], Gabaix et al. [GAB 03] and Rachev et al. [RAC 05] and references therein. Second, power laws display strong invariance under aggregation: when two independent power-law distributed variables are combined, either additively or multiplicatively, the one with the fattest tail dominates, and the tail exponent of the combined distribution is the minimum of the tail exponents of the two distributions being combined. Third, power laws are one of three possible factors that can limit distributions for extreme returns – see Lhabitant [LHA 04b]. However, a major issue when using heavy tail distributions is that some of its moments may not exist. More specifically, if 2, then the variance does not exist, the central limit theorem no longer applies and sums no longer converge to a normal distribution. If α ≤ 1, the mean does not exist. While the latter case is rare, several financial time series exhibit a tail index below 2 when modeled by a power law. Therefore, their variance does not exist. Consequently, the standard deviation or volatility ceases to be a good risk measure or a good basis for computing probabilities. The expected utility framework is no longer available, as it typically involves assumptions on the existence of moments for the risks in consideration. Last but not least, in such a case, diversification is not necessarily optimal and can even increase “risk” – see Ibragimov and Walden [IBR 07] for examples of situations in which Value at Risk becomes a strictly increasing function in the degree of diversification. It therefore seems that the use of heavy-tailed distributions creates more problems than it solves. In addition, we should apply common sense and remember that most traditional financial assets in a long-only portfolio have a 5 Statisticians divide distributions into three categories: (1) thin tailed, which have a lower and an upper limit such as the uniform distribution; (2) medium tailed, which have exponentially declining tails such as the normal distribution; and (3) heavy tailed, which have power law tails such as the Pareto distribution.

Non-normal Return Distributions, Multi-period Models and Time Diversification

205

limited liability. Stated differently, their return distribution has a lower limit at −100% for a long position. It therefore cannot have an infinite left tail. Furthermore, the situation becomes even more complex when short positions or derivatives are considered. 6.1.7. A note on conditional correlations In practice, it is commonly observed that return correlations display different behaviors under different market conditions. Several studies have found them to be much higher than normal during turbulent market conditions – the latter situation being characterized by large negative returns, high volatility or both6. Reality – as often – is more complex than this, particularly when considering a longer time frame and several periods. Table 6.1 shows the average correlation of the index, between each of the five regions considered and the other regions, for five consecutive periods, namely the Dotcom Bubble (January 1999 to March 2000), the Dotcom Crisis (April 2000 to March 2003), the Bull Market 1 (April 2003 to March 2008), the Global Financial Crisis (April 2008 to March 2009) and the Bull Market 2 (April 2009 to March 2016). In general, the observations seem to confirm that during a period of large negative returns, average correlations increase. There is one exception; however, Emerging Markets saw their average correlation to other markets decrease between the Dotcom Bubble and the Dotcom Crisis. Table 6.2 shows the average correlation of the index, between each sector of the Euro STOXX index and the other sectors, for the same five consecutive periods. As expected, all average correlations increased between the Dotcom Bubble and the Dotcom Crisis. Yet, average correlations did not necessarily decrease between the Dotcom Crisis and the Bull Market 1, nor increase between the Bull Market 1 and the Global Financial Crisis – consumer goods, consumer services and telecom actually saw their average correlation decrease then. Also, over the entire sample and regardless of the period, the healthcare sector has seen a constant increase of its correlation to the other sectors. 6 See, for instance, Ang and Chen [ANG 02], Campbell et al. [CAM 02], Cappiello et al. [CAP 06], Chua et al. [CHU 09], Longin and Solnik [LON 95, LON 01], Loretan and English [LOR 00], Philips et al. [PHI 12], Meric and Meric [MER 97], Rey [REY 00], Sancetta and Satchell [SAN 07], and Solnik et al. [SOL 96].

206

Portfolio Diversification

Dotcom Bubble

Dotcom Crisis

Bull Market 1

Global Financial Crisis

Bull Market 2

MSCI Emg Markets

0.41

0.08

0.60

0.83

0.71

MSCI U.S.A.

0.50

0.54

0.70

0.85

0.77

MSCI Japan Index

0.33

0.42

0.69

0.86

0.77

STOXX Europe 50

0.45

0.53

0.69

0.80

0.74

MSCI World Index

0.55

0.54

0.75

0.86

0.81

Average

0.45

0.42

0.69

0.84

0.76

Table 6.1. Average correlation for different market periods and different regions

Dotcom Bubble

Dotcom Crisis

Bull Market 1

Global Financial Crisis

Bull Market 2

Basic materials

0.24

0.56

0.51

0.53

0.67

Consumer goods

0.42

0.68

0.67

0.48

0.67

Consumer services

0.47

0.67

0.68

0.64

0.72

Financials

0.55

0.70

0.67

0.67

0.68

Healthcare

0.03

0.33

0.38

0.48

0.52

Industrials

0.54

0.67

0.69

0.71

0.75

Oil and gas

0.11

0.51

0.43

0.54

0.65

Technology

0.46

0.62

0.53

0.65

0.65

Telecom

0.40

0.42

0.53

0.36

0.54

Utilities

0.40

0.49

0.62

0.69

0.65

Average

0.36

0.56

0.57

0.58

0.65

Table 6.2. Average correlation for different market periods and different sectors of the Euro STOXX index

It therefore seems that for some markets, but not all, correlations have historically increased during periods of market turmoil. Several approaches have been suggested to capture this, and one of the simplest is the exceedance correlation introduced by Longin and Solnik [LON 01]. A correlation at an exceedance level is defined as the correlation between the

Non-normal Return R Distributions, Multi-perio od Models and Time T Diversifica ation

207

returns on two asseets when botth returns reegister increaases or decrreases of more thhan θ standarrd deviations away from m their meanns. If assets X and Y returns have been sttandardized (by ( deductin ng their respeective mean from the originall returns annd dividingg the resultt by their respective standard deviatioon), their excceedance corrrelation is deefined as:

⎧⎪correel ( x, y x >θ , y > θ ) if θ ≥ 0 ⎪⎩correel ( x, y x